2025 03 15 dhwani server routing v1
Dhwani- v1
Load balancer with cpu - upgrades
Route all queries to gpu instances.
Reply with wait message to stsrtup systems that are down.
Provide- system ststus button to get info regarding available service.
Host - tiny systems on the load balancer.
Pure fast api server only.
mobile App - server messages
Looks like our AI is taking a nap.
We are waking it up,
Please come back in 3 mins.
We will be ready to serve you.
Run cpu variants of all service's,
Use as fall back system when GPU resources are unavailable.
Restart gpu service on service request.
Run - smaller models on cpu services.
server unavailability- graceful response
Handle gracefully, if a service is currently not available.
Provide- usable response to the App user.
--
health chech - evals
On startup - . Check outputs for basic commands.
Verify that v the model - returns cirrect results
Analytics on usage
Focus on Android only,
Make it secure,
Add logs/ enable disable logg8ng
Add- enable analytics for system improvement. Mainly logs and translation.
Asd option for - rating response.
Add rlhf option.
‐--