Skip to content

2025 03 15 dhwani server routing v1

Dhwani- v1

Load balancer with cpu - upgrades

Route all queries to gpu instances.

Reply with wait message to stsrtup systems that are down.

Provide- system ststus button to get info regarding available service.

Host - tiny systems on the load balancer.

Pure fast api server only.


mobile App - server messages

Looks like our AI is taking a nap.

We are waking it up,
Please come back in 3 mins.

We will be ready to serve you.

Run cpu variants of all service's,
Use as fall back system when GPU resources are unavailable.

Restart gpu service on service request.

Run - smaller models on cpu services.

server unavailability- graceful response

Handle gracefully, if a service is currently not available.

Provide- usable response to the App user.

--

health chech - evals

On startup - . Check outputs for basic commands.

Verify that v the model - returns cirrect results


Analytics on usage

Focus on Android only,
Make it secure, Add logs/ enable disable logg8ng

Add- enable analytics for system improvement. Mainly logs and translation.

Asd option for - rating response.

Add rlhf option.

‐--