Skip to content

dwani.ai > Technical report

Dhwani.ai Technical Report

Release Date: April 21, 2025
Product: Dhwani.ai Android App


Hardware Sizing

The Dhwani.ai server configurations are optimized for different scales of deployment, utilizing various GPU and CPU hardware. Below is the hardware sizing for different server configurations:

Server Size Hardware Configuration
Large H100 2 x Gemma3-27B-Instruct (4-bit) + 3x IndicTrans2-1B + IndicConformer Multilingual + IndicF5 + 2x TTS/IndicF5 + 1x LLM for PDF
Large A100 Gemma3-27B-Instruct (4-bit) + 3x IndicTrans2-1B + IndicConformer Multilingual + IndicF5
Medium 1x L40S Gemma3-12B-Instruct (4-bit) + 3x IndicTrans2-200M (distilled) + IndicConformer Multilingual + IndicF5
Medium 1x L4 Gemma3-12B-Instruct (4-bit) + 3x IndicTrans2-200M (distilled) + IndicConformer Multilingual + IndicF5
Small 1x T4 Gemma3-4B-Instruct (4-bit) + 3x IndicTrans2-200M (distilled) + IndicConformer Multilingual + IndicF5
Small Local/CPU Gemma3-4B-Instruct (4-bit) + 3x IndicTrans2-200M (distilled) + IndicConformer Multilingual + IndicF5

Model Configurations

The dwani.ai platform uses a combination of quantized large language models (LLMs), multilingual speech models, and translation models optimized for Indic languages. The configurations are tailored to server sizes:

Large Server

  • LLM: google/gemma-3-27b-it (4-bit quantized)
  • Speech Model: ai4bharat/indic-conformer-600m-multilingual
  • Translation Models:
  • ai4bharat/indictrans2-en-indic-1B
  • ai4bharat/indictrans2-indic-en-1B
  • ai4bharat/indictrans2-indic-indic-1B
  • TTS Model: ai4bharat/indicf5

Medium Server

  • LLM: google/gemma-3-12b-it (4-bit quantized)
  • Speech Model: ai4bharat/indic-conformer-600m-multilingual
  • Translation Models:
  • ai4bharat/indictrans2-en-indic-dist-200M
  • ai4bharat/indictrans2-indic-en-dist-200M
  • ai4bharat/indictrans2-indic-indic-dist-320M
  • TTS Model: ai4bharat/indicf5

Small Server

  • LLM: google/gemma-3-4b-it (4-bit quantized)
  • Speech Model: ai4bharat/indic-conformer-600m-multilingual
  • Translation Models:
  • ai4bharat/indictrans2-en-indic-dist-200M
  • ai4bharat/indictrans2-indic-en-dist-200M
  • ai4bharat/indictrans2-indic-indic-dist-320M
  • TTS Model: ai4bharat/indicf5

Evaluation Details

The following models were evaluated for performance and accuracy:

  • LLM: google/gemma-3-27b-it
  • Speech Model: ai4bharat/indic-conformer-600m-multilingual
  • Translation Models:
  • full precision
  • ai4bharat/indictrans2-en-indic-1B
  • ai4bharat/indictrans2-indic-en-1B
  • ai4bharat/indictrans2-indic-indic-1B
  • distilled
  • ai4bharat/indictrans2-en-indic-dist-200M
  • ai4bharat/indictrans2-indic-en-dist-200M
  • ai4bharat/indictrans2-indic-indic-dist-320M
  • TTS Model: ai4bharat/indicf5

Key Observations: - The models successfully transcribe and translate Kannada (kan_Knda) speech inputs. - Translation accuracy varies, with notable errors in mapping Indian locations (e.g., Hubli to New York). - Response generation is contextually limited, leading to irrelevant responses in some cases.


Performance Logs

The provided logs highlight two key interactions with the Dhwani.ai server on April 24, 2025, using the speech-to-speech endpoint (/v1/speech_to_speech).

Log 1: Incorrect Translation (00:55:04 - 00:55:56)

  • Input (Kannada): "ಹುಬ್ಬಳ್ಳಿಯಿಂದ ಬೆಂಗಳೂರು ಯಾವ ಟ್ರೈನ್ ತೊಗೋಬೇಕು" (Which train should I take from Hubli to Bangalore?)
  • Transcription: Correctly transcribed.
  • Translation to English: Incorrectly translated as "What train should I take from New York to New York?"
  • Generated Response (English): "This question is irrelevant as I am designed to provide information about India and Karnataka, and there are no trains running from New York to New York."
  • Translated Response (Kannada): Correctly translated back to Kannada but irrelevant due to the initial translation error.
  • Processing Time: 12.282 seconds (00:55:04 - 00:55:16)

Issues: - The translation model (likely IndicTrans2-200M) failed to recognize "Hubli" and "Bangalore," mapping them to "New York." - The response generation model rejected the query due to the incorrect translation, resulting in an irrelevant response. - Processing time is relatively high, indicating potential bottlenecks in the pipeline.

Log 2: Correct Translation (01:08:30 - 01:08:37)

  • Input (Kannada): "ಹುಬ್ಬಳ್ಳಿಯಿಂದ ಬೆಂಗಳೂರು ಯಾವ ಟ್ರೈನ್ ತಗೋಬೇಕು" (Which train should I take from Hubli to Bangalore?)
  • Transcription: Correctly transcribed.
  • Translation to English: Correctly translated as "What train to take from Hubli to Bangalore?"
  • Generated Response (English): "To travel from Hubli to Bangalore, you can consider the Rani Chennamma Express."
  • Translated Response (Kannada): Correctly translated as "ಹುಬ್ಬಳ್ಳಿಯಿಂದ ಬೆಂಗಳೂರಿಗೆ ಪ್ರಯಾಣಿಸಲು, ನೀವು ರಾಣಿ ಚೆನ್ನಮ್ಮ ಎಕ್ಸ್ಪ್ರೆಸ್ ಅನ್ನು ಪರಿಗಣಿಸಬಹುದು."
  • Processing Time: 8.482 seconds (01:08:30 - 01:08:37)

Observations: - The translation model (likely IndicTrans2-1B, given the improved accuracy) correctly identified Indian locations. - The response was accurate and contextually relevant, recommending a specific train. - Processing time improved significantly (8.482 seconds vs. 12.282 seconds), likely due to the use of a larger model or optimized pipeline.