2025 03 20 gemma speed up

Gemma - Speed Up

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. 2025-03-20 00:20:03,894 - dhwani_api - INFO - LLM google/gemma-3-4b-it loaded on cuda with compiled forward pass /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:194: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting torch.set_float32_matmul_precision('high') for better performance. warnings.warn( W0320 00:20:26.460000 1 torch/_inductor/utils.py:1137] [0/0] Not enough SMs to use max_autotune_gemm mode