2025 07 07 gh 200 arm64 tooling

vLLM/ tensorRT-LLM do not work on arm64 architecture without significant installation steps.

The docker images in NGC cloud did not support vllm/tensorRT-LLM with arm64, when i verified it last month. I have to check it again this week.

We got vLLM- python library and docker container running for arm64 GPUs, GH200

https://github.com/dwani-ai/vllm-arm64

The changes need to be pushed upstreamed to vllm, but without benchmark the results, the maintainers of vllm might not accept it.

We have to create GitHub action runners with arm64 to build the wheels for vllm.

Pytorch builds for arm64 is currently avaliable only on the nightly builds.

We would like to get access to GH200/B200 to test the combinations before creating a PR.