tiny-vLLM: high performance LLM inference engine in C++ and CUDA

LaunchDevelopers

12 days ago

tiny-vLLM is an open-source LLM inference engine built with C++ and CUDA. It is available on GitHub for developers to use.

12 days ago