Back to AIBriefs
LaunchDevelopers

tiny-vLLM: high performance LLM inference engine in C++ and CUDA

tiny-vLLM is an open-source LLM inference engine built with C++ and CUDA. It is available on GitHub for developers to use.

·
12 days ago
tiny-vLLM: high performance LLM inference engine in C++ and CUDA — AIBriefs