Defilantech / LLMKube
Description
Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready
Technical Specifications
| Core Language | |
| GitHub Authority | ⭐ 127 stars |
| Last Code Push | 2026-06-10 |
| Open Issues / Bugs | 🛠️ 43 bugs listed |
| License Type | Open-Source (Free to use) |
Get Source Code
This project is open-source and hosted on GitHub. Click below to explore the repository, deployment guides, or fork the code.
Go to Repository →