← Back to Home

Local LLM Tools

Explore the highest-rated open-source tools for running Large Language Models locally. Discover inference engines, GUI clients, and API wrappers for GGUF and safetensors models. Sorted by GitHub authority and active contributions.

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU

This repository provides a ready-to-use Google Colab notebook that turns Colab into a temporary

Run local LLM from Huggingface in React-Native or Expo using onnxruntime.

Run Open Source/Open Weight LLMs locally with OpenAI compatible APIs

Chat, RAG search, multi-step Plans workflows, MCP tools, and Agents integration. Supports OpenAI,

Project Jarvis is a versatile AI assistant that integrates various functionalities.

SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI.

M-Courtyard: Local AI Model Fine-tuning Assistant for Apple Silicon. Zero-code, zero-cloud,

Native LLM inference server for Apple Silicon. OpenAI + Anthropic API compatible. No Python.

Vibecode Editor is a fullstack, web-based IDE built with Next.js and Monaco Editor. It features

Openai-style, fast & lightweight local language model inference w/ documents

A python package for developing AI applications with local LLMs.

Give a query, get a dataroom. Pi + self-hosted Qwen3.6 research harness on a single L4.

🤖 Visual AI agent workflow automation platform with local LLM integration - build intelligent

Private on-device AI chat for Android — runs any GGUF model locally via llama.cpp with

Orchestrate a swarm of Claude Code agents with a local brain that learns from you.

Local LLM Testing & Benchmarking for Apple Silicon

macOS menu bar app that exposes Apple's on-device Foundation Models as an OpenAI-compatible local

Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible

Open Source Local Data Analysis Assistant.

One-click Qwen3.6-27B inference on Windows. 158 tok/s on RTX 5090, 72 tok/s on RTX 3090. Native, no

Find the best models and how to run them locally.

Your fully proficient, AI-powered and local chatbot assistant🤖

LLM story writer with a focus on high-quality long output based on a user provided prompt.