llm-inference

Star

Here are 48 public repositories matching this topic...

nomic-ai / gpt4all

Star

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

ai-chat llm-inference

Updated May 27, 2025
C++

openvinotoolkit / openvino

Star

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Dec 1, 2025
C++

SJTU-IPADS / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Aug 2, 2025
C++

flashinfer-ai / flashinfer

Star

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

Updated Dec 1, 2025
C++

cactus-compute / cactus

Star

Kernels & AI inference engine for mobile devices.

android ios mobile framework ai edge transformer smartphone llm llms llamacpp llm-inference

Updated Nov 30, 2025
C++

b4rtaz / distributed-llama

Sponsor

Star

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

neural-network distributed-computing llm llms open-llm llm-inference llama2 distributed-llm llama3

Updated Nov 2, 2025
C++

lean-dojo / LeanCopilot

Star

LLMs as Copilots for Theorem Proving in Lean

machine-learning theorem-proving lean formal-mathematics lean4 llm llm-inference

Updated Nov 18, 2025
C++

zhihu / ZhiLight

Star

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda pytorch llama gpt inference-engine model-serving llm llm-serving llm-inference deepseek-r1

Updated Jul 10, 2025
C++

jd-opensource / xllm

Star

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

inference inference-engine large-language-models llm-inference qwen deepseek

Updated Nov 28, 2025
C++

foldl / chatllm.cpp

Star

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

llm llm-inference

Updated Nov 26, 2025
C++

andrewkchan / yalm

Star

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

machine-learning cpp cuda llama mistral inference-engine llm llamacpp llm-inference

Updated Sep 13, 2025
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Nov 6, 2025
C++

zjhellofss / KuiperLLama

Star

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

cpp cuda inference-engine llm llm-inference llama2 qwen qwen2 llama3

Updated Oct 28, 2025
C++

DEEPPOWERS is a Fully Homomorphic Encryption (FHE) framework built for MCP (Model Context Protocol), aiming to provide end-to-end privacy protection and high-efficiency computation for the upstream and downstream ecosystem of the MCP protocol.

ai accelerator llms llm-inference

Updated Apr 25, 2025
C++

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Aug 30, 2024
C++

andrewkchan / deepseek.cpp

Star

CPU inference for the DeepSeek family of large language models in C++

machine-learning cpp transformers llama llm llm-inference deepseek

Updated Oct 2, 2025
C++

sophgo / LLM-TPU

Star

Run generative AI models in sophgo BM1684X/BM1688

large-language-models llm generative-ai llm-inference bm1684x llama3 qwen3 qwen2-5-vl bm1688 internvl3 qwen3-vl

Updated Nov 30, 2025
C++

inferflow / inferflow

Star

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

bloom falcon moe gemma mistral mixture-of-experts model-quantization multi-gpu-inference m2m100 llamacpp llm-inference internlm llama2 qwen baichuan2 mixtral phi-2 deepseek minicpm

Updated Mar 15, 2024
C++

bytedance / ABQ-LLM

Star

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

research cuda mlsys quantized-networks llm-inference

Updated Sep 30, 2024
C++

Adriankhl / godot-llm

Star

LLM in Godot

gamedev game-development godotengine godot godot-engine gdextension llamacpp llm-inference

Updated Jun 23, 2024
C++

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference

Here are 48 public repositories matching this topic...

nomic-ai / gpt4all

openvinotoolkit / openvino

SJTU-IPADS / PowerInfer

flashinfer-ai / flashinfer

cactus-compute / cactus

b4rtaz / distributed-llama

lean-dojo / LeanCopilot

zhihu / ZhiLight

jd-opensource / xllm

foldl / chatllm.cpp

andrewkchan / yalm

vectorch-ai / ScaleLLM

zjhellofss / KuiperLLama

deeppowers / deeppowers

intel / neural-speed

andrewkchan / deepseek.cpp

sophgo / LLM-TPU

inferflow / inferflow

bytedance / ABQ-LLM

Adriankhl / godot-llm

Improve this page

Add this topic to your repo