GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
May 27, 2025 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
High-speed Large Language Model Serving for Local Deployment
FlashInfer: Kernel Library for LLM Serving
Kernels & AI inference engine for mobile devices.
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
LLMs as Copilots for Theorem Proving in Lean
A highly optimized LLM inference acceleration engine for Llama and its variants.
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
A high-performance inference system for large language models, designed for production environments.
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
DEEPPOWERS is a Fully Homomorphic Encryption (FHE) framework built for MCP (Model Context Protocol), aiming to provide end-to-end privacy protection and high-efficiency computation for the upstream and downstream ecosystem of the MCP protocol.
CPU inference for the DeepSeek family of large language models in C++
Run generative AI models in sophgo BM1684X/BM1688
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
LLM in Godot
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."