llm-d components
The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.
Latest Release: v0.3.0​
Released: October 10, 2025
Components​
| Component | Description | Repository | Documentation |
|---|---|---|---|
| Inference Scheduler | This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework. | llm-d/llm-d-inference-scheduler | View Docs |
| Modelservice | modelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet). | llm-d-incubation/llm-d-modelservice | View Docs |
| Routing Sidecar | A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header. | llm-d/llm-d-routing-sidecar | View Docs |
| Inference Sim | A light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM. | llm-d/llm-d-inference-sim | View Docs |
| Infra | A helm chart for deploying gateway and gateway related infrastructure assets for llm-d. | llm-d-incubation/llm-d-infra | View Docs |
| Kv Cache Manager | This repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms. | llm-d/llm-d-kv-cache-manager | View Docs |
| Benchmark | This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles. | llm-d/llm-d-benchmark | View Docs |
Getting Started​
Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.
Quick Links​
- Main llm-d Repository - Core platform and orchestration
- llm-d-incubation Organization - Experimental and supporting components
- Latest Release - v0.3.0
- All Releases - Complete release history
Previous Releases​
For information about previous versions and their features, visit the GitHub Releases page.
Contributing​
To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.