Skip to main content

llm-d components

The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.

Latest Release: v0.3.0​

Released: October 10, 2025

Components​

ComponentDescriptionRepositoryDocumentation
Inference SchedulerThis scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.llm-d/llm-d-inference-schedulerView Docs
Modelservicemodelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).llm-d-incubation/llm-d-modelserviceView Docs
Routing SidecarA reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.llm-d/llm-d-routing-sidecarView Docs
Inference SimA light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.llm-d/llm-d-inference-simView Docs
InfraA helm chart for deploying gateway and gateway related infrastructure assets for llm-d.llm-d-incubation/llm-d-infraView Docs
Kv Cache ManagerThis repository contains the llm-d-kv-cache-manager, a pluggable service designed to enable KV-Cache Aware Routing and lay the foundation for advanced, cross-node cache coordination in vLLM-based serving platforms.llm-d/llm-d-kv-cache-managerView Docs
BenchmarkThis repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.llm-d/llm-d-benchmarkView Docs

Getting Started​

Each component has its own detailed documentation page accessible from the links above. For a comprehensive view of how these components work together, see the main Architecture Overview.

Previous Releases​

For information about previous versions and their features, visit the GitHub Releases page.

Contributing​

To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.