csm.rs: Blazing-fast Rust Conversational Speech Model

A few weeks ago, we introduced azzurra-voice, our open, state-of-the-art Italian Text-to-Speech (TTS) model.

A beautiful voice is just one part of the equation. To create a truly interactive and real-time experience, that voice needs a powerful engine—one that can generate speech instantly, without compromising on quality or sending data to the cloud.

Today, we’re releasing:

csm.rs a blazing-fast, open-source TTS inference engine written in Rust.

What is csm.rs?

csm.rs is a high-performance Rust implementation of Sesame’s Conversational Speech Model (CSM) (the base model of azzurra-voice), built on the powerful candle machine learning framework. It’s designed from the ground up for one thing: raw performance for real-time streaming TTS.

The engine is built around several key features that make it ideal for local AI. It is blazing fast, built in Rust on candle to deliver the high-throughput, low-latency performance needed for natural, real-time conversation. csm.rs is also extremely efficient, with support for GGUF-based q8_0 and q4_k quantization that allows you to run large TTS models with a significantly smaller memory footprint, making it perfect for consumer hardware. Thanks to candle, it runs virtually anywhere, supporting multiple hardware backends including MKL (Intel), Accelerate (macOS), CUDA/cuDNN (NVIDIA), and Metal (Apple Silicon). Integration is seamless; we’ve included a built-in web server with an OpenAI-compatible API, allowing csm.rs to act as a drop-in replacement for existing TTS services. Finally, it provides broad model support, natively handling the original sesame/csm-1b weights as well as fine-tuned models from the Hugging Face Hub, like our own azzurra-voice.

Why We Built a New Engine

For a conversation to feel natural, the delay between you speaking and the agent responding must be minimal. Traditional Python-based inference frameworks, while excellent for research, often carry overhead that makes achieving this ultra-low latency a challenge.

By building csm.rs in Rust, we gain the low-level control needed to squeeze every last drop of performance out of the hardware. This ensures the voice generation step is never the bottleneck.

Open Source for a Private Future

We are releasing csm.rs under the GNU Affero General Public License (AGPL) v3. We believe that foundational tools for private, personal AI should be open, transparent, and available to everyone. We encourage developers, researchers, and hobbyists to use it, learn from it, and contribute back to the project.

Federico Galatolo, Cartesia CTO