Skip to content

Install IndexTTS2 Locally

IndexTTS is an industrial-level controllable and efficient zero-shot Text-to-Speech (TTS) system developed by the Bilibili, officially open-sourced on September 8, 2025. It can quickly convert text into natural and fluent speech, supporting both Chinese and English scenarios.

Key Features and Upgrades from Version 1.x

It boasts the following features and numerous upgrades compared to Version 1.x:

  • Precise Duration Control: IndexTTS2 achieves precise duration control in an autoregressive architecture for the first time, supporting two generation modes. One mode enables accurate duration control by explicitly specifying the number of tokens to generate, while the other allows free generation while preserving the prosodic features of the input prompt. In contrast, Version 1.x offers no duration control. This advantage makes IndexTTS2 particularly effective in scenarios requiring strict audio-visual synchronization (such as film and television dubbing), with an audio-visual synchronization error of less than 0.02%.

  • Decoupling of Timbre and Emotion: The model decouples emotional features from speaker timbre, allowing users to independently specify the source of timbre and the source of emotion. For example, users can retain the timbre from one audio clip and assign emotion using another audio clip with different emotions or a text description. Under zero-shot conditions, the model can accurately reproduce the target timbre while fully restoring the specified emotion. Version 1.x, however, lacked this capability, resulting in less flexibility in combining emotional expression and timbre.

  • Multiple Emotion Control Methods: IndexTTS2 introduces four new emotion control methods: using an emotional reference audio, controlling via an emotion vector, controlling via emotional descriptive text, and the default method (using the same reference audio as the timbre source). Users can choose different methods based on their needs to precisely regulate the emotional expression of the synthesized speech. In comparison, Version 1.x had relatively limited emotion control options.

  • Text-Driven Emotion Control: It incorporates a built-in T2E (Text-to-Emotion) module, fine-tuned based on the Qwen-3 model. This module converts natural language descriptions into emotion vectors, enabling users to drive the emotional expression of synthesized speech simply by inputting text descriptions (e.g., "questioning angrily"). This significantly lowers the barrier to use, whereas Version 1.x likely lacked such a convenient text-driven emotion control function.

  • Integration of GPT Latent Representations: IndexTTS2 integrates GPT latent representations and designs a three-stage training strategy. This enhances the stability and clarity of speech in high-emotion scenarios, addresses issues of insufficient data and overfitting, and makes the synthesized results more natural and fluent. Version 1.x, by contrast, might have had problems such as unclear articulation when expressing strong emotions.

  • Performance Improvements: Multi-dataset experiments show that IndexTTS2 outperforms current state-of-the-art zero-shot TTS models in terms of word error rate (WER), speaker similarity, and emotion fidelity. For instance, the word error rate of IndexTTS2 is 1.883%, compared to 1.921% for Version 1.x, representing a reduction of 0.038%.

System Requirements

  • Minimum 16GB RAM. 24GB+ storage recommended.
  • macOS 11+: Intel/M-series supported.
  • Windows 10/11: Intel/AMD GPUs supported, NVIDIA GPU recommended.

Find IndexTTS2 in LM Downloader

Open LM Downloader, then click the "Local Apps" in the left menu. You could see IndexTTS2 in the app list.

Of course, there is also the older version 1.x of IndexTTS, which has lower hardware requirements.

Click the IndexTTS2 icon to go to the introduction page.

Click the Install Button,the install window opens. If you already have IndexTTS2 installed, don't worry, this can be treated as an update to IndexTTS2 and won't affect the models you've previously downloaded.

Close this window after the installation is complete.

Run IndexTTS2

On the application details page, click the Run button on the right to open the execution window.

Upon successful launch, your browser will open automatically.