WhisperX API Server is a FastAPI-based server designed to transcribe audio files using the Whisper ASR (Automatic Speech Recognition) model based on WhisperX (https://github.com/m-bain/WhisperX) Python library. The API offers an OpenAI-like interface that allows users to upload audio files and receive transcription results in various formats. It supports customizable options such as different models, languages, temperature settings, and more.
Features
- Audio Transcription: Transcribe audio files using the Whisper ASR model.
- Model Caching: Load and cache models for reusability and faster performance.
- OpenAI-like API, based on https://platform.openai.com/docs/api-reference/audio/createTranscription and https://platform.openai.com/docs/api-reference/audio/createTranslation
https://platform.openai.com/docs/api-reference/audio/createTranscription
Parameters:
file
: The audio file to transcribe.model (str)
: The Whisper model to use. Default isconfig.whisper.model
.language (str)
: The language for transcription. Default isconfig.default_language
.prompt (str)
: Optional transcription prompt.response_format (str)
: The format of the transcription output. Defaults tojson
.temperature (float)
: Temperature setting for transcription. Default is0.0
.timestamp_granularities (list)
: Granularity of timestamps, eithersegment
orword
. Default is["segment"]
.stream (bool)
: Enable streaming mode for real-time transcription. (Doesn't work.)hotwords (str)
: Optional hotwords for transcription.suppress_numerals (bool)
: Option to suppress numerals in the transcription. Default isTrue
.highlight_words (bool)
: Highlight words in the transcription output for formats like VTT and SRT.align (bool)
: Option to do transcription timings alignment. Default isTrue
.diarize (bool)
: Option to diarize the transcription. Default isFalse
.
Returns: Transcription results in the specified format.
https://platform.openai.com/docs/api-reference/audio/createTranslation
Parameters:
file
: The audio file to translate.model (str)
: The Whisper model to use. Default isconfig.whisper.model
.prompt (str)
: Optional translation prompt.response_format (str)
: The format of the translation output. Defaults tojson
.temperature (float)
: Temperature setting for translation. Default is0.0
.
Returns: Translation results in the specified format.
Returns the current health status of the API server.
Lists all loaded models currently available on the server.
Unloads a specific model from memory cache.
Loads a specified model into memory.
With Docker:
For CPU:
docker compose build whisperx-api-server-cpu
docker compose up whisperx-api-server-cpu
For CUDA (GPU):
docker compose build whisperx-api-server-cuda
docker compose up whisperx-api-server-cuda
Feel free to submit issues, fork the repository, and send pull requests to contribute to the project.
This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3. See the LICENSE
file for details.