$ npm install @qvac/transcription-parakeetTechnology Stack: C++20, CMake, vcpkg, Bare Runtime, ONNX Runtime
Package Type: Native Bare addon
A high-performance speech-to-text (STT) inference addon for the Bare runtime using NVIDIA's Parakeet ASR models. This addon provides fast, accurate transcription with support for multiple languages, speaker diarization, and streaming audio processing via ONNX Runtime.
This addon uses NVIDIA's Parakeet ASR models in ONNX format:
License: CC-BY-4.0 by NVIDIA
This addon is built on qvac-lib-inference-addon-cpp, which provides the foundational framework for QVAC inference addons.
Clone the repository:
git clone https://github.com/tetherto/qvac.git
cd qvac/packages/qvac-lib-infer-parakeet
Install npm dependencies (includes cmake-bare and cmake-vcpkg):
npm install
This will automatically:
prebuilds/ directoryOr build manually:
npm run build
The examples/ folder contains ready-to-run scripts demonstrating different use cases.
quickstart.js - Basic transcription of a WAV file using the TDT model. Start here to understand the core workflow: create instance, load weights, activate, transcribe, cleanup.
bare examples/quickstart.js
transcribe.js - Transcribe audio files in any supported language. Supports both WAV and raw PCM formats with automatic language detection.
# Transcribe Spanish audio
bare examples/transcribe.js --file examples/samples/LastQuestion_long_ES.raw
# Transcribe French audio
bare examples/transcribe.js --file examples/samples/French.raw
# Transcribe Croatian audio with INT8 model
bare examples/transcribe.js -f examples/samples/croatian.raw -m models/parakeet-tdt-0.6b-v3-onnx-int8-full
# Transcribe English WAV file
bare examples/transcribe.js --file examples/samples/sample-16k.wav
quickstart-ctc.js - Fast English-only transcription using the CTC model. Includes punctuation and capitalization. Best for single-language, high-throughput use cases.
bare examples/quickstart-ctc.js
quickstart-eou.js - Real-time streaming transcription using the EOU model (120M params). Automatically detects utterance boundaries for turn-by-turn output. Note: the EOU model is optimized for low latency over accuracy — expect lower transcription quality compared to TDT/CTC.
bare examples/quickstart-eou.js
quickstart-sortformer.js - Identifies who is speaking when using the Sortformer model (up to 4 speakers). Outputs speaker-labeled time segments.
bare examples/quickstart-sortformer.js
quickstart-diarized.js - Combines TDT transcription with Sortformer diarization to produce speaker-attributed text. Runs both models in parallel and merges the results. Diarization accuracy depends on audio quality and speaker overlap — some boundary imprecision is expected.
bare examples/quickstart-diarized.js
example.decoder.js - Demonstrates using @qvac/decoder-audio to decode audio files before transcription. Useful when working with compressed audio formats.
Download models from HuggingFace using the provided script:
./scripts/download-models.sh
The interactive script lets you choose which model variant to download (TDT, CTC, EOU, Sortformer, or all).
Model Variants:
| Variant | Size | Path | Notes |
|---|---|---|---|
| INT8 (default) | ~650 MB | models/parakeet-tdt-0.6b-v3-onnx-int8/ | Recommended, 73% smaller, Conv+MatMul quantized |
| INT8 partial | ~890 MB | models/parakeet-tdt-0.6b-v3-onnx-int8-partial/ | MatMul-only quantized |
| FP32 | ~2.4 GB | models/parakeet-tdt-0.6b-v3-onnx/ | Full precision |
Models will be saved to the models/ directory.
The TDT model supports approximately 25 languages with automatic detection:
Set language: 'auto' for automatic detection or specify the language code explicitly.
The following benchmarks were run using the parakeet-tdt-0.6b-v3-onnx model with 100 samples per language on CPU (4 threads).
| Language | Dataset | WER (%) | CER (%) | Quality |
|---|---|---|---|---|
| English | LibriSpeech (clean) | 7.51 | 6.61 | Excellent |
| French | Multilingual LibriSpeech | 22.35 | 19.31 | Adequate |
| Spanish | Multilingual LibriSpeech | 27.34 | 25.93 | Adequate |
| Russian | FLEURS | 30.97 | 28.81 | Adequate |
| Italian | Multilingual LibriSpeech | 31.39 | 24.71 | Low |
| Portuguese | Multilingual LibriSpeech | 31.24 | 29.48 | Low |
| Czech | FLEURS | 35.39 | 30.18 | Low |
| German | Multilingual LibriSpeech | 40.99 | 38.83 | Low |
| WER Range | Quality | Description |
|---|---|---|
| 0–5% | Excellent | Near human-parity transcription |
| 5–15% | High | Minor word errors, highly usable |
| 15–30% | Adequate | Understandable but noticeable mistakes |
| >30% | Low | Transcript may need significant correction |
For detailed benchmark methodology and raw results, see the benchmarks/ directory.
createInstance(config, outputCallback)Creates a new Parakeet instance.
Parameters:
config (Object):
modelPath (string): Path to model directorymodelType (string): 'ctc', 'tdt', 'eou', or 'sortformer'config (Object):
language (string): Language code or 'auto'maxThreads (number): Maximum CPU threads to useuseGPU (boolean): Enable GPU accelerationoutputCallback (Function): (handle, event, data, error) => {}Returns: Handle (number) for this instance
loadWeights(handle, buffer)Load model weights from buffer.
Parameters:
handle (number): Instance handlebuffer (ArrayBuffer): Model file dataactivate(handle)Activate the model after loading weights.
Parameters:
handle (number): Instance handlerunJob(handle, input)Run transcription job.
Parameters:
handle (number): Instance handleinput (Object):
type (string): 'audio'data (ArrayBuffer): Audio datasampleRate (number): Sample rate (e.g., 16000)channels (number): Number of audio channelscancelJob(handle)Cancel the current running job.
destroyInstance(handle)Destroy the instance and free resources.
The output callback receives these events:
transcription: Partial or complete transcription result
data.text (string): Transcribed textdata.confidence (number): Confidence score (0-1)data.isFinal (boolean): Whether this is the final resultprogress: Processing progress update
data.percent (number): Progress percentage (0-100)data.timeElapsed (number): Elapsed time in msdiarization: Speaker identification (if using Sortformer)
data.speakerId (number): Speaker ID (0-3)data.startTime (number): Start time in secondsdata.endTime (number): End time in secondscomplete: Job completed successfully
error: Error occurred
error (string): Error message./scripts/download-models.sh or Hugging FaceClone the repository:
git clone https://github.com/tetherto/qvac.git
cd qvac/packages/qvac-lib-infer-parakeet
Configure with vcpkg:
cmake -S . -B build \
-DCMAKE_TOOLCHAIN_FILE="$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake" \
-DCMAKE_BUILD_TYPE=Release
Build:
cmake --build build --config Release
# Build with tests enabled
cmake -S . -B build \
-DCMAKE_TOOLCHAIN_FILE="$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake" \
-DBUILD_TESTING=ON
cmake --build build
ctest --test-dir build --output-on-failure
qvac-lib-infer-parakeet/
├── src/
│ ├── ParakeetModel.hpp # Main model implementation
│ ├── ParakeetModel.cpp # ONNX Runtime integration
│ ├── binding.cpp # Bare addon registration
│ └── qvac-lib-inference-addon-cpp/ # Base framework (header-only)
├── models/ # Downloaded ONNX models (not in git)
├── tests/ # C++ tests
├── examples/ # JavaScript usage examples
├── CMakeLists.txt # Build configuration
├── vcpkg.json # C++ dependencies
├── package.json # npm/bare package
└── README.md
| Platform | Architecture | Min Version | Status | GPU Support |
|---|---|---|---|---|
| macOS | arm64, x64 | 14.0+ | ✅ Tier 1 | CoreML |
| iOS | arm64 | 17.0+ | ✅ Tier 1 | CoreML |
| Linux | arm64, x64 | Ubuntu-22+ | ✅ Tier 1 | CPU only |
| Android | arm64 | 12+ | ✅ Tier 1 | NNAPI |
| Windows | x64 | 10+ | ✅ Tier 1 | DirectML |
Dependencies:
ONNX Runtime provides automatic hardware acceleration when useGPU: true is set:
If the selected GPU provider fails at session creation, inference falls back to CPU automatically.
There are no official ONNX models on huggingface from NVIDIA. These are converted ONNX model files by the open community.
This project is licensed under the Apache-2.0 License – see the LICENSE file for details.
Model License: The Parakeet models are licensed under CC-BY-4.0 by NVIDIA.
Contributions are welcome! Please open an issue or pull request on GitHub.
For questions or issues, please open an issue on the GitHub repository.