npm stats
  • Search
  • About
  • Repo
  • Sponsor
  • more
    • Search
    • About
    • Repo
    • Sponsor

Made by Antonio Ramirez

@qvac/bci-whispercpp

0.3.3

@GitHub Actions

npmHomeRepoSnykSocket
Downloads:9186
$ npm install @qvac/bci-whispercpp
DailyWeeklyMonthlyYearly

@qvac/bci-whispercpp

Brain-Computer Interface (BCI) neural signal transcription addon for qvac, powered by the tetherto/qvac-ext-lib-whisper.cpp fork of whisper.cpp.

Transcribes multi-channel neural signals (e.g., 512-channel microelectrode array recordings) into text using a BCI-trained whisper model running natively via GGML. Output matches the Python BrainWhisperer reference model exactly.

Table of Contents

  • Architecture
  • Results
  • Neural Signal Format
  • Installation
  • Quickstart
  • Model Conversion
  • Usage
  • Configuration
  • Tests
  • Error Range
  • whisper.cpp Patches
  • Resources
  • Glossary
  • License

Architecture

Neural Signal (512ch, 20ms bins)
    │
    ▼
┌──────────────────────────────┐
│   NeuralProcessor (C++)      │
│   - Gaussian smoothing       │  std=2, kernel=100
│   - Day-specific projection  │  low-rank (A·B) + month + softsign
│   - Pad to 3000 frames       │  mel-major layout for whisper.cpp
└──────────────┬───────────────┘
               │  mel features (512 × 3000)
               ▼
┌──────────────────────────────┐
│   whisper.cpp (patched)      │
│   - conv1 (k=7, 512→384)    │  BCI-trained embedder weights
│   - conv2 (k=3, stride=2)   │
│   - Positional encoding      │  learned time PE + sinusoidal day PE
│   - 6-layer encoder          │  windowed attention (w=57) on layers 0–3
│   - 4-layer decoder (LoRA)   │  beam search, length_penalty=0.14
└──────────────┬───────────────┘
               │
               ▼
          Text output

Results

Native GGML inference matches the Python BrainWhisperer reference on all test samples:

SampleGround TruthGGML Native OutputWER
0"You can see the code at this point as well.""You can see the good at this point as well."10.0%
1"How does it keep the cost down?""How does it keep the cost down?"0.0%
2"Not too controversial.""Not too controversial."0.0%
3"The jury and a judge work together on it.""The jury and a judge work together on it."0.0%
4"Were quite vocal about it.""We're quite vocal about it."20.0%
Average6.0%

Neural Signal Format

Binary files with the following layout:

OffsetTypeDescription
0uint32Number of timesteps
4uint32Number of channels
8float32[]Feature data (row-major: features[t * channels + c])

Each timestep represents a 20ms bin of neural activity. Channels correspond to individual electrodes in a microelectrode array (typically 512 channels).

Installation

cd packages/bci-whispercpp
npm install
VCPKG_ROOT=/path/to/vcpkg npm run build

Prerequisites

  • Bare runtime >= 1.24.0
  • CMake >= 3.25
  • vcpkg with VCPKG_ROOT environment variable set

Quickstart

To run an example you need the BCI model files and (for batch mode) the test fixtures. The download script fetches the model files from the QVAC model registry (no GitHub CLI, no auth) and the neural-signal fixtures from the public release tarball (the fixtures aren't in the registry yet).

cd packages/bci-whispercpp
npm install   # installs @qvac/registry-client (devDependency)

# Download model files (ggml-bci-windowed.bin + bci-embedder.bin) + test fixtures
npm run download-models
# node scripts/download-models.js --models    # models only (from the registry)
# node scripts/download-models.js --fixtures  # fixtures only
# node scripts/download-models.js --force     # re-download even if present

The models land in models/ and the neural-signal fixtures in test/fixtures/. Then run an example:

# Transcribe all bundled fixture samples and print WER
bare examples/transcribe-neural.js --batch

# Transcribe a single neural signal file
bare examples/transcribe-neural.js test/fixtures/neural_sample_0.bin

# Streaming transcription over a sliding window
bare examples/transcribe-stream-neural.js test/fixtures/neural_sample_0.bin

By default the examples look for models/ggml-bci-windowed.bin (with bci-embedder.bin alongside it). Override with WHISPER_MODEL_PATH=/path/to/ggml-bci-windowed.bin or by passing the model path as the final argument.

The model files come from the qvac model registry (engine @qvac/bci-whispercpp, S3 source qvac_models_compiled/bci-whispercpp/...). If you already have them locally, skip the download step and point WHISPER_MODEL_PATH at your copy. The fixtures URL can be overridden with BCI_FIXTURES_URL.

Model Conversion Prerequisites

  • Python 3 with numpy, torch, and transformers (pip install numpy torch transformers)

Model Conversion

Convert a trained BrainWhisperer checkpoint. This produces two files, both required for inference:

FileSizeDescription
ggml-bci-windowed.bin~84 MBGGML model: whisper encoder/decoder (LoRA-merged), tokenizer, positional embedding, windowed attention header
bci-embedder.bin~24 MBDay projection weights: low-rank A·B matrices per recording day, month projections, session-to-day mapping
python3 scripts/convert-model.py \
  --checkpoint /path/to/epoch=93-val_wer=0.0910.ckpt

Both files are written to models/ by default. All flags are optional:

FlagDefaultDescription
--outputmodels/ggml-bci-windowed.binGGML model output path
--embedder-outputmodels/bci-embedder.binEmbedder weights output path
--day-idx1Day index for baked positional embedding
--window-size57Windowed attention size (0 to disable)
--last-window-layer3Last encoder layer with windowed attention
--f32offUse f32 for all tensors (avoids f16 precision loss, ~2x larger)

Important: By default both files must be in the same directory at runtime — the addon resolves bci-embedder.bin next to the GGML model file and will fail if it is missing. To store the embedder elsewhere, pass an explicit path via files.embedder (see Usage).

Usage

The package's default export is the high-level BCIWhispercpp class. It owns model lifecycle, an inference queue, and a sliding-window streaming driver on top of the native addon.

const BCIWhispercpp = require('@qvac/bci-whispercpp')

1. Construct an instance

const bci = new BCIWhispercpp({
  files: { model: './models/ggml-bci-windowed.bin' },
  opts: { stats: true }            // optional — surfaces runtime stats on response.stats
}, {
  whisperConfig: { language: 'en', temperature: 0.0 },
  bciConfig:     { day_idx: 1 },   // session day index for day-specific projection
  miscConfig:    { caption_enabled: false }
})

By default the companion bci-embedder.bin must sit next to files.model — the native addon resolves it relative to the model path and will fail to load otherwise. To keep the embedder in a different location, pass its path explicitly via files.embedder:

const bci = new BCIWhispercpp({
  files: {
    model: './models/ggml-bci-windowed.bin',
    embedder: './weights/bci-embedder.bin' // optional override
  }
})

2. Load the model

await bci.load()

load() is idempotent — calling it again unloads the existing model and re-initialises with the current config. There is no progress callback today.

3. Transcribe (batch mode)

Use this when you have the full neural signal up-front. transcribe() accepts the raw bytes (header + body); transcribeFile() is a convenience wrapper that reads the file for you.

const fs = require('bare-fs')

const response = await bci.transcribeFile('./signal.bin')
// or: const response = await bci.transcribe(new Uint8Array(fs.readFileSync('./signal.bin')))

const segments = await response.await()
const text = segments.map(s => s.text).join('').trim()
console.log(text)

if (response.stats) console.log(response.stats) // when opts.stats: true

Concurrent calls are serialised — a second transcribe() waits for the first to settle.

4. Transcribe (streaming mode)

transcribeStream() consumes a stream of bytes (async iterable, sync iterable, Uint8Array, or array of chunks) and decodes a sliding window over the body as data arrives. The first 8 bytes of the stream must be the standard [T u32 LE, C u32 LE] header (T is ignored in stream mode; C must be non-zero).

const response = await bci.transcribeStream(chunkIterable, {
  windowTimesteps: 1500,   // default
  hopTimesteps:    500,    // default — must be < windowTimesteps
  emit:            'delta' // 'delta' (default) | 'full'
})

response.onUpdate(segments => {
  // emit: 'delta' — newly-discovered tail segments since the last window.
  //                 Each segment carries native fields (text, t0, t1, ...) plus
  //                 windowStartTimestep so you can map back to the stream timeline.
  // emit: 'full'  — single { text } entry with the full running transcript.
  for (const s of segments) process.stdout.write(s.text)
})

await response.await()     // resolves when the stream ends and the final window decodes

Streaming constraints:

OptionConstraint
windowTimestepspositive integer, ≤ 2900 (MAX_WINDOW_TIMESTEPS)
hopTimestepspositive integer, < windowTimesteps
emit'delta' or 'full'

Only one stream may be active at a time. response.stats is not populated for streams.

5. Cancel / unload / destroy

await bci.cancel()    // abort an in-flight job or stream; instance remains usable
await bci.unload()    // release native resources; bci.load() can be called again
await bci.destroy()   // permanent — instance cannot be reused

6. Word Error Rate helper

The package re-exports computeWER(hypothesis, reference) for evaluation:

const { computeWER } = require('@qvac/bci-whispercpp')
const wer = computeWER('how does it keep the cost down', 'how does it keep the cost down?')

Output shape

response.await() resolves to an array of segments; response.onUpdate(cb) receives the same shape per emission:

[
  { text: ' How does it keep the cost down?', t0: 0, t1: 280, /* ... */ }
]

In streaming delta mode each segment is annotated with windowStartTimestep. In full mode the array contains a single { text } entry.

Tests

ScriptPurpose
npm run test:unitJS unit tests (brittle-bare test/unit/*.test.js) — no model required
npm run test:integrationJS integration tests against the native addon — requires WHISPER_MODEL_PATH
npm run test:cppC++ unit tests (GoogleTest); bare-make rebuilds the addon with BUILD_TESTING=ON
npm run test:dtsType-checks the published index.d.ts
npm testRuns test:unit + test:integration
# JS unit tests
npm run test:unit

# JS integration tests
WHISPER_MODEL_PATH=./models/ggml-bci-windowed.bin npm run test:integration

# C++ unit tests
VCPKG_ROOT=/path/to/vcpkg npm run test:cpp

# .d.ts typecheck
npm run test:dts

Integration tests require both ggml-bci-windowed.bin and bci-embedder.bin to be present in the same directory, plus the neural-signal fixtures. The quickest way to get them is npm run download-models (see Quickstart); alternatively produce the models yourself via Model Conversion.

Configuration

BCIWhispercpp accepts two arguments:

new BCIWhispercpp(args, config)

args

FieldTypeDescription
files.modelstringRequired. Path to BCI GGML model file. By default bci-embedder.bin must sit alongside it.
files.embedderstringOptional explicit path to the embedder weights file. Overrides the default lookup of bci-embedder.bin next to files.model.
loggerobjectOptional logger; wrapped in @qvac/logging. Defaults to a noop logger.
opts.statsbooleanWhen true, runtime stats are surfaced on response.stats for batch jobs. Default false.

config.whisperConfig

The convenience defaults below are surfaced explicitly. Any other whisper_full_params key is forwarded untouched to whisper.cpp — see Advanced configuration.

ParameterTypeDefaultDescription
languagestring"en"Language code
temperaturenumber0.0Sampling temperature
n_threadsnumber0 (auto)Number of threads

config.bciConfig

ParameterTypeDefaultDescription
day_idxnumber0Session day index for the day-specific low-rank projection at runtime. Distinct from the conversion-time --day-idx flag, which bakes a positional embedding into ggml-bci-windowed.bin.

config.contextParams

These keys back the whisper_context. Changing any of them between jobs forces a full model reload (unload → re-init → warmup), which can take several seconds.

ParameterTypeDescription
modelstringOptional override; usually set via args.files.model.
use_gpubooleanEnable GPU acceleration (Metal on macOS by default).
flash_attnbooleanEnable flash attention.
gpu_devicenumberSelect a non-default GPU device.

config.miscConfig

ParameterTypeDefaultDescription
caption_enabledbooleanfalseFormat segments with <|start|>..<|end|> markers.

streamOpts (passed to transcribeStream())

ParameterTypeDefaultConstraintDescription
windowTimestepsnumber1500positive integer, ≤ 2900 (MAX_WINDOW_TIMESTEPS)Decode window size in 20 ms timesteps.
hopTimestepsnumber500positive integer, < windowTimestepsHow far the window advances between decodes (~33% overlap by default).
emitstring'delta''delta' or 'full''delta' emits the newly-discovered tail per window with native segment fields plus windowStartTimestep. 'full' emits a single { text } entry with the running transcript.

The encoder accepts up to ~3000 timesteps per forward pass; MAX_WINDOW_TIMESTEPS = 2900 keeps a safety margin so partial flush windows always fit.

Advanced configuration

whisperConfig is a thin pass-through to whisper.cpp's whisper_full_params. For the full surface (decoding strategy, beam search, VAD, suppression, callbacks, etc.) refer to the upstream reference:

  • whisper_full_params in whisper.cpp
  • Concrete shapes used in production: see the examples directory and @qvac/transcription-whispercpp for richer usage patterns (VAD, chunking, live streaming).

whisper.cpp Patches

The BCI patches live in the tetherto/qvac-ext-lib-whisper.cpp fork (v1.8.4.2) and are consumed via the qvac-registry-vcpkg port:

FeatureDescription
Variable conv1 kernelRead n_audio_conv1_kernel from model header (k=7 for 512ch BCI vs k=3 for audio)
Windowed attentionAttention mask with configurable window size/layer params in header
BCI SOS tokensBCI-specific start-of-sequence token handling
Graph placement fixCorrect encoder-graph mask population for the encoder graph refactor

Error Range

All errors thrown by this package are QvacErrorAddonBCI instances (extending QvacErrorBase from @qvac/error) and use codes in the range 26001–27000.

CodeNameWhen
26001FAILED_TO_LOAD_WEIGHTSNative addon failed to load the GGML model
26002FAILED_TO_CANCELcancel() failed at the addon layer
26003FAILED_TO_APPENDAppend to processing queue failed
26004FAILED_TO_DESTROYdestroy() failed at the addon layer
26005FAILED_TO_ACTIVATEaddon.activate() failed during load()
26006INVALID_NEURAL_INPUTBatch input rejected by the addon
26007JOB_ALREADY_RUNNINGtranscribe() called while a job is in flight
26008MODEL_NOT_LOADEDInference called before load() or after destroy()
26009MODEL_FILE_NOT_FOUNDfiles.model missing/unreadable, or files.embedder provided but missing/invalid
26010BUFFER_LIMIT_EXCEEDEDNeural signal buffer exceeded the addon limit
26011FAILED_TO_START_JOBAddon refused to start the job
26012INVALID_CONFIGConstructor / context configuration rejected
26013EMBEDDER_WEIGHTS_INVALIDbci-embedder.bin failed validation
26014STREAM_ALREADY_ACTIVEtranscribeStream() called while one is already active
26015INVALID_STREAM_INPUTBad stream input type or streamOpts
26016INVALID_STREAM_HEADERStream [T u32, C u32] header malformed (e.g. C == 0)
26017WINDOW_TOO_LARGEwindowTimesteps exceeds MAX_WINDOW_TIMESTEPS (2900)

Codes are also re-exported via require('@qvac/bci-whispercpp/lib/error').ERR_CODES for programmatic matching.

Resources

  • whisper.cpp fork (Tether): tetherto/qvac-ext-lib-whisper.cpp
  • Sibling package — audio transcription: @qvac/transcription-whispercpp
  • vcpkg registry: qvac-registry-vcpkg
  • BrainWhisperer reference (Python): the model checkpoints converted by scripts/convert-model.py

Glossary

  • Bare — small modular JavaScript runtime for desktop and mobile. Learn more.
  • QVAC — Tether's open-source SDK for building decentralized, local-first AI applications.
  • GGML — tensor library / file format used by whisper.cpp for native inference.
  • BCI — Brain-Computer Interface; here, microelectrode-array recordings of neural activity decoded into text.
  • Day index (day_idx) — selects the day-specific low-rank projection (A·B) baked into bci-embedder.bin. Sessions recorded on different days use different projections.
  • Windowed attention — encoder attention mask restricted to a local window (w=57 over layers 0–3 by default), configured at model conversion time.

License

Apache-2.0