GGML-backed OCR addon for QVAC.
Provides two inference pipelines on ggml / .gguf — no Python, no
PyTorch, and no ONNX Runtime at runtime:
| Pipeline | Detector | Recognizer | Notes |
|---|---|---|---|
easyocr (default) | CRAFT | CRNN gen-2 (English / Latin) | Port of EasyOCR |
doctr | DBNet (MobileNetV3-Large) | CRNN (MobileNetV3-Small) | Port of doctr |
Select the pipeline at construction time via params.pipelineType
(default 'easyocr'). Both pipelines emit the same output shape.
Sibling of @qvac/ocr-onnx. Same input/output shape, same
public surface — only the inference engine differs.
@qvac/ocr-onnx | @qvac/ocr-ggml | |
|---|---|---|
| Inference backend | ONNX Runtime | GGML |
| Weight format | .onnx | .gguf |
| Pre/post-processing | C++ + OpenCV (EasyOCR) | C++ + OpenCV (EasyOCR + doctr, lifted) |
| Quantization | per-EP (limited) | block-quantized (Q8_0, Q4_K, …) out of the box |
| Pipelines | EasyOCR | EasyOCR + Doctr |
The C++ implementation is lifted from
EasyOcr-ggml; GGML is pulled
from qvac-fabric (instead of the upstream submodule), matching how the
sibling translation-nmtcpp addon consumes ggml.
npm install @qvac/ocr-ggml
The package ships a Bare addon. Build prerequisites (clang-22, libc++, vcpkg, bare-make) match the rest of the QVAC monorepo — see the root README for the canonical setup.
cd packages/ocr-ggml
npm install
bare-make generate
bare-make build
bare-make install # produces prebuilds/
const { OcrGgml } = require('@qvac/ocr-ggml')
const ocr = new OcrGgml({
params: {
pathDetector: '/abs/path/craft_mlt_25k.gguf',
pathRecognizer: '/abs/path/english_g2.gguf',
langList: ['en'],
magRatio: 1.5
},
opts: { stats: true }
})
await ocr.load()
const response = await ocr.run({
path: '/abs/path/photo.jpg',
options: { paragraph: false }
})
response.onUpdate(rows => {
for (const [box, text, conf] of rows) {
console.log(`[${conf.toFixed(2)}] ${text}`, box)
}
})
const stats = await response.await()
console.log(stats)
await ocr.unload()
bare examples/quickstart.js \
--image samples/english.png \
--detector models/craft_mlt_25k.gguf \
--recognizer models/english_g2.gguf \
--lang en
new OcrGgml({ params, opts?, logger? })| Field | Type | Required | Default | Description |
|---|---|---|---|---|
params.pathDetector | string | ✓ | — | detector .gguf (CRAFT for easyocr, DBNet for doctr) |
params.pathRecognizer | string | ✓ | — | recognizer .gguf (english_g2/latin_g2 for easyocr, doctr CRNN for doctr) |
params.langList | string[] | ✓ | — | language codes (['en'], ['en','fr'], …) — used by easyocr, ignored by doctr |
params.pipelineType | 'easyocr' | 'doctr' | 'easyocr' | which pipeline backs the addon | |
params.magRatio | number | 1.5 | CRAFT input-image magnification (easyocr only) | |
params.defaultRotationAngles | number[] | [90, 270] | rotations tried on low-confidence boxes (easyocr only) | |
params.contrastRetry | boolean | false | retry low-confidence boxes with contrast adjustment (easyocr only) | |
params.lowConfidenceThreshold | number | 0.4 | retry threshold (easyocr only) | |
params.recognizerBatchSize | number | 32 | recognizer batch size (easyocr only) | |
params.nThreads | number | 0 (auto) | CPU thread count for GGML; <0 leaves the GGML default | |
params.backendsDir | string | <package>/prebuilds | directory holding libggml-*.so backend shared libs | |
params.backendDevice | 'cpu' | 'vulkan' | 'metal' | 'opencl' | 'cpu' | ggml backend device. 'vulkan' (Linux/Windows/Android), 'metal' (Apple) and 'opencl' (Android/Adreno) opt in to GPU inference with transparent CPU fallback — see Backend device | |
params.gpuDevice | number | prefer discrete | 0-based index into the matching GPU/iGPU devices for 'vulkan'/'metal'/'opencl'; out-of-range → CPU fallback — see Selecting a specific GPU | |
opts.stats | boolean | false | emit timing stats on finish | |
logger | Object | null | optional { info, warn, error, debug } — receives C++ log lines |
load(): Promise<void> — loads both models, registers ggml backends, activates the addonrun(input): Promise<QvacResponse> — serialised; one job at a timeunload(): Promise<void> — frees the addon (destroys ggml contexts + backends)destroy(): Promise<void> — marks the instance as destroyed (no further use)getState(): InferenceClientStategetBackendInfo(): BackendInfo | null — backend device resolved at load() ({ requested, backendDevice, backendName, deviceIndex, backendDescription, fallbackReason }); null before load() / after unload(). deviceIndex is the ggml device index of the selected device (or -1 on CPU); backendDescription is the human-readable model (e.g. 'NVIDIA GeForce RTX 4090', 'Apple M3')OcrGgml.getModelKey(): string — "ocr-ggml", used by the inference managerBy default inference runs on the CPU ggml backend, which is always
available. Set params.backendDevice to 'vulkan' (Linux/Windows/Android),
'metal' (Apple) or 'opencl' (Android/Adreno) to opt in to GPU inference:
const ocr = new OcrGgml({
params: {
pathDetector: '/abs/path/craft_mlt_25k.gguf',
pathRecognizer: '/abs/path/english_g2.gguf',
langList: ['en'],
backendDevice: 'metal' // 'cpu' (default) | 'vulkan' | 'metal' | 'opencl'
}
})
await ocr.load()
console.log(ocr.getBackendInfo())
// Vulkan available → { requested: 'vulkan', backendDevice: 'GPU', backendName: 'Vulkan0', deviceIndex: 1, backendDescription: 'NVIDIA GeForce RTX 4090', fallbackReason: '' }
// no Vulkan device → { requested: 'vulkan', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'Vulkan backend requested but no Vulkan-capable GPU device was found; falling back to CPU' }
// Metal available → { requested: 'metal', backendDevice: 'GPU', backendName: 'MTL0', deviceIndex: 1, backendDescription: 'Apple M3 Ultra', fallbackReason: '' } // device name; 'MTL1'… on a multi-GPU host
// no Metal device → { requested: 'metal', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'Metal backend requested but no Metal-capable GPU device was found; falling back to CPU' }
// OpenCL available (Adreno) → { requested: 'opencl', backendDevice: 'GPU', backendName: 'GPUOpenCL', deviceIndex: 1, backendDescription: 'QUALCOMM Adreno(TM) 830', fallbackReason: '' }
// no OpenCL device → { requested: 'opencl', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'OpenCL backend requested but no OpenCL-capable GPU device was found; falling back to CPU' }
Behaviour and expectations:
'vulkan' / 'metal' / 'opencl' is
requested but no matching GPU device is registered, the pipeline falls back
to CPU and records a non-empty fallbackReason (also reflected by the numeric
backendIsGpu stat). It never silently does the wrong thing.libggml-vulkan
backend shared library (libggml-vulkan.so / .dll / .dylib) present in
backendsDir (default <package>/prebuilds/<target>/), plus a working
Vulkan driver/ICD and a Vulkan-capable GPU on the host. OpenCL likewise
needs the libggml-opencl backend shared library plus a working OpenCL
runtime (libOpenCL.so); it is built primarily for Android (the opencl
vcpkg dependency is Android-only). Metal is compiled into the addon (no
extra shared library), and is available whenever ggml was built with the
qvac-fabric gpu-backends feature (the default on Apple). These GPU backends
are only produced on platforms/feature sets where the upstream ggml port
builds them; on other hosts the request quietly falls back to CPU.'opencl' (OpenCL is Adreno's sound GPU family). As of
qvac-fabric 8828.1.2 the OpenCL backend implements the vision ops the OCR
graphs need (POOL_2D, CONV_2D_DW, HARDSWISH, HARDSIGMOID, …), so
both the EasyOCR and DocTR pipelines now run end-to-end on Adreno via
OpenCL — the EasyOCR CRAFT/CRNN and DocTR graphs take a backend-aware
ggml_conv_2d_direct path on OpenCL (see the Direct conv path section
below). Selection still runs a POOL_2D op-support probe on the chosen GPU
device as a safety net: any backend that cannot run a required op transparently
falls back to CPU with a fallbackReason instead of aborting at inference
(GGML_ABORT). On a build that ships the libggml-opencl backend lib,
requesting 'opencl' on an Adreno device resolves to the GPU.OCR_DOCTR_LSTM_CPU=1 to force the scalar CPU LSTM
path). On Mali, where the CPU would otherwise sit idle next to the Vulkan
recognizer, a CPU work-stealing assist runs a second feature extractor on
disjoint crop chunks concurrently and the LSTM is split across CPU + GPU.nThreads only affects the CPU backend; it is ignored when a
Vulkan, Metal or OpenCL device is selected.'metal',
DocTR → 'cpu' on Apple. Since backendDevice is per-instance, you can mix
both. (Numbers are workload/hardware dependent — measure for your case.)conv2d dispatches are pathologically slow under Vulkan,
so a plain backendDevice: 'vulkan' request on a Mali GPU auto-routes
detection to the CPU while keeping recognition on Vulkan (detected from
the GPU description at load time; no API change). On a Pixel 9 Pro (Mali-G715)
the clinical_chemistry page drops from ~11.9 s to ~2.7 s warm GPU end-to-end
with identical output. Other GPUs (Adreno OpenCL, Apple Metal, NVIDIA/Intel
Vulkan) keep full-GPU detection.gpuDevice)On a host with more than one GPU (e.g. a discrete GPU plus an integrated GPU, or two discrete GPUs) the backend resolves which device to use as follows:
gpuDevice): prefer discrete. Selection enumerates every
GPU/iGPU device that matches the requested backend (Vulkan or Metal) and
picks the first discrete GPU (GGML_BACKEND_DEVICE_TYPE_GPU); if none is
discrete it uses the first integrated GPU. This avoids accidentally
pinning inference to a weaker iGPU on laptops/APUs.gpuDevice: N. Pass a 0-based index to pin a specific device.
The index counts only the matching devices, in ggml enumeration order
(so gpuDevice: 0 is the first matching device, gpuDevice: 1 the second,
…). An out-of-range index transparently falls back to CPU and records a
fallbackReason naming the requested index and how many matching devices
were found. The resolved ggml device index is reported as
getBackendInfo().deviceIndex (and -1 on CPU).const ocr = new OcrGgml({
params: {
pathDetector: '/abs/path/craft_mlt_25k.gguf',
pathRecognizer: '/abs/path/english_g2.gguf',
langList: ['en'],
backendDevice: 'vulkan',
gpuDevice: 1 // pin the 2nd matching Vulkan device
}
})
await ocr.load()
console.log(ocr.getBackendInfo())
// → { requested: 'vulkan', backendDevice: 'GPU', backendName: 'Vulkan1',
// deviceIndex: 1, backendDescription: 'NVIDIA GeForce RTX 4090', fallbackReason: '' }
// out-of-range gpuDevice (e.g. 99) →
// { requested: 'vulkan', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1,
// backendDescription: '…',
// fallbackReason: 'Vulkan backend requested with gpuDevice index 99 but only N matching device(s) were found; falling back to CPU' }
gpuDevice applies to both Vulkan and Metal (the prefer-discrete default
and the index selection share one code path).
GGML_VK_VISIBLE_DEVICES). For pinning or reordering
Vulkan devices without code, ggml's Vulkan backend honours the
GGML_VK_VISIBLE_DEVICES environment variable — a comma-separated list of
device indices (e.g. GGML_VK_VISIBLE_DEVICES=1,0) that restricts and
reorders the Vulkan devices ggml exposes. Because this is applied by ggml
before the addon enumerates devices, it composes with gpuDevice: the
addon's index counts the (already filtered/reordered) visible devices. Use it
as an interim lever (e.g. in CI or a launcher script) when you cannot pass
gpuDevice through the API. It does not affect Metal.OCR_GGML_CRAFT_KERNEL_F32/F16 / OCR_GGML_CRNN_KERNEL_F32/F16)The EasyOCR pipeline can store its convolution kernels as F16 in the weights buffer, which lets ggml take the faster F16 im2col→GEMM conv path (and run on GPU backends). Kernels are cast F32→F16 at model-load time from the F32 GGUF — no separate F16 model file is needed, and biases plus the BatchNorm-fold math stay F32 (the recognizer's LSTM / linear / Prediction weights also stay F32).
F16 only helps where the resolved backend has a fast F16 GEMM, so the default is backend-aware (decided at model-load time from the selected ggml device):
| Resolved backend / device | Default |
|---|---|
| GPU / iGPU with fast F16 (NVIDIA, Apple Metal, Intel, AMD…) | F16 |
| Mali GPU (Vulkan) | F32 (its F16 GEMM is ~4× slower) |
| Apple-Silicon CPU (native FP16) | F16 |
| Other CPUs — x86, non-Apple ARM (F16 emulated) | F32 |
Adreno Vulkan is already skipped by backend selection (it runs on CPU), so it follows the CPU rule above.
Per-pipeline env vars override the backend-aware default (read once when the
model is loaded; only the exact value 1 applies; _F32 wins if both are set):
| Env var | Affects | Effect |
|---|---|---|
OCR_GGML_CRAFT_KERNEL_F32=1 | CRAFT detector conv kernels | force F32 |
OCR_GGML_CRAFT_KERNEL_F16=1 | CRAFT detector conv kernels | force F16 |
OCR_GGML_CRNN_KERNEL_F32=1 | CRNN gen-2 recognizer feature-extractor conv kernels | force F32 |
OCR_GGML_CRNN_KERNEL_F16=1 | CRNN gen-2 recognizer feature-extractor conv kernels | force F16 |
These are useful for A/B-benchmarking the F16 fast path or bisecting an accuracy regression. None of them affect the DocTR pipeline.
OCR_GGML_CONV1X1_MULMAT / OCR_GGML_CONV1X1_CONV2D)A 1×1 convolution is a per-pixel linear map over channels — i.e. a plain matrix
multiply. The EasyOCR pipeline can run a 1×1, stride-1, no-padding conv
either through ggml_conv_2d (im2col → GEMM) or a direct ggml_mul_mat that
skips the im2col lowering and its materialised buffer. This mainly affects the
CRAFT detector's 1×1 convs (the upconv*.conv.0 legs, basenet.slice5.2, and
conv_cls.6/.8).
Skipping im2col helps GPU GEMM backends but adds permute/cont overhead that does not pay off on CPU, so the default is backend-aware, resolved once at model-load time (mirrors the F16 kernel decision):
| Resolved backend | 1×1 conv default |
|---|---|
| GPU / accelerator (NVIDIA Vulkan, Apple Metal, Mali Vulkan) | mul_mat (~−19% total / −43% detection on NVIDIA, ~−10% on Metal, ~neutral on Mali — output verified identical) |
| Adreno on Vulkan | conv_2d — Adreno's Vulkan compute is numerically fragile (and is already auto-skipped to CPU). Keyed on the backend API, so the Adreno-OpenCL path is not affected and follows the GPU mul_mat default. |
| Any CPU (x86, Apple-Silicon, non-Apple ARM) | conv_2d (mul_mat is neutral-to-slower there) |
Two env vars override the default (read once at model load; only the exact value
1 applies; CONV2D wins if both are set):
| Env var | Effect |
|---|---|
OCR_GGML_CONV1X1_MULMAT=1 | force the mul_mat path on every backend |
OCR_GGML_CONV1X1_CONV2D=1 | force the ggml_conv_2d path on every backend |
These are useful for A/B-benchmarking the two paths or as an escape hatch if a
backend's mul_mat path ever misbehaves. They do not affect the DocTR pipeline.
OCR_GGML_DIRECT_CONV / OCR_GGML_IM2COL_CONV)The non-pointwise (e.g. 3×3) convs can run either through ggml_conv_2d
(im2col → GEMM) or the fused ggml_conv_2d_direct (GGML_OP_CONV_2D). On the
OpenCL backend (Adreno) the im2col path rides a slow f16×f16 GEMV, so the
direct kernel is much faster there (the EasyOCR counterpart of the DocTR
doctrConv2d work). On CPU/Vulkan/Metal the im2col path is kept (direct is
~2× slower on Metal). The default is therefore backend-aware, resolved once
at model-load time:
| Resolved backend | non-1×1 conv default |
|---|---|
| OpenCL (Adreno) | ggml_conv_2d_direct |
| CPU / Vulkan / Metal | ggml_conv_2d (im2col) |
Two env vars override the default (read once at model load; IM2COL wins if both
are set):
| Env var | Effect |
|---|---|
OCR_GGML_DIRECT_CONV=1 | force ggml_conv_2d_direct on every backend |
OCR_GGML_IM2COL_CONV=1 | force the ggml_conv_2d (im2col) path on every backend |
Note:
ggml_conv_2d_directis only implemented on some backends; forcing it on a backend withoutGGML_OP_CONV_2Dwill abort. It does not affect the DocTR pipeline.
OCR_GGML_CRAFT_BIAS_REPEAT)Each convolution adds a per-output-channel bias. By default the EasyOCR
pipeline adds the [OC] bias via ggml_add's implicit broadcast
(ggml_add(x, bias_reshaped[1,1,OC,1])), so the [W,H,OC,N] activation never
has to materialise a full repeated copy of the bias — a small memory/op saving
on every conv. This is numerically identical to the older ggml_repeat path
(ggml_add broadcasts its second operand on CPU/Vulkan/Metal; verified equal on
all three and ~8-15% faster on CPU).
Set OCR_GGML_CRAFT_BIAS_REPEAT=1 to fall back to the legacy ggml_repeat
broadcast — an escape hatch to recover without a code change if a backend's
broadcast-add ever misbehaves (read once at graph-build time; only the exact
value 1 enables it). It does not affect the DocTR pipeline.
OCR_GGML_CRNN_BIAS_REPEAT)The EasyOCR recognizer applies the same broadcast to its sequence biases:
the BiLSTM Linear and the final Prediction add their [F] bias via
ggml_add's implicit broadcast over the (T, N) axes, instead of materialising
a full [F, T, N] ggml_repeat copy. Numerically identical to the legacy path.
Set OCR_GGML_CRNN_BIAS_REPEAT=1 to fall back to the legacy ggml_repeat
broadcast — the recognizer-side counterpart of OCR_GGML_CRAFT_BIAS_REPEAT
(read once at graph-build time; only the exact value 1 enables it). It does
not affect the DocTR pipeline.
run(input) shape{
path: string, // JPEG / PNG / BMP file
options?: {
paragraph?: boolean, // merge nearby boxes
boxMarginMultiplier?: number, // padding around boxes
rotationAngles?: number[] // override defaults for this call
}
}
Output rows (delivered via response.onUpdate):
type InferredText = [
[[number, number], [number, number], [number, number], [number, number]], // 4-point box
string, // text
number // confidence [0..1]
]
This is byte-for-byte the same shape @qvac/ocr-onnx returns.
opts.stats=true){
totalTime: number, // seconds
detectionTime: number, // seconds (CRAFT inference)
recognitionTime: number, // seconds (CRNN inference)
numBoxes: number, // total boxes (aligned + unaligned)
backendIsGpu: number // 1 if inference ran on a GPU (Vulkan/Metal) device, else 0
}
The addon consumes GGUF weight files. Each pipeline expects its own detector + recognizer pair:
pipelineType: 'easyocr')| GGUF | Role |
|---|---|
craft_mlt_25k.gguf / *_q8_0.gguf / *_q4_k.gguf | CRAFT detector |
english_g2.gguf / *_q8_0.gguf / *_q4_k.gguf | English recognizer (gen-2) |
latin_g2.gguf | Latin-script recognizer (gen-2; fr/de/it/es/pt/…) |
Use the converter in the upstream
tetherto/easy-ocr-ggml
repo (scripts/pth_to_gguf.py) to produce these from EasyOCR PyTorch
.pth checkpoints.
This first release ships the gen-2 recognizer family only (English / Latin). Other language groups (Arabic, Bengali, Cyrillic, Devanagari, CJK) will land as GGUFs are produced.
pipelineType: 'doctr')| GGUF | Role |
|---|---|
db_mobilenet_v3_large.gguf | DBNet detector (MobileNetV3-Large backbone) |
crnn_mobilenet_v3_small.gguf | doctr recognizer (MobileNetV3-Small backbone) |
Doctr is language-agnostic: it recognises any Latin-script text the
underlying CRNN was trained on, so it ignores langList, magRatio and
the contrast-retry / rotation knobs.
CI pulls pinned snapshots of both the EasyOCR and Doctr GGUFs from S3
(see .github/workflows/integration-test-ocr-ggml.yml)
and exposes them to the integration suite via the
OCR_GGML_DETECTOR + OCR_GGML_RECOGNIZER env vars (EasyOCR) and
OCR_GGML_DOCTR_DETECTOR + OCR_GGML_DOCTR_RECOGNIZER env vars
(Doctr). Both pipelines are exercised end-to-end on every PR.
A development-time CLI ships at the package root, ocr-ggml-cli, modelled
on @qvac/translation-nmtcpp's nmt-cli. It is not included in the
npm artifact (same convention as nmt-cli); run it directly from the
repository checkout:
# Default: OCR samples/english.png with bundled English weights (easyocr)
bare ocr-ggml-cli
# Doctr pipeline (DBNet detector + doctr recognizer)
bare ocr-ggml-cli --pipeline-type doctr \
--detector models/db_mobilenet_v3_large.gguf \
--recognizer models/crnn_mobilenet_v3_small.gguf \
--image /tmp/photo.jpg
# Detail mode (index + confidence + box per recognised line)
bare ocr-ggml-cli --detail 1
# JSON output (matches EasyOCR Python's readtext shape)
bare ocr-ggml-cli --output-format json | jq .
# Custom image + Q8_0 quantized EasyOCR models
bare ocr-ggml-cli --image /tmp/photo.jpg \
--detector models/craft_mlt_25k_q8_0.gguf \
--recognizer models/english_g2_q8_0.gguf
# Force a specific CPU thread count, with verbose C++ logs
bare ocr-ggml-cli --n-threads 8 --verbose
# Show help / version
bare ocr-ggml-cli --help
bare ocr-ggml-cli --version
The CLI is functionally equivalent to upstream EasyOcr-ggml's ocr-cli
binary — same flag surface (--image, --detector, --recognizer,
--lang, --paragraph, --mag-ratio, --detail, --output-format,
--n-threads) plus --pipeline-type {easyocr,doctr} for the second
pipeline, and the nmt-cli ergonomics (env-var fallbacks
OCR_GGML_{IMAGE,DETECTOR,RECOGNIZER,PIPELINE_TYPE}, -h/--help,
-v/--version, --verbose for C++ log forwarding). One deliberate
omission for v1: --debug-png (annotated overlay) — print boxes via
--detail 1 or --output-format json and render externally instead.
| Script | Purpose |
|---|---|
scripts/check_ggml_backends.sh | Probe shipped ggml backends + BLAS/Vulkan/OpenCL paths in prebuilds/ |
Full usage in scripts/README.md. For weight
conversion (PyTorch .pth → GGUF), use the upstream converter in
tetherto/easy-ocr-ggml.
npm run lint
npm run test:unit # JS unit tests (no models required)
npm run test:integration # end-to-end smoke; soft-skips when models absent
npm run test:cpp # C++ GoogleTest (BUILD_TESTING=ON)
The integration smoke test reads the following env vars and runs each case only when the corresponding GGUFs are present on disk:
| Env var | Pipeline | Required for which test |
|---|---|---|
OCR_GGML_DETECTOR | EasyOCR | EasyOCR case |
OCR_GGML_RECOGNIZER | EasyOCR | EasyOCR case (CI uses latin_g2.gguf) |
OCR_GGML_DOCTR_DETECTOR | Doctr | Doctr case |
OCR_GGML_DOCTR_RECOGNIZER | Doctr | Doctr case |
OCR_GGML_IMAGE | — | overrides the default sample image |
OCR_GGML_BACKEND | — | manual ggml backend override for the whole suite: cpu, vulkan, metal or opencl (otherwise auto-detected, see below) |
CI sets these automatically; locally you can:
OCR_GGML_DETECTOR=$PWD/models/craft_mlt_25k.gguf \
OCR_GGML_RECOGNIZER=$PWD/models/latin_g2.gguf \
npm run test:integration
The harness auto-detects the backend. When the package ships a
ggml-vulkan backend lib in prebuilds/ (as the merged desktop CI prebuilds
do), the whole integration suite — every EasyOCR + DocTR case, with the same
expected-text / quality assertions as CPU — automatically runs through the
ggml Vulkan backend. This means the existing desktop test-<platform>-<arch>
integration job exercises Vulkan on the Vulkan-capable GPU runner (e.g.
qvac-ubuntu2404-x64-gpu) with no separate CI job.
On a host without a Vulkan-capable GPU (or without the ggml-vulkan backend
lib — e.g. local dev with unmerged prebuilds), the suite stays on CPU: when no
lib is present it never requests Vulkan, and when the lib is present but no GPU
is available the request transparently falls back to CPU. Either way the suite
still passes, and the recorded execution_provider reflects the backend
actually used (driven by the backendIsGpu stat), not the request.
OCR_GGML_BACKEND remains a manual override that takes precedence over
auto-detection — force the GPU path (or force CPU) with:
OCR_GGML_BACKEND=vulkan \
OCR_GGML_DETECTOR=$PWD/models/craft_mlt_25k.gguf \
OCR_GGML_RECOGNIZER=$PWD/models/latin_g2.gguf \
npm run test:integration
Android is the primary mobile Vulkan target, and the android-arm64 prebuild
ships the Vulkan backend lib (libqvac-ggml-vulkan.so). The mobile suite runs
on AWS Device Farm (see test/mobile/test-groups.json), where the harness
defaults to CPU — so a dedicated test,
test/integration/android-vulkan.test.js
(runAndroidVulkanTest, in the android → regularB shard), explicitly
requests backendDevice: 'vulkan'. It asserts the addon either runs on a
Vulkan device or reports an explicit CPU fallback, and — whichever backend
is resolved — that the OCR output is correct (an accuracy gate, not just an
"it executed" check). The test runs only on Android and is a clean skip on
desktop and iOS (iOS has no Vulkan).
Adreno caveat. Adreno Vulkan is numerically broken (cos-sim ~0.73 vs reference on Adreno 830 / Galaxy S25, while Mali / Metal / NVIDIA sit above 0.999 — see
vla-ggml).OcrBackendSelectiontherefore auto-skips Adreno GPUs for Vulkan and falls back to CPU (an explicitgpuDeviceindex still overrides this to force an Adreno device on purpose). The accuracy gate above is the backstop that catches a numerically-broken Vulkan device that slips through.
OpenCL is Adreno's sound GPU path (the inverse of the Vulkan Adreno guard above),
and the android-arm64 prebuild ships the OpenCL backend lib
(libqvac-ggml-opencl.so). Two tests exercise it:
test/integration/android-opencl.test.js
(runAndroidOpenclTest, android → regularB shard) requests
backendDevice: 'opencl' on real Device Farm devices and asserts the addon
either runs on an OpenCL device or reports an explicit CPU fallback —
with a correctness (accuracy) gate either way. Android-only; clean skip on
desktop and iOS.test/integration/opencl-backend.test.js
(runOpenclBackendTest) covers the desktop opt-in path and skips cleanly on
any host that did not ship a libggml-opencl backend lib.Because the OCR vision ops are now implemented on OpenCL, an Adreno device that
ships the OpenCL backend lib resolves 'opencl' to the GPU and runs both
pipelines on-device (rather than falling back to CPU).
The Benchmark Performance (OCR-GGML) workflow reuses the integration suites,
which already record both a Vulkan ([GPU]) and a forced-CPU ([CPU]) pass
for each test on a GPU host (runOcrComparison / runDoctrComparison, tagged
via the backendIsGpu stat). The shared perf-report aggregator
(scripts/perf-report/aggregate.js) pairs those rows per device + test and
renders a "CPU → Vulkan Speedup" section (markdown + HTML) showing
speedup = CPU mean / Vulkan mean for total / detection / recognition time.
The section only appears when a test ran on both backends, so non-GPU runs are
unaffected.
On mobile, Android attempts a GPU pass per device family: Mali devices
(e.g. Pixel) run on Vulkan, while Adreno devices — auto-skipped on
Vulkan — run the GPU pass on OpenCL instead, so both families fill the GPU
column (the harness probes the device once and picks Vulkan or OpenCL
accordingly). To compare output quality (not just speed) across backends,
the Python quality benchmark takes a --backend flag:
python benchmarks/quality_eval/benchmark_100.py \
--pipeline easyocr \
--detector models/craft_mlt_25k.gguf \
--recognizer models/latin_g2.gguf \
--backend vulkan # cpu (default) | vulkan — falls back to CPU when unavailable
packages/ocr-ggml/
├── package.json # @qvac/ocr-ggml (bare addon)
├── CMakeLists.txt # bare_module(ocr-ggml), links ggml + opencv4
├── vcpkg.json # ggml from qvac-fabric, opencv4, inference-addon-cpp
├── vcpkg-configuration.json
├── vcpkg/ # custom triplets + toolchains
├── ocr-ggml-cli # dev-time CLI (mirrors nmt-cli), not shipped to npm
├── binding.js # require.addon() entry
├── index.js, index.d.ts # public JS surface (OcrGgml class)
├── ocr-ggml.js # thin wrapper over the bare binding
├── addonLogging.{js,d.ts} # setLogger / releaseLogger surface
├── lib/error.js # QvacErrorAddonOcrGgml + ERR_CODES
├── examples/quickstart.js # JS code example
├── samples/ # sample fixture images (english.png, …)
├── scripts/ # check_ggml_backends.sh diagnostic
├── test/{unit,integration}
└── addon/src/
├── js-interface/binding.cpp # BARE_MODULE entry
├── addon/AddonJs.hpp # createInstance / runJob / output handler
├── model-interface/
│ ├── OcrTypes.hpp # shared OcrInput/OcrConfig + PipelineMode enum
│ └── Pipeline.{hpp,cpp} # unified IModel adapter (EasyOCR + DocTR via mode)
├── ggml/ # gguf_loader, ops, craft, crnn, weights (lifted)
├── pipeline/ # lang, steps, step_* (EasyOCR; lifted)
├── easyocr-ggml/ # headers for the EasyOCR lifted code
└── doctr-ggml/ # MobileNetGraph + DBNet/CRNN steps
tetherto/easy-ocr-ggml
(Apache-2.0).@qvac/translation-nmtcpp (ggml from
qvac-fabric, cmake-bare + cmake-vcpkg, inference-addon-cpp base
classes).@qvac/ocr-onnx so callers can swap engines transparently.Apache-2.0 (matches upstream EasyOCR, EasyOcr-ggml, @qvac/ocr-onnx, and
@qvac/translation-nmtcpp).