npm stats
  • Search
  • About
  • Repo
  • Sponsor
  • more
    • Search
    • About
    • Repo
    • Sponsor

Made by Antonio Ramirez

@qvac/ocr-ggml

0.7.0

@GitHub Actions

npmHomeRepoSnykSocket
Downloads:2030
$ npm install @qvac/ocr-ggml
DailyWeeklyMonthlyYearly

@qvac/ocr-ggml

GGML-backed OCR addon for QVAC. Provides two inference pipelines on ggml / .gguf — no Python, no PyTorch, and no ONNX Runtime at runtime:

PipelineDetectorRecognizerNotes
easyocr (default)CRAFTCRNN gen-2 (English / Latin)Port of EasyOCR
doctrDBNet (MobileNetV3-Large)CRNN (MobileNetV3-Small)Port of doctr

Select the pipeline at construction time via params.pipelineType (default 'easyocr'). Both pipelines emit the same output shape.

Sibling of @qvac/ocr-onnx. Same input/output shape, same public surface — only the inference engine differs.

@qvac/ocr-onnx@qvac/ocr-ggml
Inference backendONNX RuntimeGGML
Weight format.onnx.gguf
Pre/post-processingC++ + OpenCV (EasyOCR)C++ + OpenCV (EasyOCR + doctr, lifted)
Quantizationper-EP (limited)block-quantized (Q8_0, Q4_K, …) out of the box
PipelinesEasyOCREasyOCR + Doctr

The C++ implementation is lifted from EasyOcr-ggml; GGML is pulled from qvac-fabric (instead of the upstream submodule), matching how the sibling translation-nmtcpp addon consumes ggml.

Install

npm install @qvac/ocr-ggml

The package ships a Bare addon. Build prerequisites (clang-22, libc++, vcpkg, bare-make) match the rest of the QVAC monorepo — see the root README for the canonical setup.

cd packages/ocr-ggml
npm install
bare-make generate
bare-make build
bare-make install   # produces prebuilds/

Usage

const { OcrGgml } = require('@qvac/ocr-ggml')

const ocr = new OcrGgml({
  params: {
    pathDetector: '/abs/path/craft_mlt_25k.gguf',
    pathRecognizer: '/abs/path/english_g2.gguf',
    langList: ['en'],
    magRatio: 1.5
  },
  opts: { stats: true }
})

await ocr.load()

const response = await ocr.run({
  path: '/abs/path/photo.jpg',
  options: { paragraph: false }
})

response.onUpdate(rows => {
  for (const [box, text, conf] of rows) {
    console.log(`[${conf.toFixed(2)}] ${text}`, box)
  }
})

const stats = await response.await()
console.log(stats)

await ocr.unload()

Quickstart example

bare examples/quickstart.js \
  --image samples/english.png \
  --detector models/craft_mlt_25k.gguf \
  --recognizer models/english_g2.gguf \
  --lang en

API

new OcrGgml({ params, opts?, logger? })

FieldTypeRequiredDefaultDescription
params.pathDetectorstring✓—detector .gguf (CRAFT for easyocr, DBNet for doctr)
params.pathRecognizerstring✓—recognizer .gguf (english_g2/latin_g2 for easyocr, doctr CRNN for doctr)
params.langListstring[]✓—language codes (['en'], ['en','fr'], …) — used by easyocr, ignored by doctr
params.pipelineType'easyocr' | 'doctr''easyocr'which pipeline backs the addon
params.magRationumber1.5CRAFT input-image magnification (easyocr only)
params.defaultRotationAnglesnumber[][90, 270]rotations tried on low-confidence boxes (easyocr only)
params.contrastRetrybooleanfalseretry low-confidence boxes with contrast adjustment (easyocr only)
params.lowConfidenceThresholdnumber0.4retry threshold (easyocr only)
params.recognizerBatchSizenumber32recognizer batch size (easyocr only)
params.nThreadsnumber0 (auto)CPU thread count for GGML; <0 leaves the GGML default
params.backendsDirstring<package>/prebuildsdirectory holding libggml-*.so backend shared libs
params.backendDevice'cpu' | 'vulkan' | 'metal' | 'opencl''cpu'ggml backend device. 'vulkan' (Linux/Windows/Android), 'metal' (Apple) and 'opencl' (Android/Adreno) opt in to GPU inference with transparent CPU fallback — see Backend device
params.gpuDevicenumberprefer discrete0-based index into the matching GPU/iGPU devices for 'vulkan'/'metal'/'opencl'; out-of-range → CPU fallback — see Selecting a specific GPU
opts.statsbooleanfalseemit timing stats on finish
loggerObjectnulloptional { info, warn, error, debug } — receives C++ log lines

Methods

  • load(): Promise<void> — loads both models, registers ggml backends, activates the addon
  • run(input): Promise<QvacResponse> — serialised; one job at a time
  • unload(): Promise<void> — frees the addon (destroys ggml contexts + backends)
  • destroy(): Promise<void> — marks the instance as destroyed (no further use)
  • getState(): InferenceClientState
  • getBackendInfo(): BackendInfo | null — backend device resolved at load() ({ requested, backendDevice, backendName, deviceIndex, backendDescription, fallbackReason }); null before load() / after unload(). deviceIndex is the ggml device index of the selected device (or -1 on CPU); backendDescription is the human-readable model (e.g. 'NVIDIA GeForce RTX 4090', 'Apple M3')
  • OcrGgml.getModelKey(): string — "ocr-ggml", used by the inference manager

Backend device (CPU / Vulkan / Metal / OpenCL)

By default inference runs on the CPU ggml backend, which is always available. Set params.backendDevice to 'vulkan' (Linux/Windows/Android), 'metal' (Apple) or 'opencl' (Android/Adreno) to opt in to GPU inference:

const ocr = new OcrGgml({
  params: {
    pathDetector: '/abs/path/craft_mlt_25k.gguf',
    pathRecognizer: '/abs/path/english_g2.gguf',
    langList: ['en'],
    backendDevice: 'metal'   // 'cpu' (default) | 'vulkan' | 'metal' | 'opencl'
  }
})
await ocr.load()
console.log(ocr.getBackendInfo())
// Vulkan available → { requested: 'vulkan', backendDevice: 'GPU', backendName: 'Vulkan0', deviceIndex: 1, backendDescription: 'NVIDIA GeForce RTX 4090', fallbackReason: '' }
// no Vulkan device → { requested: 'vulkan', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'Vulkan backend requested but no Vulkan-capable GPU device was found; falling back to CPU' }
// Metal available  → { requested: 'metal',  backendDevice: 'GPU', backendName: 'MTL0', deviceIndex: 1, backendDescription: 'Apple M3 Ultra', fallbackReason: '' }  // device name; 'MTL1'… on a multi-GPU host
// no Metal device  → { requested: 'metal',  backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'Metal backend requested but no Metal-capable GPU device was found; falling back to CPU' }
// OpenCL available (Adreno) → { requested: 'opencl', backendDevice: 'GPU', backendName: 'GPUOpenCL', deviceIndex: 1, backendDescription: 'QUALCOMM Adreno(TM) 830', fallbackReason: '' }
// no OpenCL device → { requested: 'opencl', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1, backendDescription: '…', fallbackReason: 'OpenCL backend requested but no OpenCL-capable GPU device was found; falling back to CPU' }

Behaviour and expectations:

  • Transparent CPU fallback. When 'vulkan' / 'metal' / 'opencl' is requested but no matching GPU device is registered, the pipeline falls back to CPU and records a non-empty fallbackReason (also reflected by the numeric backendIsGpu stat). It never silently does the wrong thing.
  • Required backend libs. Vulkan execution needs the libggml-vulkan backend shared library (libggml-vulkan.so / .dll / .dylib) present in backendsDir (default <package>/prebuilds/<target>/), plus a working Vulkan driver/ICD and a Vulkan-capable GPU on the host. OpenCL likewise needs the libggml-opencl backend shared library plus a working OpenCL runtime (libOpenCL.so); it is built primarily for Android (the opencl vcpkg dependency is Android-only). Metal is compiled into the addon (no extra shared library), and is available whenever ggml was built with the qvac-fabric gpu-backends feature (the default on Apple). These GPU backends are only produced on platforms/feature sets where the upstream ggml port builds them; on other hosts the request quietly falls back to CPU.
  • OpenCL is the Adreno GPU path. Qualcomm Adreno GPUs are skipped on the auto Vulkan path (their Vulkan compute is numerically broken) but are the intended target for 'opencl' (OpenCL is Adreno's sound GPU family). As of qvac-fabric 8828.1.2 the OpenCL backend implements the vision ops the OCR graphs need (POOL_2D, CONV_2D_DW, HARDSWISH, HARDSIGMOID, …), so both the EasyOCR and DocTR pipelines now run end-to-end on Adreno via OpenCL — the EasyOCR CRAFT/CRNN and DocTR graphs take a backend-aware ggml_conv_2d_direct path on OpenCL (see the Direct conv path section below). Selection still runs a POOL_2D op-support probe on the chosen GPU device as a safety net: any backend that cannot run a required op transparently falls back to CPU with a fallbackReason instead of aborting at inference (GGML_ABORT). On a build that ships the libggml-opencl backend lib, requesting 'opencl' on an Adreno device resolves to the GPU.
  • DocTR recognizer. The MobileNetV3 feature-extractor graph and the bidirectional LSTM + linear classifier run on the selected ggml device as a batched ggml graph (set OCR_DOCTR_LSTM_CPU=1 to force the scalar CPU LSTM path). On Mali, where the CPU would otherwise sit idle next to the Vulkan recognizer, a CPU work-stealing assist runs a second feature extractor on disjoint crop chunks concurrently and the LSTM is split across CPU + GPU.
  • Threads. nThreads only affects the CPU backend; it is ignored when a Vulkan, Metal or OpenCL device is selected.
  • Performance guidance (Metal). The win depends on the detector. The EasyOCR pipeline's CRAFT detector is dense-convolution and benefits strongly from the GPU (≈4.5× faster on Metal on an Apple M3 Ultra vs CPU). The DocTR detector is MobileNetV3 (depthwise-separable convolutions) — a low-arithmetic -intensity, GPU-unfriendly workload that runs slower on Metal than on CPU; output is identical either way. Recommended default: EasyOCR → 'metal', DocTR → 'cpu' on Apple. Since backendDevice is per-instance, you can mix both. (Numbers are workload/hardware dependent — measure for your case.)
  • Performance guidance (Mali, DocTR). On Arm Mali / Immortalis GPUs the DBNet detector's many conv2d dispatches are pathologically slow under Vulkan, so a plain backendDevice: 'vulkan' request on a Mali GPU auto-routes detection to the CPU while keeping recognition on Vulkan (detected from the GPU description at load time; no API change). On a Pixel 9 Pro (Mali-G715) the clinical_chemistry page drops from ~11.9 s to ~2.7 s warm GPU end-to-end with identical output. Other GPUs (Adreno OpenCL, Apple Metal, NVIDIA/Intel Vulkan) keep full-GPU detection.

Selecting a specific GPU (gpuDevice)

On a host with more than one GPU (e.g. a discrete GPU plus an integrated GPU, or two discrete GPUs) the backend resolves which device to use as follows:

  • Default (no gpuDevice): prefer discrete. Selection enumerates every GPU/iGPU device that matches the requested backend (Vulkan or Metal) and picks the first discrete GPU (GGML_BACKEND_DEVICE_TYPE_GPU); if none is discrete it uses the first integrated GPU. This avoids accidentally pinning inference to a weaker iGPU on laptops/APUs.
  • Explicit gpuDevice: N. Pass a 0-based index to pin a specific device. The index counts only the matching devices, in ggml enumeration order (so gpuDevice: 0 is the first matching device, gpuDevice: 1 the second, …). An out-of-range index transparently falls back to CPU and records a fallbackReason naming the requested index and how many matching devices were found. The resolved ggml device index is reported as getBackendInfo().deviceIndex (and -1 on CPU).
const ocr = new OcrGgml({
  params: {
    pathDetector: '/abs/path/craft_mlt_25k.gguf',
    pathRecognizer: '/abs/path/english_g2.gguf',
    langList: ['en'],
    backendDevice: 'vulkan',
    gpuDevice: 1            // pin the 2nd matching Vulkan device
  }
})
await ocr.load()
console.log(ocr.getBackendInfo())
// → { requested: 'vulkan', backendDevice: 'GPU', backendName: 'Vulkan1',
//     deviceIndex: 1, backendDescription: 'NVIDIA GeForce RTX 4090', fallbackReason: '' }
// out-of-range gpuDevice (e.g. 99) →
//   { requested: 'vulkan', backendDevice: 'CPU', backendName: 'CPU', deviceIndex: -1,
//     backendDescription: '…',
//     fallbackReason: 'Vulkan backend requested with gpuDevice index 99 but only N matching device(s) were found; falling back to CPU' }

gpuDevice applies to both Vulkan and Metal (the prefer-discrete default and the index selection share one code path).

  • Interim env lever (GGML_VK_VISIBLE_DEVICES). For pinning or reordering Vulkan devices without code, ggml's Vulkan backend honours the GGML_VK_VISIBLE_DEVICES environment variable — a comma-separated list of device indices (e.g. GGML_VK_VISIBLE_DEVICES=1,0) that restricts and reorders the Vulkan devices ggml exposes. Because this is applied by ggml before the addon enumerates devices, it composes with gpuDevice: the addon's index counts the (already filtered/reordered) visible devices. Use it as an interim lever (e.g. in CI or a launcher script) when you cannot pass gpuDevice through the API. It does not affect Metal.

Kernel precision (OCR_GGML_CRAFT_KERNEL_F32/F16 / OCR_GGML_CRNN_KERNEL_F32/F16)

The EasyOCR pipeline can store its convolution kernels as F16 in the weights buffer, which lets ggml take the faster F16 im2col→GEMM conv path (and run on GPU backends). Kernels are cast F32→F16 at model-load time from the F32 GGUF — no separate F16 model file is needed, and biases plus the BatchNorm-fold math stay F32 (the recognizer's LSTM / linear / Prediction weights also stay F32).

F16 only helps where the resolved backend has a fast F16 GEMM, so the default is backend-aware (decided at model-load time from the selected ggml device):

Resolved backend / deviceDefault
GPU / iGPU with fast F16 (NVIDIA, Apple Metal, Intel, AMD…)F16
Mali GPU (Vulkan)F32 (its F16 GEMM is ~4× slower)
Apple-Silicon CPU (native FP16)F16
Other CPUs — x86, non-Apple ARM (F16 emulated)F32

Adreno Vulkan is already skipped by backend selection (it runs on CPU), so it follows the CPU rule above.

Per-pipeline env vars override the backend-aware default (read once when the model is loaded; only the exact value 1 applies; _F32 wins if both are set):

Env varAffectsEffect
OCR_GGML_CRAFT_KERNEL_F32=1CRAFT detector conv kernelsforce F32
OCR_GGML_CRAFT_KERNEL_F16=1CRAFT detector conv kernelsforce F16
OCR_GGML_CRNN_KERNEL_F32=1CRNN gen-2 recognizer feature-extractor conv kernelsforce F32
OCR_GGML_CRNN_KERNEL_F16=1CRNN gen-2 recognizer feature-extractor conv kernelsforce F16

These are useful for A/B-benchmarking the F16 fast path or bisecting an accuracy regression. None of them affect the DocTR pipeline.

1×1 conv path (backend-aware; OCR_GGML_CONV1X1_MULMAT / OCR_GGML_CONV1X1_CONV2D)

A 1×1 convolution is a per-pixel linear map over channels — i.e. a plain matrix multiply. The EasyOCR pipeline can run a 1×1, stride-1, no-padding conv either through ggml_conv_2d (im2col → GEMM) or a direct ggml_mul_mat that skips the im2col lowering and its materialised buffer. This mainly affects the CRAFT detector's 1×1 convs (the upconv*.conv.0 legs, basenet.slice5.2, and conv_cls.6/.8).

Skipping im2col helps GPU GEMM backends but adds permute/cont overhead that does not pay off on CPU, so the default is backend-aware, resolved once at model-load time (mirrors the F16 kernel decision):

Resolved backend1×1 conv default
GPU / accelerator (NVIDIA Vulkan, Apple Metal, Mali Vulkan)mul_mat (~−19% total / −43% detection on NVIDIA, ~−10% on Metal, ~neutral on Mali — output verified identical)
Adreno on Vulkanconv_2d — Adreno's Vulkan compute is numerically fragile (and is already auto-skipped to CPU). Keyed on the backend API, so the Adreno-OpenCL path is not affected and follows the GPU mul_mat default.
Any CPU (x86, Apple-Silicon, non-Apple ARM)conv_2d (mul_mat is neutral-to-slower there)

Two env vars override the default (read once at model load; only the exact value 1 applies; CONV2D wins if both are set):

Env varEffect
OCR_GGML_CONV1X1_MULMAT=1force the mul_mat path on every backend
OCR_GGML_CONV1X1_CONV2D=1force the ggml_conv_2d path on every backend

These are useful for A/B-benchmarking the two paths or as an escape hatch if a backend's mul_mat path ever misbehaves. They do not affect the DocTR pipeline.

Direct conv path (backend-aware; OCR_GGML_DIRECT_CONV / OCR_GGML_IM2COL_CONV)

The non-pointwise (e.g. 3×3) convs can run either through ggml_conv_2d (im2col → GEMM) or the fused ggml_conv_2d_direct (GGML_OP_CONV_2D). On the OpenCL backend (Adreno) the im2col path rides a slow f16×f16 GEMV, so the direct kernel is much faster there (the EasyOCR counterpart of the DocTR doctrConv2d work). On CPU/Vulkan/Metal the im2col path is kept (direct is ~2× slower on Metal). The default is therefore backend-aware, resolved once at model-load time:

Resolved backendnon-1×1 conv default
OpenCL (Adreno)ggml_conv_2d_direct
CPU / Vulkan / Metalggml_conv_2d (im2col)

Two env vars override the default (read once at model load; IM2COL wins if both are set):

Env varEffect
OCR_GGML_DIRECT_CONV=1force ggml_conv_2d_direct on every backend
OCR_GGML_IM2COL_CONV=1force the ggml_conv_2d (im2col) path on every backend

Note: ggml_conv_2d_direct is only implemented on some backends; forcing it on a backend without GGML_OP_CONV_2D will abort. It does not affect the DocTR pipeline.

Conv bias broadcast (OCR_GGML_CRAFT_BIAS_REPEAT)

Each convolution adds a per-output-channel bias. By default the EasyOCR pipeline adds the [OC] bias via ggml_add's implicit broadcast (ggml_add(x, bias_reshaped[1,1,OC,1])), so the [W,H,OC,N] activation never has to materialise a full repeated copy of the bias — a small memory/op saving on every conv. This is numerically identical to the older ggml_repeat path (ggml_add broadcasts its second operand on CPU/Vulkan/Metal; verified equal on all three and ~8-15% faster on CPU).

Set OCR_GGML_CRAFT_BIAS_REPEAT=1 to fall back to the legacy ggml_repeat broadcast — an escape hatch to recover without a code change if a backend's broadcast-add ever misbehaves (read once at graph-build time; only the exact value 1 enables it). It does not affect the DocTR pipeline.

CRNN recognizer bias broadcast (OCR_GGML_CRNN_BIAS_REPEAT)

The EasyOCR recognizer applies the same broadcast to its sequence biases: the BiLSTM Linear and the final Prediction add their [F] bias via ggml_add's implicit broadcast over the (T, N) axes, instead of materialising a full [F, T, N] ggml_repeat copy. Numerically identical to the legacy path.

Set OCR_GGML_CRNN_BIAS_REPEAT=1 to fall back to the legacy ggml_repeat broadcast — the recognizer-side counterpart of OCR_GGML_CRAFT_BIAS_REPEAT (read once at graph-build time; only the exact value 1 enables it). It does not affect the DocTR pipeline.

run(input) shape

{
  path: string,                    // JPEG / PNG / BMP file
  options?: {
    paragraph?: boolean,           // merge nearby boxes
    boxMarginMultiplier?: number,  // padding around boxes
    rotationAngles?: number[]      // override defaults for this call
  }
}

Output rows (delivered via response.onUpdate):

type InferredText = [
  [[number, number], [number, number], [number, number], [number, number]],  // 4-point box
  string,                                                                    // text
  number                                                                     // confidence [0..1]
]

This is byte-for-byte the same shape @qvac/ocr-onnx returns.

Stats (when opts.stats=true)

{
  totalTime: number,        // seconds
  detectionTime: number,    // seconds (CRAFT inference)
  recognitionTime: number,  // seconds (CRNN inference)
  numBoxes: number,         // total boxes (aligned + unaligned)
  backendIsGpu: number      // 1 if inference ran on a GPU (Vulkan/Metal) device, else 0
}

Models

The addon consumes GGUF weight files. Each pipeline expects its own detector + recognizer pair:

EasyOCR pipeline (pipelineType: 'easyocr')

GGUFRole
craft_mlt_25k.gguf / *_q8_0.gguf / *_q4_k.ggufCRAFT detector
english_g2.gguf / *_q8_0.gguf / *_q4_k.ggufEnglish recognizer (gen-2)
latin_g2.ggufLatin-script recognizer (gen-2; fr/de/it/es/pt/…)

Use the converter in the upstream tetherto/easy-ocr-ggml repo (scripts/pth_to_gguf.py) to produce these from EasyOCR PyTorch .pth checkpoints.

This first release ships the gen-2 recognizer family only (English / Latin). Other language groups (Arabic, Bengali, Cyrillic, Devanagari, CJK) will land as GGUFs are produced.

Doctr pipeline (pipelineType: 'doctr')

GGUFRole
db_mobilenet_v3_large.ggufDBNet detector (MobileNetV3-Large backbone)
crnn_mobilenet_v3_small.ggufdoctr recognizer (MobileNetV3-Small backbone)

Doctr is language-agnostic: it recognises any Latin-script text the underlying CRNN was trained on, so it ignores langList, magRatio and the contrast-retry / rotation knobs.

CI distribution

CI pulls pinned snapshots of both the EasyOCR and Doctr GGUFs from S3 (see .github/workflows/integration-test-ocr-ggml.yml) and exposes them to the integration suite via the OCR_GGML_DETECTOR + OCR_GGML_RECOGNIZER env vars (EasyOCR) and OCR_GGML_DOCTR_DETECTOR + OCR_GGML_DOCTR_RECOGNIZER env vars (Doctr). Both pipelines are exercised end-to-end on every PR.

CLI

A development-time CLI ships at the package root, ocr-ggml-cli, modelled on @qvac/translation-nmtcpp's nmt-cli. It is not included in the npm artifact (same convention as nmt-cli); run it directly from the repository checkout:

# Default: OCR samples/english.png with bundled English weights (easyocr)
bare ocr-ggml-cli

# Doctr pipeline (DBNet detector + doctr recognizer)
bare ocr-ggml-cli --pipeline-type doctr \
                  --detector models/db_mobilenet_v3_large.gguf \
                  --recognizer models/crnn_mobilenet_v3_small.gguf \
                  --image /tmp/photo.jpg

# Detail mode (index + confidence + box per recognised line)
bare ocr-ggml-cli --detail 1

# JSON output (matches EasyOCR Python's readtext shape)
bare ocr-ggml-cli --output-format json | jq .

# Custom image + Q8_0 quantized EasyOCR models
bare ocr-ggml-cli --image /tmp/photo.jpg \
                  --detector models/craft_mlt_25k_q8_0.gguf \
                  --recognizer models/english_g2_q8_0.gguf

# Force a specific CPU thread count, with verbose C++ logs
bare ocr-ggml-cli --n-threads 8 --verbose

# Show help / version
bare ocr-ggml-cli --help
bare ocr-ggml-cli --version

The CLI is functionally equivalent to upstream EasyOcr-ggml's ocr-cli binary — same flag surface (--image, --detector, --recognizer, --lang, --paragraph, --mag-ratio, --detail, --output-format, --n-threads) plus --pipeline-type {easyocr,doctr} for the second pipeline, and the nmt-cli ergonomics (env-var fallbacks OCR_GGML_{IMAGE,DETECTOR,RECOGNIZER,PIPELINE_TYPE}, -h/--help, -v/--version, --verbose for C++ log forwarding). One deliberate omission for v1: --debug-png (annotated overlay) — print boxes via --detail 1 or --output-format json and render externally instead.

Scripts

ScriptPurpose
scripts/check_ggml_backends.shProbe shipped ggml backends + BLAS/Vulkan/OpenCL paths in prebuilds/

Full usage in scripts/README.md. For weight conversion (PyTorch .pth → GGUF), use the upstream converter in tetherto/easy-ocr-ggml.

Testing

npm run lint
npm run test:unit          # JS unit tests (no models required)
npm run test:integration   # end-to-end smoke; soft-skips when models absent
npm run test:cpp           # C++ GoogleTest (BUILD_TESTING=ON)

The integration smoke test reads the following env vars and runs each case only when the corresponding GGUFs are present on disk:

Env varPipelineRequired for which test
OCR_GGML_DETECTOREasyOCREasyOCR case
OCR_GGML_RECOGNIZEREasyOCREasyOCR case (CI uses latin_g2.gguf)
OCR_GGML_DOCTR_DETECTORDoctrDoctr case
OCR_GGML_DOCTR_RECOGNIZERDoctrDoctr case
OCR_GGML_IMAGE—overrides the default sample image
OCR_GGML_BACKEND—manual ggml backend override for the whole suite: cpu, vulkan, metal or opencl (otherwise auto-detected, see below)

CI sets these automatically; locally you can:

OCR_GGML_DETECTOR=$PWD/models/craft_mlt_25k.gguf \
OCR_GGML_RECOGNIZER=$PWD/models/latin_g2.gguf \
npm run test:integration

Running the suite on Vulkan (GPU)

The harness auto-detects the backend. When the package ships a ggml-vulkan backend lib in prebuilds/ (as the merged desktop CI prebuilds do), the whole integration suite — every EasyOCR + DocTR case, with the same expected-text / quality assertions as CPU — automatically runs through the ggml Vulkan backend. This means the existing desktop test-<platform>-<arch> integration job exercises Vulkan on the Vulkan-capable GPU runner (e.g. qvac-ubuntu2404-x64-gpu) with no separate CI job.

On a host without a Vulkan-capable GPU (or without the ggml-vulkan backend lib — e.g. local dev with unmerged prebuilds), the suite stays on CPU: when no lib is present it never requests Vulkan, and when the lib is present but no GPU is available the request transparently falls back to CPU. Either way the suite still passes, and the recorded execution_provider reflects the backend actually used (driven by the backendIsGpu stat), not the request.

OCR_GGML_BACKEND remains a manual override that takes precedence over auto-detection — force the GPU path (or force CPU) with:

OCR_GGML_BACKEND=vulkan \
OCR_GGML_DETECTOR=$PWD/models/craft_mlt_25k.gguf \
OCR_GGML_RECOGNIZER=$PWD/models/latin_g2.gguf \
npm run test:integration

Android Vulkan (mobile suite)

Android is the primary mobile Vulkan target, and the android-arm64 prebuild ships the Vulkan backend lib (libqvac-ggml-vulkan.so). The mobile suite runs on AWS Device Farm (see test/mobile/test-groups.json), where the harness defaults to CPU — so a dedicated test, test/integration/android-vulkan.test.js (runAndroidVulkanTest, in the android → regularB shard), explicitly requests backendDevice: 'vulkan'. It asserts the addon either runs on a Vulkan device or reports an explicit CPU fallback, and — whichever backend is resolved — that the OCR output is correct (an accuracy gate, not just an "it executed" check). The test runs only on Android and is a clean skip on desktop and iOS (iOS has no Vulkan).

Adreno caveat. Adreno Vulkan is numerically broken (cos-sim ~0.73 vs reference on Adreno 830 / Galaxy S25, while Mali / Metal / NVIDIA sit above 0.999 — see vla-ggml). OcrBackendSelection therefore auto-skips Adreno GPUs for Vulkan and falls back to CPU (an explicit gpuDevice index still overrides this to force an Adreno device on purpose). The accuracy gate above is the backstop that catches a numerically-broken Vulkan device that slips through.

Android OpenCL (mobile suite)

OpenCL is Adreno's sound GPU path (the inverse of the Vulkan Adreno guard above), and the android-arm64 prebuild ships the OpenCL backend lib (libqvac-ggml-opencl.so). Two tests exercise it:

  • test/integration/android-opencl.test.js (runAndroidOpenclTest, android → regularB shard) requests backendDevice: 'opencl' on real Device Farm devices and asserts the addon either runs on an OpenCL device or reports an explicit CPU fallback — with a correctness (accuracy) gate either way. Android-only; clean skip on desktop and iOS.
  • test/integration/opencl-backend.test.js (runOpenclBackendTest) covers the desktop opt-in path and skips cleanly on any host that did not ship a libggml-opencl backend lib.

Because the OCR vision ops are now implemented on OpenCL, an Adreno device that ships the OpenCL backend lib resolves 'opencl' to the GPU and runs both pipelines on-device (rather than falling back to CPU).

CPU-vs-Vulkan benchmark

The Benchmark Performance (OCR-GGML) workflow reuses the integration suites, which already record both a Vulkan ([GPU]) and a forced-CPU ([CPU]) pass for each test on a GPU host (runOcrComparison / runDoctrComparison, tagged via the backendIsGpu stat). The shared perf-report aggregator (scripts/perf-report/aggregate.js) pairs those rows per device + test and renders a "CPU → Vulkan Speedup" section (markdown + HTML) showing speedup = CPU mean / Vulkan mean for total / detection / recognition time. The section only appears when a test ran on both backends, so non-GPU runs are unaffected.

On mobile, Android attempts a GPU pass per device family: Mali devices (e.g. Pixel) run on Vulkan, while Adreno devices — auto-skipped on Vulkan — run the GPU pass on OpenCL instead, so both families fill the GPU column (the harness probes the device once and picks Vulkan or OpenCL accordingly). To compare output quality (not just speed) across backends, the Python quality benchmark takes a --backend flag:

python benchmarks/quality_eval/benchmark_100.py \
  --pipeline easyocr \
  --detector models/craft_mlt_25k.gguf \
  --recognizer models/latin_g2.gguf \
  --backend vulkan   # cpu (default) | vulkan — falls back to CPU when unavailable

Repository layout

packages/ocr-ggml/
├── package.json             # @qvac/ocr-ggml (bare addon)
├── CMakeLists.txt           # bare_module(ocr-ggml), links ggml + opencv4
├── vcpkg.json               # ggml from qvac-fabric, opencv4, inference-addon-cpp
├── vcpkg-configuration.json
├── vcpkg/                   # custom triplets + toolchains
├── ocr-ggml-cli             # dev-time CLI (mirrors nmt-cli), not shipped to npm
├── binding.js               # require.addon() entry
├── index.js, index.d.ts     # public JS surface (OcrGgml class)
├── ocr-ggml.js              # thin wrapper over the bare binding
├── addonLogging.{js,d.ts}   # setLogger / releaseLogger surface
├── lib/error.js             # QvacErrorAddonOcrGgml + ERR_CODES
├── examples/quickstart.js   # JS code example
├── samples/                 # sample fixture images (english.png, …)
├── scripts/                 # check_ggml_backends.sh diagnostic
├── test/{unit,integration}
└── addon/src/
    ├── js-interface/binding.cpp                  # BARE_MODULE entry
    ├── addon/AddonJs.hpp                         # createInstance / runJob / output handler
    ├── model-interface/
    │   ├── OcrTypes.hpp                          # shared OcrInput/OcrConfig + PipelineMode enum
    │   └── Pipeline.{hpp,cpp}                    # unified IModel adapter (EasyOCR + DocTR via mode)
    ├── ggml/                                     # gguf_loader, ops, craft, crnn, weights (lifted)
    ├── pipeline/                                 # lang, steps, step_* (EasyOCR; lifted)
    ├── easyocr-ggml/                             # headers for the EasyOCR lifted code
    └── doctr-ggml/                               # MobileNetGraph + DBNet/CRNN steps

Provenance

  • C++ pipeline + GGML graph code lifted from tetherto/easy-ocr-ggml (Apache-2.0).
  • Build / addon plumbing modelled on @qvac/translation-nmtcpp (ggml from qvac-fabric, cmake-bare + cmake-vcpkg, inference-addon-cpp base classes).
  • Public JS surface modelled on @qvac/ocr-onnx so callers can swap engines transparently.

License

Apache-2.0 (matches upstream EasyOCR, EasyOcr-ggml, @qvac/ocr-onnx, and @qvac/translation-nmtcpp).