@qvac/ocr-onnx

This library provides Optical Character Recognition (OCR) capabilities for QVAC runtime applications, leveraging the ONNX Runtime for efficient inference.

The library supports two OCR pipelines:

EasyOCR pipeline (default): Uses CRAFT detector + language-specific recognizers.
DocTR pipeline: Uses DBNet detector + CRNN/PARSeq recognizer, matching the OnnxTR Python library.

Supported Platforms
Installation
Building from Source
Usage
Output Format
Glossary
Supported Languages
DocTR Pipeline
Contributing
License
Support

Supported Platforms

Platform	Architecture	Min Version	Status
macOS	arm64, x64	14.0+	Tier 1
iOS	arm64	17.0+	Tier 1
Linux	arm64, x64	Ubuntu 22+	Tier 1
Android	arm64	12+	Tier 1
Windows	x64	10+	Tier 1

Installation

Prerequisites

Install Bare Runtime:

npm install -g bare

Note: Make sure the Bare version is >= 1.19.3. Check this using:

bare -v

Installing the Package

Install the latest version of the package:

npm install @qvac/ocr-onnx@latest

Building from Source

If you want to build the addon from source (for development or customization), follow these steps:

Prerequisites

Before building, ensure you have the following installed:

vcpkg - Cross-platform C++ package manager

git clone https://github.com/microsoft/vcpkg.git
cd vcpkg && ./bootstrap-vcpkg.sh -disableMetrics
export VCPKG_ROOT=/path/to/vcpkg
export PATH=$VCPKG_ROOT:$PATH

Build tools for your platform:
- Linux: sudo apt install build-essential autoconf automake libtool pkg-config
- macOS: Xcode command line tools
- Windows: Visual Studio with C++ build tools
Node.js and npm (version 18+ recommended)
Bare runtime and build tools:
```
npm install -g bare-runtime bare-make
```

Building the Addon

Clone the repository:

git clone https://github.com/tetherto/qvac.git
cd qvac/packages/ocr-onnx

Install dependencies:
```
npm install
```
Build the addon:
```
npm run build
```

This command will:

Generate CMake build files (bare-make generate)
Build the native addon (bare-make build)
Install the addon to the prebuilds directory (bare-make install)

Verifying the Build

After building, verify everything works by running the registry example:

bare examples/example.registry.js

This example will:

Download the detector and recognizer models from the registry (cached locally)
Load the OCR model
Run text recognition on a test image
Display the detected text with confidence scores

Examples

The examples/ folder contains several examples to help you get started:

Example	Description
`example.registry.js`	Downloads models from registry and runs OCR
`example.fs.js`	Basic OCR using local model files
`exampleGPU.fs.js`	OCR with GPU acceleration enabled
`example.logger.js`	OCR with custom logging
`visualize_ocr.js`	Runs OCR and saves results to JSON for visualization
`draw_boxes.py`	Python script to draw bounding boxes on images using OCR results

Usage

The library provides a straightforward workflow for image-based text recognition. The pipeline is selected via the pipelineMode parameter: 'easyocr' (default) or 'doctr'.

1. Configure Parameters

EasyOCR Mode (default)

const args = {
  params: {
    // Required
    langList: ['en'],                        // Language codes for recognizer selection
    pathDetector: './models/ocr/detector_craft.onnx',
    pathRecognizer: './models/ocr/recognizer_latin.onnx',
    // Or use prefix: pathRecognizerPrefix: './models/ocr/recognizer_',

    // Shared optional
    useGPU: true,                            // Enable GPU/NPU acceleration (falls back to CPU)
    timeout: 120,                            // Max inference time in seconds

    // EasyOCR-specific optional
    magRatio: 1.5,                           // Detection magnification ratio (1.0–2.0)
    defaultRotationAngles: [90, 270],        // Rotation angles to try (use [] to disable)
    contrastRetry: false,                    // Re-process low-confidence regions with adjusted contrast
    lowConfidenceThreshold: 0.4,             // Confidence threshold below which contrast retry triggers
    recognizerBatchSize: 32                  // Text regions per batch (lower = less memory on mobile)
  },
  opts: {
    stats: true                              // Enable performance statistics
  }
}

DocTR Mode

const args = {
  params: {
    pipelineMode: 'doctr',                   // Select DocTR pipeline
    langList: ['en'],                        // Language codes (defaults to ['en'] for DocTR)
    pathDetector: './models/doctr/db_mobilenet_v3_large.onnx',
    pathRecognizer: './models/doctr/crnn_mobilenet_v3_small.onnx',

    // Shared optional
    useGPU: false,                           // Enable GPU/NPU acceleration (falls back to CPU)

    // DocTR-specific optional
    straightenPages: false,                  // Apply perspective transform to straighten text regions
    decodingMethod: 'greedy'                 // 'greedy' (all models) or 'attention' (PARSeq only)
  },
  opts: {
    stats: true                              // Enable performance statistics
  }
}

Shared Parameters (both pipelines)

Parameter	Type	Default	Required	Description
`pipelineMode`	`string`	`'easyocr'`	No	Pipeline to use: `'easyocr'` or `'doctr'`.
`langList`	`string[]`	—	Yes (EasyOCR), optional for DocTR	Language codes (ISO 639-1). In EasyOCR mode, determines the recognizer model. In DocTR mode, defaults to `['en']`.
`pathDetector`	`string`	—	Yes	Path to the detector ONNX model (CRAFT for EasyOCR, DBNet for DocTR).
`pathRecognizer`	`string`	—	Yes	Path to the recognizer ONNX model. In EasyOCR mode, can be omitted if `pathRecognizerPrefix` is provided.
`pathRecognizerPrefix`	`string`	—	No	EasyOCR only. Prefix path for recognizer model; the library appends the language suffix (e.g., `recognizer_latin.onnx`).
`useGPU`	`boolean`	`true`	No	Enable GPU/NPU/TPU acceleration. Falls back to CPU if unavailable.
`timeout`	`number`	`120`	No	Maximum inference time in seconds.

EasyOCR-Specific Parameters

Parameter	Type	Default	Description
`magRatio`	`number`	`1.5`	Detection magnification ratio (1.0-2.0). Higher values improve detection of small text but increase processing time.
`defaultRotationAngles`	`number[]`	`[90, 270]`	Rotation angles to try for text detection. Use `[]` to disable rotation.
`contrastRetry`	`boolean`	`false`	Re-process low-confidence regions with adjusted contrast.
`lowConfidenceThreshold`	`number`	`0.4`	Confidence threshold (0-1) below which contrast retry is triggered.
`recognizerBatchSize`	`number`	`32`	Number of text regions processed per batch. Lower values reduce memory on mobile.

DocTR-Specific Parameters

Parameter	Type	Default	Description
`straightenPages`	`boolean`	`false`	Apply perspective transform to straighten detected text regions before recognition.
`decodingMethod`	`string`	`'greedy'`	Decoding method: `'greedy'` (all models) or `'attention'` (PARSeq only).

2. Create Model Instance

Import the library and create a new instance with the configured arguments.

const { ONNXOcr } = require('@qvac/ocr-onnx')

const model = new ONNXOcr(args)

3. Load Model

Asynchronously load the ONNX models specified in the parameters.

try {
  await model.load()
  console.log('OCR model loaded successfully.')
} catch (error) {
  console.error('Failed to load OCR model:', error)
}

4. Run OCR

Pass the path to the input image file to the run method. Supported formats: BMP, JPEG, and PNG.

const imagePath = 'path/to/your/image.jpg'

try {
  const response = await model.run({
     path: imagePath,
     options: {
       paragraph: true,           // Group results into paragraphs (default: false)
       rotationAngles: [90, 270], // Override default rotation angles for this run
       boxMarginMultiplier: 1.0   // Adjust bounding box margins
     }
  })
  // ... process the response (see step 5)
} catch (error) {
  console.error('OCR failed:', error)
}

Runtime Options

Option	Type	Default	Description
`paragraph`	`boolean`	`false`	Group detected text regions into paragraphs based on proximity.
`rotationAngles`	`number[]`	Uses `defaultRotationAngles`	Override default rotation angles for this specific run.
`boxMarginMultiplier`	`number`	`1.0`	Multiplier for bounding box margins around detected text.

5. Process Output

The run method returns a QvacResponse object. Use its methods to handle the OCR results as they become available.

// Option 1: Using onUpdate callback
await response
  .onUpdate(data => {
    // data contains OCR results for a chunk or the final result
    console.log('OCR Update:', JSON.stringify(data))
  })
  .await() // Wait for the entire process to complete

// Option 2: Using async iterator (if supported by QvacResponse in the future)
// for await (const data of response.iterate()) {
//   console.log('OCR Chunk:', JSON.stringify(data))
// }

// Access performance stats if enabled
if (response.stats) {
  console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}

See Output Format for the structure of the results.

6. Release Resources

Unload the model and free up resources when done.

try {
  await model.unload()
  console.log('OCR model unloaded.')
} catch (error) {
  console.error('Failed to unload model:', error)
}

Output Format

The output is typically received via the onUpdate callback of the QvacResponse object. It's a JSON array where each element represents a detected text block.

Each text block contains:

Bounding Box: An array of four [x, y] coordinate pairs defining the corners of the box around the detected text. Coordinates are clockwise, starting from the top-left relative to the text orientation.
Detected Text: The recognized text string.
Confidence Score: A numerical value indicating the model's confidence in the recognition (range may vary, often 0-1).

[ // Array of detected text blocks
  [ // First text block
    [ // Bounding Box
      [x1, y1], // Top-left corner
      [x2, y2], // Top-right corner
      [x3, y3], // Bottom-right corner
      [x4, y4]  // Bottom-left corner
    ],
    "Detected Text String", // Recognized text
    0.95 // Confidence score
  ],
  [ // Second text block
    [ /* Bounding Box */ ],
    "Another piece of text",
    0.88
  ]
  // ... more text blocks
]

Example:

[[
  [
    [10, 10],
    [150, 12],
    [149, 30],
    [9, 28]
  ],
  "Example Text",
  0.85
]]

The box coordinates are always provided in clockwise direction and starting from the top-left point with relation to the extracted text. Therefore, it is possible to know how extracted text is rotated based on this.

(Note: The exact structure and timing of updates might depend on internal buffering and the paragraph option.)

Glossary

Bare – Small and modular JavaScript runtime for desktop and mobile.
QVAC – QVAC is our open-source AI-SDK for building decentralized AI applications.
ONNX – Open Neural Network Exchange is an open format built to represent machine learning models. Learn more.

Supported Languages

Language support is determined by the recognizer model used. Each recognizer model supports a specific set of languages. The library automatically selects the appropriate model based on the langList parameter.

Recognizer Model	Languages
`recognizer_latin.onnx`	af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi
`recognizer_arabic.onnx`	ar, fa, ug, ur
`recognizer_cyrillic.onnx`	ru, rs_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk
`recognizer_devanagari.onnx`	hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc
`recognizer_bengali.onnx`	bn, as, mni
`recognizer_thai.onnx`	th
`recognizer_zh_sim.onnx`	ch_sim
`recognizer_zh_tra.onnx`	ch_tra
`recognizer_japanese.onnx`	ja
`recognizer_korean.onnx`	ko
`recognizer_tamil.onnx`	ta
`recognizer_telugu.onnx`	te
`recognizer_kannada.onnx`	kn

See supportedLanguages.js for the complete language definitions.

DocTR Pipeline

The DocTR pipeline (pipelineMode: 'doctr') provides an alternative OCR engine based on the OnnxTR project. It uses DBNet for text detection and CRNN or PARSeq for text recognition.

DocTR Models

Model	Type	Description
`db_resnet50.onnx`	Detector	DBNet with ResNet50 backbone (higher accuracy)
`db_mobilenet_v3_large.onnx`	Detector	DBNet with MobileNetV3 backbone (faster, mobile-friendly)
`parseq.onnx`	Recognizer	PARSeq attention-based recognizer (supports `attention` decoding)
`crnn_mobilenet_v3_small.onnx`	Recognizer	CRNN with MobileNetV3 backbone (faster, mobile-friendly, `greedy` decoding only)

EasyOCR vs DocTR

Feature	EasyOCR	DocTR
`pipelineMode`	`'easyocr'` (default)	`'doctr'`
Detector	CRAFT	DBNet (ResNet50 / MobileNetV3)
Recognizer	Language-specific CRNN models	PARSeq or CRNN (single model)
Language support	50+ languages via separate models	French vocab (126 chars)
Mobile	Supported	Supported (MobileNet models recommended)
Perspective correction	N/A	`straightenPages` option
Image resizing	Auto-resize to 1200px	Full resolution (no resize)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Support

For questions, bug reports, or feature requests, please open an issue on GitHub.

@qvac/ocr-onnx

0.7.0

@GitHub Actions

@qvac/ocr-onnx

Table of Contents

Supported Platforms

Installation

Prerequisites

Installing the Package

Building from Source

Prerequisites

Building the Addon

Verifying the Build

Examples

Usage

1. Configure Parameters

EasyOCR Mode (default)

DocTR Mode

Shared Parameters (both pipelines)

EasyOCR-Specific Parameters

DocTR-Specific Parameters

2. Create Model Instance

3. Load Model

4. Run OCR

Runtime Options

5. Process Output

6. Release Resources

Output Format

Glossary

Supported Languages

DocTR Pipeline

DocTR Models

EasyOCR vs DocTR

Contributing

License

Support