npm stats
  • Search
  • About
  • Repo
  • Sponsor
  • more
    • Search
    • About
    • Repo
    • Sponsor

Made by Antonio Ramirez

@qvac/classification-ggml

0.7.0

@GitHub Actions

npmHomeRepoSnykSocket
Downloads:29661
$ npm install @qvac/classification-ggml
DailyWeeklyMonthlyYearly

@qvac/classification-ggml

GGML-powered image classification addon for QVAC. Runs a fine-tuned MobileNetV3-Small 3-class triage CNN on the CPU backend of libggml and exposes a small, stable JavaScript API. Now intended for a specific image triage, but can be easily adapted for other classification tasks.

PropertyValue
ModelMobileNetV3-Small (3 classes)
Parameters~2.5 M
WeightsFP16 GGUF, 2.94 MB, bundled in this package
InputJPEG, PNG, or raw RGB bytes
Resize target224 × 224 (bilinear)
NormalizationImageNet mean/std
Backendlibggml CPU (no GPU dependency)

Package name: @qvac/classification-ggml
Directory: packages/classification-ggml

Install

This addon is published to the @qvac scope and consumed like any other QVAC native addon. When used from the monorepo, npm install resolves @qvac/infer-base and @qvac/logging via the workspace.

Quickstart

const ImageClassifier = require('@qvac/classification-ggml')

const classifier = new ImageClassifier()
await classifier.load()

const imageBuffer = fs.readFileSync('./my-image.jpg')
const result = await classifier.classify(imageBuffer)
// [ { label: 'food',   confidence: 0.93 },
//   { label: 'other',  confidence: 0.05 },
//   { label: 'report', confidence: 0.02 } ]

await classifier.unload()

Raw RGB input

const result = await classifier.classify(rgbBuffer, {
  width: 320,
  height: 240,
  channels: 3,
})

topK filter

By default classify() returns one entry per class, sorted from most likely to least likely. Pass topK: N to keep only the top N results — for example topK: 1 returns just the single highest-scoring class:

const best = await classifier.classify(buf, { topK: 1 })

API

MethodDescription
new ImageClassifier(opts?)opts = { modelPath?, logger?, nativeLogger? }
await load()Initialises the GGML backend and loads weights. Idempotent.
await classify(buffer, options?)Runs inference. Returns [{ label, confidence }, …] sorted descending.
await unload()Releases native resources. Safe to call again.
await destroy()Releases resources and marks the instance as destroyed.
getState()Returns { configLoaded, destroyed }.

See index.d.ts for the full TypeScript surface.

Parameters

new ImageClassifier(opts?)

All constructor options are optional.

OptionTypeDefaultDescription
modelPathstringBundled weights/mobilenetv3_3class_v3_fp16.ggufAbsolute path to an FP16 GGUF file. Override only when pointing at a custom fine-tune produced by the ONNX→GGUF conversion guide. Also overridable via the QVAC_CLASSIFICATION_MODEL_PATH env variable.
loggerQvacLogger-shapednullA sink with optional error / warn / info / debug(msg) methods (compatible with @qvac/logging). Receives JS-side info from a successful load() and error from a failed load(). With nativeLogger: true, also receives forwarded native LogMsg events at info level. Always honoured, regardless of nativeLogger.
nativeLoggerbooleanfalseWhen true, native C++ QLOG(...) lines from inside the addon's model-loading and graph code are forwarded to logger. Disabled by default because the underlying qvac-lib-inference-addon-cpp logger is a process-wide singleton with a static uv_async_t that is not safe across rapid create/destroy cycles (e.g. in tests).

await classify(imageInput, options?)

ParameterTypeDefaultDescription
imageInput (required)BufferUint8Array—
options.topKnumberundefined (all classes)If set, the returned array is truncated to this many entries (top-K highest confidences). Must be a positive integer. Passing a value ≥ class count is a no-op.
options.widthnumber—Required for raw RGB input. Integer > 0. The underlying buffer must be exactly width × height × channels bytes; any mismatch throws a structured error.
options.heightnumber—Required for raw RGB input. Integer > 0.
options.channels3—Required for raw RGB input. Must be exactly 3. Grayscale and RGBA are not supported — decode or drop the alpha channel on the caller side.

Returns Promise<ClassificationResult[]> where each entry is { label: string; confidence: number }. The array is sorted by confidence descending, confidences are softmax probabilities in [0, 1] summing to ≈ 1, and label comes from the loaded GGUF's mobilenet.class_N metadata (so a future fine-tune can introduce new label strings without a code change).

await load() / await unload() / await destroy()

None take arguments. load() is idempotent — calling it twice is a no-op (check getState().configLoaded if you want to verify). unload() safely tears down the native handle and may be called multiple times. destroy() is equivalent to unload() plus a sticky destroyed flag in getState() — useful if your code wants to refuse reuse of a released instance.

Output contract

  • An array of { label: string, confidence: number }.
  • Sorted by confidence descending.
  • confidence values are softmax probabilities in [0, 1] and sum to ≈ 1.
  • Labels come from the GGUF metadata (mobilenet.class_0/1/2). For the bundled weights these are food, report, other.

Build (from source, monorepo)

Prerequisites: clang (LLVM ≥ 19) with matching libc++-dev, vcpkg, bare ≥ 1.24, bare-make. CI pins the exact LLVM major via the shared setup-llvm action; locally any recent clang works.

cd packages/classification-ggml
npm install
bare-make generate
bare-make build
bare-make install

One-liner: npm install && bare-make generate && bare-make build && bare-make install.

Testing

npm run test:integration     # brittle + bare JS integration tests (desktop)
npm run test:cpp             # GoogleTest C++ unit tests
npm run test:mobile:generate # regenerate test/mobile/integration.auto.cjs
npm run test:mobile:validate # verify mobile test file structure

Integration tests live in test/integration/*.test.js and use the 6 sample images under test/images/ (two images per class).

Mobile tests

Mobile tests use the shared qvac-test-addon-mobile framework. The test/mobile/integration.auto.cjs file is auto-generated by scripts/generate-mobile-integration-tests.js from every *.test.js under test/integration/, so adding a new integration test automatically exposes it on mobile too.

Before the mobile harness can be built, run

npm run mobile:copy-prebuilds

to populate test/mobile/testAssets/ (driven by scripts/copy-mobile-test-assets.js). The script (a) fans out the single arm64 prebuild into the per-flavour directories the framework expects under prebuilds/, (b) copies the FP16 GGUF weights with a .gguf.bin suffix so the React Native bundler treats them as a binary asset, and (c) copies every test/images/*.{jpg,jpeg,png} into testAssets/ so the integration tests can resolve them via global.assetPaths on-device. None of these copied files are checked into git. See test/mobile/README.md for the lifecycle note about the shared native logger.

Platform support

PlatformCPUNotes
Linux x64✅
Linux arm64✅
macOS arm64 (Apple)✅
macOS x64 (Intel)✅
Windows x64✅
Android arm64✅c++_shared STL
iOS arm64✅

All platforms are produced by the shared reusable-prebuilds.yml matrix and merged into a single prebuilds artifact for downstream consumption. GPU (Vulkan / Metal / CUDA) is not currently supported.

Performance

Depending on the platform, one call to classifier.classify(buffer) takes from a few tens to a couple of hundred milliseconds.

What affects classify() latency

  • CPU thread pool — libggml sizes its internal CPU worker pool to std::thread::hardware_concurrency on every platform. The addon does not expose a tuning knob for this; if a future need arises, raise an issue and we can add one.
  • Input size — the JPEG/PNG decode and the stb_image_resize2 bilinear pass scale with source pixel count. The 224×224 tensor pass is fixed-cost; a 12 MP phone photo adds real overhead vs. a 640×480 webcam frame.
  • First-call overhead — load() already runs a full-pipeline warmup (synthetic-pattern pass through preprocess + GGML compute + output read) before returning, so the GGML compute buffers, weight buffer, and worker thread are fully materialised when the first classify() is dispatched. Even so, the first user-supplied call is typically a few tens of milliseconds slower than the steady-state average.
  • Re-use — load() once, classify() many times. Tearing down and rebuilding the model for each image is roughly 4–6× slower end-to-end and is never necessary outside of tests.

Memory footprint

ComponentSize
Bundled FP16 weights (mmapped)2.94 MB
Backend weight buffer (FP16 + folded BN + FP32 classifier)≈ 5.5 MB
Intermediate activations (compute buffer)single-digit MB
Total resident during inference~8–10 MB

All GGML compute buffers (input tensor, intermediate activations, output) are allocated once at load() time and reused on every classify() call — ggml_backend_tensor_set / _get are the only operations that touch them per request. Per-call C++ allocations are bounded: one input-buffer copy across the bare-runtime boundary, the decoded RGB buffer, the resized 224×224 RGB buffer, the WHCN F32 tensor, and the 3-element softmax + result vectors. Multiple ImageClassifier instances each keep their own compute buffer and worker thread — you pay the ~8 MB once per instance.

Why FP16 weights?

FP16 was chosen because it matches FP32 top-1 accuracy on the internal validation set while halving the on-disk footprint (≈3 MB vs ≈6 MB) and giving a measurable inference speed-up on every CPU backend we ship. More aggressive quantizations (Q8_0, Q4_K and below) were evaluated on the same validation set and showed noticeable accuracy degradation, which for a 3-class triage model is not acceptable. If you fine-tune your own MobileNetV3-Small, keep FP16 as the publish format unless you re-run the full validation suite at the lower precision.

Measuring locally

The integration suite hooks the shared scripts/test-utils/performance-reporter.js via test/integration/utils.js. Running

npm run test:integration

writes test/results/performance-report.json with one total_time_ms entry per sample image, and in GitHub Actions also emits a Markdown step summary.

Architecture

See [docs/architecture.md](docs/architecture.md) for the MobileNetV3-Small layer breakdown and graph construction notes, and [docs/data-flow.md](docs/data-flow.md) for the end-to-end request flow.

Why a custom GGML graph?

llama-cpp doesn't support CNN architectures, so this addon bypasses llama.cpp entirely and talks to the stable ggml_* / ggml_backend_* public API.

For this MobileNetV3-Small the GGML CPU backend is, in most configurations, slower per call than the same network running on a mature PyTorch or ONNX Runtime build with their hand-tuned convolution kernels. Because the model is very small (≈2.5 M params, single-digit-millisecond compute on a modern phone), the absolute gap is negligible for a triage workload and is dominated by image decode and JS↔native marshalling. If a substantially larger classifier is ever added on top of this same scaffolding, expect to invest extra effort in graph-level optimisations (operator fusion, matmul tiling, FP16 SIMD kernels, threadpool sizing) before the GGML path is competitive.

Converting a new model

If you fine-tune or swap the underlying MobileNetV3 model, follow [docs/onnx-to-gguf-conversion.md](docs/onnx-to-gguf-conversion.md). The graph construction is parameterised by BLOCKS in MobileNetGraph.hpp — only classes and weights change between fine-tunes.

Troubleshooting

  • “MobileNet GGUF weights not found”: the default path is <package>/weights/mobilenetv3_3class_v3_fp16.gguf. Override with new ImageClassifier({ modelPath: '/abs/path.gguf' }) or set the QVAC_CLASSIFICATION_MODEL_PATH env variable.
  • All predictions look wrong: verify the BN epsilon is still 0.001 (see the guarded unit test) — the architecture is unusually sensitive to this constant.
  • Build fails looking for stb_image.h: make sure the stb vcpkg port is installed. The vcpkg-configuration.json pins it.
  • Mobile build fails looking for libggml-cpu: the prebuild workflow copies all ggml::${_backend} targets into prebuilds/. Re-run bare-make install.

License

Apache-2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).