How It Works

Overview

BiomAPI receives a biometry file, routes it to the appropriate extraction engine, returns structured data in a standardized response, and optionally encrypts and stores it under a BiomPIN for secure sharing. Every successful response follows the same StandardAPIResponse shape regardless of which engine processed the file.

File Routing

When you POST /api/v1/biom/process, the server inspects the file extension and selects one of two engines:

Extension	Engine	What happens
PDF, JPG, PNG, GIF, BMP	BiomAI	LLM extraction via Gemini (deployment-managed or BYOK AI Studio)
JSON	BiomJSON	Schema validation + metadata preservation

Engine routing is based on the file extension. The server also performs lightweight signature checks for PDF and image uploads, and rejects unsupported extensions or mismatched content with 400.

The Three Processing Paths

BiomAI — LLM Extraction

BiomAI transmits the file bytes directly to a Google Gemini model alongside an extraction prompt. Gemini responds with a structured BiometryReport JSON. The server validates this through Pydantic and wraps it in a StandardAPIResponse.

Backends:

Deployment-managed: BiomAPI chooses the server-managed Gemini/Vertex credential for the request.
BYOK: pass X-Gemini-API-Key to use your own Gemini key. BYOK always uses AI Studio, overrides deployment-managed routing for that request, and tracks usage under the biomai_byok bucket.

Credential routing is exact: BYOK failures do not fall back to server-managed credentials.

Other behavior:

If Gemini returns 429 or 503 (overload), the server retries up to 3 times with exponential backoff and jitter before returning an error to the caller
A hard server-side timeout is applied (default 30 s); breach returns 504

BiomJSON — Validation and Round-Trips

BiomJSON validates JSON payloads against the BiometryReport schema. It’s the engine for the round-trip workflow: extract a PDF with BiomAI → download JSON → edit locally → re-upload.

Metadata preservation: When an unedited BiomAI-origin JSON is re-uploaded, BiomJSON reconstructs and preserves the public BiomAI provenance fields (timestamp, byok, and llm). When a user manually edits data in the web app, the new latest-state payload is tagged BiomDIRECT instead, because manual review/editing is now the active provenance. The input_schema_version field is populated with the schema version declared in the uploaded JSON, making schema drift detectable.

If the uploaded JSON has no recognizable BiomAI provenance (or comes from a different origin), it’s attributed as BiomDIRECT.

Schema versioning: BiomJSON checks that the major version of the uploaded JSON’s schema_version matches the server’s current version. Minor/patch differences are tolerated; major version mismatch returns 422.

BiomDIRECT — Direct Data Entry

BiomDIRECT is an attribution label, not a separate engine. Any biometry data not extracted by the Gemini LLM is tagged method: "BiomDIRECT" in the response metadata:

Manual transcription via the web UI Transcribe tab
Manual edits in the web app, including edits on top of BiomAI results
JSON constructed by an external script, EHR export, or automated pipeline
Re-uploaded JSONs without BiomAI provenance

BiomDIRECT metadata includes method, timestamp, input_schema_version, and optional client-declared source_app/source_version fields. Standard sources include BiomAPI Webapp for web transcription/editing and BiomLINK for BiomLINK-originated payloads.

After any successful extraction, the server can optionally encrypt the response and store it under a PIN. BiomPIN is opt-in (pass biompin=true in the request). Patient name/initials and patient ID are removed from the stored payload before encryption; the immediate process response remains complete.

Two-part PIN design

word-word  -  123456
└──────┘      └────┘
 share_id   numeric PIN
(stored)   (never stored)

The word-word share ID is the database primary key — it’s stored in plain text and used to look up the record. The 6-digit numeric PIN is the encryption secret; it is never stored anywhere on the server. The decryption key is derived from the numeric PIN using Argon2id (a memory-hard key derivation function), with the SHA-256 hash of the share ID as salt.

Because the numeric PIN is never stored, the server cannot decrypt data without it. Even full database read access doesn’t compromise stored biometry data.

Brute-force protection

After 3 wrong numeric PIN attempts, the database record is permanently deleted. This eliminates the stored ciphertext, making further brute-force attempts pointless. The response is 404 — there is no lockout period.

Expiry and cleanup

Records expire after 744 hours (31 days) by default. Expired records are purged lazily after each new store operation — there is no background cleanup process.

Rate Limiting

BiomAPI tracks usage across four independent engine buckets:

Bucket	Covers
`biomai`	PDF/image extraction, server-managed Gemini quota
`biomai_byok`	PDF/image extraction, user-supplied Gemini key
`biomjson`	JSON validation — no LLM call
`retrieve`	BiomPIN retrieval

Cost-aware application: BiomAI and BYOK rate limits are consumed before the Gemini call, after local file validation and credential resolution. This protects external API cost and quota, so an admitted BiomAI attempt can count even if extraction fails or times out. Deployment-managed calls can also consume internal capacity. BiomJSON remains post-success because it is a local validation path.

Dual tracking: Every request is tracked by both the client IP and the authenticated user ID (if present). Public callers share per-IP limits; authenticated callers have custom per-user quotas. Both are checked independently.

Sliding window: The window is a continuous 24 hours (not a midnight calendar reset). Usage timestamps roll off exactly 24 hours after recording.

Response Structure

Every successful response is a StandardAPIResponse with four top-level fields:

data         → BiometryReport  (biometer, patient, right_eye, left_eye)
extra_data   → ExtraReport | null  (notes, posterior_keratometry)
metadata     → ResponseMetadata  (request_id, schema_version, app_version, extraction)
biompin      → BiomPINInfo | null  (pin, expires_at, db_id)

Why data and extra_data are separate: BiometryReport in data contains the 12 core measurements present on virtually every device. extra_data holds optional, device-dependent fields — currently posterior keratometry (PK1/PK2) and notes. This separation means adding new optional fields doesn’t require bumping the core schema version.

Why the metadata discriminated union: metadata.extraction is either BiomAI provenance (model, BYOK status, timestamp) or BiomDIRECT provenance (manual/direct source details). Operational LLM metrics are tracked internally for reliability and analytics, not exposed in the public response payload.

The db_id field: biompin.db_id (and GET /api/v1/status’s db_id) identifies the BiomPIN database instance.