AI OCR

BBox-guided OCR with Paddle, Qwen, and Gemma

📷 Upload an Image

Upload an image and AI OCR will detect text boxes, read each crop, and keep every result tied to its original location.

💡 How it works

PaddleOCR first dewarps the image and supplies authoritative text boxes. Qwen3VL:8b reads each rotated crop, then Gemma4:31B reviews the image and candidates to finalize text while preserving every Paddle bbox.

📄
Drag & drop your image here, or click to browse
Supports PNG, JPG, JPEG, WebP (max 8 MB)
Preview
Analyzing image...

📝 OCR Results

Product Overview

AI OCR uses a bbox-guided pipeline: PaddleOCR provides stable geometry, Qwen3VL:8b performs crop-level OCR, and Gemma4:31B finalizes the text with image context. The original Paddle boxes remain the overlay geometry so results stay anchored to the image.

How It Works

1. Upload Image

Select or drag-and-drop an image containing text — documents, signs, handwriting, screenshots, and more.

2. Paddle Boxes

PaddleOCR dewarps the image, detects text lines, and provides the bounding boxes used for overlays.

3. Qwen + Gemma Review

Qwen3VL:8b reads each rotated crop. Gemma4:31B reviews those candidates with the image and only updates text, never the boxes.

4. Compare & Copy

The extracted text is listed below the image with source labels for Paddle, Qwen, and Gemma. Use the toggle to compare engines or copy any result.