GOT-OCR 2 Projects .

Technology

GOT-OCR 2

A 580M-parameter General OCR Theory model that unifies plain text, math formulas, chemical equations, and geometric shapes into a single end-to-end vision encoder.

GOT-OCR 2.0 (General OCR Theory) replaces specialized recognition pipelines with a unified 580M-parameter architecture. It handles diverse inputs: multi-page documents, sheet music, and complex LaTeX formulas. By utilizing a high-resolution vision encoder (1024x1024) and a linear decoder, it achieves state-of-the-art results on the OCRBench benchmark. The system supports localized 'crop' OCR and formatted output like Markdown or TikZ, making it a versatile tool for digitizing structured data from static images.

https://github.com/Ucas-HaoranWei/GOT
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects