Comparison of optical character recognition software

This comparison of optical character recognition software includes:

OCR engines, that do the actual character identification
Layout analysis software, that divide scanned documents into zones suitable for OCR
Graphical interfaces to one or more OCR engines
Software development kits that are used to add OCR capabilities to other software

Name

Founded year

Latest stable version

Latest release year

License

Online

Android

iOS

Programming language

SDK?

Languages

Fonts

Output formats

Notes

ABBYY FineReader

1989

2023

C/C++

198

All fonts

DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2

ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.

2016

13.0

2024

All languages using Latin alphabet

Machine and handprinted text, Latin alphabet

DOCX, XLSX, PPTX, TXT, CSV, PDF, JSON, XML

AIDA is able to learn how to extract any value from any document, with a single click on a single document.

AnyDoc Software

1989

VBScript

Works with structured, semi-structured, and unstructured documents.

Asprise OCR SDK

1998

2015

Java, C#,VB.NET, C/C++/Delphi

20+

Plain text, searchable PDF, XML

Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.

CuneiForm

1996

1.1

2011

C/C++

Any printed font

HTML, hOCR, native, RTF, TeX, TXT

Enterprise-class system, can save text formatting and recognizes complicated tables of any structure

E-aksharayan

2010

RTF, TXT, BRL

GOCR

2000

0.52

2018

20+

Google Drive OCR or Google Cloud Vision

2015

Browser

Unknown

200+

All fonts

text

Google blog post

Microsoft Office Document Imaging

Office 2007

2007

Uses OmniPage

Microsoft Office OneNote 2007

2011

2007

OCRFeeder

2009-03

0.8.5

2022

Python

Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad

Ocrad

0.29

2024

C++

Latin alphabet

Command line

OCRopus

2007

1.3.3

2017

Python

All languages using Latin script

Normal Latin script and Fraktur

TXT, hOCR, PDF

Pluggable framework under active development, used for Google Books

OmniPage

1970s

19.2

2015

C/C++, C#

125

Machine and handprinted fonts

DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3

Product of Nuance Communications

Puma.NET

2009

Any printed font

.NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for.NET applications

ReadSoft

14?

Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.

Scantron

For working with localized interfaces, corresponding language support is required.

SmartScore

1991

10.5.8

2015

For musical scores

Tesseract

1985

5.5.0

2024

C++, C

100+

Any printed font

Text, ALTO, hOCR, PAGE, PDF, others with different user interfaces or the API

Developed at HP Labs and Google (2006–2018

Name

Founded year

Latest stable version

Release year

License

Online

Android

iOS

Programming language

SDK?

Languages

Fonts

Output formats

Notes

Evaluation

A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.