Comparison of optical character recognition software


This comparison of optical character recognition software includes:
  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software
NameFounded yearLatest stable versionLatest release yearLicenseOnlineWindowsMac OS XLinuxBSDAndroidiOSProgramming languageSDK?LanguagesFontsOutput formatsNotes
ABBYY FineReader1989162023C/C++198All fontsDOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.
201613.02024All languages using Latin alphabetMachine and handprinted text, Latin alphabetDOCX, XLSX, PPTX, TXT, CSV, PDF, JSON, XMLAIDA is able to learn how to extract any value from any document, with a single click on a single document.
AnyDoc Software1989VBScriptWorks with structured, semi-structured, and unstructured documents.
Asprise OCR SDK1998152015Java, C#,VB.NET, C/C++/Delphi20+Plain text, searchable PDF, XMLJava, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.
CuneiForm19961.12011C/C++28Any printed fontHTML, hOCR, native, RTF, TeX, TXTEnterprise-class system, can save text formatting and recognizes complicated tables of any structure
E-aksharayan201014RTF, TXT, BRL
GOCR20000.522018C20+
Google Drive OCR or Google Cloud Vision2015BrowserBrowserBrowserUnknown200+All fontstextGoogle blog post
Microsoft Office Document ImagingOffice 20072007Uses OmniPage
Microsoft Office OneNote 200720112007
OCRFeeder2009-030.8.52022PythonFeatures a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
Ocrad0.292024C++Latin alphabetCommand line
OCRopus20071.3.32017PythonAll languages using Latin script Normal Latin script and Fraktur TXT, hOCR, PDFPluggable framework under active development, used for Google Books
OmniPage1970s19.22015C/C++, C#125Machine and handprinted fontsDOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3Product of Nuance Communications
Puma.NET2009C#28Any printed font.NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for.NET applications
ReadSoft14?Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
ScantronFor working with localized interfaces, corresponding language support is required.
SmartScore199110.5.82015For musical scores
Tesseract19855.5.02024C++, C100+Any printed fontText, ALTO, hOCR, PAGE, PDF, others with different user interfaces or the APIDeveloped at HP Labs and Google (2006–2018
NameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDAndroidiOSProgramming languageSDK?LanguagesFontsOutput formatsNotes

Evaluation

A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.