2019 Examples to Compare OCR Services: Amazon Textract/Rekognition vs Google Vision vs Microsoft Cognitive Services


Introduction

We're building a note app that will surface images+documents in full-text search, so it needs to do OCR as well as possible. Preferably at a low price. We hoped there would be a good, modern, comparison of the major OCR services, but as of July 2019, there wasn't -- so we wrote one. The main result Google kept sending us to was OK, but its review concluded more than a year ago, and these services are evolving very quickly. Most have launched completely new versions over the past year.


And so, we did some research on the current OCR providers. We figured that as long as we had to compile the research into a note, we might as well share that note with others who might need this knowledge for whatever reason.


This article will compare


Since our use case is full-text search, we're not seeking to extract any structural data, just a set of words as a user might transcribe the image. Some of these products have a strong focus on specific use cases - like form data extraction - which we're not evaluating. Both Microsoft and Google have additional OCR services that focus on that use case.


In addition to providing transcriptions of sample images, we'll also touch on the current price of each service (with links to pricing pages so you can confirm the estimates are up-to-date), in case that is a factor in your consideration.


If you would like to read a full-width version of this article, try this.


OCR Image Processing Results

We started with three image samples, representing archetypes we expect to see from our users. Our samples included a hand-written letter, webpage text, and text written on a whiteboard. The selection of these particular images wasn't scientific, but we figure that if the OCR solution can get these right, it's state-of-the-art for the moment.


For the tl; dr types, here's how each service performed on our non-scientific test:



See also: methodology notes.


Pricing: Amazon Rekognition, Amazon Textract, Google, Microsoft. We don't really care which one you use, but Microsoft did best by our sample data. Textract was a very close second if you only need its headline feature: extracting text from digital documents. If someone wants to email bill -at- amplenote.com with comparable data for other images/services, I can try to work those into this post as time allows. 😎


Image 1: Hand-written note

See also: the result as interpreted by me.


Amazon Rekognition


See also: the result as text.


Amazon Textract


See also: the result as text.


Google Cloud Vision OCR


See also: the result as text.


Microsoft Cognitive Services (Read API)


See also: the result as text.


Ruby used to compare these: data, and method.


Image 2: Webpage text

See also: result as interpreted by me.


Amazon Rekognition


See also: result as text.


Amazon Textract


See also: result as text.


Google Cloud Vision OCR


See also: result as text.


Microsoft Cognitive Services (Read API)


See also: result as text.


Ruby used to compare these results: data, and method.


Image 3: Handwriting on whiteboard

This one was a toughie. My interpretation.


Amazon Rekognition


See also: result as text.


Amazon Textract


See also: result as text.


Google Cloud Vision OCR



See also: result as text.


Microsoft Cognitive Services (Read API)


See also: result as text.


Ruby used to compare these results: data, and method.


Thanks to Jordan for deriving the data and pasting the screenshots!

🍻🍻🍻

Plot twist

Thanks for stopping by the Amplenote blog. Did you know that the content of this "blog post" is just a plain old note, lifted from the author's Amplenote notebook? Rich footnotes, built-in emoji selector 😄, and the strongest encryption game among major note apps (all notes encrypted client-side by default) make us a solid option for modern writers. Try it out yourself.

Comments

No comments have been left on this blog.

Login to leave a comment