Using AI-ML to facilitate data collection for one of the largest consulting firms

Facilitating Ease, Accuracy in Data collection using Amazon Textract

Amazon Textract is a service that automatically extracts text and data from images or scanned documents.
Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in invoice/forms and information stored in tables.
Many companies today extract data from documents and forms through manual data entry or through simple OCR that requires manual customization or configuration and, they are slow and expensive.

BIG DATA ANALYTICS / AI-ML

Build intelligent and secure big data platforms and integrated analytics. Predict and automate business outcomes and make intelligent decisions.

Process of Data Collection

Why Amazon Textract?

Quick and Accurate
Versatile in type of images (bills, product photos etc)
Bounding Boxes help in identifying lines / words separately and return in text format (analysable)
We can build-in human review system where ‘low confidence’ data can be verified to increase the confidence of text extraction (as an additional quality check).

Using ML to build accuracy over time Amazon Textract – OCR++

Traditionally, in OCR, rules and workflows for each document and forms often need to be hard coded and updated with each change in the form or when dealing with multiple forms.
If the form deviates from the rules, the output is often scrambled and unusable.
Textract overcomes these challenges by using machine learning without the need for any manual effort or custom code. 
We can build-in human review system where ‘low confidence’ data can be verified to increase the confidence of text extraction (as an additional quality check).

Key Information we can collect

Textract can recognize most information with great accuracy given the image is shot properly:

Brand name
Flavour / Variant (if written on the pack).
Packsize / Quantity (if written on the pack).
Price (if written on the pack).
Any other key information Kantar deems important / useful (On the pack)

Textract does a great job in identifying key value pair text extraction. Any information that is missing can be cross-verified and extracted directly from the invoice generated.

Top