Azure Form Recognizer goes multilingual on invoices • The Register


A few days after the launch of Express Design recognizing Doodle on the Power Apps platform, Microsoft updated its Azure sibling: Form Recognizer.

Although Express Design is really the newcomer, its ability to create a form from scribbles dates back to the Applied AI service, Form recognition.

Azure Form Recognizer, as the name suggests, extracts text and structure from documents using AI and OCR. The theory goes that users can automate data processing with the technology, which accepts PDFs, scanned images and handwritten forms (although, as with all handwriting recognition systems, a scribble barely readable by humans can also block robots.)

More usefully, Azure Form Recognizer can map field relationships as key-value pairs and spit out structured JSON without “excessive manual intervention” as Microsoft delicately puts it.

The latest technology preview adds the ability to extract paragraphs – handy for unstructured documents – and roles for those paragraphs (e.g. headings or footnotes.) June 2022 API release will also spot tabular fields, useful for turning document content into tables. . It will also handle tables that span multiple pages.

“If you have a dataset labeled with tables,” Microsoft explained, “train a model with the current API to start seeing multi-page tables in the response.”

Other tweaks include the ability to extract text from Word, Excel, and Powerpoint files, as well as text from embedded images. HTML documents can also be scanned. Scanning of US driver’s licenses has been improved to extract fields such as date issued, height and weight, and Japanese has been added to the business card template.

However, it’s the additional languages ​​now supported by the invoice template that are probably the most appealing. Spanish and English are joined by German, French, Italian, Portuguese and Dutch. “This opens up provisioning scenarios to invoices in many different languages,” Microsoft said.

A word of caution: we advise readers to check regulatory compliance with this data. After all, this is all handled in Azure, which might give some lawmakers pause.

Still, for developers tasked with document scanning and the Azure train, the (preview) updates will be welcome. However, there are unlikely to be enough to reward engineers of competing products from Google and AWS, such as Cloud Vision and Textract respectively. ®


Comments are closed.