What is OCR (Optical Character Recognition)?

In a perfect world, a translation expert receives source documentation in an easy-to-read, digital format that’s ready to be incorporated into translation tools.

Dare to dream.

The reality is that in this fast-paced, deadline-driven, unlimited-formatted world, source documentation is delivered to a translation team in all shapes and sizes. Quite often, digital source files are missing completely.

This is where OCR (optical character recognition) becomes extremely handy.

OCR is software that electronically converts images into machine-coded text. It allows printed documents—like PDFs—to be scanned and converted to text.

This transition is a necessary step before any translation takes place. Why? Because PDFs can’t be edited without the proper software. Without editable text, a translation expert would be forced to retype every character. This equates to billable hours you don’t want to pay for, nor should you have to.

OCR creates a vital, cost-efficient step to successful localization for non-digital source material. It also provides a means for accurate project quotes. If a translation team receives a PDF, they can save that PDF or OCR as a Word file to get an accurate word count for a quote. If that doesn’t work, then your translation team will need to estimate the word count manually.

Professional OCR oversight

As innovative and technologically amazing as many OCR software programs are—with many able to replicate non-textual components such as columns and images—this conversion requires the deft eye of a trained expert. This way, you can save on costs by not having to reformat all content.

You just can’t trust computers to create an apples-to-apples comparison. (At least not yet.) Quality is not consistent among software.

Too many mistakes can occur. And those mistakes can wreak havoc upon your brand, especially if your localization project involves sensitive content or communications. (And especially if your brand is in the legal or healthcare industry; or handles numerous hard-copy files.)

A common example of OCR issues includes “O” vs “0” and “L” vs “1.”

However, fonts, image captions and sentence structure can be corrupted before translation even occurs. Then, if issues are found during translation, time is wasted searching for the issue, when it should be spent translating your files.

A mistake caught early is only a mistake. A mistake caught late could affect multiple projects.

Professional oversight will find these issues and correct them before translation. Translators will then have a localization-ready file to work from, which creates confidence that the translation process will be as efficient and high quality as possible. You will also have content that can be added to translation memory for future projects.

