Steve Donlon, Senior Publishing Lead at Acclaro, presents a debrief on the world of document translation. Those that work with multingual documents on a regular basis may already understand the challenges involved. If so, feel free to comment below with your own experiences, tips, or tricks.
Next to software translation, document translation is one of the most complex types of translation we handle. We process a wide array of documentation file types and versions, ranging from MS Word/Excel/PPT/Publisher (2003/2007), Adobe Creative Suite (CS, CS2, CS3, including InDesign), Adobe Acrobat, MS Help Workshop, QuarkXPress 7, Robohelp, MadCap Flare, Lectora, Articulate, Flash, Flashbook, Captivate, Director, FrameMaker 7 and above, WebWorks Publisher, ePublisher, and many others.
Each of these programs works a little differently and has its own benefits and challenges. With that in mind, let’s look at some of the major steps in document translation:
How documents are translated
Every written language has a unique set of characteristics. Most professional translators and translation agencies, including Acclaro, make use of translation memory (or TM) software to aid translators as they work though a project. In order to reap the benefits of TM, we make sure we can get the content out of the document (including graphics!), into the TM environment for translation, and then back in to the document. This is where the fun begins!
Language coding, expansion, and contraction
As mentioned above, each language has its own characteristics: Asian languages use double-byte character sets, Arabic and Hebrew are bi-directional (meaning that most text is read right-to-left, except for numbers and words in foreign languages which are read left-to-right), and all languages, in some form or another, produce some natural language expansion or contraction versus the English (meaning you will most always have more or less text on the page). What this means is the formatting (see below) must be verified in context for each language we handle.
Font rendering and support
Back to double-byte, bi-directional, and accented/extended characters for a bit. Up until recently, most publishing applications were developed in English-speaking countries by English-speaking companies who didn’t always take foreign language support into consideration, so once we got a perfectly good translation back into the document, it sometimes looked like a bunch of symbols and random characters.
The problem here was not a bad translation; it’s just that the application didn’t know how to parse the specific foreign-language characters. Typically, we applied all manner of packs and patches to get the language to display correctly, and/or opened and processed each document in its own language-specific operating system. Today, many developers do include font support for a growing number of languages, so this is not nearly the headache it used to be, but certain languages still present their own unique support challenges.
Formatting: Graphics, TOCs, indexes, headers/footers, tables, and page sizing
Since the amount of text changes naturally in translation, as mentioned earlier, the formatting will shift accordingly. This entails checking the translated document against the English to make sure that graphics, tables, columns, page numbers, and headers/footers are all in their proper place, as well as any cross-references and hyperlinks. Once that’s done, we review the table of contents (TOC), along with any indexes, for accuracy, including any re-ordering (alphabetization, or for Asian languages, character organization following Pinyin, stroke count or radical number) is linguistically accurate. And finally, paper size isn’t the same the world over. A different page size, if required in a specific region, can affect how and where text appears on the page. As with font support, this doesn’t always happen automatically or accurately as part of the document software, but it’s getting better.
So there you have it: Document Translation 101. It’s probably not as simple as you thought! While most of our aesthetic worries are gone as publishing software is more friendly and automated now, increasing document interactivity and an expanding language pool, especially Arabic and Indian languages, keep us on our toes.
Photo attribution: gomattolson