AI translation, including Machine Translation (MT) is getting a lot of attention right now. Everyone seems to be jumping on the bandwagon. But, even thoughmachine learning, engine performance, and quality are improving, many questions still surround MT and its suitability for various translation needs.
And now, with the explosive appearance of ChatGPT on the scene in November of 2022, Large Language Model (LLM) translations were introduced. LLMs have greatly advanced the accuracy and quality of what AI translation can do. They produce more fluent, more creative, and more human-sounding translations than ever before. But it’s well known that bias, errors, and omissions are at large in LLM translations. They are, simply put, unpredictable. So much so that humans can’t yet step away completely from the translation process. So should humans own the translation process or should businesses fully adopt AI translations and just keep ‘humans in the loop’?
In truth, the solution falls somewhere in the middle.
In this blog post, we’ll explore the different types of AI translation, what type of content is a good candidate for AI, and where humans should be involved. We will address questions like:
How well does MT translate creative, technical, or ecommerce content?
Can MT be used effectively in a professional setting for business content?
How and where should humans be involved?
How to make decisions about when to employ MT?
Let’s start with a very quick survey of MT translation types, and then dive into the question of whether or not to deploy it.
The six key types of AI translation technologies
Machine translation was first used in the 1950s. Since then, substantial developments have propelled the technology forward to drive better quality and higher levels of customization. To figure out what’s best for you, let’s start by understanding the different types of MT used today. Some technologies are older and some are new, but each has a specific place in the translation landscape.
Rule-based MT uses a collection of “rules” governing the construction of language. These rules, developed by linguists, help the machine translate from the source language to the target language. A good example is a rule that helps the machine correctly translate the formal and informal versions of “you” in German. Although this type of MT was the first commercially available, the technology is no longer widely commercially in use.
2. Statistical machine translation (SMT)
Statistical MT uses algorithms to produce millions of possible permutations and selects the text that appears to be the best translation. Like rule-based MT, statistical MT is no longer widely used.
Why? Because it takes a lot of effort to maintain the system and translation quality is relatively low.
3. Hybrid machine translation (HMT)
Hybrid MT combines rule-based and statistical MT to improve translation quality. Usingtranslation memory (TM), hybrid MT can deliver consistent quality. However, hybrid MT can also require extensive human post-editing by a qualified linguist.
4. Neural machine translation (NMT)
Advancements in deep learning, a type of machine learning that uses artificial neural networks to learn from data, led to neural machine translation. A breakthrough in this technology came in 2016 when Google introduced Neural MT. Using a neural network, artificial intelligence (AI), modeled on the human brain, this type of MT predicts the most likely word sequence. NMT is more accurate and fluent than SMT because it learns the relationships between words and phrases. Also, NMT can translate rare or low-resource languages more effectively because it is able to learn from large amounts of data even when it’s not aligned perfectly between the source and target language. There are, however, some lingering challenges with neural MT, like long sentences and word alignment, that the industry is seeking to resolve.
Another innovative development came with the development of adaptive MT in 2016. Adaptive MT is usually used with an NMT model. In this approach, translators interact with MT suggestions — effectively training the MT engine in real time.
As it’s trained, the MT engine learns new terms and phrases in the right context and can even learn your brand’s tone and voice. This helps NMT generate higher-quality translations. With adaptive MT, MT platform providers can further optimize MT engines for speed, quality, and budget for large-scale localization initiatives.
ChatGPT, a Large Language Model that burst onto the scene in November of 2022, is the best-known example of an LLM that produces translation. Now MS Bing with ChatGPT4, BARD, and PALM2 are available as well. These LLMs are pre-trained on massive amounts of text data, allowing them to acquire language knowledge and patterns from diverse sources in all languages. They create content based on predictions, and so can generate text that sounds human-like and conversational. They are more fluent but less accurate than traditional MT, prone to error and bias, and fairly unpredictable.
How large language models are revolutionizing automatic translations
Large language models (LLMs) have greatly advanced the fluency and creativity of AI-translated output. As we mentioned, it sounds and feels more conversational, and more human, than all past AI translation models. Enterprises are beginning to favor LMM translation for certain use cases, despite concerns about bias and errors.
Older machine translation systems (SBMT and RBMT) were based around direct word-to-word or phrase-to-phrase substitutions, which is useful for helping speed up translators’ workflows, but a bit clunky and in need of human quality assurance to make it sound human and fluent. LLM translations are more fluid, human, and conversational than AI translation by other methods. Human translators benefit from streamlined workflows and can complete translation work faster and more efficiently.
It’s useful to see a comparison between NMT and LLM to showcase the differences.
Neural Machine Translation
Large Language Model Translation
Controllable, known and predictable output
Somewhat unpredictable output
More accurate, but less fluent and natural
More fluent, creative, and conversational but larger risk of errors, omissions, and bias
Still under development
Customizable by domains
Good for general purpose (lack domain knowledge)
Need to be trained with specialized data
May be faster and cheaper because they come ‘pre-trained’
Works 95% of the time (and we know where it can fail)
Hard to tell when it’s going to work well
Training can be controlled using clean, bilingual data
Trained happens through vast amounts of unvalidated data + human-generated prompts
Struggle with languages where there is not enough bilingual data to train
Better for rare or low-resource languages
Works with source content
Can generate content – does not need traditional source
At the end of the day, LLM translations cannot yet be a full substitute or replacement for human translators. They are most effective when used in conjunction with human expertise, serving as a powerful tool for language professionals to enhance their productivity and efficiency. Humans and machines, when used together, leverage the best of both and give enterprises the highest benefit.
How do LLMs work?
LLMs first analyze the source text to understand its meaning. Then, they generate a translation using knowledge of both languages.
They create translations aligned more closely with a text’s intended meaning by adapting the content rather than just translating by grammar rules. This is how the output is a natural-sounding translation.
They are trained on very large datasets (not only clean translation data in the form of a Translation Memory).
They take background context into account when generating a translation.
They learn from additional input provided by users at the time of translation – prompts – such as style guides or terminology information.
But again, a caveat: since the data that the LLM is built on is not controlled or cleaned, the translations can be full of bias, errors, omissions, and misleading information. Humans need to be involved in this case in a process called post-editing. (Read on for more about where humans should be involved).
Not anytime soon. When talking about AI translations and automation in general, the role humans continue to play remains a concern. High-quality translation relies on a human plus a machine process. People are often worried they will lose their jobs. The reality is that jobs will change, but they won’t go away. Humans are still needed in roles such as:
Post-editors, who make the changes needed post-translation so the text is accurate and fluent
Solution architects, who help clients decide whether, when and how to use AI translation, and set up the workflow
Language data specialists, who train MT software with their knowledge of linguistics
Quality engineers, who drive measurement programs and assess the quality of MT output so re-training can occur (using traditional TER/BLEU /METER measurement frameworks)
Transcreators and in-country copywriters, who adapt or author creative or highly emotive texts that are beyond the capability of MT
Language prompt engineers, who provide the LLM with the right instructions to capably handle the translation per the specifications.
Humans are not going away. Machine translation and human efforts are symbiotic: intertwined and better together. It’s often not one or the other but rather a spectrum of AI/human translation practices: at one end there is no human intervention, and at the opposite end a professional linguist will post-edit all the output.
Weighing the pros and cons of using Machine Translation
As MT technology has developed, people misperceive that it’s as easy as ‘use’/’don’t use’, but the reality is more nuanced than that. Businesses often ask us whether they should use AI translation or stick with a fully human process. It’s not an either/or situation for most businesses but rather a mix of the 2. There is a symbiosis between humans and machines that gives great benefit to enterprises: MT can act as a real-time assistant to the professional linguist and accelerate high-quality human translations.
Language requirements. AI translation does not handle all languages equally well. Some languages are more difficult to translate than others and some languages are not well-supported by MT systems.
The type of content. Is it complex text or straightforward? Is it high-visibility or low-profile? When translating repetitive information, with clearly defined language and terminology, MT can handle the repetitive and simple text.
Volume of your content. If you are looking to translate as much content in as little time as possible, then MT can outpace human translators and complete otherwise massive projects in short time frames.
Budget available and size of investment. Some MT options are free (open source) and others require significant investment (proprietary tools).
Time to deploy. Customization, usually by domain, means greater accuracy. It takes time (and a significant amount of data) to customize an engine.
Quality requirements. Low-priority / low-profile content can be perfect for MT. For example, content that will be used internally might not require the same level of polish as a landing page for a new product.
Security. Are there any legal restrictions or strict brand guidelines surrounding this content? Relying on MT services like Google Translator poses security risks. LMM models use any data you put in to further train the AI, meaning your content is no longer your own and nothing that comes out is your own either.
That’s a lot to sort out, but with advice from anexpert translation partner, you can make an informed decision and implement the MT approach that’s best for you. Using an MT when it’s well-suited to your content is a worthwhile investment. It allows you to save money, speed up the translation process, and speed up your time to market.
Types of content that work well with Machine Translation
MT can handle many different types of content, from service manuals to websites to marketing content, but it’s key to understand that not every piece of content is a good fit for MT. Sometimes, a human linguist will produce the best result and is the only logical choice, such as for your website’s home page. Other times, MT and a human linguist can work together to deliver exactly what you need. And there are situations in which a 100% machine process will fit the need, such as a customer service chat.
The type of content you need to translate will ultimately drive the decision of whether to use MT or not. A good rule of thumb is the higher the content’s value, the more likely you’ll need a human linguist.
Let’s look at some examples to help make this clear.
These content types generally only need to inform and don’t have the nuances of more creative content. Also, the content is not as visible as, say, marketing content. In other words, the goal is communication, not perfection.
Often, human-edited or even “raw” MT is a good fit for content that wouldn’t otherwise be translated because of budget and time constraints. We typically recommend using an MT approach when clients need to translate well-structured, informational content (often consisting of millions of words).
MT for legal and similar content where accuracy is key
Accuracy is extremely important for legal documents, product packaging, apps and user dashboards. The obvious translation approach is usually to opt for a human linguist. But, early on in the process, adaptive MT integrated into computer-aided translation (CAT) or translation memory (TM) tools can play a helpful role. These tools can speed up the translation process for the linguist, but humans need to be involved in assessing the quality of output and making fixes.
MT for creative and marketing content
When you need to capture accuracy and nuance, or when content is highly branded, a human linguist is key. Content likemarketing campaigns,website copy,emails, sales presentations,videos, etc. need a high level of cultural and technical accuracy. They also need to convey a brand’s personality. A human linguist is the best approach to correctly translate this type of content, so we don’t recommend MT here.
Also, a human linguist should do most, if not all, of the translation for content types like SEO, pay-per-click ads and landing pages.
The factors of quality, speed, and cost in machine translation
When making comparisons between all human and machine plus human translation it boils down to quality, speed, and cost.
Machine translation can help improve efficiency
Machine translation is automatic and near-instantaneous, giving it an advantage when it comes to speed and efficiency
The human process of translation takes more time and consideration
Automating some parts of the translation process can improve the linguists’ efficiency, and in turn, reduce costs
Machine translation can be considerably more cost-effective
Machine translation is less expensive than human translation (once customized)
The efficiency that humans gain by using MT can help reduce overall labor costs
However, if the output requires a lot of human post-editing to bring it up to the quality expectation, it may not be worth it
Note that MT deployment is not free: there are often costs to customize and train the engine
Machine translation isn’t 100% reliable for top quality
MT isn’t as adept at fully accounting for connotation and cultural nuance
AI doesn’t reason, feel, or think as well as a human, and it can’t pull in broader knowledge and experience about things like culture and connotation the way humans can
Machine translation doesn’t always succeed at preserving tone and style
AI translation technology is, however, continually improving. For example, LLMs are better at translating idioms and jokes than earlier MT models.
To summarize the tradeoff: when you implement MT you’ll likely have higher speed and lower cost, but you might sacrifice quality for some content types.
How we assess if MT is a good fit for you
We work closely with our clients to identify when MT makes sense for them. Here’s how we help our clients select the MT approach that best fits their goals, needs, budget and content.
1. Requirements analysis
In this first step, we evaluate your content volume and type(s), language pair(s), quality expectations, and privacy requirements.
2. Content analysis
Here, we assess if your source text is suitable for MT from a technical and linguistic perspective. This includes file types, structure, and formatting.
3. MT platform evaluation
Based on the first two steps, we evaluate what MT platform is best suited for your localization initiative. This may include an MT output evaluation where we apply machine translations from several platforms and review and evaluate the output.
4. MT custom engine development
In some cases, we’ll opt to create a custom neural MT solution. We only recommend this option when we know that it will outperform all other MT platforms for the type of content that you need to translate.
We can either create this completely from scratch, or we can create a custom solution on top of one of the best available neural MT systems. First, we gather and prepare a large collection of bilingual data and adapt the content to your specific domain. Later, we’ll choose the latest and best technology available to train a state-of-the-art MT solution, completely unique and customized to your needs and content.
Partner with Acclaro machine translation experts
We understand that choosing the right MT process for your organization’s content is complicated and may be overwhelming. But it can be straightforward when you have an LSP there to help you sort through all the factors.
You’ll want to weigh cost, speed, and quality.
You’ll want to think through what content types you need to translate, your volumes, the languages you require, and the quality you need at the end of the day.
You’ll need help choosing and implementing the right approach – NMT or LLM?
Our experts will talk you through all the decision criteria. With experience across many MT platforms and aholistic tech approach, we have the best foundation to help you choose what approach is the right one. If you’re considering machine translation and would like to evaluate your options, check out ourmachine translation services.