Machine translation tools like SYSTRAN use a technique known as language description.
To the uninitiated, translation can seem like a magic show. You put a slightly dishevelled academic-looking person in a room with a piece of paper, and an hour later they emerge with two pieces of paper, one being a translation of the first. Ta-da!
While translation work may seem totally cerebral to most people, with all the work going on in the mind of the translator, the fact is we use a variety of tools and techniques to get our work done. Most of the translation tools we utilise employ the same basic strategy, exemplified by SYSTRAN’s suite of translation tools, which are very popular among translators at all levels: They all involve describing the source language to a lesser or greater extent as a pathway towards a fast and accurate translation.
So, what do I mean by ‛describing’ a language? Well, consider that translation can have more than one level. On the surface is what we sometimes call gisting, which is a casual translation performed in real-time that gives you the ‛gist’ of the text’s meaning. In other words, you gain a shallow comprehension of the subject and details, but you haven’t done a formal translation. Gisting is a useful skill when you’re in a rush or in the very beginnings of a translation task, but of course can be dangerous if relied upon: As the saying goes, the devil is in the details.
To get deeper, you need to analyse the source document’s language – you have to be able to describe it. For human translators, this part of the work happens invisibly, inside our heads. Often we ourselves are hard pressed to articulate the process. But machine translation does this work explicitly.
Usually it’s a three-part process:
1. Analyse Document Holistically. The first step is to look at the source document as a whole and analyse the language it contains at the paragraph level. At this level data like proper names of places and people are noted, which help determine the locality of the language, and the overall subject matter is determined, which can help centre the vocabulary choices in the right lexicon.
2. Analyse Grammar. Once you’re ‘in the ballpark,’ so to speak, the next step is to dig deeper into the text and determine the grammatical structures being used. By analysing the subjects, verbs, objects, and articles, a general description of the language can be made, which can then be matched to languages known to the system.
3. Translation Generation. Once the source language has been adequately described, it can be matched against the target language and translation can be accomplished.
This might seem awkward compared to the ‛magician’ perception mentioned above, but it is essentially what’s happening in our damp, dark brains while we work – just formalised into a process that a machine can handle. In translation, describing the source language is always the first step.
Image courtesy SYSTRAN, twitter.com