Research on Bangla Language Processing

April 15th, 2013

As Bangladesh enjoys its newfound stability and growing economic and geopolitical importance, it’s seeking to solve some translation issues with technology.

Information Technology

As Bangladesh begin to enjoy the fruits of increased stability in its government, one of the most fascinating aspects has been the rapidly increasing demand that Bangla take on a more prominent role in both the internal affairs of the country and in its relations with the rest of the world. One sector where this is becoming visible is in the information technology area.

On the one hand, there is a robust movement to develop new technologies, software, and devices natively in Bangla instead of in English or another language and have everything translated. As anyone who has worked in translation for software knows, taking hard-coded text in software and rendering it into another language is a lengthy and frustrating effort, usually with mixed results even from the most meticulous translations pro. So this internal effort is a wise one that will no doubt yield great things for the Bangla people.

Optical Character Recognition and Machine Translation

Other developments along these lines are less encouraging, as the Bangla people have made efforts to solve all of their language issues with technology. I for one can tell you beyond a doubt that this is not possible.

Efforts have been made along two lines: First, technologies designed to scan existing Bangla documents and use Optical Character Recognition (OCR) to automatically bring the text into a digital format. This is a great idea, in general, as digital text formats are much more flexible, powerful, and easier to work with, especially as a translator. However, as anyone who has worked with OCR can attest, it is a volatile technology that often takes more effort to clean up than it would have to simply keyboard the text in the first place! And that is my experience working in English, a language with 26 letters in its alphabet and a dominant position in the technology sector, compared to Bangla with 49 letters!

The other effort is in Machine Translation, which I am here to tell you never works. Machine translation is doomed from the start, although I can understand why the Bangla might seek it, as for all its 200+ million speaker there is a dearth of Bangla translation services pros in the world.

Image courtesy amandasfulbright.blogspot.com

You might also like: