Google’s play in the translation space
Over the past few years, Google have been moving more and more into the machine translation (MT) space – see, for example, their language tools page, which allows you to translate an arbitrary webpage, or a snippet of text from one language to another.
Google’s approach to machine translation is what’s called statistical machine translation (SMT). Essentially, they take the millions of human translated webpages that their search engine has already indexed, and align them – that is, they match sentences in one language (let’s say English) with their counterpart sentences in the second language (let’s say Spanish).
By doing this process across millions and millions of webpage they can build up pretty robust statistical methods of guessing a particular phrase’s correct translation.
This approach had been proposed relatively early on in the development of machine translation – as far back as the 1940’s or early ’50’s, indeed, but until recently, it could not compete with the other major school of thought in the area: rule-based machine translation. Google’s innovation, of course, was that because of their enormous web index they could bring several orders of magnitude more data (translated web pages) to the party than any other previous approach to SMT. In so doing, they showed that an abundance of data can lead to a significant improvement in the quality of the resulting translation.
Why is all of this relevant now? For two reasons: firstly, SFI is funding the Centre for Next Generation Localisation CSET (one of the grants in my portfolio), part of whose work includes machine translation. Second, by way of TechCrunch, I learned of the newly released Google Translator Toolkit. This toolkit is designed to work with the existing Google translation system, but also to allow human translators to add or correct the translations as they see fit.
Of course, there are many professional software tools to support human translation of software packages, websites, documents, etc., but the new Google Translator Toolkit appears to be aimed more at crowd-sourced translations. This is the latest development in website localisation (in particular), led by companies such as Facebook, where the casual (as opposed to professional) translator can translate some of the content of a site into another language. Indeed, crowd-sourced translation is also one of the areas of particular interest to CNGL.
This is a very hot area, and with the release of this toolkit, looks likely to get hotter. It’ll be interesting to see what impact this has, on the translation research community, the amateur/enthusiast translator, and indeed, also the professional translation business.
June 15th, 2009 at 7:44 pm
The fact that Google thinks this area is interesting, certainly validates SFI’s decision to invest in this technology.
Do you think that Goolge might be persuaded to join up with CNGL?
June 19th, 2009 at 8:10 am
Brian, it would certainly be a real coup if Google were to participate, and a real boon to the centre. Let’s hope that they can be enticed to join in the future.
September 7th, 2009 at 6:22 am
The fact that Google thinks this area is interesting, certainly validates SFI's decision to invest in this technology.
Do you think that Goolge might be persuaded to join up with CNGL?…