Abstract
The thesis presents a formalism for specifying grammars for automatic controlled language translation. The described Augmented Lexical Entries (ALE) formalism was developed in the Webtran project that was funded by TEKES and carried out at VTT Information Technology in 1997-1999. One of the two major results of the project was the controlled language machine translation system Webtran, which is based on the presented ALE formalism. Controlled languages are disambiguated sublanguages of human languages. They are characterised by specific use domain, selected vocabulary and simplified syntax. They have
benefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation with
minimal or zero post editing. The approach is called “write-once-publish-many”.
The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,
Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30% had been
achieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.
benefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation with
minimal or zero post editing. The approach is called “write-once-publish-many”.
The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,
Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30% had been
achieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.
Original language | English |
---|---|
Qualification | Licentiate Degree |
Awarding Institution |
|
Supervisors/Advisors |
|
Place of Publication | Espoo |
Publisher | |
Publication status | Published - 2004 |
MoE publication type | G3 Licentiate thesis |