Grammar Formalism for Controlled Language Machine Translation: Augmented Lexical Entries: Licenciate Thesis

Aarno Lehtola

Research output: ThesisLicenciateTheses

Abstract

The thesis presents a formalism for specifying grammars for automatic controlled language translation. The described Augmented Lexical Entries (ALE) formalism was developed in the Webtran project that was funded by TEKES and carried out at VTT Information Technology in 1997-1999. One of the two major results of the project was the controlled language machine translation system Webtran, which is based on the presented ALE formalism. Controlled languages are disambiguated sublanguages of human languages. They are characterised by specific use domain, selected vocabulary and simplified syntax. They have
benefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation with
minimal or zero post editing. The approach is called “write-once-publish-many”.
The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,
Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30% had been
achieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.
Original languageEnglish
QualificationLicentiate Degree
Awarding Institution
  • Helsinki University of Technology
Supervisors/Advisors
  • Honkela, Timo, Supervisor, External person
Place of PublicationEspoo
Publisher
Publication statusPublished - 2004
MoE publication typeG3 Licentiate thesis

Fingerprint

Natural sciences computing
Information technology
Learning systems
Communication
Experiments

Cite this

Lehtola, Aarno. / Grammar Formalism for Controlled Language Machine Translation : Augmented Lexical Entries: Licenciate Thesis. Espoo : Helsinki University of Technology, 2004. 91 p.
@phdthesis{5ada739230554339931cfb5372f510bd,
title = "Grammar Formalism for Controlled Language Machine Translation: Augmented Lexical Entries: Licenciate Thesis",
abstract = "The thesis presents a formalism for specifying grammars for automatic controlled language translation. The described Augmented Lexical Entries (ALE) formalism was developed in the Webtran project that was funded by TEKES and carried out at VTT Information Technology in 1997-1999. One of the two major results of the project was the controlled language machine translation system Webtran, which is based on the presented ALE formalism. Controlled languages are disambiguated sublanguages of human languages. They are characterised by specific use domain, selected vocabulary and simplified syntax. They havebenefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation withminimal or zero post editing. The approach is called “write-once-publish-many”.The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30{\%} had beenachieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.",
author = "Aarno Lehtola",
note = "TTE",
year = "2004",
language = "English",
publisher = "Helsinki University of Technology",
address = "Finland",
school = "Helsinki University of Technology",

}

Lehtola, A 2004, 'Grammar Formalism for Controlled Language Machine Translation: Augmented Lexical Entries: Licenciate Thesis', Licentiate Degree, Helsinki University of Technology, Espoo.

Grammar Formalism for Controlled Language Machine Translation : Augmented Lexical Entries: Licenciate Thesis. / Lehtola, Aarno.

Espoo : Helsinki University of Technology, 2004. 91 p.

Research output: ThesisLicenciateTheses

TY - THES

T1 - Grammar Formalism for Controlled Language Machine Translation

T2 - Augmented Lexical Entries: Licenciate Thesis

AU - Lehtola, Aarno

N1 - TTE

PY - 2004

Y1 - 2004

N2 - The thesis presents a formalism for specifying grammars for automatic controlled language translation. The described Augmented Lexical Entries (ALE) formalism was developed in the Webtran project that was funded by TEKES and carried out at VTT Information Technology in 1997-1999. One of the two major results of the project was the controlled language machine translation system Webtran, which is based on the presented ALE formalism. Controlled languages are disambiguated sublanguages of human languages. They are characterised by specific use domain, selected vocabulary and simplified syntax. They havebenefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation withminimal or zero post editing. The approach is called “write-once-publish-many”.The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30% had beenachieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.

AB - The thesis presents a formalism for specifying grammars for automatic controlled language translation. The described Augmented Lexical Entries (ALE) formalism was developed in the Webtran project that was funded by TEKES and carried out at VTT Information Technology in 1997-1999. One of the two major results of the project was the controlled language machine translation system Webtran, which is based on the presented ALE formalism. Controlled languages are disambiguated sublanguages of human languages. They are characterised by specific use domain, selected vocabulary and simplified syntax. They havebenefits such as accuracy and clarity of expression, which make them very usable in tasks where faultless and efficient communication are crucial, like in technical maintenance manuals, medical epicrises, weather reports etc. In this thesis, controlled languages have been used in commercial product descriptions in order to make them multilingually accessible by automatic translation withminimal or zero post editing. The approach is called “write-once-publish-many”.The ALE formalism is declarative and intuitive so that a professional translator can use it. It has enough expressive power for the targeted commercial product descriptions. It has been found suitable for human assisted machine learning of translation grammars. Moreover, it has been tested and found suitable for translating in the directions Swedish→Finnish, Finnish→English,Finnish→French. Small experiments have also been carried out to translate into Estonian and Norwegian. The Webtran system and the ALE formalism have been in production use at Ellos Postimyynti Oy since spring 2000, with an annual amount of around 2000 translated catalogue pages and 10000-15000 product descriptions. An independent survey by CSC Scientific Computing Ltd found that already after one year of use time savings of more than 30% had beenachieved. Nowadays, the translators of Ellos maintain the ALE based grammars themselves.

M3 - Licenciate

PB - Helsinki University of Technology

CY - Espoo

ER -