Using community feedback to improve social networking terminology in Microsoft
products
Sara Nicolini & Palle Petersen - Microsoft
Unlike more traditional software like operating systems and office productivity suites, social networking technology and therefore terminology has developed rapidly over the last few years.
With more Web 2.0 applications featuring a strong social aspect, including the voice of the customer is becoming increasingly important in order to be able to capture the latest terminology used in the social networking domain. To be successful in tomorrow’s market-place, established businesses need to create business models that are inclusive of its customers while leveraging the global expertise and vast know-how and future potential they bring to the table.
To address this challenge, Microsoft has launched several initiatives to embrace end-users and the “community”. One of them is the “MTCF” terminology community engagement and feedback program designed to assess and improve the quality of localised Microsoft Messenger and Spaces terminology through community feedback with a focus on social networking terminology.
This presentation will cover lessons learned from the 1900+ terminology suggestions received across 29 EMEA languages during this feedback program. It will explore interesting observations from the community around existing terminology, implications for source terminology, the importance of style and “artistic license” in translation and challenges to existing - and often anecdotal - assumptions about terminology quality.
Sarawak Language Technology (SaLT) Initiative: The preservation of Sarawak Ethnic Languages
Alvin Yeo - Universiti Malaysia Sarawak
According to Ethnologue, 84% of the 46 living languages in Sarawak have less than 10,000 speakers. The number of speakers of these languages is also decreasing due to the influence of major languages such as Bahasa Malaysia and English especially in the urban areas, and exogamy. In addition, to the authors’ knowledge, there does not exist any applications which involve Sarawak languages. For example, there are no Iban electronic dictionaries, nor (computer supported) language translators which can help in speeding up the translation of materials into the target Sarawak ethnic languages, or vice-versa.
Thus, the Sarawak Language Technologies (SaLT) Research Group at Universiti Malaysia Sarawak has initiated a number of projects with the end goal of revitalising and maintaining the ethnic languages of Sarawak. The ongoing projects include building corpora of languages (Iban, Melanau and Kelabit), as well as, research and development of technologies which contribute to the implementation of software for the ethnic languages. Specifically, these projects include development of morphological analysers and POS taggers which contribute to work on Iban-English translation, and in human computer interaction using Melanau speech and text. Other projects in the pipeline include a wiki approach in building a Bidayuh lexicon, and a web-based Sarawak Malay language dictionary. These projects would not have been possible without the collaboration of partners such as Tun Jugah Foundation, Dewan Bahasa dan Pustaka (Sarawak Branch), and Pustaka Negeri Sarawak, and national funding from the Malaysian Ministry of Science, Technology and Innovation. The final paper will provide more detailed information of the above projects.
Process Automation at LSPs – It Ain’t Just About the Tools
Dr. David Filip - Moravia
As overseer of internal change projects at Moravia, David Filip knows that while Localisation Service Providers (LSPs) work with projects every day for our clients, we don’t perhaps really understand what it takes to specify, execute and monitor the success (or not) of a change in how things are done – especially if this change gets the internal label as “project”. LSPs need to apply professional project management approaches inside our own houses. We can automate things, but, it’s not all about the technology.
Systemic localisation validation across languages
Martin Ørsted - Microsoft
Microsoft has internally designed a scripting tool that allows us to ensure that known localisation issues can be systematically caught and fixed across all languages, so we can create a scenario where we only ever have to find an issue once, in what ever given language, and we can then through the use of localisation verification comments for that resource ensure that the issue gets caught on all languages where it may occur. Introducing this as a practice can allow us to cut dramatically down on certain forms of testing and make use of orthogonal arrays across languages and pseudo localisation much more efficient.
Open standards in use in localisation - an engineering approach
Andres Vega - Tek Translations
This session will provide an overview of existing open standards within content handling and localisation from a technical and engineering perspective. For some of these standards, attendees will learn about the advantages of these standards both general and specific to localisation as well as their implementation challenges and issues. Practical and hands-on examples will illustrate how these can be beneficial in a real production scenario.
The areas covered will include:
Localisation specific standards will also be discussed, with special focus on XLIFF, TMX, and TBX, and a brief look into SRX.
Checking Terminology Consistency with Statistical Methods
Alfredo Maldonado Guerra & Masaki Itagaki - Microsoft
Work in Statistical Language Technologies has uncovered numerous techniques that can be used in Software Localisation. This paper explores the application of statistical methods in the automatic validation of terminology consistency in localised software.
It sets out a statistical algorithm that identifies the translation of a given source term in a software localisation project, and then determines whether the translation has been used consistently within that project. However, the accuracy of this algorithm depends on the size of the linguistic data made available to it (the bigger, the better), and since the typical software project is small by traditional statistical NLP standards, we need to find a way to compensate for this lack of data.
At the same time, the algorithm needs to deal with the different grammatical features of each target language we work with at Microsoft. To address these issues the authors chose a hybrid approach of statistical analysis with a minimal grammatical model. We discuss the statistical analysis applied and show what is the minimum amount of linguistic knowledge needed by the model in order to successfully deal with these issues.
Linguistic Comparison and Analysis of Statistical Post-Editing between Chinese and Japanese
Midori Tatsumi & Yanli Sun - Symantec/Dublin City University
Statistical post-editing (SPE) method has been increasingly getting attentions. While this technique can improve greatly the quality of some of the machine translation (MT) output, it sometimes makes unwanted and inappropriate changes. In addition, some problems in MT output can not easily be addressed by SPE technique. This paper analyses the results of a recent experimental SPE conducted by Systran on English to Chinese and Japanese translation using the data provided by Symantec. The improvements and degradations made by SPE will be compared both quantitatively and qualitatively, focusing on the similarities and differences of advantages and disadvantages of SPE between two languages. It also investigates whether SPE will reduce the task of human post-editing that may be conducted after the SPE process on the text for publishing purposes. The first part of this paper introduces related theoretical and research background on combination of Rule-Base Machine Translation (RBMT) and SPE. The methodology of the current research will be briefed in the second part. Detailed comparison, analysis and discussion will then be presented in the third part of the paper. Finally, a short conclusion concludes this research and points out future work.
Automation of Terminology tasks
using T-Manager
Rafael Guzman - Symantec
Good-quality
and controlled terminology is critical for the success of rule-based
Machine Translation (MT). This typically involved tasks such as
leveraging, checking for duplicates and deprecated terms, customisation,
glossaries alignment, and generating metrics reports.
These tasks depend heavily on terminology comparisons, which
require specific criteria. Unfortunately, these criteria are often
underestimated and even neglected. When this happens, unexpected results
and hidden issues occur. Doing these tasks manually is tedious,
time-consuming, expensive and prone to errors. This presentation will
provide an overview of necessary terminology comparison criteria as well
as a demo that will show how many of these tasks can be automated with T-Manager
Terminology Tool.