Home About LRC LRC News Resources
 

LRC XII 
The Localisation Research Forum

The 12th Annual Internationalisation and Localisation Conference 
organised by the Localisation Research Centre (LRC)
with the Global Initiative for Local Computing (GILC)

26 - 27 - 28 September 2007
European Foundation, Loughlinstown, 
Dublin, Ireland

Conference Home

Programme

Accommodation

How to get there

TILP Pre-conference

 

return to conference programme

Programme Abstracts

Click on the speaker name for biographical information

 

Title: Careers in Localisation and the CLP programme

Presenters: Angela Starkmann-Lehr & Inger Larsen

When starting a career in localisation, people are often not quite aware where they are going to. The group of ‘Localisation Professionals’ consists of well-trained professionals, who could just as easily been working in another area of business. They start working in the industry when a job shows up, and chose another job in another area of work just as easily (as career step, or just to do something different again)

The emerging localisation trainings of the past years (like at LRC) are a welcome change to this perspective, students are actually 'planning' to become a localisation professional after having attended one of these programs. Still, there are many, and rather young professionals, who have no formal training as a ‘localiser’ and may take their talents and expertise with them in case of a change of job. This is why it should be the goal to create a common awareness and pride for the localisation industry and its professionals, so that people can be more easily convinced not to jump ship and be proud of the work they do. 

This presentation will serve as a summary of possible professional choices, and how to get there - to try offering a perspective on relevant professional moves and the planning for it. It should be considered as practical exercise relevant for everybody working in the industry. What is going on today, and what can I do for it? Within the framework of professional training, the emerging CLP programme and the TILP organization will receive some special attention.

Back to programme

 

Title: Automating Your QA Process to Save Time and Improve Consistency

Presenter: Iñaki Hernández-Lasa

Automating your localisation efforts is no longer a luxury – it’s a must. The larger the enterprise, the more critical automation has become – not merely for cost, but for quality. There’s little argument today that a well-designed, efficiently automated language QA system can significantly improve localization project output. 

Automation has become the single greatest methodology for improving translation quality and maintaining consistency of terminology and grammatical correctness – all vital to a project’s success. Today’s new breed of language quality checker tools can perform extremely sophisticated automated QA checks such as glossary adherence, translation consistency checking, spell checking, and they can automate routine tasks performed by proofreaders in order to accelerate tedious, time-consuming tasks. The result is consistent quality and a more streamlined final QA cycle – better, faster time-to-market.

In this presentation, and based on a production project scenario Iñaki Hernández-Lasa will describe the state-of-the-art in automating language quality tools, exploring such advanced techniques as using Hashtables, a regular expressions engine, iFilter Windows interface, XML parser, XSL transformation to generate different output files, and opportunities for using an advanced stemmer.

Back to programme

 
Title: Exploding the myth (the gerund in machine translation)

Presenter: Nora Aranberri

This paper proposes to examine the relation between specific source language linguistic structures and machine translation output over a range of target languages. In particular, it aims at determining the causality of "unacceptable" MT output. The work described in this paper is carried out in the context of a large IT developer (Symantec) who, due to the increasing translation volume and speed of technical documentation localisation, opted for a rule-based machine translation system to translate its product content. 

For the purpose of this paper, English source structures focusing on “-ing” words were examined in their French, German, Japanese and Spanish translations. We used the standard evaluation methodologies to isolate unsatisfactory output. We attempt to differentiate the causality of this unsatisfactory output into two categories. On the one hand, those outputs that can be ascribed to problems in source disambiguation and on the other, problems which can be ascribed to the poor performance of the generation of the translated output. The paper will present the results of the experiment and propose recommendations for improvements in source control and target generation correction. 

Back to programme

 
Title: Using heuristic methods for automatically correcting Persian texts

Presenter: Mohammad Azadnia

There are a lot of Persian electronic texts and they increase exponentially each year. The more generating documents there are, the more incorrect words are incorporated into these texts. Because we have some special characteristics in this language many parameters must be considered for the error detection and correction. Manual error correction needs a lot of time and cost and on the other hand some incorrect words may be skipped. In this paper we introduce a new automatic Persian text correction system based on heuristic methods. This system could be used for past processing of the two naturally different kind of Persian texts, those generated by Persian optical character recognition systems (OCRs) and the texts produced directly by human typing.
Persian lexicon, error detection unit and error correction module are the different parts of this system. There are some heuristic functions which aim at modeling the errors that are generated in the two different kinds of text production in order to organise them for automatic correction. To promote this system we used different approaches to create a special Persian lexicon. These approaches were then tested and the best ones were selected to use in this automatic text error correction system. 

Back to programme

 
Title: Using pseudo translation to ensure world-ready software

Presenter: Aidan Killeen

McAfee has been using pseudo translation as a core part of its localisation development cycle for the last two years. It has been a very successful program and has resulted in dramatic improvements in quality along with savings in both time and costs.

This paper covers what pseudo translation is, how it was implemented in McAfee, what type of internationalisation defects were found and a summary of our estimated cost savings. It will also cover some of the improvements that were made after the initial attempts at pseudo translation and some criteria for tool selection. 

Back to programme

 
Title: Future web-based translation environments 

Presenter: Stefan Kreckwitz

For many years Web applications have found their way into the translation industry. These applications were predominantly limited to terminology systems and project portals. As is common for conventional Web applications, the user interaction was read-only or form-based. In the meantime, Web 2.0 appeared as a term for the new generation of Web applications. Apart from social aspects, Web 2.0 designates a combination of technologies, which can enable rich user interfaces and PC-equivalent interactivity. Some manufacturers of translation software have recognized the potential of Web 2.0 and work on Web-based translation environments. The advantages of these solutions are promising. But at present Web-based translation environments are quite restricted in comparison to the market-leading Windows applications. 

The talk is about the challenges and opportunities of prospective Web-based translation environments. Apart from the functional convergence to Windows-based counterparts, the talk addresses ideas like "software as a service" and the harnessing of collective intelligence.

Back to programme

 
Title: The Expansion/Compression Factor in Translated Texts

Presenter: Luis R. Cerna

The objective of this study is to identify, scientifically, the corresponding expansion or compression factor in translated texts when translating from DE or EN or ES into EN or ES or DE respectively. In the field of translations it is very difficult to obtain a statistically significant number of examples in a certain technical field in all analysed languages. This research was carried out on a range of 50 documents for each target language in each specialty field. This work is very important in the translation industry for both translators/clients and those involved in the localisation industry. In translation it is important as the market is nowadays extremely price-oriented, as a result it is becoming increasingly important both for the client and the freelancer, to have at their disposal a practical guide for a reliable cost estimation. The clients need fast quotations and the translator has to be able to offer these on a reliable calculation basis. This research can help in this regard. In localisation it is important as most designers compose at sight in the original language (i.e. if the original manuscript is in German, they compose the texts as it looks in German). The space is defined for German and only when the translation arrives do they realise that other languages need a different amount of space. Through applying the research put forth in this study, the disigner, layouter, localisator, etc. would be able to calculate the space needed for each different language been localised.

Back to programme

 
Title: Collaboration Between an LSP and a Technology Provider in Practice:
Why and How Moravia Worldwide and Idiom Technologies are Working Together

Presenters: Gráinne Maycock, Peter Reynolds

The quest for value is driven by our clients’ requirements to manage more volume (more scope, more languages, more projects) with “flat” budgets – i.e., to do more with less. Another driver of the need to show value is merger and acquisition activity, both within the end-user client community (clients merging with clients) and in the supplier community (LSPs acquiring technologies; LSPs merging).
Technology is part of the solution in addressing value. LSPs (such as Moravia) invest in technology and development to manage linguistic assets and workflow, but so do technology providers and so do clients. Do we all need to make these investments in technology? And where should a workflow solution be hosted? At the client site? The LSP site? The technology provider site?
What alternatives to technological solutions for adding value exist, and which are best for the end-user client? Is it the solutions owned and managed by the LSP? Where the end-user client “does nothing”? Or, does the solution come from an LSP (such as SDL TeamWorks) or an independent technology provider (such as Idiom’s WorldServer) and get implemented at the end-user client site?
Independent LSPs are aware of their own strategies, but seek to understand the strategies of independent technology suppliers to see if a joint approach, highlighting the strengths of each (scalable service delivery and technology, respectively) can benefit all.

Set against this background, this session will address these technology considerations that most LSPs face today, with wide-reaching implications, and will focus on the practical collaboration between Moravia and Idiom. 

Back to programme

 
Title: What You See Is What You Get? - A Pilot Experiment on Access to Visual Information in Translation Interfaces

Presenter: José-Ramón Biau-Gil

Translation memories are, no doubt, the most widely used and probably the most controversial technology in the translation industry. Its use has an impact on terminology, style, layout, pricing and deadline decisions, to name but a few key issues for professional translation projects. 

This paper presents the results of a pilot study that aims to find out whether the translation activity changes when translators work with a WYSIWYG or with the character-only translation interface of a translation memory system. We compared translations done with Word + Trados (where informants were able to see the text layout) and with Trados TagEditor (where translators only had access to the text layout by checking a PDF file of the source text). The presentation includes data on the translation output and on the translation process, which was tracked using a screen recording tool. Results of this pilot experiment show that the type of interface used influences not only the translation output, but also the strategies used by translators. 

Back to programme

 
Title: Keyboards for Indic Languages

Presenters: Gihan V. Dias, G. Balachandran

The keyboard remains the most popular text input method for computer applications. For most alphabetic languages, keyboard layouts and input methods have been standardised and accepted for each language and country. Each letter is typically assigned to one key, possibly together with a shift or control key. However, the same is not true for the abugida-based Indic languages of South Asia. Most Indic languages lack a widely accepted keyboard, and use a variety of keyboarding techniques and layouts. Software developers and localisers must therefore deal with the multiple keyboard layouts and input methods used for each language.

This paper first identifies the basic types of Indic keyboards; viz. consonant-vowel, typewriter and transliteration, and compares their features and utility. In consonant-vowel keyboards, words are typed linguistically; i.e., vowels are typed following their associated consonants, irrespective of how they are written. Typewriter keyboards are based on manual typewriters, and symbols are generally typed as they appear on the page from left-to-right. In transliteration keyboards, an approximation of the text is typed in English characters. We evaluate the efficiency and user acceptance of these methods. Sinhala and Tamil are both Indic script languages used in Sri Lanka. 

The second part of this paper comprises case studies of the development of standard computer keyboards for these two languages. A widely-used standard keyboard provides major benefits in hardware manufacture and procurement, software development, training, and usability. However, weaning users from their familiar keyboard is hard. We endeavoured to balance the usability, efficiency and user acceptance of the keyboards.

Back to programme

 
Title: Minority language success: Not only Catalan (but in Catalan)

Presenter: Felix Donoso

While the promotion of the Catalan language in industries like cinema and videogames has been clearly unsuccessful there have been other areas such as software and the internet where the Catalan language has been able to strengthen its position. Why are the big companies like Microsoft, IBM, Novell, etc. translating their products into Catalan? Has it anything to do with the Open Source Community? This paper will try to answer these questions, analysing the situation from various different points of view. Some of the issues that will be touched upon include:

History of the Catalan language: a brief history of the language from its beginning until today
Geographical frame: in which territories Catalan is spoken and some figures about the use of the language in different areas.
Legal frame: an explanation of the legal status of the language in Spain and Europe.
The peculiarities in the social and political environment of Catalonia and the Catalan language.
Official institutions: which official institutions are supporting Catalan and specifically those that are supporting the widespread use of Catalan in the IT world. 
Why localize a product into Catalan? From a “nice to have” (paid for by the Catalan government) to a “must have” (paid for by the publishers)
The creation of the ".cat" domain for Catalan related websites. In less than a year the number of register domains has risen from 8364 to 21798!

Back to programme

 
Title: Computational Linguistics and Challenges Of Persian Script and Language

Presenters: Ehsaneh Vilataj, Maryam Mahmoudi & Mohammad Azadnia

Language and text processing are essential parts of the Computational Linguistic. It also consists of a variety of techniques including Grammar and Semantic Analysis, Ontologies, tagged and untagged text and Voice Corpuses, Retrieval Information, Continuous Speech Recognition Systems, Machine Translation, Handwritten letters and Continuous text Recognition Systems, methods of language resource construction such as lexicon, thesauruses and many applications. Current computer and web platforms are mainly based around "Latin" characters. This has created difficulties many non Latin languages such as Persian, mainly due to their different characteristics. 

In this paper, a survey of proposed approaches as well as developed tools for the Persian language and text will be presented. Through these means all localisation related activities can be categorised into three groups including text, speech and image. Ultimately based on this evaluation of localisation trends in the field of computational linguistic for the Persian script and language, policies will be proposed.

Back to programme

 
Title: An alternative approach to Software testing to enable SimShip for the localisation market, using Shadow.

Presenter: Kieran Arthur

We believe that our approach to automated software testing is novel. We can test several language instances of a product simultaneously, either through direct engineer interaction or by a record/playback script. The Shadow application can manage a situation where the user interface of the product under test is slightly different from each other in layout, both from the localised versions and the English version. In our pilot studies, we are examining the effect of separating out the functions of a test engineer into a product specialist and QA specialist. 

Our testing methodology has multiple outputs; a failure report and a set of screenshots of the products under test in each language. The screenshots are used in either the product documentation construction or by a translator who can use them for linguistic/consistency QA. We propose to perform a comparative analysis of existing automation tools (SQATest, Winrunner) with the Shadow testing suite and the Shadow process.

Back to programme

 
Title: XLIFF in the Localisation of Open Source Software – One step forward, two steps back?

Presenter: Asgeir Frimannsson

In the year 2004-2005 the researcher undertook a project investigating the possibility of adopting XML based standards in the localisation of open source software. A key outcome of this study was a guide for representing the de-facto standard PO format in XLIFF, a work now incorporated as part of the XLIFF 1.2 specification. 3 years later, it seems that XLIFF in many ways has failed to deliver on its promises for open source localisation. The vast majority of open source projects are still using the simple PO format in the localisation process, and few if any projects take advantage of the many benefits that XLIFF has to offer. 

In this presentation, we wish to highlight some of the unique challenges and opportunities in community-driven translation processes, with relation to resource management, translation reuse and terminology management. We will give an overview of a broad range of open source localisation projects, and discuss the unique challenges in adopting localisation standard file formats in these processes.

Back to programme

 
Title: Automating the Content Localisation Process for Legacy Web Applications

Presenter: Harshal K Dhote

The Internet is widely seen as the global medium of communication and information sharing, particularly with the evolution of eCommerce and eGovernance applications. The only constraint seems to be language because, since the inception of the Internet, English has been the predominant language of the World Wide Web, a fact which has restricted the full benefits of the web to an audience that understands English. The solution to this problem is web page localisation. The existing approach often involves manually replicating the HTML web page in several languages depending on the targeted audience, but this approach is tedious. In addition, many legacy applications generate non-standard HTML which means that the content cannot be localised without a massive amount of redesign. This session will look at how the HTML localisation process can be automated through usinthe use of a Java Internationalisation approach and proposes a custom HTML server localisation architecture. The approach helps to avoid issues caused by complexities in updating web pages as well as allowing legacy applications to support localisation without any architectural changes.

Back to programme

 
Title: A Process for Risk-Based Localisation Testing

Presenter: Tim Callanan

This session will discuss the integration of localisation project requirements into the overall ‘Project Software Development Lifecycle’ (SWDLC) using entry and exit criteria, set a standard for writing localisation test cases based upon risk and importance and use this criteria, for two pilot languages, to make informed decisions about reducing the testing effort as subsequent localised languages are tested and released. The testing procedures described in this paper aim to reduce the ‘Time-To-Market’ (TTM) of localised product releases using a priority and risk based Quality Assurance (QA) system. This will be achieved using the following methodology. 

A procedure is defined whereby the Localisation process can be efficiently integrated into the milestones of the Software Development Lifecycle to produce entry and exit gates criteria to ensure that software reaches a certain standard earlier in the process to achieve internationalisation readiness and allow ease of localisation. Having satisfied these gates criteria localisation test cases can be written, using strongly defined standards, and prioritised based upon risk and importance. Once these localisation test cases have been prioritised, two languages are initially selected as test pilot languages to validate the assumptions that were made when making priority based testing decisions. Then based upon the results achieved during pilot testing the remaining languages required to be localised will have their test cases re-prioritised in order to reduce the testing effort for subsequent language releases. 
This overall testing strategy will help reduce the time-to-market of language releases based upon a defined and understood risk-based testing model.

Back to programme

 
 

© Copyright 2007 Localisation Research Centre (LRC). All rights reserved.