Italian Localization problem n.2 – Father’s Day

While localizing a slogan for a camera manufacturer that will offer specials for Father’s Day, I pointed out to our client that they are a couple of months late for Italian. Here are the countries that celebrate on March 19th (St. Joseph):
italian fathers day

And here is when the majority of countries celebrate:

Image 001

For other countries and for further information, head to the relevant Wikipedia page.

Meme Miner: get translations of Wikipedia article titles

Wikipedia is a good source of terminology. Wikipedia articles often appear on top of Google search results about specific terms of concepts, and since many articles are translated, or at least written in different languages, it’s often sufficient to click on the corresponding target language in the Wikipedia Languages side bar to jump to the corresponding translated article.

But that’s a lot of clicks, especially because translations are not always available for all languages. If you are only interested in obtaining the translation of the title (which would be the keyword or concept you are looking for), wouldn’t it be nice to just type the source word and see its translation immediately?

There are several online services that offer this functionality. After trying out a few of them, I have settled for Meme Miner for its nice interface and speed. Here’s what a simple search looks like:

image

So Meme Miner not only displays the term’s translation, but also offers the translated definition, as it appears in the Wikipedia article that uses the searched term as the title. Very useful.

It would be nice to see vendors of translation tools enter into an agreement with Wikipedia about integrating this type of search mechanism into their programs, for instance for pre-populating a glossary, complete with definitions, about a text that is about to be translated.

I also wonder if it would be possible to download bilingual Wikipedia article headings in a way that is easy to manipulate in order to generate bilingual term lists. Any comments about this possibility will be appreciated.

How to get rid of bad translation choices?

It’s one of the things I have to deal with on a regular basis in working with my in-house translators: how to improve on people’s bad translation choices. The word “bad” here is not meant as an absolute: it simply refers to a term that we as a group prefer not to use and that we invariably correct if we come across it. In most cases it’s just a turn of phrase that is not as universally common as one would like to think, or a typographic convention that has not been thoroughly understood and absorbed. Yet very often not even a clear explanation is enough to eradicate these choices, including when a request to create a personal checklist with one’s own special little flaws is made.

I guess we all fall into the rut of using the same phrases, adjectives, adverbs and idiomatic expressions we have become accustomed to. To be honest, many of these are like a buoy in the ocean, useful little helpers that can come to our aid when nothing better or more appropriate comes to mind.

But if we are not conscious of this, if we don’t pay enough attention to the fact that habits in translation can turn your words to putty, then it is also possible that making a change would not really work and that we would be unable to appreciate the better (or sometimes just different) choice that is imposed upon us.

In Italian, for instance, words like the verb “consentire” have become staples of our linguistic production because they are neutral and flexible. So much so, in fact, that when someone start using other, less orthodox terms, we are immediately alerted to the change and run for cover.

In an ideal world we would have created a common, flexible, accurate set of language choices that we all share and that make our work (mine, like that of my translators) similar in that it stems from the same set of choices.

But these are the things that take planning and time, and very often we refrain from engaging in these types of undertakings because we “just don’t have the time”. And all the time we know that it would only save time to do it!

demaquina Select: sub-sentence segmentation, propagation etc. on TMX, XLIFF [@joaoalb]

a tweet I “intercepted” today led me to this potentially interesting interesting tool: demaquina Select.

from the website:

Select is a sidekick tool for preprocessing and boost works on CATs with support for XML Localization Interchange File Format (XLIFF) and Translation Memory Interchange (TMX).

Select offers an unequaled sub-sentence free segmentation ability which together with its own chunk-based Dual-memory System?, and Sub-sentence Case Aware Propagation delivers an ultimate terminology reusability.

With Select, each technical term, common expression or single word translation is typed ONCE in life!

From your elected CAT, export your work to a file with XLIFF format, create a Select Project and experience an incredible time saving with its intelligent sub-sentence term/chunk resuse and case-aware propagation… and many other time-saving driven features. Thereafter import the XLIFF file back into your CAT and sharpen your work, from Sub-sentence Zero Inconsistency to Perfection… Free to Care About Wording and Semantics…

Eradicate inconsistency at sub-sentence level from existing memories and term bases!

Export your elected CAT’s memories as TMX, open them with Select and see how easy it is to select segments with specific terms or expressions and use the Replace process with Sub-sentence Case-aware Propagation to eradicate inconsistency through all them at once!

 

Select is intended to:

Translate given XLIFF files’ content

If your elected or current project required CAT has support for XLIFF (XML Localization Interchange File Format), you can export your work to a XLIFF file and create a Select Project from it. Taking advantage of Select’s Sub-sentence Case Aware Propagation, you can deal with language common expressions and project specific terminology like software UI elements like never before! Then import the XLIFF file content back to your CAT and resume your work from Zero inconsistency, Free to Care About Wording and Semantics!…

Edit Translation Memories (particularly spot and eradicate inconsistency at sub sentence level)

If your elected CAT has support for TMX (Translation Memory Interchange) you can export any of your translation memories to a TMX file and create a Select Project from it. Then you can use the Replace Process together with Sub-sentence Case-aware Propagation options to eradicate inconsistency and securely change wrong terminology!

It is also possible to translate using a TMX file as interchange file,

creating a "temporary" memory with the content of your source files, with both Source and Target units of each segment filled with the source text. (See bellow, how its TMX exported file should look like to be imported into a Select Project.)

Sub-segment leveraging is certainly one area in which modern TEnTs have a lot of potential for improvement. If implemented correctly, it can save time and, most importantly, facilitate consistency without requiring a lot of time spent on creating and tweaking glossaries. I intend to take a close look at this program after the holidays.

Free PDFUnlock! web service allows to remove limitations from PDF files [@PDFUnlock]

Reference material plays a very important part in most translation projects. We often receive reference files from our clients, and sometimes we have to find them ourselves through web searches or by browsing the client’s website.

The management and usage of reference files is one aspect that has been introduced in memoQ’s LiveDocs feature, which allows to create searchable corpora of monolingual source and target files. So it’s finally time to put all those reference PDFs to good use! But wait, there’s a catch…

Very often, publishers put locks on PDF files for various reasons, e.g. intellectual property protection, forced consistency by preventing unwanted changes, etc. Here is an example of the possible locks that can be applied to a PDF file (in this case the file is completely unlocked):

image

Today we needed to unlock a few PDF files in order to use them in LiveDocs. While looking for a possible solution, I came across the PDFUnlock! web service. It’s very simple to use: you upload a locked PDF file and you immediately receive a link to download the unlocked file. Here are some features from the site’s description:

PDF files can be secured with restrictions that prevent you from for example copying text from them or editing, printing, merging or splitting them. PDFUnlock! can remove these restrictions (a.k.a “owner password”).

If a password is required to open the uploaded file, you will be asked to enter it (a.k.a “user password”). PDFUnlock! cannot, however, recover lost or unknown user passwords.

A PDF file can also be subject to non-standard encryption, such as DRM. PDFUnlock! does not remove such.

There is a further limitation: the maximum file size is 5 MB. And, of course, the rule of thumb that applies to all free, unencrypted, unprotected web services: do not send anything confidential for conversion.

PDFUnlock!

What are back translations really for?

Whenever we’re asked to do a back translation, we instinctively recoil and kindly refuse.
It may not seem like a logical business choice, but to me, back translations are first and foremost a way end clients have to control your work that is far more intrusive than making sure quality is up to scratch. It’s as if they were saying: I don’t really know your mother tongue and since I can never be sure whether you’re good or not, I’ve decided to bring it all back to my language so that I can judge for myself.

And this really irks me.

But of course, this is not all there is to it.
I’m sure at the root of it there’s a lack of communication between the middle entity, that is the company between ourselves, the LSP, and the end clients, and the end client themselves. Very often the middle company does not have any Italian linguists and they have to find ways to reassure a client that they cannot reassure by other, more persuasive means. Hence the back translation.

But is it really effective?
We all know that when translating “you lose some, you gain some”, but what happens when your reverse the combination? I’m sure the end client thinks that if all that was there to begin with is not there in the back translation, then… A-ha!, there’s your mistranslation! But it does not really work quite this way and when you end up having to justify why “more” can and should be translated as “many” if there’s no comparison to follow (i.e. more than… something), well… when this happens frustration kicks in and you end up having to justify your own language to people who don’t speak it nor understand it.

Okapi Framework (components and applications for localizing and translating documentation and software) Milestone 9

A new release of the Okapi Tools is available.

Also, there is now a wiki for Okapi’s help and documentation: http://www.opentag.com/okapi/wiki

Changes Log – Sep-30-2010

Download page for latest stable release: http://okapi.opentag.com/downloads.html

Changes from M8 to M9

  • Rainbow:
    • Translation package Post-Processing utility:
      • Fixed the bug where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
    • Added the user option "Always show the Log when starting a process".
  • Tikal:
    • Fixed the bug in the Merge command where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
    • Switched help to use the wiki.
  • Ratel:
    • Windows position and size are now saved for the next session.
  • CheckMate:
    • Added capability to save and load configurations outside the session.
    • Improved pattern checks defaults and processing.
    • Added support for short vs. long text in text length verification (new Length tab)
    • Added experimental support for terminology verification.
    • Added support for exceptions in verification of double-words.
    • Added some limited support for string-based term verification.
  • Translation resources:
    • Added batchQuery method to the IQuery interface.
    • Added leverage method to the IQuery interface.
    • Open-Tran connector:
      • Changed implementation to use the REST API instead of the XML-RPC.
      • Improved support for queries with inline codes.
    • SimpleTM connector:
      • IMPORTANT: Changed the H2 database dependency from version 1.1.103 (.data.db files) to 1.2.135 (.h2.db files), this breaks backward compatibility: the new SimpleTM connector cannot open the old .data.db files. To convert an older TM: Use a M8 or prior version of Rainbow to run the SimpleTM to TMX step to export your database to TMX. Then, Use this version of Rainbow to run the Generate SimpleTM step to convert your TMX document into a new .h2.data file.
  • Steps:
    • Added the Resource Simplifier Step. It modifies normal reources of filter events into simpler resources for some third-party tools.
    • Added the XLIFF Spitter Step. It splits several <file> inside an XLIFF documents into separate documents.
    • Added the Id-Based Aligner Step. It aligns text units from two input files, based on their unique IDs (resname).
    • Added the XML Validation Step. It performs well-formness XML verification and optionally, DTD or schema validation.
    • Sentence Aligner Step:
      • Updated so entries with empty text are skipped and don’t cause an error.
    • Diff Leverage Step:
      • Added support for 3 input files: new source, old source, old translation. The second and third files must have the same text units (same number and same order).
  • Filters:
    • Modified several filters to generate unique extraction ids in non-text-unit events.
    • Vignette Filter:
      • Added support for monolingual documents.
    • XML Filter:
      • Fixed the bug where text extracted from attribute values was not processed for the codeFinder option.
  • Libraries:
    • Implemented the Appendable and CharSequence interfaces for TextFragment.
    • IMPORTANT: Changed TextFragment.toString() to return the coded text instead of the original content of the fragment. The previous behavior of toString() is now accessible using text().
    • The net.sf.okapi.lib.extra.pipelinebuilder package has been added. It allows you to easily script run pipelines, for example using Jython.

Translating Wordfast Pro TXML files in memoQ

Please note that this procedure no longer applies since version 5 of memoQ, which allows to open, translate and export TXML files directly.

(Updated on 2010-09-24 with some corrections, new filters and procedures for pretranslated files, and simplified procedure)

We sometimes receive files that have been processed using the latest version of WordFast Pro. These are recognizable from the .txml extension.

This format is just a specific XML structure, and as such it should be possible to translate the files using MemoQ after formatting them properly. Here is a simple workflow that will allow you to process the files in memoQ.

 

1. Copy the source segments to the target column

There are three possible situations:

 

 

1.a. There are only a few files in the project and you do not need to preserve any 100% pretranslated segments found in the source Wordfast files 1.b.  No matter how many files are in the project, you want to preserve any 100% translated  segments contained in the Wordfast files by masking them out in memoQ 1.c. For projects including many files, perform a batch search/replace (caveat: you will lose any 100% translated segments contained in your Wordfast files)
In this case, open each file in Wordfast and use the keyboard shortcut ctrl-shift-insto copy the source segments to the target column by overwriting all target segments, no matter what their status is. 

 

 

 

 

If there are any translated segments in the file, they will be overwritten. Save and close the files, and proceed to step 2.

In this case you will have to open every file and confirm every segment with Wordfast’s shortcut ALT-down arrow. There are some prerequisites:

In the Wordfast preferences, go to Translation Memory and enable Copy source on no match.

To prevent Wordfast’s sluggish UI from slowing you down, make sure the outline is off (Window > Show view > Outline)

For even more speed, switch the Wordfast view to Text mode (see below)
image

Once you have completed the prerequisites above, place the cursor in the first segment of your Wordfast file and press and hold ALT-down arrow until you have scrolled through the whole file. Save and close the files, and proceed to step 2.

Open the .txml file using the jEdit text editor. Click here to go to the download page. It is important to use this editor because it allows for a very simple search/replace syntax that takes care of “greedy” wildcards. You can obtain the same results using a different editor, but the syntax to use might be different.

After opening the file in JEdit, place the cursor at the top and choose Search > Find…

In the Search for field, insert the string below (be careful not to add superfluous spaces if copying from this page):
<segment(.*?)>(.*?)<source>(.*?)</source>(.*?)</segment>

In the Replace with field, insert the following string:
<segment$1>$2<source>$3</source>$4<target>$3</target></segment>

Check that the search options are configured as in the screenshot below:
image

Click on Replace All, save the file and quit jEdit. Proceed to step 2.

 

2.  Open the modified file in memoQ

  1. First perform a quick check by opening the file you just saved with Wordfast. Now the target column should be identical to the source column, tags included. The total number of segments should be identical to the value you saw when you first opened the file in Wordfast. After checking this, you can open the file in memoQ.
  2. Add the .xml extension to the file name (e.g. filename.txml.xml), since memoQ likes this better.
  3. Open memoQ and create a new project. Call it for instance “Wordfast”, so you can re-use it easily for subsequent projects that involve translating WordFast files.
  4. Go to Translations > Add document as…
  5. Select the file with .XML extension and open it.
  6. The Document import settings window is displayed.
    If you followed procedure 1.a or 1.c (no preservation of pretranslated material) download this  MemoQ XML definition file (right click, Save As).
    If you followed procedure 1.b above (preservation of pretranslated material) download this  MemoQ XML definition file (right click, Save As).
  7. Click on the on top to import the file downloaded in sub-step 6 above.
  8. Click OK at the bottom of the window. The window closes and the file is imported.
  9. Open the file in memoQ. memoQ should have inserted the any tags in the correct positions corresponding to the tags contained in WordFast. If you followed procedure 1.b (preservation of pretranslated material), the 100% translated segments contained in the original Wordfast files are hidden. They should however be restored when exporting from memoQ.
  10. Translate the file normally.
  11. When ready, export it with Export (dialog)

3.  Check the translated file in Wordfast

  1. Restore the .txml extension and open the translated file in Wordfast. You should get no error messages. Check that the total number of segments is identical to the original files, check the tag positions, etc.
  2. Make one simple modification in the file with Wordfast (e.g. add and delete a space), and save the file.  This step will rewrite some specific Wordfast headers and guarantees compatibility.