About this blog Translator's Shack is a collection of links, news, reviews and opinions about translation technologies. It's edited and updated by Roberto Savelli, an English to Italian translator, project manager and company owner of Albatros Soluzioni Linguistiche, a team of English-Italian translators, which hosts and supports this blog.
The Life as a PM category, managed by Gabriella Ascari, contains topics that are less technical in nature, but which we're sure will be appreciated by owners of small translation businesses and freelancers.
Here are links to my pages on some social networks:


Highly recommended:

|
Wikipedia is a good source of terminology. Wikipedia articles often appear on top of Google search results about specific terms of concepts, and since many articles are translated, or at least written in different languages, it’s often sufficient to click on the corresponding target language in the Wikipedia Languages side bar to jump to the corresponding translated article.
But that’s a lot of clicks, especially because translations are not always available for all languages. If you are only interested in obtaining the translation of the title (which would be the keyword or concept you are looking for), wouldn’t it be nice to just type the source word and see its translation immediately?
There are several online services that offer this functionality. After trying out a few of them, I have settled for Meme Miner for its nice interface and speed. Here’s what a simple search looks like:

So Meme Miner not only displays the term’s translation, but also offers the translated definition, as it appears in the Wikipedia article that uses the searched term as the title. Very useful.
It would be nice to see vendors of translation tools enter into an agreement with Wikipedia about integrating this type of search mechanism into their programs, for instance for pre-populating a glossary, complete with definitions, about a text that is about to be translated.
I also wonder if it would be possible to download bilingual Wikipedia article headings in a way that is easy to manipulate in order to generate bilingual term lists. Any comments about this possibility will be appreciated.
It’s one of the things I have to deal with on a regular basis in working with my in-house translators: how to improve on people’s bad translation choices. The word “bad” here is not meant as an absolute: it simply refers to a term that we as a group prefer not to use and that we invariably correct if we come across it. In most cases it’s just a turn of phrase that is not as universally common as one would like to think, or a typographic convention that has not been thoroughly understood and absorbed. Yet very often not even a clear explanation is enough to eradicate these choices, including when a request to create a personal checklist with one’s own special little flaws is made.
I guess we all fall into the rut of using the same phrases, adjectives, adverbs and idiomatic expressions we have become accustomed to. To be honest, many of these are like a buoy in the ocean, useful little helpers that can come to our aid when nothing better or more appropriate comes to mind.
But if we are not conscious of this, if we don’t pay enough attention to the fact that habits in translation can turn your words to putty, then it is also possible that making a change would not really work and that we would be unable to appreciate the better (or sometimes just different) choice that is imposed upon us.
In Italian, for instance, words like the verb “consentire” have become staples of our linguistic production because they are neutral and flexible. So much so, in fact, that when someone start using other, less orthodox terms, we are immediately alerted to the change and run for cover.
In an ideal world we would have created a common, flexible, accurate set of language choices that we all share and that make our work (mine, like that of my translators) similar in that it stems from the same set of choices.
But these are the things that take planning and time, and very often we refrain from engaging in these types of undertakings because we “just don’t have the time”. And all the time we know that it would only save time to do it!
a tweet I “intercepted” today led me to this potentially interesting interesting tool: demaquina Select.
from the website:
Select is a sidekick tool for preprocessing and boost works on CATs with support for XML Localization Interchange File Format (XLIFF) and Translation Memory Interchange (TMX).
Select offers an unequaled sub-sentence free segmentation ability which together with its own chunk-based Dual-memory System?, and Sub-sentence Case Aware Propagation delivers an ultimate terminology reusability.
With Select, each technical term, common expression or single word translation is typed ONCE in life!
From your elected CAT, export your work to a file with XLIFF format, create a Select Project and experience an incredible time saving with its intelligent sub-sentence term/chunk resuse and case-aware propagation… and many other time-saving driven features. Thereafter import the XLIFF file back into your CAT and sharpen your work, from Sub-sentence Zero Inconsistency to Perfection… Free to Care About Wording and Semantics…
Eradicate inconsistency at sub-sentence level from existing memories and term bases!
Export your elected CAT’s memories as TMX, open them with Select and see how easy it is to select segments with specific terms or expressions and use the Replace process with Sub-sentence Case-aware Propagation to eradicate inconsistency through all them at once!
Select is intended to:
Translate given XLIFF files’ content
If your elected or current project required CAT has support for XLIFF (XML Localization Interchange File Format), you can export your work to a XLIFF file and create a Select Project from it. Taking advantage of Select’s Sub-sentence Case Aware Propagation, you can deal with language common expressions and project specific terminology like software UI elements like never before! Then import the XLIFF file content back to your CAT and resume your work from Zero inconsistency, Free to Care About Wording and Semantics!…
Edit Translation Memories (particularly spot and eradicate inconsistency at sub sentence level)
If your elected CAT has support for TMX (Translation Memory Interchange) you can export any of your translation memories to a TMX file and create a Select Project from it. Then you can use the Replace Process together with Sub-sentence Case-aware Propagation options to eradicate inconsistency and securely change wrong terminology!
It is also possible to translate using a TMX file as interchange file,
creating a "temporary" memory with the content of your source files, with both Source and Target units of each segment filled with the source text. (See bellow, how its TMX exported file should look like to be imported into a Select Project.)
Sub-segment leveraging is certainly one area in which modern TEnTs have a lot of potential for improvement. If implemented correctly, it can save time and, most importantly, facilitate consistency without requiring a lot of time spent on creating and tweaking glossaries. I intend to take a close look at this program after the holidays.
Reference material plays a very important part in most translation projects. We often receive reference files from our clients, and sometimes we have to find them ourselves through web searches or by browsing the client’s website.
The management and usage of reference files is one aspect that has been introduced in memoQ’s LiveDocs feature, which allows to create searchable corpora of monolingual source and target files. So it’s finally time to put all those reference PDFs to good use! But wait, there’s a catch…
Very often, publishers put locks on PDF files for various reasons, e.g. intellectual property protection, forced consistency by preventing unwanted changes, etc. Here is an example of the possible locks that can be applied to a PDF file (in this case the file is completely unlocked):

Today we needed to unlock a few PDF files in order to use them in LiveDocs. While looking for a possible solution, I came across the PDFUnlock! web service. It’s very simple to use: you upload a locked PDF file and you immediately receive a link to download the unlocked file. Here are some features from the site’s description:
PDF files can be secured with restrictions that prevent you from for example copying text from them or editing, printing, merging or splitting them. PDFUnlock! can remove these restrictions (a.k.a “owner password”).
If a password is required to open the uploaded file, you will be asked to enter it (a.k.a “user password”). PDFUnlock! cannot, however, recover lost or unknown user passwords.
A PDF file can also be subject to non-standard encryption, such as DRM. PDFUnlock! does not remove such.
There is a further limitation: the maximum file size is 5 MB. And, of course, the rule of thumb that applies to all free, unencrypted, unprotected web services: do not send anything confidential for conversion.
PDFUnlock!
Whenever we’re asked to do a back translation, we instinctively recoil and kindly refuse.
It may not seem like a logical business choice, but to me, back translations are first and foremost a way end clients have to control your work that is far more intrusive than making sure quality is up to scratch. It’s as if they were saying: I don’t really know your mother tongue and since I can never be sure whether you’re good or not, I’ve decided to bring it all back to my language so that I can judge for myself.
And this really irks me.
But of course, this is not all there is to it.
I’m sure at the root of it there’s a lack of communication between the middle entity, that is the company between ourselves, the LSP, and the end clients, and the end client themselves. Very often the middle company does not have any Italian linguists and they have to find ways to reassure a client that they cannot reassure by other, more persuasive means. Hence the back translation.
But is it really effective?
We all know that when translating “you lose some, you gain some”, but what happens when your reverse the combination? I’m sure the end client thinks that if all that was there to begin with is not there in the back translation, then… A-ha!, there’s your mistranslation! But it does not really work quite this way and when you end up having to justify why “more” can and should be translated as “many” if there’s no comparison to follow (i.e. more than… something), well… when this happens frustration kicks in and you end up having to justify your own language to people who don’t speak it nor understand it.
A new release of the Okapi Tools is available.
Also, there is now a wiki for Okapi’s help and documentation: http://www.opentag.com/okapi/wiki
Changes Log – Sep-30-2010
Download page for latest stable release: http://okapi.opentag.com/downloads.html
Changes from M8 to M9
- Rainbow:
- Translation package Post-Processing utility:
- Fixed the bug where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
- Added the user option "Always show the Log when starting a process".
- Tikal:
- Fixed the bug in the Merge command where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
- Switched help to use the wiki.
- Ratel:
- Windows position and size are now saved for the next session.
- CheckMate:
- Added capability to save and load configurations outside the session.
- Improved pattern checks defaults and processing.
- Added support for short vs. long text in text length verification (new Length tab)
- Added experimental support for terminology verification.
- Added support for exceptions in verification of double-words.
- Added some limited support for string-based term verification.
- Translation resources:
- Added
batchQuery method to the IQuery interface.
- Added
leverage method to the IQuery interface.
- Open-Tran connector:
- Changed implementation to use the REST API instead of the XML-RPC.
- Improved support for queries with inline codes.
- SimpleTM connector:
- IMPORTANT: Changed the H2 database dependency from version 1.1.103 (.data.db files) to 1.2.135 (.h2.db files), this breaks backward compatibility: the new SimpleTM connector cannot open the old .data.db files. To convert an older TM: Use a M8 or prior version of Rainbow to run the SimpleTM to TMX step to export your database to TMX. Then, Use this version of Rainbow to run the Generate SimpleTM step to convert your TMX document into a new .h2.data file.
- Steps:
- Added the Resource Simplifier Step. It modifies normal reources of filter events into simpler resources for some third-party tools.
- Added the XLIFF Spitter Step. It splits several
<file> inside an XLIFF documents into separate documents.
- Added the Id-Based Aligner Step. It aligns text units from two input files, based on their unique IDs (resname).
- Added the XML Validation Step. It performs well-formness XML verification and optionally, DTD or schema validation.
- Sentence Aligner Step:
- Updated so entries with empty text are skipped and don’t cause an error.
- Diff Leverage Step:
- Added support for 3 input files: new source, old source, old translation. The second and third files must have the same text units (same number and same order).
- Filters:
- Modified several filters to generate unique extraction ids in non-text-unit events.
- Vignette Filter:
- Added support for monolingual documents.
- XML Filter:
- Fixed the bug where text extracted from attribute values was not processed for the codeFinder option.
- Libraries:
- Implemented the Appendable and CharSequence interfaces for TextFragment.
- IMPORTANT: Changed
TextFragment.toString() to return the coded text instead of the original content of the fragment. The previous behavior of toString() is now accessible using text().
- The
net.sf.okapi.lib.extra.pipelinebuilder package has been added. It allows you to easily script run pipelines, for example using Jython.
Please note that this procedure no longer applies since version 5 of memoQ, which allows to open, translate and export TXML files directly.
(Updated on 2010-09-24 with some corrections, new filters and procedures for pretranslated files, and simplified procedure)
We sometimes receive files that have been processed using the latest version of WordFast Pro. These are recognizable from the .txml extension.
This format is just a specific XML structure, and as such it should be possible to translate the files using MemoQ after formatting them properly. Here is a simple workflow that will allow you to process the files in memoQ.
1. Copy the source segments to the target column
There are three possible situations:
| 1.a. There are only a few files in the project and you do not need to preserve any 100% pretranslated segments found in the source Wordfast files |
1.b. No matter how many files are in the project, you want to preserve any 100% translated segments contained in the Wordfast files by masking them out in memoQ |
1.c. For projects including many files, perform a batch search/replace (caveat: you will lose any 100% translated segments contained in your Wordfast files) |
| In this case, open each file in Wordfast and use the keyboard shortcut ctrl-shift-insto copy the source segments to the target column by overwriting all target segments, no matter what their status is.
If there are any translated segments in the file, they will be overwritten. Save and close the files, and proceed to step 2. |
In this case you will have to open every file and confirm every segment with Wordfast’s shortcut ALT-down arrow. There are some prerequisites:
In the Wordfast preferences, go to Translation Memory and enable Copy source on no match.
To prevent Wordfast’s sluggish UI from slowing you down, make sure the outline is off (Window > Show view > Outline)
For even more speed, switch the Wordfast view to Text mode (see below)

Once you have completed the prerequisites above, place the cursor in the first segment of your Wordfast file and press and hold ALT-down arrow until you have scrolled through the whole file. Save and close the files, and proceed to step 2. |
Open the .txml file using the jEdit text editor. Click here to go to the download page. It is important to use this editor because it allows for a very simple search/replace syntax that takes care of “greedy” wildcards. You can obtain the same results using a different editor, but the syntax to use might be different.
After opening the file in JEdit, place the cursor at the top and choose Search > Find…
In the Search for field, insert the string below (be careful not to add superfluous spaces if copying from this page):
<segment(.*?)>(.*?)<source>(.*?)</source>(.*?)</segment>
In the Replace with field, insert the following string:
<segment$1>$2<source>$3</source>$4<target>$3</target></segment>
Check that the search options are configured as in the screenshot below:

Click on Replace All, save the file and quit jEdit. Proceed to step 2. |
2. Open the modified file in memoQ
- First perform a quick check by opening the file you just saved with Wordfast. Now the target column should be identical to the source column, tags included. The total number of segments should be identical to the value you saw when you first opened the file in Wordfast. After checking this, you can open the file in memoQ.
- Add the .xml extension to the file name (e.g. filename.txml.xml), since memoQ likes this better.
- Open memoQ and create a new project. Call it for instance “Wordfast”, so you can re-use it easily for subsequent projects that involve translating WordFast files.
- Go to Translations > Add document as…
- Select the file with .XML extension and open it.
- The Document import settings window is displayed.
If you followed procedure 1.a or 1.c (no preservation of pretranslated material) download this MemoQ XML definition file (right click, Save As).
If you followed procedure 1.b above (preservation of pretranslated material) download this MemoQ XML definition file (right click, Save As).
- Click on the … on top to import the file downloaded in sub-step 6 above.
- Click OK at the bottom of the window. The window closes and the file is imported.
- Open the file in memoQ. memoQ should have inserted the any tags in the correct positions corresponding to the tags contained in WordFast. If you followed procedure 1.b (preservation of pretranslated material), the 100% translated segments contained in the original Wordfast files are hidden. They should however be restored when exporting from memoQ.
- Translate the file normally.
- When ready, export it with Export (dialog)
3. Check the translated file in Wordfast
- Restore the .txml extension and open the translated file in Wordfast. You should get no error messages. Check that the total number of segments is identical to the original files, check the tag positions, etc.
- Make one simple modification in the file with Wordfast (e.g. add and delete a space), and save the file. This step will rewrite some specific Wordfast headers and guarantees compatibility.

While attending the “Train the Trainer” course in Pécz last week, I was pleased to hear that the developers are still progressing at a very fast pace, adding interesting features to all new releases of our favorite translation environment tool. Below is a brief outline of what I found out. Be sure to attend the Introducing memoQ 4.5 webinar that is offered tomorrow from 3:30 PM to 4:30 PM CEST to double-check the information I am giving here.
TM Performance improvements
Kilgray showed me a real-life comparison between the current version (4.2) and the new one (4.5) by running an analysis of a particularly complex file (containing many tags and many segments with numbers) against a TM (I do not recall if it was a server or a local TM, but it should not make any significant difference for the purpose of this comparison). Version 4.5 was visibly much faster than 4.2 in this test. I would say 4, or perhaps more, times faster. This is true both for analysis and pretranslation.
LiveDocs
In version 4.5, the “Alignment” tab will be gone, to be replaced by a brand-new concept called LiveDocs. This is a collection of text files, or a “corpus”, which can contain files in different formats (monolingual files paired by a specific function, bilingual files and monolingual documents such ad PDFs).
When added to the corpus, all these files are immediately available for fuzzy searches. There is no need to align them. Add multiple files (you are no longer limited to one pair of files as in the alignment feature of the previous versions) from your reference folder, and you will see them appear in the concordance results. If you have the source and the target language version of the same document, you have the chance to highlight terms in the source and in the target to add them to your term base. Apparently, the new alignment algorithm also uses the term base entries and inline tags as reference points for alignment, elevating the score of the sentence containing the term pair or tags for a more precise alignment.
More importantly, when the user makes an alignment correction, the pair of strings is added as a reference point for the ongoing alignment function that works in the background, using the reference point to fine-tune and improve the alignment. So if you spot any misalignment in the corpus while doing a search, you will be able to correct it on the fly, and your correction will immediately be used for optimizing the index. The index is continuously updated in the background, but this feature can be temporarily stopped if it interferes with other resource-hungry applications.
By inserting monolingual files into the corpus, even if no translation is available for them, you will be able to obtain precious suggestions that include more context. Something that a translation memory alone cannot do, since it’s limited to one segment at a time.
Although not ground-breaking (if I’m not mistaken it’s offered, among others, by Multicorpora’s MultiTrans, but at a much, much higher price) this feature seems to have been implemented very nicely. Translators who have dozens of reference files will be able to access them instantaneously from the translation environment. One important concept that may not be obvious from the description above is the following: you will no longer waste time aligning sentences that you probably will never need to use. Instead, you will access useful linguistic data right when you need them.
Online/desktop documents
This will be a very welcome addition for users, like us, who have a memoQ Server. Up to now, while creating a new project your choice is limited to online projects (with files residing on the server, which implies that the translator has to be online all the time and is unable to create “views” from the documents), and to handoffs, which are local files with references to the server’s resources.
Version 4.5 introduces the concept of “desktop documents”. In practice, we are still dealing with an online project, but the translator will work on local documents and will benefit from superior manipulation options, such as the creation of views. However, the project will still be synchronized on the server, so the project manager will not have to give up precious progress information.
Management interface
This mode has a separate window now. What I understood about this feature is that you will be able to translate a project and at the same time manage it (or another project) without having to go through the annoying step of closing your working project before switching to PM mode. Again, a great time-saver for our team.
Other features
- It’s now easier to create new users thanks to links available in several places of the interface
- An InDesign CS5 filter will be available in the upcoming months
- Reverse lookup in TM will be possible (only for reference TMs)
- Slider to choose between “better recall” or “faster lookup” from the TM
- Backup feature for local projects to facilitate moving from one PC to another
UPDATE
Here is some more information I gathered on the memoQ mailing list:
- LiveAlign is the name for the “on the fly” alignment feature
- ActiveTM is the one which gives matches from all the documents in the corpus when you’re translating, indeed leveraging your previous assets
- [...] monolingual indexing à la dtSearch is the Library.
- Along with ezAttach, which allows the addition of binary files (pictures, music, anything)
… this dynamic quad team constitutes the LiveDocs all-in-one feature. Get the most out of your resources, whether monolingual or bilingual!
Thanks to Livedocs, the time when reference documents were simply ignored by the translator because they were too long to read or only scarce parts were relevant is almost over!
|