About this blog

Translator's Shack is a collection of links, news, reviews and opinions about translation technologies. It's edited and updated by Roberto Savelli, an English to Italian translator, project manager and company owner of Albatros Soluzioni Linguistiche, a team of English-Italian translators, which hosts and supports this blog.


The Life as a PM category, managed by Gabriella Ascari, contains topics that are less technical in nature, but which we're sure will be appreciated by owners of small translation businesses and freelancers.


Here are links to my pages on some social networks:


Highly recommended:


demaquina Select: sub-sentence segmentation, propagation etc. on TMX, XLIFF [@joaoalb]

a tweet I “intercepted” today led me to this potentially interesting interesting tool: demaquina Select.

from the website:

Select is a sidekick tool for preprocessing and boost works on CATs with support for XML Localization Interchange File Format (XLIFF) and Translation Memory Interchange (TMX).

Select offers an unequaled sub-sentence free segmentation ability which together with its own chunk-based Dual-memory System?, and Sub-sentence Case Aware Propagation delivers an ultimate terminology reusability.

With Select, each technical term, common expression or single word translation is typed ONCE in life!

From your elected CAT, export your work to a file with XLIFF format, create a Select Project and experience an incredible time saving with its intelligent sub-sentence term/chunk resuse and case-aware propagation… and many other time-saving driven features. Thereafter import the XLIFF file back into your CAT and sharpen your work, from Sub-sentence Zero Inconsistency to Perfection… Free to Care About Wording and Semantics…

Eradicate inconsistency at sub-sentence level from existing memories and term bases!

Export your elected CAT’s memories as TMX, open them with Select and see how easy it is to select segments with specific terms or expressions and use the Replace process with Sub-sentence Case-aware Propagation to eradicate inconsistency through all them at once!

 

Select is intended to:

Translate given XLIFF files’ content

If your elected or current project required CAT has support for XLIFF (XML Localization Interchange File Format), you can export your work to a XLIFF file and create a Select Project from it. Taking advantage of Select’s Sub-sentence Case Aware Propagation, you can deal with language common expressions and project specific terminology like software UI elements like never before! Then import the XLIFF file content back to your CAT and resume your work from Zero inconsistency, Free to Care About Wording and Semantics!…

Edit Translation Memories (particularly spot and eradicate inconsistency at sub sentence level)

If your elected CAT has support for TMX (Translation Memory Interchange) you can export any of your translation memories to a TMX file and create a Select Project from it. Then you can use the Replace Process together with Sub-sentence Case-aware Propagation options to eradicate inconsistency and securely change wrong terminology!

It is also possible to translate using a TMX file as interchange file,

creating a "temporary" memory with the content of your source files, with both Source and Target units of each segment filled with the source text. (See bellow, how its TMX exported file should look like to be imported into a Select Project.)

Sub-segment leveraging is certainly one area in which modern TEnTs have a lot of potential for improvement. If implemented correctly, it can save time and, most importantly, facilitate consistency without requiring a lot of time spent on creating and tweaking glossaries. I intend to take a close look at this program after the holidays.

Translating WordFast TXML files in MemoQ

Please note that this procedure no longer applies since version 5 of memoQ, which allows to open, translate and export TXML files directly.

(Updated on 2010-09-24 with some corrections, new filters and procedures for pretranslated files, and simplified procedure)

We sometimes receive files that have been processed using the latest version of WordFast Pro. These are recognizable from the .txml extension.

This format is just a specific XML structure, and as such it should be possible to translate the files using MemoQ after formatting them properly. Here is a simple workflow that will allow you to process the files in memoQ.

 

1. Copy the source segments to the target column

There are three possible situations:

 

 

1.a. There are only a few files in the project and you do not need to preserve any 100% pretranslated segments found in the source Wordfast files 1.b.  No matter how many files are in the project, you want to preserve any 100% translated  segments contained in the Wordfast files by masking them out in memoQ 1.c. For projects including many files, perform a batch search/replace (caveat: you will lose any 100% translated segments contained in your Wordfast files)
In this case, open each file in Wordfast and use the keyboard shortcut ctrl-shift-insto copy the source segments to the target column by overwriting all target segments, no matter what their status is. 

 

 

 

 

If there are any translated segments in the file, they will be overwritten. Save and close the files, and proceed to step 2.

In this case you will have to open every file and confirm every segment with Wordfast’s shortcut ALT-down arrow. There are some prerequisites:

In the Wordfast preferences, go to Translation Memory and enable Copy source on no match.

To prevent Wordfast’s sluggish UI from slowing you down, make sure the outline is off (Window > Show view > Outline)

For even more speed, switch the Wordfast view to Text mode (see below)
image

Once you have completed the prerequisites above, place the cursor in the first segment of your Wordfast file and press and hold ALT-down arrow until you have scrolled through the whole file. Save and close the files, and proceed to step 2.

Open the .txml file using the jEdit text editor. Click here to go to the download page. It is important to use this editor because it allows for a very simple search/replace syntax that takes care of “greedy” wildcards. You can obtain the same results using a different editor, but the syntax to use might be different.

After opening the file in JEdit, place the cursor at the top and choose Search > Find…

In the Search for field, insert the string below (be careful not to add superfluous spaces if copying from this page):
<segment(.*?)>(.*?)<source>(.*?)</source>(.*?)</segment>

In the Replace with field, insert the following string:
<segment$1>$2<source>$3</source>$4<target>$3</target></segment>

Check that the search options are configured as in the screenshot below:
image

Click on Replace All, save the file and quit jEdit. Proceed to step 2.

 

2.  Open the modified file in memoQ

  1. First perform a quick check by opening the file you just saved with Wordfast. Now the target column should be identical to the source column, tags included. The total number of segments should be identical to the value you saw when you first opened the file in Wordfast. After checking this, you can open the file in memoQ.
  2. Add the .xml extension to the file name (e.g. filename.txml.xml), since memoQ likes this better.
  3. Open memoQ and create a new project. Call it for instance “Wordfast”, so you can re-use it easily for subsequent projects that involve translating WordFast files.
  4. Go to Translations > Add document as…
  5. Select the file with .XML extension and open it.
  6. The Document import settings window is displayed.
    If you followed procedure 1.a or 1.c (no preservation of pretranslated material) download this  MemoQ XML definition file (right click, Save As).
    If you followed procedure 1.b above (preservation of pretranslated material) download this  MemoQ XML definition file (right click, Save As).
  7. Click on the on top to import the file downloaded in sub-step 6 above.
  8. Click OK at the bottom of the window. The window closes and the file is imported.
  9. Open the file in memoQ. memoQ should have inserted the any tags in the correct positions corresponding to the tags contained in WordFast. If you followed procedure 1.b (preservation of pretranslated material), the 100% translated segments contained in the original Wordfast files are hidden. They should however be restored when exporting from memoQ.
  10. Translate the file normally.
  11. When ready, export it with Export (dialog)

3.  Check the translated file in Wordfast

  1. Restore the .txml extension and open the translated file in Wordfast. You should get no error messages. Check that the total number of segments is identical to the original files, check the tag positions, etc.
  2. Make one simple modification in the file with Wordfast (e.g. add and delete a space), and save the file.  This step will rewrite some specific Wordfast headers and guarantees compatibility.

memoQ 4.5: What to expect

memoQ logo





While attending the “Train the Trainer” course in Pécz last week, I was pleased to hear that the developers are still progressing at a very fast pace, adding interesting features to all new releases of our favorite translation environment tool. Below is a brief outline of what I found out. Be sure to attend the Introducing memoQ 4.5 webinar that is offered tomorrow from 3:30 PM to 4:30 PM CEST to double-check the information I am giving here.

TM Performance improvements

Kilgray showed me a real-life comparison between the current version (4.2) and the new one (4.5) by running an analysis of a particularly complex file (containing many tags and many segments with numbers) against a TM (I do not recall if it was a server or a local TM, but it should not make any significant difference for the purpose of this comparison). Version 4.5 was visibly much faster than 4.2 in this test. I would say 4, or perhaps more, times faster. This is true both for analysis and pretranslation.

LiveDocs

In version 4.5, the “Alignment” tab will be gone, to be replaced by a brand-new concept called LiveDocs. This is a collection of text files, or a “corpus”, which can contain files in different formats (monolingual files paired by a specific function, bilingual files and monolingual documents such ad PDFs).

When added to the corpus, all these files are immediately available for fuzzy searches. There is no need to align them. Add multiple files (you are no longer limited to one pair of files as in the alignment feature of the previous versions) from your reference folder, and you will see them appear in the concordance results. If you have the source and the target language version of the same document, you have the chance to highlight terms in the source and in the target to add them to your term base. Apparently, the new alignment algorithm also uses the term base entries and inline tags as reference points for alignment, elevating the score of the sentence containing the term pair or tags for a more precise alignment.

More importantly, when the user makes an alignment correction, the pair of strings is added as a reference point for the ongoing alignment function that works in the background, using the reference point to fine-tune and improve the alignment. So if you spot any misalignment in the corpus while doing a search, you will be able to correct it on the fly, and your correction will immediately be used for optimizing the index. The index is continuously updated in the background, but this feature can be temporarily stopped if it interferes with other resource-hungry applications.

By inserting monolingual files into the corpus, even if no translation is available for them, you will be able to obtain precious suggestions that include more context. Something that a translation memory alone cannot do, since it’s limited to one segment at a time.

Although not ground-breaking (if I’m not mistaken it’s offered, among others, by Multicorpora’s MultiTrans, but at a much, much higher price) this feature seems to have been implemented very nicely. Translators who have dozens of reference files will be able to access them instantaneously from the translation environment. One important concept that may not be obvious from the description above is the following: you will no longer waste time aligning sentences that you probably will never need to use. Instead, you will access useful linguistic data right when you need them.

Online/desktop documents

This will be a very welcome addition for users, like us, who have a memoQ Server. Up to now, while creating a new project your choice is limited to online projects (with files residing on the server, which implies that the translator has to be online all the time and is unable to create “views” from the documents), and to handoffs, which are local files with references to the server’s resources.

Version 4.5 introduces the concept of “desktop documents”. In practice, we are still dealing with an online project, but the translator will work on local documents and will benefit from superior manipulation options, such as the creation of views. However, the project will still be synchronized on the server, so the project manager will not have to give up precious progress information.

Management interface

This mode has a separate window now. What I understood about this feature is that you will be able to translate a project and at the same time manage it (or another project) without having to go through the annoying step of closing your working project before switching to PM mode. Again, a great time-saver for our team.

Other features

  • It’s now easier to create new users thanks to links available in several places of the interface
  • An InDesign CS5 filter will be available in the upcoming months
  • Reverse lookup in TM will be possible (only for reference TMs)
  • Slider to choose between “better recall” or “faster lookup” from the TM
  • Backup feature for local projects to facilitate moving from one PC to another

UPDATE

Here is some more information I gathered on the memoQ mailing list:

  • LiveAlign is the name for the “on the fly” alignment feature
  • ActiveTM is the one which gives matches from all the documents in the corpus when you’re translating, indeed leveraging your previous assets
  • [...] monolingual indexing à la dtSearch is the Library.
  • Along with ezAttach, which allows the addition of binary files (pictures, music, anything)

… this dynamic quad team constitutes the LiveDocs all-in-one feature. Get the most out of your resources, whether monolingual or bilingual!

Thanks to Livedocs, the time when reference documents were simply ignored by the translator because they were too long to read or only scarce parts were relevant is almost over!

Across Systems Releases Language Server v5 SP1

across



Across Systems announced Language Server v5 SP1. In brief, here are some of the new features:

  • Web-based Translation Workflows
  • Crowdsourcing
  • Integration of Machine Translation
  • Authoring Assistance

See the full press release here.

memoQfest 2010 - TM Repository

Moravia are the first adopters of the TM Repository.

The main problem with the TM repository so far was: “what is it, and what is it intended for?”

According to statistics, 80% of new software projects fail if there is no clear vision about the project. The TM Repository project has been put on hold for a certain period, but the company has resumed development now that there is such a vision.

Problem statements

TMs can be scattered in hundreds of tiny files.

Or, in a “big mama” type of approach, the TMs are managed by using filters and metadata. A project TM is a TM that only contains the segments (e.g. from a “big mama”) that are relevant to a specific project.

Including the reviewing process into the technology is essential. Some reviewers will refuse to use a translation environment to review a text (this has been solved by memoQ’s RTF columnar export).

TM contents need regular cleanup. What needs to be deleted/fixed: outdated entries, bad translations, “other” garbage.

In technical documentation, there is a lot of terminology that refers to concepts that get deprecated.

Cross-tool TMX transfer does not give good enough leverage. Attributes specified in one tool will often not be transferred to a different tool.

Context information: how can we transfer the context information (saved as hashes in Trados Studio TMs and as clear text in memoQ)? In short, we can’t. The hashing algorithm used in Idiom is patented. In other cases, it’s kept secret.

Features and workflows

  • Single database with TMX import/export. At its current stage TM Repository can only receive TMX files.

  • It accepts any sort of metadata (client, project number, dates, etc.), no matter which tool was used to create the TMX.

  • The TM is extracted from TM Repository and can be modified in any translation tool. The TM then needs to get back into the TM Repository, which offers complex features for comparing the new TM to the current contents of the repository.

  • You can import TMs containing one language pair and merge them into a multi-pair TM, then split it again in sub-pairs.

  • Version control and rollbacks: entry history is included for every translation unit. If you make a mistake, you can always roll back to the previous version

  • TM Repository is tool-agnostic. Every attribute is converted to an attribute system that is specified by the user for the TM Repository. Preconfigured mapping templates are available.

  • Full-text (not fuzzy!) concordance search

  • Search and replace in the repository

Next steps

  • A beta version is available now
  • A connector to memoQ Server is in the works. It will make import/exports from/to memoQ transparent through an integration API.

Specifications

Back-end database application with no direct connection to translation management systems (for now). Requires a dedicated server, as it’s resource-intensive.

In its current state, TM Repository will not allow you to upload a translatable project, run a fuzzy-match analysis, and extract the relevant translation units from the repository, regardless of the metadata. From my personal point of view this is a real show-stopper. I cannot see real potential in this product for a small LSP if it does not offer this kind of functionality.

memoQfest 2010 – “ask the Geeks” Q&A session

New feature for memoQ 4.2 or later: project archival. Backup and restore functionality for the project including TM and TB. Useful for moving projects to a new PC and keep two PCs in sync.

Speed degradation on network drives: it’s a problem created by the underlying input/output system.

More on archiving and paths. Suggestion to make relative path as the default. This suggestion is being considered.

In 4.2 a brand-new aligner is available. Interface has been reworked, improvements with multiple documents. Segments are editable.

Problems in handling Chinese/Japanese content:The focus from the input method editor shifts away from the main window. Developers are working to solve the problem. It appears to be a rather complex bug. There is also an alleged problem with fonts since version 4.0. If you copy Japanese text from the source to the target and the target uses a Latin-based font, you may get useless squares. However, most users translating from/to Japanese do not have a problem with this.

Dragon Naturally Speaking support: support in the 4.0 version has problems. These should be solved in version 4.2.

Feature request: hide the mouse pointer when the user starts typing. This is probably already supported by your mouse driver.

Plans to have a clone project feature for local projects.

Feature request: search for tags and/or filter for tags. It’s being considered.

Request: make the metadata from the term base visible in the translation environment. In all probability, qTerm will be addressing this issue.

Hunspell problem with the Rumanian dictionary. Perhaps the best solution is to replace the default dictionary with a new one.

Ctrl-shift-B – Ctrl-shift-N keyboard shortcut allows to move the selection to the left/right.

Request: if the source term is in small caps, insert it as small caps even if it’s capitalized in the target term base.

Fragment-assembling: often short segments from the TM take precedence over the same term from the term base. Sometimes the result is that a term is inserted with wrong capitalization. Perhaps this issue will be solved in qTerm. qTerm will also offer filtering capabilities.

Inverting the “direction” of translation memories. Version 4.5 will introduce features that will make this “problem” obsolete.

Inline tags: if you have translated a project containing bilingual files that use the “old” (inserted by F8) type of tags, and then receive a new version that use the “new” version tags (F9), your match rates will decrease because the tags are not substituted on-the-fly. This is something the developers are working on.

Request: LAN access to term bases and translation memories without using memoQ server: this is recognized as an important feature, but it’s not going to happen. Tiny translator groups still have to purchase the server version if they want to share resources.

Request: add a keyboard shortcut to add a term as untranslatable. This is being considered as an addition to future versions.

memoQfest – XLIFF as a bilingual interchange format

Presentation by Thomas Imhof from localix.biz. Just some quick notes here.

Some interesting concepts:

  • allows translators to concentrate on the text rather than on the formatting.
  • standardized exchange of localization data
  • can serve as a common format for localization tool vendors
  • supports review comments, translation status of each string
  • XLIFF allows to create the target document at any stage
  • Custom namespaces and attribute values allow to extend the information included in XLIFF files

Some limitations of XLIFF:

  • XLIFF knows nothing about segmentation. [see comments section. This appears not to be the case]
  • Extensibility is limited to the specific tool that added the specific extra features.
  • Inline elements: XLIFF does not control the filtering process, so the notation of inline elements in entirely in the hands of the translation tool vendor.

XLIFF support in the current translation tools:

Thomas divides XLIFF support in today’s tools into three groups:

level 1: source is copied to target. Considered as “messy”, offered by many translation tools today

level 2: offered by memoQ and Trados Studio: opens the files correctly and handles elements more or less correctly. Use custom namespaces for tool-specific functionality.

level 3: offered by Swordfish and Heartsome, offer full support for all of functionalities and features, do not add custom namespaces. They use the “note” element offered by XLIFF.

memoQ works well when opening third-party XLIFF files. Roundtrip of SDLXLIFF files produced by Trados studio works well, but some Trados-specific attributes (e.g. segment status) are not updated.

Best practices:

  • make sure XLIFF file is bilingual and not multilingual
  • alt-trans elements are not supported in memoQ
  • etc.

memoQfest 2010 – AGITO Translate

Next item on the program is AGITO Translate, a web-based translation environment based on memoQ server. The foundation for this system are the memoQ APIs

The main feature of this system is its simplicity. Thor Angelo from LanguageWire (the translation company that develops AGITO) admits that although AGITO might be even too simple for some translators, it’s the ideal solution for some clients who require super-fast turnaround and who send frequent, but small chunks of material to be translated. For instance, advertising agencies, web service companies, search engine optimization firms.

AGITO offers a modular approach (term base, translation, editor, integration, authoring, etc.). Clients, as well as translators, can access it through the web interface.

Interesting concept: a brief history outlining the transition of tools from everything offline, to TB and TM online, to documents online, to application online, which is supposed to be the final stage we are getting to now.

AGITO allows translators and proofreaders to access the same document simultaneously. No software installation is required, and project managers can see the real-time status of each job.

Some examples of problems on the user’s end were presented, for instance trouble with installing translation tools, problems with the timely delivery, with completeness, etc. AGITO aims to solve this problems by simplifying the whole process on the translator’s end.

During the Q&A session, some concerns were raised by the audience, e.g. spelling control (it’s handled by the browser), quality assessment (there are some basic checks like double spaces etc. but according to LanguageWire a separate proofreader is the way to go). Also, the translator is not allowed to use his/her own translation memories or term bases. Moreover, at the current stage the translator has no local copy of the translation material.

Some concern was expressed about confidentiality. The system is protected by secured passwords. “Just like your home banking system”.

The system is, in theory, ideal for crowdsourced translation projects.

In conclusion, AGITO is certainly not a product tat our team of translators would like to use any time soon, but it’s an innovative concept that could be interesting for some agencies that work with very tight deadlines and require multi-user collaboration without the overhead of supporting local installations.

memoQ masterclass by Angelika Zerfass, part 3

memoQ term bases

After the lunch break, the structure of memoQ term base entries was discussed. Angelika explained the import mappings for CSV files.

One trick for importing term bases in the fastest way possible: if you work frequently with one language pair and always use the same term base structure, export a sample term base from memoQ and delete all the content except the rows containing the headings. Then use this CSV template every time by pasting your contents under the column headings. When you have to import the resulting term base into memoQ, you will not need to do any mapping, because the column headings will be correctly accepted and configured by memoQ.

In the current version of memoQ, only 5-6 hard-coded fields are available. While this is probably enough for most translators, organizations that have terminology management systems feel the limitation of this setup. That’s why Kilgray will introduce a brand-new terminology system that will contain custom fields and complex structures.

Terminology plug-ins

memoQ 4.2 offers terminology plug-ins. One example is the EuroTermBank: if you start a term lookup (ctrl-P) , you can type a term search and specify to search the term not only in the normal memoQ term bases, but also in the online EuroTermBank database. Kilgray is part of the EuroTermBank consortium and can offer this feature to all its users for free. Needless to say, you need to be online in order to use this plug-in.

Two-column RTF export

Balázs Kis then proceeded to show the brand-new functionality called two-column RTF export. In the presentation, Balázs added a couple of comments to some segments, created a view that only included commented segments, and proceeded to export the view as a two-column RTF file. He then opened the resulting file in Word. The resulting file is a multi-column, editable file that a reviewer can use even if she does not have memoQ. The third column contains a color-coded value of the segment status. The general comment in the room was “this is better implemented than in Déja Vu”, “great!”. There was even some applause! You could really tell that this was a long-awaited feature.

XLIFF Translator 0.2 is out

XLIFF Translator is a free (MIT license) Windows desktop application for translating XLIFF files. XLIFF is a standard XML format for translation.

Changelog for this version:

  • Support for opening single XLIFF files
  • Wide range of XLIFF files passing test suite
  • Japanese localization finished.

From: Felix Blog