memoQ 4.5: What to expect

memoQ logo





While attending the “Train the Trainer” course in Pécz last week, I was pleased to hear that the developers are still progressing at a very fast pace, adding interesting features to all new releases of our favorite translation environment tool. Below is a brief outline of what I found out. Be sure to attend the Introducing memoQ 4.5 webinar that is offered tomorrow from 3:30 PM to 4:30 PM CEST to double-check the information I am giving here.

TM Performance improvements

Kilgray showed me a real-life comparison between the current version (4.2) and the new one (4.5) by running an analysis of a particularly complex file (containing many tags and many segments with numbers) against a TM (I do not recall if it was a server or a local TM, but it should not make any significant difference for the purpose of this comparison). Version 4.5 was visibly much faster than 4.2 in this test. I would say 4, or perhaps more, times faster. This is true both for analysis and pretranslation.

LiveDocs

In version 4.5, the “Alignment” tab will be gone, to be replaced by a brand-new concept called LiveDocs. This is a collection of text files, or a “corpus”, which can contain files in different formats (monolingual files paired by a specific function, bilingual files and monolingual documents such ad PDFs).

When added to the corpus, all these files are immediately available for fuzzy searches. There is no need to align them. Add multiple files (you are no longer limited to one pair of files as in the alignment feature of the previous versions) from your reference folder, and you will see them appear in the concordance results. If you have the source and the target language version of the same document, you have the chance to highlight terms in the source and in the target to add them to your term base. Apparently, the new alignment algorithm also uses the term base entries and inline tags as reference points for alignment, elevating the score of the sentence containing the term pair or tags for a more precise alignment.

More importantly, when the user makes an alignment correction, the pair of strings is added as a reference point for the ongoing alignment function that works in the background, using the reference point to fine-tune and improve the alignment. So if you spot any misalignment in the corpus while doing a search, you will be able to correct it on the fly, and your correction will immediately be used for optimizing the index. The index is continuously updated in the background, but this feature can be temporarily stopped if it interferes with other resource-hungry applications.

By inserting monolingual files into the corpus, even if no translation is available for them, you will be able to obtain precious suggestions that include more context. Something that a translation memory alone cannot do, since it’s limited to one segment at a time.

Although not ground-breaking (if I’m not mistaken it’s offered, among others, by Multicorpora’s MultiTrans, but at a much, much higher price) this feature seems to have been implemented very nicely. Translators who have dozens of reference files will be able to access them instantaneously from the translation environment. One important concept that may not be obvious from the description above is the following: you will no longer waste time aligning sentences that you probably will never need to use. Instead, you will access useful linguistic data right when you need them.

Online/desktop documents

This will be a very welcome addition for users, like us, who have a memoQ Server. Up to now, while creating a new project your choice is limited to online projects (with files residing on the server, which implies that the translator has to be online all the time and is unable to create “views” from the documents), and to handoffs, which are local files with references to the server’s resources.

Version 4.5 introduces the concept of “desktop documents”. In practice, we are still dealing with an online project, but the translator will work on local documents and will benefit from superior manipulation options, such as the creation of views. However, the project will still be synchronized on the server, so the project manager will not have to give up precious progress information.

Management interface

This mode has a separate window now. What I understood about this feature is that you will be able to translate a project and at the same time manage it (or another project) without having to go through the annoying step of closing your working project before switching to PM mode. Again, a great time-saver for our team.

Other features

  • It’s now easier to create new users thanks to links available in several places of the interface
  • An InDesign CS5 filter will be available in the upcoming months
  • Reverse lookup in TM will be possible (only for reference TMs)
  • Slider to choose between “better recall” or “faster lookup” from the TM
  • Backup feature for local projects to facilitate moving from one PC to another

UPDATE

Here is some more information I gathered on the memoQ mailing list:

  • LiveAlign is the name for the “on the fly” alignment feature
  • ActiveTM is the one which gives matches from all the documents in the corpus when you’re translating, indeed leveraging your previous assets
  • […] monolingual indexing à la dtSearch is the Library.
  • Along with ezAttach, which allows the addition of binary files (pictures, music, anything)

… this dynamic quad team constitutes the LiveDocs all-in-one feature. Get the most out of your resources, whether monolingual or bilingual!

Thanks to Livedocs, the time when reference documents were simply ignored by the translator because they were too long to read or only scarce parts were relevant is almost over!

Dynamic Translation Memory: Using Statistical Machine Translation to improve Translation Memory Fuzzy Matches

I came across this interesting paper by Ergun Biçici 1 and Marc Dymetman. In short:

The paper proposes to perform this automation in the following  way: a phrase-based Statistical Machine  Translation  (SMT)  system (trained on  a  bilingual  corpus  in the same domain as the TM) is combined with the TM fuzzy match, by extracting from the fuzzy-match a large (possibly gapped) bi-phrase that is dynamically added to the usual set of “static” bi-phrases used for  decoding the source.

Here is the download link

Importing the Microsoft Terminology Collection

The latest Tool Kit contains a nice description and details about importing the newly-available

Microsoft Terminology Collection into the translation environment of your choice. If your tool does not support the TBX format, however, you will have to transform the data into the proper format (e.g. CSV) before importing it.

The Tool Kit suggests using the excellent XBench for importing the TBX terminology file and exporting it into a comma-separated file. It also warns that XBench drops the “definition” field which, in my opinion, contains very useful context information. So in this case I’d say XBench is not the way to go.

By digging into the memoQ user discussion formum, I found this useful tidbit of information by Denis Hay:

True, we don’t have official support for TBX yet, but just add ".xml" to your file, open in Excel 2003 or 2007 and save as Unicode text. You will easily be able to import that into any memoQ termbase, picking only those columns you want.

Excellent. This should solve the problem and make the TBX easily accessible even if your favorite translation tool does not support this format natively (as is currently the case with memoQ).

Another solution that I tried and found to be working flawlessly is using Wordfast Pro, which supports TBX out of the box and allows you to export an imported glossary to CSV format. Wordfast Pro is available in a free trial version, which has some limitations. I’m not sure if the free version will allow to import and then export the whole Microsoft glossary, but my guess is it will.

CopyFlow Gold for InDesign CS5 5.2 released

CopyFlow Gold allows to export formatted text out of InDesign or QuarkXPress documents to a computerized translation system, and then batch import the translated text back into its original page location — all while preserving the typographic formatting. CopyFlow Gold for Adobe Illustrator is also available.

According to the latest post on the developer’s blog, CopyFlow Gold for InDesign CS5 has received an update.

memoQfest 2010 – TM Repository

Moravia are the first adopters of the TM Repository.

The main problem with the TM repository so far was: “what is it, and what is it intended for?”

According to statistics, 80% of new software projects fail if there is no clear vision about the project. The TM Repository project has been put on hold for a certain period, but the company has resumed development now that there is such a vision.

Problem statements

TMs can be scattered in hundreds of tiny files.

Or, in a “big mama” type of approach, the TMs are managed by using filters and metadata. A project TM is a TM that only contains the segments (e.g. from a “big mama”) that are relevant to a specific project.

Including the reviewing process into the technology is essential. Some reviewers will refuse to use a translation environment to review a text (this has been solved by memoQ’s RTF columnar export).

TM contents need regular cleanup. What needs to be deleted/fixed: outdated entries, bad translations, “other” garbage.

In technical documentation, there is a lot of terminology that refers to concepts that get deprecated.

Cross-tool TMX transfer does not give good enough leverage. Attributes specified in one tool will often not be transferred to a different tool.

Context information: how can we transfer the context information (saved as hashes in Trados Studio TMs and as clear text in memoQ)? In short, we can’t. The hashing algorithm used in Idiom is patented. In other cases, it’s kept secret.

Features and workflows

  • Single database with TMX import/export. At its current stage TM Repository can only receive TMX files.

  • It accepts any sort of metadata (client, project number, dates, etc.), no matter which tool was used to create the TMX.

  • The TM is extracted from TM Repository and can be modified in any translation tool. The TM then needs to get back into the TM Repository, which offers complex features for comparing the new TM to the current contents of the repository.

  • You can import TMs containing one language pair and merge them into a multi-pair TM, then split it again in sub-pairs.

  • Version control and rollbacks: entry history is included for every translation unit. If you make a mistake, you can always roll back to the previous version

  • TM Repository is tool-agnostic. Every attribute is converted to an attribute system that is specified by the user for the TM Repository. Preconfigured mapping templates are available.

  • Full-text (not fuzzy!) concordance search

  • Search and replace in the repository

Next steps

  • A beta version is available now
  • A connector to memoQ Server is in the works. It will make import/exports from/to memoQ transparent through an integration API.

Specifications

Back-end database application with no direct connection to translation management systems (for now). Requires a dedicated server, as it’s resource-intensive.

In its current state, TM Repository will not allow you to upload a translatable project, run a fuzzy-match analysis, and extract the relevant translation units from the repository, regardless of the metadata. From my personal point of view this is a real show-stopper. I cannot see real potential in this product for a small LSP if it does not offer this kind of functionality.

memoQfest 2010 – How to be successful in the translation industry

Isabella Moore from Comtec gave a very interesting presentation this morning here at memoQfest in Budapest. I am posting the four final points that summarize the issues she discussed.

Communicate results to staff: this will improve the relationship between the company members at all levels (from management to employees) because it gives everyone a good feedback about the overall performance. Interestingly, this aspect is perceived as being very important by employees.

You can’t do everything yourself: learn to delegate and start doing it now.

Join networks, but be  selective: some of them can be a waste of time and resources (this includes online networks, in my opinion)

Consider selling to the public sector: A considerable chunk of Comtek’s turnover comes from the public sector. Of course, approaching this sector requires specific strategies that are different from those that apply to the private sector.

memoQfest 2010 – Q&A with Kilgray management

After an in-depth explanation of the “do not press this button” button (short answer: you can press it if you find it in a software program but you shouldn’t press it if you are in a steam bath), a Q&A session started with Kilgray management.

Training opportunities and initiatives will be expanded. Some of the information about trainers is already on Kilgray’s  website. Training is seen as a great opportunity for expanding memoQ’s user base.

Suggestion to allow translators to register as memoQ users in order to reach new clients, i.e. memoQ translator marketplace. There are no plans for such a program at this point.

memoQ localized interface and manuals will be available in many more languages very soon.

One recurring request is to have a stripped-down “server” version that would allow two translators to share a resources. The answer may be to adapt the existing offering rather than developing a whole new product.

Another requested feature is the ability to allow the insertion of short translated texts to a TM without going through the procedure of opening an existing project or starting a new one. There are no plans to add such a feature for the time being.

Final words of praise and a warning from one user: Kilgray, please do not lose your personality and accessibility and become a “corporation”.

The incredible shrinking delivery times

It’s nothing new to note that delivery times have been increasingly shrinking over the past 10 years. Everyone who’s in the business knows this is how things are now, and that’s that.

But what are the actual reasons for the ever closer due dates?

Is it always the fault of the unreasonable end client who will not take no for an answer? Is it the fault of the large LSP between yourselves (well, at least ourselves) and the end client? And why is it their responsibility? Have they lost their bargaining power entirely and simply nod and agree to any conditions? Or is it because they find it hard to organize a workflow which involves translator, reviewer, DTP editor, and translator again? Or is it perhaps because they are under the desktop publishers’ thumb and have no choice but to sacrifice the translator’s time?

I ask myself these questions daily, as I incessantly negotiate for extra time with my clients, but the answer can only be a guesswork.

My first explanation is that the economy is pushing and prodding end clients to issue their material earlier than reasonably possible. It is plausible. The economy has changed, recently for the worse, but even before that, communications have become so fast that a company simply has to keep up, or else.

My second explanation is that large LSPs are backing up and are no longer in a position to negotiate with a firm hand. I don’t think this was brought on by the recent economic crisis. I believe in fact it started earlier, with the gradual merge and consolidation of the major global language service providers. As a result, the smaller LSPs who are still in business are no longer prepared to require certain (and previously perfectly reasonable) conditions for fear their clients will go somewhere else. How else can one explain the fact that negotiation of delivery dates is not as part of the larger LSPs’ routine as it is part of mine, as I often have to urge them to ask for extensions as if they hadn’t even thought about it.

My third explanation is again a platitude, and that is that speed has become far more important than quality, and everyone is happy to live with that. Speed makes languages grow more similar to one another through previously unconscionable loans and it makes details go from paramount to expendable. Speed is preventing translators (and reviewers at that) from exercising their mind and their knowledge of the source and target language; instead, it forces them to resort to words and expressions that have grown to form their source database for this or that client to achieve speed. And the tighter the deadline the more this source database is pillaged and abused.

I have no particular qualms of conscience about the way things are going and I’m not a purist or perfectionist who still clings to a romantic view of translation. I do care about my mind, though, and I know that not exercising languages, even your own, equals to forgetting them. And this simply won’t do.