a tweet I “intercepted” today led me to this potentially interesting interesting tool: demaquina Select.
from the website:
Select is a sidekick tool for preprocessing and boost works on CATs with support for XML Localization Interchange File Format (XLIFF) and Translation Memory Interchange (TMX).
Select offers an unequaled sub-sentence free segmentation ability which together with its own chunk-based Dual-memory System?, and Sub-sentence Case Aware Propagation delivers an ultimate terminology reusability.
With Select, each technical term, common expression or single word translation is typed ONCE in life!
From your elected CAT, export your work to a file with XLIFF format, create a Select Project and experience an incredible time saving with its intelligent sub-sentence term/chunk resuse and case-aware propagation… and many other time-saving driven features. Thereafter import the XLIFF file back into your CAT and sharpen your work, from Sub-sentence Zero Inconsistency to Perfection… Free to Care About Wording and Semantics…
Eradicate inconsistency at sub-sentence level from existing memories and term bases!
Export your elected CAT’s memories as TMX, open them with Select and see how easy it is to select segments with specific terms or expressions and use the Replace process with Sub-sentence Case-aware Propagation to eradicate inconsistency through all them at once!
Select is intended to:
Translate given XLIFF files’ content
If your elected or current project required CAT has support for XLIFF (XML Localization Interchange File Format), you can export your work to a XLIFF file and create a Select Project from it. Taking advantage of Select’s Sub-sentence Case Aware Propagation, you can deal with language common expressions and project specific terminology like software UI elements like never before! Then import the XLIFF file content back to your CAT and resume your work from Zero inconsistency, Free to Care About Wording and Semantics!…
Edit Translation Memories (particularly spot and eradicate inconsistency at sub sentence level)
If your elected CAT has support for TMX (Translation Memory Interchange) you can export any of your translation memories to a TMX file and create a Select Project from it. Then you can use the Replace Process together with Sub-sentence Case-aware Propagation options to eradicate inconsistency and securely change wrong terminology!
It is also possible to translate using a TMX file as interchange file,
creating a "temporary" memory with the content of your source files, with both Source and Target units of each segment filled with the source text. (See bellow, how its TMX exported file should look like to be imported into a Select Project.)
Sub-segment leveraging is certainly one area in which modern TEnTs have a lot of potential for improvement. If implemented correctly, it can save time and, most importantly, facilitate consistency without requiring a lot of time spent on creating and tweaking glossaries. I intend to take a close look at this program after the holidays.
While attending the “Train the Trainer” course in Pécz last week, I was pleased to hear that the developers are still progressing at a very fast pace, adding interesting features to all new releases of our favorite translation environment tool. Below is a brief outline of what I found out. Be sure to attend the Introducing memoQ 4.5 webinar that is offered tomorrow from 3:30 PM to 4:30 PM CEST to double-check the information I am giving here.
TM Performance improvements
Kilgray showed me a real-life comparison between the current version (4.2) and the new one (4.5) by running an analysis of a particularly complex file (containing many tags and many segments with numbers) against a TM (I do not recall if it was a server or a local TM, but it should not make any significant difference for the purpose of this comparison). Version 4.5 was visibly much faster than 4.2 in this test. I would say 4, or perhaps more, times faster. This is true both for analysis and pretranslation.
In version 4.5, the “Alignment” tab will be gone, to be replaced by a brand-new concept called LiveDocs. This is a collection of text files, or a “corpus”, which can contain files in different formats (monolingual files paired by a specific function, bilingual files and monolingual documents such ad PDFs).
When added to the corpus, all these files are immediately available for fuzzy searches. There is no need to align them. Add multiple files (you are no longer limited to one pair of files as in the alignment feature of the previous versions) from your reference folder, and you will see them appear in the concordance results. If you have the source and the target language version of the same document, you have the chance to highlight terms in the source and in the target to add them to your term base. Apparently, the new alignment algorithm also uses the term base entries and inline tags as reference points for alignment, elevating the score of the sentence containing the term pair or tags for a more precise alignment.
More importantly, when the user makes an alignment correction, the pair of strings is added as a reference point for the ongoing alignment function that works in the background, using the reference point to fine-tune and improve the alignment. So if you spot any misalignment in the corpus while doing a search, you will be able to correct it on the fly, and your correction will immediately be used for optimizing the index. The index is continuously updated in the background, but this feature can be temporarily stopped if it interferes with other resource-hungry applications.
By inserting monolingual files into the corpus, even if no translation is available for them, you will be able to obtain precious suggestions that include more context. Something that a translation memory alone cannot do, since it’s limited to one segment at a time.
Although not ground-breaking (if I’m not mistaken it’s offered, among others, by Multicorpora’s MultiTrans, but at a much, much higher price) this feature seems to have been implemented very nicely. Translators who have dozens of reference files will be able to access them instantaneously from the translation environment. One important concept that may not be obvious from the description above is the following: you will no longer waste time aligning sentences that you probably will never need to use. Instead, you will access useful linguistic data right when you need them.
This will be a very welcome addition for users, like us, who have a memoQ Server. Up to now, while creating a new project your choice is limited to online projects (with files residing on the server, which implies that the translator has to be online all the time and is unable to create “views” from the documents), and to handoffs, which are local files with references to the server’s resources.
Version 4.5 introduces the concept of “desktop documents”. In practice, we are still dealing with an online project, but the translator will work on local documents and will benefit from superior manipulation options, such as the creation of views. However, the project will still be synchronized on the server, so the project manager will not have to give up precious progress information.
This mode has a separate window now. What I understood about this feature is that you will be able to translate a project and at the same time manage it (or another project) without having to go through the annoying step of closing your working project before switching to PM mode. Again, a great time-saver for our team.
- It’s now easier to create new users thanks to links available in several places of the interface
- An InDesign CS5 filter will be available in the upcoming months
- Reverse lookup in TM will be possible (only for reference TMs)
- Slider to choose between “better recall” or “faster lookup” from the TM
- Backup feature for local projects to facilitate moving from one PC to another
Here is some more information I gathered on the memoQ mailing list:
- LiveAlign is the name for the “on the fly” alignment feature
- ActiveTM is the one which gives matches from all the documents in the corpus when you’re translating, indeed leveraging your previous assets
- [...] monolingual indexing à la dtSearch is the Library.
- Along with ezAttach, which allows the addition of binary files (pictures, music, anything)
… this dynamic quad team constitutes the LiveDocs all-in-one feature. Get the most out of your resources, whether monolingual or bilingual!
Thanks to Livedocs, the time when reference documents were simply ignored by the translator because they were too long to read or only scarce parts were relevant is almost over!
Moravia are the first adopters of the TM Repository.
The main problem with the TM repository so far was: “what is it, and what is it intended for?”
According to statistics, 80% of new software projects fail if there is no clear vision about the project. The TM Repository project has been put on hold for a certain period, but the company has resumed development now that there is such a vision.
TMs can be scattered in hundreds of tiny files.
Or, in a “big mama” type of approach, the TMs are managed by using filters and metadata. A project TM is a TM that only contains the segments (e.g. from a “big mama”) that are relevant to a specific project.
Including the reviewing process into the technology is essential. Some reviewers will refuse to use a translation environment to review a text (this has been solved by memoQ’s RTF columnar export).
TM contents need regular cleanup. What needs to be deleted/fixed: outdated entries, bad translations, “other” garbage.
In technical documentation, there is a lot of terminology that refers to concepts that get deprecated.
Cross-tool TMX transfer does not give good enough leverage. Attributes specified in one tool will often not be transferred to a different tool.
Context information: how can we transfer the context information (saved as hashes in Trados Studio TMs and as clear text in memoQ)? In short, we can’t. The hashing algorithm used in Idiom is patented. In other cases, it’s kept secret.
Features and workflows
Single database with TMX import/export. At its current stage TM Repository can only receive TMX files.
It accepts any sort of metadata (client, project number, dates, etc.), no matter which tool was used to create the TMX.
The TM is extracted from TM Repository and can be modified in any translation tool. The TM then needs to get back into the TM Repository, which offers complex features for comparing the new TM to the current contents of the repository.
You can import TMs containing one language pair and merge them into a multi-pair TM, then split it again in sub-pairs.
Version control and rollbacks: entry history is included for every translation unit. If you make a mistake, you can always roll back to the previous version
TM Repository is tool-agnostic. Every attribute is converted to an attribute system that is specified by the user for the TM Repository. Preconfigured mapping templates are available.
Full-text (not fuzzy!) concordance search
Search and replace in the repository
- A beta version is available now
- A connector to memoQ Server is in the works. It will make import/exports from/to memoQ transparent through an integration API.
Back-end database application with no direct connection to translation management systems (for now). Requires a dedicated server, as it’s resource-intensive.
In its current state, TM Repository will not allow you to upload a translatable project, run a fuzzy-match analysis, and extract the relevant translation units from the repository, regardless of the metadata. From my personal point of view this is a real show-stopper. I cannot see real potential in this product for a small LSP if it does not offer this kind of functionality.
TMbuilder is a small tool that makes building up TM export/import files as straight-forward as possible. You can use it to batch-import several files in Excel (2003 or 2007) or tab-delimited format and build a Trados-compatible or TMX 1.4b TMX file with a couple of mouse clicks. Here are some more details about the features:
- Accepts two input formats: tab-delimited text files and MS Excel spreadsheets
– Creates output files in two file formats: Translator’s Workbench 7.x/8.x (TXT) and the Translation Memory eXchange (TMX)
– Works on multiple input files and offers a merging feature – there might be just one import file
– Allows the user to specify standard TM fields, like: source and target ISO flags, segment descriptions and author name
– Removes additional quotes often created by MS Excel when saving the file to the Text form
– Works with standard encodings: Unicode and UTF-8
– Rapid file creation: milliseconds for .txt and seconds for .xls input files
The application is free for non-commercial use and can be distributed as a standalone executable program. It requires Microsoft .NET Framework 3.5.
TMbuilder – the easiest Translation Memory export creator
Yesterday I received the latest edition of the LogiTerm newsletter. You can download it here. There are some interesting announcements:
- Agreement with SYSTRAN:
Terminotix has entered into an agreement with SYSTRAN to add machine translation solutions to the Terminotix product line.
- YouAlign completely free:
YouAlign, the text alignment website launched by Terminotix in August 2009, was supposed to be free for a limited time only, but is now completely free. YouAlign lets you quickly and easily create HTML bitext and TMX translation memory files from pairs of input files. Bitext and translation memory files generated by YouAlign can be downloaded for use with bilingual full-text search engines and translation memory systems. No software to install — everything is done through your web browser.
- SynchroTerm 2010 released:
The 2010 release of SynchroTerm, the powerful bilingual term extraction program, is now available. Enhancements include optimized memory use for handling larger files; support for Greek, Dutch, Hungarian, Norwegian, Polish and Turkish;
First there’s 90% of in-country reviews are a waste of time, from the Medical Translation Blog where, after a somehow provocative title, the author explains the difference between theoretical, ideal situations and the hard facts of in-country reviews, which are often marred by the following problems:
- Lack of information sharing (e.g., no reference materials)
- Lack of understanding regarding brand
- Review schedules that are “black holes”
- Clarity of review changes is lacking (ever try reading a French doctor’s handwriting?)
- Mechanics fail (file exchanges don’t work, changes are entered inconsistently)
- Quality of review changes (linguistic, technical errors are introduced)
Then there’s Quality translation dictates a collaborative effort, from the Translation and Software Localization Blog, which can be considered as some sort of retort to the previous post. The author adopts concepts from control theory to explain how in-country review is in fact an essential step of the translation process.
I think that the two articles complement each other and really support the idea that quality control, when done properly, can make a huge difference for the final quality of any translated or localized product. In conclusion, two interesting reads.
Yesterday Alchemy announced PUBLISHER 2.0, its advanced translation memory solution. The product contains three modules:
- Analysis Expert: allows to re-use previously translated content by analyzing different types of translation memory formats (Catalyst TTK; Wordfast, SDL Trados, SDL TM Server, SDL Idiom TM, and TMX)
- Translate Expert: matches previously translated content to new content, in order to (guess what?) reduce the number of words being sent to translators
- Clean Up Expert: creates the localized version of the translated content and updates the translation memories
PUBLISHER 2.0 supports several source formats, among which FrameMaker, Word, XPress, InDesign, several Help authoring systems and Web and “tagged” formats (HTML, XML, PHP, JSP, etc.)
This is one of the major announcements from Alchemy since its acquisition by Translations.com last spring.
For the full announcement text and trial download:
Alchemy Publisher 2.0