The translation companies began investing in the various CAT tool programs approximately ten years ago. They have been steadily storing up growing “libraries” of source-language documents with their translations. This means that the types and quantities of text that can be processed with the help of these programs are expanding steadily. It also means that in the future there will be less need for translations done “from scratch” and more need for editing, “filling in of blanks” (translation of words and sentences not already stored in memory), adaptation of “fuzzy-match” sentences, and general “smoothing” of these canned translations culled from CAT-tool memories. There is likely also to be more need for pre- and post-editing of machine-produced translations.
In other words, free-lance translators who sub-contract to the translation service companies (and even to other types of clients, since awareness of machine translation programs and CAT tools, and insistence on their use, is increasing among corporate users of translations as well) will to a great extent need to stop being writers and to become good copy-editors, proofreaders, and data managers. Unfortunately, it is a long-standing, tried-and-true truism in the publishing industry that writers tend not to make good copy-editors and proofreaders.
Systran and MultiCorpora integrate technologies for increased translation quality and volume
Gatineau, QC, Canada – January 8, 2009: MultiCorpora announced today that it has signed a technology and OEM agreement with Systran to add machine translation functionality to the MultiTrans client-server application.
Felix version 1.4.4 released
Published by Ryan Ginstrom on Jan 17th, 2009 in Felix, release with No Comments
I’ve just made a quick release of version 1.4.4 of Felix in order to fix two bugs: one had to do with glossary lookup, and the other had to do with a GUI bug in the Memory Manager dialog.
Get the latest version here.
Since the previous post generated some interest, I have decided to put the program to the test. I saved the page from http://www.apple.com/mac/ as a local HTML file and fed it to the program. Here’s a screenshot of Caterpillar’s main window:
Just place the source files in the folder indicated by the In path, choose an extraction path and hit Extract. The processing speed on this single, short page was very high and the resulting TXT file was created instantaneously.
Here’s what the program’s output looks like:
As you can see, after the first couple of lines containing headers that will help the program reconstruct the file after it has been translated, the structure contains the following fields:
The translator will then have to translate all the fields preceded by “Target=” (using the CAT-tool of choice) and then reconstruct the translated HTM file by using Caterpillar’s Integration command.
Here’s what I found out during this brief test:
- The tags are completely taken out of the equation when the text is converted to TXT. This can be either good or bad, depending on the translator’s tastes and on the type and complexity of the file being processed
- The program does not assign internal/external styles to the file, so if the translator wants to use a CAT tool to translate it, the choice is between moving the cursor to the beginning of the “Target=” header after having translated each sentence, or to prep the file by assigning the “translatable” attribute to the Target sentence and by making the rest of the test untranslatable
- I noticed that if the HTML file contains diacritics (accented letters) or characters that are rendered by using Unicode in the HTML file, these become corrupted during the conversion to the translatable TXT file. This issue might or might not be addressed by the Encoding Converter option available in the program, which I did not test
- Caterpillar has an option that allows to merge the source files into one single translatable TXT file. This sounds particularly interesting for translators who like the auto-propagation feature offered by some CAT tools and for those complex projects comprising hundreds of tiny HTML files in multiple sub-folders
- Interestingly, the file types that are available for processing include (besides HTML) PHP, XML and ASP. I did not test all these formats. However, I did test the program with one of those dreaded XML files that contain embedded HTML code. Surprisingly, Caterpillar did a decent job of extracting the translatable text. On the downside, the program creates a segmentation break at each tag that is preceded and followed by translatable text, so for the following code:
<p><b>Note:</b> When searching, look for the <img src="/images/search/plus_icon.gif" width=9 height=9> icon to see which items are only available to ACME <span class="hlt">Plus</span> customers.]]></Data></Cell>
is rendered as follows:
Source=When searching, look for the Target=When searching, look for the ID=133 Type=text Source=icon to see which items are only available to ACME Target=icon to see which items are only available to ACME ID=134 Type=text Source=Plus Target=Plus ID=135 Type=text Source=customers.]]> Target=customers.]]>
In conclusion, although this tool still requires some extra work for prepping the files in order to process them with a CAT tool and it takes a rather radical approach to tags (by deleting them from the working file), it migh prove a useful addition to the utilities folder of those translators who use basic CAT tools that cannot prep HTML and tagged files, and to the advanced users who need a quick way of simplifying complex tagged files, for instance XML with embedded HTML.
However, full reliablility of Caterpillar should be tested carefully before using the tool on a large-scale project.
At 30.00 EUR, this inexpensive tool may be a valuable solution for translators who are using tools that do not have file-prepping capabilities (i.e. externalizing or separating and protecting HTML code that does not need to be translated or otherwise changed).
Has any one heard of the tool or used it for translation projects?
Caterpillar is a high-speed HTML Text Extractor and Integrator written for translators working with web sites. Process whole folders of web pages with a single click, then translate using your choice of software.
By generating a single output file containing all the text requiring translation Caterpillar provides a simple way to incorporate web page localisation into your existing translation work flow.
WordFast compatible – now you can translate web sites in the familiar environment of MS Word.
Globalizer.NET provides a complete solution for localizing software built using Microsoft’s .NET development platform.
Here some of the features, straight from the developer’s information page:
- Globalizer.NET can be used to localize ASP.NET and Windows Forms applications and components written in Visual Studio 2005 and 2008 using C#, VB.NET and C++.
- Globalizer.NET can localize the Windows Installer (MSI) files generated by Visual Studio and other tools, into any language
- Automatically scan Visual Studio Projects and Solutions for resources to translate
- Globalizer.NET can generate the language specific resource (resx) files for your application and automatically add these into your Visual Studio projects.
- Build the satellite assemblies (that contain language specific resources) for your application directly using Globalizer.NET. Developers can use this to build localized versions of an application without having to modify the Visual Studio project. Translators can use this mechanism to build and test localized versions of your application without them requiring access to your source code or needing to purchase and use Visual Studio.
- Translators can install and use the free Translators Edition of Globalizer.NET to translate your applications at no additional cost to you.
- Test your localized Application directly from Globalizer.NET using a selected locale without having to use the Control Panel to set the locale
- Allows Translators to preview application forms and controls directly from within Globalizer.NET.
- Supports importing and exporting industry standard Translation Memory eXchange (TMX) format files to allow interoperability with other Computer Aided Translation (CAT) tools
- Automatically translate duplicate strings
- Filter and sort the displayed translations based on a variety of criteria
Here is some pricing information:
|Basic License (allows up to 1000 translations per workspace)||$120|
|Standard License (allows up to 5000 translations per workspace)||$240|
|Pro License (unlimited translations)||$480|
The Translate Toolkit website reports the launch of Virtaal, a “graphical translation tool”.
Here is a list of features:
Support for many localization file formats:
Gettext (.po and .mo)
WordFast TM (.txt)
Qt Linguist (.ts)
Qt Phrase Book (.qph)
Ideal for beginners:
Simple and intuitive layout
Highlighting of XML and escape characters
Displays comments from programmers and previous translators
Displays context (like msgctxt in PO)
Localisation guide available from the Help menu
Fast and easy navigation within the file
Auto-correction based on OpenOffice.org’s auto-correction data files
Auto-completion modelled after OpenOffice.org
Automatic sensing of the initial cursor position
Copying original string to target string taking your language’s punctuation rules into account
Easily find your work by moving between the units that are untranslated or fuzzy
Searching with regular expressions and Unicode normalisation
Spell checker for translation and original text (might not work on all Windows platforms)
Debug compiled application translations by opening .mo files directly
Kilgray announced version 3.0 of MemoQ. In short, here are some of the new features:
- Full support for XLIFF
- New spell checker
- New term base
- Filtering locked/unlocked segments.
- New XML filter
- Support for proofreading
- Online document management add-in
- Enterprise license management
- New terminology moderation interface
- Server TM hit limit (you can specify the maximum number of TM hits retrieved from a server)
- Server admin enhancements
* Tested to work with OpenOffice.org 2.0.2
* Added keyboard modifier selection (Alt, Ctrl, Shift).
* Translation in text tables.
* Fixed bug in glossary loading.
* Source/Target locale now checking against OOo locales.
* Glossary and TM items checked before loading
* Fixed error 1971835 “TMX export fails if TM has empty paragraphs”
Blogoscoped has a new post about “Google Translation Center”. Here is a short excerpt:
According to the Google explanations on the frontpage and their product overview page, we can see this is meant to be a translation service which offers both volunteers and professional translators… and I suppose at least the professionals will want to get paid.
More than an all-in-one stop for paid translations as some of the competing services in this field, the Google Translation Center looks like it aims to be a marketplace coordinator and tool provider.
This is a highly recommended read that presents some interesting hypotheses about future developments in translation services.