About this blog Translator's Shack is a collection of links, news, reviews and opinions about translation technologies. It's edited and updated by Roberto Savelli, an English to Italian translator, project manager and company owner of Albatros Soluzioni Linguistiche, a team of English-Italian translators, which hosts and supports this blog.
The Life as a PM category, managed by Gabriella Ascari, contains topics that are less technical in nature, but which we're sure will be appreciated by owners of small translation businesses and freelancers.
Here are links to my pages on some social networks:


Highly recommended:
|
CopyFlow Gold allows to export formatted text out of InDesign or QuarkXPress documents to a computerized translation system, and then batch import the translated text back into its original page location — all while preserving the typographic formatting. CopyFlow Gold for Adobe Illustrator is also available.
According to the latest post on the developer’s blog, CopyFlow Gold for InDesign CS5 has received an update.
TMbuilder is a small tool that makes building up TM export/import files as straight-forward as possible. You can use it to batch-import several files in Excel (2003 or 2007) or tab-delimited format and build a Trados-compatible or TMX 1.4b TMX file with a couple of mouse clicks. Here are some more details about the features:
- Accepts two input formats: tab-delimited text files and MS Excel spreadsheets
– Creates output files in two file formats: Translator’s Workbench 7.x/8.x (TXT) and the Translation Memory eXchange (TMX)
– Works on multiple input files and offers a merging feature – there might be just one import file
– Allows the user to specify standard TM fields, like: source and target ISO flags, segment descriptions and author name
– Removes additional quotes often created by MS Excel when saving the file to the Text form
– Works with standard encodings: Unicode and UTF-8
– Rapid file creation: milliseconds for .txt and seconds for .xls input files
The application is free for non-commercial use and can be distributed as a standalone executable program. It requires Microsoft .NET Framework 3.5.
TMbuilder – the easiest Translation Memory export creator
The Translation Automation User Society (TAUS) offers a desktop program that can be used for searching their database. The TAUS Data Association comprises 45 organizations, including well-known companies as Intel, Dell, eBay, etc. and large language service providers such as Lionbridge and SDL.
One of the the main reasons that brought these companies together to share their large translation memories and glossaries is the improvement of existing machine translation systems. However, the TAUS database represents a very valuable source of bilingual texts in many languages and it is freely searchable (but requires registration) through a Web interface. The widget goes one step further by taking this powerful search tool to the desktop.
Let’s take a look at this “widget”:
Here I ran a simple search for the common term “taskbar”. The results include dozens of human-translated text with the term highlighted in the source (and in the target too, after the system somehow computed its translation by analyzing the words that form the searched term).
The user can make the search more specific (and I think this will vary by language combination) by selecting specific industries (e.g. hardware, software, business services), data owners (e.g. ABBYY, Adobe, Dell, etc.) and content type (instructions, marketing material, software strings, etc.)
The search is fast and accurate and it displays the data in a clear two-column layout. Users can interact with the database by reporting problems with specific segments or sentences (just click on the grey “X” to the right of the segment.)
The widget requires registration, is multi-platform and runs on the Java Runtime Environment.
TAUS Widget | TDA
PrepTags is a file preparation software designed to prepare a wide range of formats using a powerful regular expression engine. It allows to “prep” tagged files such as HTML, PHP, XML, ASP, Javascript, SQL, PO, etc. by converting them to RTF and protecting the code. Once a file is prepared, the translator can use his regular CAT tool to translate it. PrepTags-prepared files can be translated with Wordfast, Trados, Deja Vu, MemoQ, and any other tool with support for prepared RTF files (a format originally designed for Trados).
There are 3 versions:
- PrepTags Lite: Free and functional but limited to 1 file at a time and without advanced features.
- PrepTags – eBook: €15. Comes together with TransBook. Limited to 20 files at a time, but without advanced features.
- PrepTags Pro: €39. Fully functional and unlimited number of files.
The PrepTags website contains video tutorials to help will the installation and use of the the program.
From Translation Solutions Blog » PrepTags – Officially Launched!
The Search bar
The Firefox Search bar is a convenient method for accessing search sites without first having to visit the site’s home page and locating the search field.
So, instead of heading to Answers.com, finding the search field, typing the search term and pressing enter, you can stay in any page you are on, click on the Search bar down arrow to select which engine to use (if it’s not already selected), type your search and press enter. The relevant search results will be displayed immediately. It’s worth remembering that the shortcut key for placing the cursor into the Search bar is Ctrl-E or Ctrl-K.
Adding common search engines
You can also add a search engine directly from the page you are visiting if the site’s publisher has made this feature available.
In that case, the down arrow will “glow” to show that a search engine can be added. See top-left corner in the screenshot to the right (this is way too subtle for me and I always overlook this information).
In the example, two search engines are “discovered” while visiting the CNN.com website.
Adding specific search engines
The Firefox Search bar comes pre-loaded with Google, Yahoo, Amazon, eBay, Answers.com, and Creative Commons search, but it’s easy to add more by visiting popular resources such as the Firefox Add-ons page or the Mycroft Project. These pages contain very specific search engines such as the ProZ term and glossary search, IATE, etc.
So, what’s the problem with the search engine list?
After adding a dozen or so search engines for useful and fun websites, I noticed that my list started to grow much too long and that it was very impractical to use it by clicking on it and scrolling to the right engine. If you stick to about 5-10 engines, you’ll probably be fine with the standard configuration, but if you use several resources for terminology research while you are translating, you’ll soon realize how frustrating it is to get the mouse, click on the list, remember and then find the right engine for the job, go back to the keyboard, etc.
There must be a better way to accomplish this. Of course there are several extensions, add-ons and utilities that can help you get quick access to your favorite search engines. If you, like me, prefer a minimalist approach to computing and want to avoid having all those tiny utilities sitting in the system tray, eating processor cycles and creating conflicts, you may want to read on.
Adding search engines the “geeky way”
One first helpful feature provided by Firefox is the Keyword option that appears on the right-hand column of the Manage Search Engine List, accessible by clicking on the down arrow on the Search bar and by choosing Manage Search Engines…
Select any search engine in this window, then click the Edit Keyword… button. Then type the keyword you want to use for this specific engine.
In the screenshot to the right I have specified “de” for my online Italian dictionary of choice.
Once you have confirmed and closed the window, you can use the keywords for quick access to these engines, like this:
| Click on the URL bar, or, better, enter it by using the appropriate shortcut, Ctrl-L |
|
| Type the keyword for the search engine, followed by your query, for instance “dm motore”, no matter what page you’re on. Hit Enter. |
 |
| Bang! you’re taken to the results page of your query on the search engine corresponding to the keyword, no questions asked, no clicks involved. |
 |
Firefox offers yet another, perhaps not as widely known way of consulting specific search engines. These engines will not appear in the Search bar, but are still quickly accessible by using custom keywords chosen by the user.
Supposing we want to add a specific IATE search for engineering terms from English to Italian:
Olifant is a utility that can be used to maintain translation memory files. It can import (even by drag-and-drop) translation memories in the TMX, tab-delimited and WordFast formats, and it can export to TMX or WordFast.
Olifant allows, among other things, to perform the following tasks on translation memories:
- Flag and remove duplicate entries
- Create a single tri-lingual TM from two separate bi-lingual TMs
- Reverse the source and target languages of a TM
- Open a TM that contains invalid XML characters
- Remove formatting codes (e.g.
<bpt>, <ept>, etc.) from the TM segments
- Search and replace text using regular expressions
- Filter the entries based on various criteria
- Partially export the TM
- Find exact and fuzzy matches or concordances for the current TM entry
Olifant is free software distributed under the GNU Lesser General Public License.
Olifant (Candidate)
The excellent Translator’s Tools blog contains a post on TranslateCAD, a utility that can be used to translate files saved in the AutoCAD format. We receive files in this format only occasionally, but I will definitely go back to that post the next time we receive this type of request.
via Translator’s Tools | Translating text in AutoCAD® drawings
Using PDF files as the source for a translation is always challenging, especially with documents that have a non-linear text flow like brochures and presentations.
Our standard policy is to ask our clients to send the original file that was used to produce the PDF file that they want us to translate. This is usually the best option and allows us to deliver a translated document that is editable with the same program that was used to produce the original (although I do not like working with the verbose “tag soup” produced by XPress or InDesign converters and would sometimes rather convert the PDF to Word when dealing with these two translator-unfriendly formats).
Freewaregenius has published a review about PDF to Word Free. Here is a short excerpt
[…] in terms of conversion quality this is hands down the best free PDF to DOC/RTF converter that I have seen; there is simply nothing that comes close.
The service is still in private beta. The freewaregenius review contains an invite code that may allow you to join the program.
Update: Lifehacker also posted a review about PDF-to-Word. One piece of information that is added by this review is the fact that the service actually performs an OCR extraction of the source file. So the conversion to Word allows to extract text from those lousy “static” (non-editable) PDF files that contain text pages pasted as images (yes, sadly we sometimes receive that sort of file from clients). Lifehacker also offers an invite code to test the service.
Update 2: as pointed out by readers on the LifeHacker blog and on this MemoQ forum thread, PDFtoWord does not perform any OCR.
PDF to Word Free: a web service that delivers free, high quality PDF to DOC conversions | freewaregenius.com
PDF-to-Word Converter Pulls Readable Text from Scanned Images
Since the previous post generated some interest, I have decided to put the program to the test. I saved the page from http://www.apple.com/mac/ as a local HTML file and fed it to the program. Here’s a screenshot of Caterpillar’s main window:
 Caterpillar main window
Just place the source files in the folder indicated by the In path, choose an extraction path and hit Extract. The processing speed on this single, short page was very high and the resulting TXT file was created instantaneously.
Here’s what the program’s output looks like:
 Caterpillar output
As you can see, after the first couple of lines containing headers that will help the program reconstruct the file after it has been translated, the structure contains the following fields:
ID=
Type=
Source=
Target=
The translator will then have to translate all the fields preceded by “Target=” (using the CAT-tool of choice) and then reconstruct the translated HTM file by using Caterpillar’s Integration command.
Here’s what I found out during this brief test:
- The tags are completely taken out of the equation when the text is converted to TXT. This can be either good or bad, depending on the translator’s tastes and on the type and complexity of the file being processed
- The program does not assign internal/external styles to the file, so if the translator wants to use a CAT tool to translate it, the choice is between moving the cursor to the beginning of the “Target=” header after having translated each sentence, or to prep the file by assigning the “translatable” attribute to the Target sentence and by making the rest of the test untranslatable
- I noticed that if the HTML file contains diacritics (accented letters) or characters that are rendered by using Unicode in the HTML file, these become corrupted during the conversion to the translatable TXT file. This issue might or might not be addressed by the Encoding Converter option available in the program, which I did not test
- Caterpillar has an option that allows to merge the source files into one single translatable TXT file. This sounds particularly interesting for translators who like the auto-propagation feature offered by some CAT tools and for those complex projects comprising hundreds of tiny HTML files in multiple sub-folders
- Interestingly, the file types that are available for processing include (besides HTML) PHP, XML and ASP. I did not test all these formats. However, I did test the program with one of those dreaded XML files that contain embedded HTML code. Surprisingly, Caterpillar did a decent job of extracting the translatable text. On the downside, the program creates a segmentation break at each tag that is preceded and followed by translatable text, so for the following code:
<p><b>Note:</b> When searching, look for the
<img src="/images/search/plus_icon.gif"
width=9 height=9> icon to see which items are
only available to ACME <span class="hlt">Plus</span>
customers.]]></Data></Cell>
is rendered as follows:
Source=When searching, look for the
Target=When searching, look for the
ID=133
Type=text
Source=icon to see which items are only available
to ACME
Target=icon to see which items are only available
to ACME
ID=134
Type=text
Source=Plus
Target=Plus
ID=135
Type=text
Source=customers.]]>
Target=customers.]]>
In conclusion, although this tool still requires some extra work for prepping the files in order to process them with a CAT tool and it takes a rather radical approach to tags (by deleting them from the working file), it migh prove a useful addition to the utilities folder of those translators who use basic CAT tools that cannot prep HTML and tagged files, and to the advanced users who need a quick way of simplifying complex tagged files, for instance XML with embedded HTML.
However, full reliablility of Caterpillar should be tested carefully before using the tool on a large-scale project.
At 30.00 EUR, this inexpensive tool may be a valuable solution for translators who are using tools that do not have file-prepping capabilities (i.e. externalizing or separating and protecting HTML code that does not need to be translated or otherwise changed).
Has any one heard of the tool or used it for translation projects?
Caterpillar is a high-speed HTML Text Extractor and Integrator written for translators working with web sites. Process whole folders of web pages with a single click, then translate using your choice of software.
By generating a single output file containing all the text requiring translation Caterpillar provides a simple way to incorporate web page localisation into your existing translation work flow.
WordFast compatible – now you can translate web sites in the familiar environment of MS Word.
Caterpillar 1.3 – HTML Extractor and Integrator
|