About this blog

Translator's Shack is a collection of links, news, reviews and opinions about translation technologies. It's edited and updated by Roberto Savelli, an English to Italian translator, project manager and company owner of Albatros Soluzioni Linguistiche, a team of English-Italian translators, which hosts and supports this blog.


The Project Mangement category, managed by Gabriella Ascari, contains topics that are less technical in nature, but which we're sure will be appreciated by owners of small translation businesses and freelancers.


Here are links to my pages on some social networks:


Highly recommended:

TAUS Widget

The Translation Automation User Society (TAUS) offers a desktop program that can be used for searching their database. The TAUS Data Association comprises 45 organizations, including well-known companies as Intel, Dell, eBay, etc. and large language service providers such as Lionbridge and SDL.

One of the the main reasons that brought these companies together to share their large translation memories and glossaries is the improvement of existing machine translation systems. However, the TAUS database represents a very valuable source of bilingual texts in many languages and it is freely searchable (but requires registration) through a Web interface. The widget goes one step further by taking this powerful search tool to the desktop.

Let’s take a look at this “widget”:

image

Here I ran a simple search for the common term “taskbar”. The results include dozens of human-translated text with the term highlighted in the source (and in the target too, after the system somehow computed its translation by analyzing the words that form the searched term).

The user can make the search more specific (and I think this will vary by language combination) by selecting specific industries (e.g. hardware, software, business services), data owners (e.g. ABBYY, Adobe, Dell, etc.) and content type (instructions, marketing material, software strings, etc.)

The search is fast and accurate and it displays the data in a clear two-column layout. Users can interact with the database by reporting problems with specific segments or sentences (just click on the grey “X” to the right of the segment.)

The widget requires registration, is multi-platform and runs on the Java Runtime Environment.

TAUS Widget | TDA

PrepTags (file preparation utility for tagged formats) Launched

preptags PrepTags is a file preparation software designed to prepare a wide range of formats using a powerful regular expression engine. It allows to “prep” tagged files such as HTML, PHP, XML, ASP, Javascript, SQL, PO, etc. by converting them to RTF and protecting the code. Once a file is prepared, the translator can use his regular CAT tool to translate it. PrepTags-prepared files can be translated with Wordfast, Trados, Deja Vu, MemoQ, and any other tool with support for prepared RTF files (a format originally designed for Trados).

There are 3 versions:

  • PrepTags Lite: Free and functional but limited to 1 file at a time and without advanced features.
  • PrepTags – eBook: €15. Comes together with TransBook. Limited to 20 files at a time, but without advanced features.
  • PrepTags Pro: €39. Fully functional and unlimited number of files.

The PrepTags website contains video tutorials to help will the installation and use of the the program.

From Translation Solutions Blog » PrepTags – Officially Launched!

Creating Firefox smart keywords for quick access to frequently-used translation glossaries, dictionaries, resources, etc.

The Search bar

imageThe Firefox Search bar is a convenient method for accessing search sites without first having to visit the site’s home page and locating the search field.

So, instead of heading to Answers.com, finding the search field, typing the search term and pressing enter, you can stay in any page you are on, click on the Search bar down arrow to select which engine to use (if it’s not already selected), type your search and press enter. The relevant search results will be displayed immediately. It’s worth remembering that the shortcut key for placing the cursor into the Search bar is Ctrl-E or Ctrl-K.

Adding common search engines

imageYou can also add a search engine directly from the page you are visiting if the site’s publisher has made this feature available.

In that case, the down arrow will “glow” to show that a search engine can be added. See top-left corner in the screenshot to the right (this is way too subtle for me and I always overlook this information).

In the example, two search engines are “discovered” while visiting the CNN.com website.

Adding specific search engines

imageThe Firefox Search bar comes pre-loaded with Google, Yahoo, Amazon, eBay, Answers.com, and Creative Commons search, but it’s easy to add more by visiting popular resources such as the Firefox Add-ons page or the Mycroft Project. These pages contain very specific search engines such as the ProZ term and glossary search, IATE, etc.

So, what’s the problem with the search engine list?

After adding a dozen or so search engines for useful and fun websites, I noticed that my list started to grow much too long and that it was very impractical to use it by clicking on it and scrolling to the right engine. If you stick to about 5-10 engines, you’ll probably be fine with the standard configuration, but if you use several resources for terminology research while you are translating, you’ll soon realize how frustrating it is to get the mouse, click on the list, remember and then find the right engine for the job, go back to the keyboard, etc.

There must be a better way to accomplish this. Of course there are several extensions, add-ons and utilities that can help you get quick access to your favorite search engines. If you, like me, prefer a minimalist approach to computing and want to avoid having all those tiny utilities sitting in the system tray, eating processor cycles and creating conflicts, you may want to read on.

Adding search engines the “geeky way”

image One first helpful feature provided by Firefox is the Keyword option that appears on the right-hand column of the Manage Search Engine List, accessible by clicking on the down arrow on the Search bar and by choosing Manage Search Engines…

Select any search engine in this window, then click the Edit Keyword… button. Then type the keyword you want to use for this specific engine.

In the screenshot to the right I have specified “de” for my online Italian dictionary of choice.

Once you have confirmed and closed the window, you can use the keywords for quick access to these engines, like this:

Click on the URL bar, or, better, enter it by using the appropriate shortcut, Ctrl-L
Type the keyword for the search engine, followed by your query, for instance “dm motore”, no matter what page you’re on. Hit Enter. image
Bang! you’re taken to the results page of your query on the search engine corresponding to the keyword, no questions asked, no clicks involved. image

Firefox offers yet another, perhaps not as widely known way of consulting specific search engines. These engines will not appear in the Search bar, but are still quickly accessible by using custom keywords chosen by the user.

Supposing we want to add a specific IATE search for engineering terms from English to Italian:

First, build a sample search by going to the IATE page and by setting your specific options. Click on the screenshot to the right to see how I set the options for my specific purpose. You can change the language combinations and sectors to your own preference.

image
Once all the desired options are in place, right-click on the “Search term” field and choose the option Add a Keyword for this Search… image
In the window that appears, type a descriptive name for this search in the “Name” field, and an easy-to-remember keyword in the Keyword field. Also, choose where you want to keep the relevant bookmark in the “Create in” field. Perhaps it’s a good idea to keep all the keyword searches in a separate folder in the bookmark structure. Press “Add” to confirm your choice. image
Time to test the search. Go to the URL bar (Ctrl-L), type the keyword (in this case “iatemec”), followed by the term. Then press enter. image
Bang! The list of results, relevant to the options (language combination, domain) that you have specified during the one-time creation of the search keyword.
This works particularly well for websites that insist you choose a plethora of options to narrow down your results every time you get to the initial search mask.
image

Olifant Candidate Release 22 available

imageOlifant is a utility that can be used to maintain translation memory files. It can import (even by drag-and-drop) translation memories in the TMX, tab-delimited and WordFast formats, and it can export to TMX or WordFast.

Olifant allows, among other things, to perform the following tasks on translation memories:

  • Flag and remove duplicate entries
  • Create a single tri-lingual TM from two separate bi-lingual TMs
  • Reverse the source and target languages of a TM
  • Open a TM that contains invalid XML characters
  • Remove formatting codes (e.g. <bpt>, <ept>, etc.) from the TM segments
  • Search and replace text using regular expressions
  • Filter the entries based on various criteria
  • Partially export the TM
  • Find exact and fuzzy matches or concordances for the current TM entry

Olifant is free software distributed under the GNU Lesser General Public License.

Olifant (Candidate)

Translating text in AutoCAD drawings

The excellent Translator’s Tools blog  contains a post on TranslateCAD, a utility that can be used to translate files saved in the AutoCAD format. We receive files in this format only occasionally, but I will definitely go back to that post the next time we receive this type of request.

via Translator’s Tools | Translating text in AutoCAD® drawings

High-quality, free PDF to Word conversion

i_index Using PDF files as the source for a translation is always challenging, especially with documents that have a non-linear text flow like brochures and presentations.

Our standard policy is to ask our clients to send the original file that was used to produce the PDF file that they want us to translate. This is usually the best option and allows us to deliver a translated document that is editable with the same program that was used to produce the original (although I do not like working with the verbose “tag soup” produced by XPress or InDesign converters and would sometimes rather convert the PDF to Word when dealing with these two translator-unfriendly formats).

Freewaregenius has published a review about PDF to Word Free. Here is a short excerpt

[…] in terms of conversion quality this is hands down the best free PDF to DOC/RTF converter that I have seen; there is simply nothing that comes close.

The service is still in private beta. The freewaregenius review contains an invite code that may allow you to join the program.

Update: Lifehacker also posted a review about PDF-to-Word. One piece of information that is added by this review is the fact that the service actually performs an OCR extraction of the source file. So the conversion to Word allows to extract text from those lousy “static” (non-editable) PDF files that contain text pages pasted as images (yes, sadly we sometimes receive that sort of file from clients). Lifehacker also offers an invite code to test the service.

Update 2: as pointed out by readers on the LifeHacker blog and on this MemoQ forum thread, PDFtoWord does not perform any OCR.

PDF to Word Free: a web service that delivers free, high quality PDF to DOC conversions | freewaregenius.com
PDF-to-Word Converter Pulls Readable Text from Scanned Images

Caterpillar 1.3 tested

Since the previous post generated some interest, I have decided to put the program to the test. I saved the page from http://www.apple.com/mac/ as a local HTML file and fed it to the program. Here’s a screenshot of Caterpillar’s main window:

Caterpillar main window

Caterpillar main window

Just place the source files in the folder indicated by the In path, choose an extraction path and hit Extract. The processing speed on this single, short page was very high and the resulting TXT file was created instantaneously.

Here’s what the program’s output looks like:

Caterpillar output

Caterpillar output

As you can see, after the first couple of lines containing headers that will help the program reconstruct the file after it has been translated, the structure contains the following fields:

ID=

Type=

Source=

Target=

The translator will then have to translate all the fields preceded by “Target=” (using the CAT-tool of choice) and then reconstruct the translated HTM file by using Caterpillar’s Integration command.

Here’s what I found out during this brief test:

  • The tags are completely taken out of the equation when the text is converted to TXT. This can be either good or bad, depending on the translator’s tastes and on the type and complexity of the file being processed
  • The program does not assign internal/external styles to the file, so if the translator wants to use a CAT tool to translate it, the choice is between moving the cursor to the beginning of the “Target=” header after having translated each sentence, or to prep the file by assigning the “translatable” attribute to the Target sentence and by making the rest of the test untranslatable
  • I noticed that if the HTML file contains diacritics (accented letters) or characters that are rendered by using Unicode in the HTML file, these become corrupted during the conversion to the translatable TXT file. This issue might or might not be addressed by the Encoding Converter option available in the program, which I did not test
  • Caterpillar has an option that allows to merge the source files into one single translatable TXT file. This sounds particularly interesting for translators who like the auto-propagation feature offered by some CAT tools and for those complex projects comprising hundreds of tiny HTML files in multiple sub-folders
  • Interestingly, the file types that are available for processing include (besides HTML) PHP, XML and ASP. I did not test all these formats. However, I did test the program with one of those dreaded XML files that contain embedded HTML code. Surprisingly, Caterpillar did a decent job of extracting the translatable text. On the downside, the program creates a segmentation break at each tag that is preceded and followed by translatable text, so for the following code:
    <p><b>Note:</b> When searching, look for the
    <img src="/images/search/plus_icon.gif"
    width=9 height=9> icon to see which items are
    only available to ACME <span class="hlt">Plus</span>
    customers.]]></Data></Cell>

    is rendered as follows:

    Source=When searching, look for the
    Target=When searching, look for the
    ID=133
    Type=text
    Source=icon to see which items are only available
    to ACME
    Target=icon to see which items are only available
    to ACME
    ID=134
    Type=text
    Source=Plus
    Target=Plus
    ID=135
    Type=text
    Source=customers.]]>
    Target=customers.]]>

In conclusion, although this tool still requires some extra work for prepping the files in order to process them with a CAT tool and it takes a rather radical approach to tags (by deleting them from the working file), it migh prove a useful addition to the utilities folder of those translators who use basic CAT tools that cannot prep HTML and tagged files, and to the advanced users who need a quick way of simplifying complex tagged files, for instance XML with embedded HTML.

However, full reliablility of Caterpillar should be tested carefully before using the tool on a large-scale project.

Caterpillar 1.3: Wordfast-compatible HTML prepping tool

At 30.00 EUR, this inexpensive tool may be a valuable solution for translators who are using tools that do not have file-prepping capabilities (i.e. externalizing or separating and protecting HTML code that does not need to be translated or otherwise changed).

Has any one heard of the tool or used it for translation projects?

Caterpillar is a high-speed HTML Text Extractor and Integrator written for translators working with web sites. Process whole folders of web pages with a single click, then translate using your choice of software.

By generating a single output file containing all the text requiring translation Caterpillar provides a simple way to incorporate web page localisation into your existing translation work flow.

WordFast compatible – now you can translate web sites in the familiar environment of MS Word.

Caterpillar 1.3 – HTML Extractor and Integrator