About this blog

Translator's Shack is a collection of links, news, reviews and opinions about translation technologies. It's edited and updated by Roberto Savelli, an English to Italian translator, project manager and company owner of Albatros Soluzioni Linguistiche, a team of English-Italian translators, which hosts and supports this blog.

The Life as a PM category, managed by Gabriella Ascari, contains topics that are less technical in nature, but which we're sure will be appreciated by owners of small translation businesses and freelancers.

Here are links to my pages on some social networks:

Highly recommended:


demaquina Select: sub-sentence segmentation, propagation etc. on TMX, XLIFF [@joaoalb]

a tweet I “intercepted” today led me to this potentially interesting interesting tool: demaquina Select.

from the website:

Select is a sidekick tool for preprocessing and boost works on CATs with support for XML Localization Interchange File Format (XLIFF) and Translation Memory Interchange (TMX).

Select offers an unequaled sub-sentence free segmentation ability which together with its own chunk-based Dual-memory System?, and Sub-sentence Case Aware Propagation delivers an ultimate terminology reusability.

With Select, each technical term, common expression or single word translation is typed ONCE in life!

From your elected CAT, export your work to a file with XLIFF format, create a Select Project and experience an incredible time saving with its intelligent sub-sentence term/chunk resuse and case-aware propagation… and many other time-saving driven features. Thereafter import the XLIFF file back into your CAT and sharpen your work, from Sub-sentence Zero Inconsistency to Perfection… Free to Care About Wording and Semantics…

Eradicate inconsistency at sub-sentence level from existing memories and term bases!

Export your elected CAT’s memories as TMX, open them with Select and see how easy it is to select segments with specific terms or expressions and use the Replace process with Sub-sentence Case-aware Propagation to eradicate inconsistency through all them at once!

 

Select is intended to:

Translate given XLIFF files’ content

If your elected or current project required CAT has support for XLIFF (XML Localization Interchange File Format), you can export your work to a XLIFF file and create a Select Project from it. Taking advantage of Select’s Sub-sentence Case Aware Propagation, you can deal with language common expressions and project specific terminology like software UI elements like never before! Then import the XLIFF file content back to your CAT and resume your work from Zero inconsistency, Free to Care About Wording and Semantics!…

Edit Translation Memories (particularly spot and eradicate inconsistency at sub sentence level)

If your elected CAT has support for TMX (Translation Memory Interchange) you can export any of your translation memories to a TMX file and create a Select Project from it. Then you can use the Replace Process together with Sub-sentence Case-aware Propagation options to eradicate inconsistency and securely change wrong terminology!

It is also possible to translate using a TMX file as interchange file,

creating a "temporary" memory with the content of your source files, with both Source and Target units of each segment filled with the source text. (See bellow, how its TMX exported file should look like to be imported into a Select Project.)

Sub-segment leveraging is certainly one area in which modern TEnTs have a lot of potential for improvement. If implemented correctly, it can save time and, most importantly, facilitate consistency without requiring a lot of time spent on creating and tweaking glossaries. I intend to take a close look at this program after the holidays.

Free PDFUnlock! web service allows to remove limitations from PDF files [@PDFUnlock]

Reference material plays a very important part in most translation projects. We often receive reference files from our clients, and sometimes we have to find them ourselves through web searches or by browsing the client’s website.

The management and usage of reference files is one aspect that has been introduced in memoQ’s LiveDocs feature, which allows to create searchable corpora of monolingual source and target files. So it’s finally time to put all those reference PDFs to good use! But wait, there’s a catch…

Very often, publishers put locks on PDF files for various reasons, e.g. intellectual property protection, forced consistency by preventing unwanted changes, etc. Here is an example of the possible locks that can be applied to a PDF file (in this case the file is completely unlocked):

image

Today we needed to unlock a few PDF files in order to use them in LiveDocs. While looking for a possible solution, I came across the PDFUnlock! web service. It’s very simple to use: you upload a locked PDF file and you immediately receive a link to download the unlocked file. Here are some features from the site’s description:

PDF files can be secured with restrictions that prevent you from for example copying text from them or editing, printing, merging or splitting them. PDFUnlock! can remove these restrictions (a.k.a “owner password”).

If a password is required to open the uploaded file, you will be asked to enter it (a.k.a “user password”). PDFUnlock! cannot, however, recover lost or unknown user passwords.

A PDF file can also be subject to non-standard encryption, such as DRM. PDFUnlock! does not remove such.

There is a further limitation: the maximum file size is 5 MB. And, of course, the rule of thumb that applies to all free, unencrypted, unprotected web services: do not send anything confidential for conversion.

PDFUnlock!

Okapi Framework (components and applications for localizing and translating documentation and software) Milestone 9

A new release of the Okapi Tools is available.

Also, there is now a wiki for Okapi’s help and documentation: http://www.opentag.com/okapi/wiki

Changes Log – Sep-30-2010

Download page for latest stable release: http://okapi.opentag.com/downloads.html

Changes from M8 to M9

  • Rainbow:
    • Translation package Post-Processing utility:
      • Fixed the bug where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
    • Added the user option "Always show the Log when starting a process".
  • Tikal:
    • Fixed the bug in the Merge command where pre-translated XLIFF entries with translate=’no’ could not be merged back properly, for example for PO files.
    • Switched help to use the wiki.
  • Ratel:
    • Windows position and size are now saved for the next session.
  • CheckMate:
    • Added capability to save and load configurations outside the session.
    • Improved pattern checks defaults and processing.
    • Added support for short vs. long text in text length verification (new Length tab)
    • Added experimental support for terminology verification.
    • Added support for exceptions in verification of double-words.
    • Added some limited support for string-based term verification.
  • Translation resources:
    • Added batchQuery method to the IQuery interface.
    • Added leverage method to the IQuery interface.
    • Open-Tran connector:
      • Changed implementation to use the REST API instead of the XML-RPC.
      • Improved support for queries with inline codes.
    • SimpleTM connector:
      • IMPORTANT: Changed the H2 database dependency from version 1.1.103 (.data.db files) to 1.2.135 (.h2.db files), this breaks backward compatibility: the new SimpleTM connector cannot open the old .data.db files. To convert an older TM: Use a M8 or prior version of Rainbow to run the SimpleTM to TMX step to export your database to TMX. Then, Use this version of Rainbow to run the Generate SimpleTM step to convert your TMX document into a new .h2.data file.
  • Steps:
    • Added the Resource Simplifier Step. It modifies normal reources of filter events into simpler resources for some third-party tools.
    • Added the XLIFF Spitter Step. It splits several <file> inside an XLIFF documents into separate documents.
    • Added the Id-Based Aligner Step. It aligns text units from two input files, based on their unique IDs (resname).
    • Added the XML Validation Step. It performs well-formness XML verification and optionally, DTD or schema validation.
    • Sentence Aligner Step:
      • Updated so entries with empty text are skipped and don’t cause an error.
    • Diff Leverage Step:
      • Added support for 3 input files: new source, old source, old translation. The second and third files must have the same text units (same number and same order).
  • Filters:
    • Modified several filters to generate unique extraction ids in non-text-unit events.
    • Vignette Filter:
      • Added support for monolingual documents.
    • XML Filter:
      • Fixed the bug where text extracted from attribute values was not processed for the codeFinder option.
  • Libraries:
    • Implemented the Appendable and CharSequence interfaces for TextFragment.
    • IMPORTANT: Changed TextFragment.toString() to return the coded text instead of the original content of the fragment. The previous behavior of toString() is now accessible using text().
    • The net.sf.okapi.lib.extra.pipelinebuilder package has been added. It allows you to easily script run pipelines, for example using Jython.

Importing the Microsoft Terminology Collection

The latest Tool Kit contains a nice description and details about importing the newly-available

Microsoft Terminology Collection into the translation environment of your choice. If your tool does not support the TBX format, however, you will have to transform the data into the proper format (e.g. CSV) before importing it.

The Tool Kit suggests using the excellent XBench for importing the TBX terminology file and exporting it into a comma-separated file. It also warns that XBench drops the “definition” field which, in my opinion, contains very useful context information. So in this case I’d say XBench is not the way to go.

By digging into the memoQ user discussion formum, I found this useful tidbit of information by Denis Hay:

True, we don’t have official support for TBX yet, but just add ".xml" to your file, open in Excel 2003 or 2007 and save as Unicode text. You will easily be able to import that into any memoQ termbase, picking only those columns you want.

Excellent. This should solve the problem and make the TBX easily accessible even if your favorite translation tool does not support this format natively (as is currently the case with memoQ).

Another solution that I tried and found to be working flawlessly is using Wordfast Pro, which supports TBX out of the box and allows you to export an imported glossary to CSV format. Wordfast Pro is available in a free trial version, which has some limitations. I’m not sure if the free version will allow to import and then export the whole Microsoft glossary, but my guess is it will.

CopyFlow Gold for InDesign CS5 5.2 released

CopyFlow Gold allows to export formatted text out of InDesign or QuarkXPress documents to a computerized translation system, and then batch import the translated text back into its original page location — all while preserving the typographic formatting. CopyFlow Gold for Adobe Illustrator is also available.

According to the latest post on the developer’s blog, CopyFlow Gold for InDesign CS5 has received an update.

TMbuilder (translation memory export creator)

image TMbuilder is a small tool that makes building up TM export/import files as straight-forward as possible. You can use it to batch-import several files in Excel (2003 or 2007) or tab-delimited format and build a Trados-compatible or TMX 1.4b TMX file with a couple of mouse clicks. Here are some more details about the features:

– Accepts two input formats: tab-delimited text files and MS Excel spreadsheets
– Creates output files in two file formats: Translator’s Workbench 7.x/8.x (TXT) and the Translation Memory eXchange (TMX)
– Works on multiple input files and offers a merging feature – there might be just one import file
– Allows the user to specify standard TM fields, like: source and target ISO flags, segment descriptions and author name
– Removes additional quotes often created by MS Excel when saving the file to the Text form
– Works with standard encodings: Unicode and UTF-8
– Rapid file creation: milliseconds for .txt and seconds for .xls input files

The application is free for non-commercial use and can be distributed as a standalone executable program. It requires Microsoft .NET Framework 3.5.

TMbuilder – the easiest Translation Memory export creator

TAUS Widget

The Translation Automation User Society (TAUS) offers a desktop program that can be used for searching their database. The TAUS Data Association comprises 45 organizations, including well-known companies as Intel, Dell, eBay, etc. and large language service providers such as Lionbridge and SDL.

One of the the main reasons that brought these companies together to share their large translation memories and glossaries is the improvement of existing machine translation systems. However, the TAUS database represents a very valuable source of bilingual texts in many languages and it is freely searchable (but requires registration) through a Web interface. The widget goes one step further by taking this powerful search tool to the desktop.

Let’s take a look at this “widget”:

image

Here I ran a simple search for the common term “taskbar”. The results include dozens of human-translated text with the term highlighted in the source (and in the target too, after the system somehow computed its translation by analyzing the words that form the searched term).

The user can make the search more specific (and I think this will vary by language combination) by selecting specific industries (e.g. hardware, software, business services), data owners (e.g. ABBYY, Adobe, Dell, etc.) and content type (instructions, marketing material, software strings, etc.)

The search is fast and accurate and it displays the data in a clear two-column layout. Users can interact with the database by reporting problems with specific segments or sentences (just click on the grey “X” to the right of the segment.)

The widget requires registration, is multi-platform and runs on the Java Runtime Environment.

TAUS Widget | TDA

PrepTags (file preparation utility for tagged formats) Launched

preptags PrepTags is a file preparation software designed to prepare a wide range of formats using a powerful regular expression engine. It allows to “prep” tagged files such as HTML, PHP, XML, ASP, Javascript, SQL, PO, etc. by converting them to RTF and protecting the code. Once a file is prepared, the translator can use his regular CAT tool to translate it. PrepTags-prepared files can be translated with Wordfast, Trados, Deja Vu, MemoQ, and any other tool with support for prepared RTF files (a format originally designed for Trados).

There are 3 versions:

  • PrepTags Lite: Free and functional but limited to 1 file at a time and without advanced features.
  • PrepTags – eBook: €15. Comes together with TransBook. Limited to 20 files at a time, but without advanced features.
  • PrepTags Pro: €39. Fully functional and unlimited number of files.

The PrepTags website contains video tutorials to help will the installation and use of the the program.

From Translation Solutions Blog » PrepTags – Officially Launched!

Creating Firefox smart keywords for quick access to frequently-used translation glossaries, dictionaries, resources, etc.

The Search bar

imageThe Firefox Search bar is a convenient method for accessing search sites without first having to visit the site’s home page and locating the search field.

So, instead of heading to Answers.com, finding the search field, typing the search term and pressing enter, you can stay in any page you are on, click on the Search bar down arrow to select which engine to use (if it’s not already selected), type your search and press enter. The relevant search results will be displayed immediately. It’s worth remembering that the shortcut key for placing the cursor into the Search bar is Ctrl-E or Ctrl-K.

Adding common search engines

imageYou can also add a search engine directly from the page you are visiting if the site’s publisher has made this feature available.

In that case, the down arrow will “glow” to show that a search engine can be added. See top-left corner in the screenshot to the right (this is way too subtle for me and I always overlook this information).

In the example, two search engines are “discovered” while visiting the CNN.com website.

Adding specific search engines

imageThe Firefox Search bar comes pre-loaded with Google, Yahoo, Amazon, eBay, Answers.com, and Creative Commons search, but it’s easy to add more by visiting popular resources such as the Firefox Add-ons page or the Mycroft Project. These pages contain very specific search engines such as the ProZ term and glossary search, IATE, etc.

So, what’s the problem with the search engine list?

After adding a dozen or so search engines for useful and fun websites, I noticed that my list started to grow much too long and that it was very impractical to use it by clicking on it and scrolling to the right engine. If you stick to about 5-10 engines, you’ll probably be fine with the standard configuration, but if you use several resources for terminology research while you are translating, you’ll soon realize how frustrating it is to get the mouse, click on the list, remember and then find the right engine for the job, go back to the keyboard, etc.

There must be a better way to accomplish this. Of course there are several extensions, add-ons and utilities that can help you get quick access to your favorite search engines. If you, like me, prefer a minimalist approach to computing and want to avoid having all those tiny utilities sitting in the system tray, eating processor cycles and creating conflicts, you may want to read on.

Adding search engines the “geeky way”

image One first helpful feature provided by Firefox is the Keyword option that appears on the right-hand column of the Manage Search Engine List, accessible by clicking on the down arrow on the Search bar and by choosing Manage Search Engines…

Select any search engine in this window, then click the Edit Keyword… button. Then type the keyword you want to use for this specific engine.

In the screenshot to the right I have specified “de” for my online Italian dictionary of choice.

Once you have confirmed and closed the window, you can use the keywords for quick access to these engines, like this:

Click on the URL bar, or, better, enter it by using the appropriate shortcut, Ctrl-L
Type the keyword for the search engine, followed by your query, for instance “dm motore”, no matter what page you’re on. Hit Enter. image
Bang! you’re taken to the results page of your query on the search engine corresponding to the keyword, no questions asked, no clicks involved. image

Firefox offers yet another, perhaps not as widely known way of consulting specific search engines. These engines will not appear in the Search bar, but are still quickly accessible by using custom keywords chosen by the user.

Supposing we want to add a specific IATE search for engineering terms from English to Italian:

First, build a sample search by going to the IATE page and by setting your specific options. Click on the screenshot to the right to see how I set the options for my specific purpose. You can change the language combinations and sectors to your own preference.

image
Once all the desired options are in place, right-click on the “Search term” field and choose the option Add a Keyword for this Search… image
In the window that appears, type a descriptive name for this search in the “Name” field, and an easy-to-remember keyword in the Keyword field. Also, choose where you want to keep the relevant bookmark in the “Create in” field. Perhaps it’s a good idea to keep all the keyword searches in a separate folder in the bookmark structure. Press “Add” to confirm your choice. image
Time to test the search. Go to the URL bar (Ctrl-L), type the keyword (in this case “iatemec”), followed by the term. Then press enter. image
Bang! The list of results, relevant to the options (language combination, domain) that you have specified during the one-time creation of the search keyword.
This works particularly well for websites that insist you choose a plethora of options to narrow down your results every time you get to the initial search mask.
image

Olifant Candidate Release 22 available

imageOlifant is a utility that can be used to maintain translation memory files. It can import (even by drag-and-drop) translation memories in the TMX, tab-delimited and WordFast formats, and it can export to TMX or WordFast.

Olifant allows, among other things, to perform the following tasks on translation memories:

  • Flag and remove duplicate entries
  • Create a single tri-lingual TM from two separate bi-lingual TMs
  • Reverse the source and target languages of a TM
  • Open a TM that contains invalid XML characters
  • Remove formatting codes (e.g. <bpt>, <ept>, etc.) from the TM segments
  • Search and replace text using regular expressions
  • Filter the entries based on various criteria
  • Partially export the TM
  • Find exact and fuzzy matches or concordances for the current TM entry

Olifant is free software distributed under the GNU Lesser General Public License.

Olifant (Candidate)