Since the previous post generated some interest, I have decided to put the program to the test. I saved the page from http://www.apple.com/mac/ as a local HTML file and fed it to the program. Here’s a screenshot of Caterpillar’s main window:
Just place the source files in the folder indicated by the In path, choose an extraction path and hit Extract. The processing speed on this single, short page was very high and the resulting TXT file was created instantaneously.
Here’s what the program’s output looks like:
As you can see, after the first couple of lines containing headers that will help the program reconstruct the file after it has been translated, the structure contains the following fields:
The translator will then have to translate all the fields preceded by “Target=” (using the CAT-tool of choice) and then reconstruct the translated HTM file by using Caterpillar’s Integration command.
Here’s what I found out during this brief test:
- The tags are completely taken out of the equation when the text is converted to TXT. This can be either good or bad, depending on the translator’s tastes and on the type and complexity of the file being processed
- The program does not assign internal/external styles to the file, so if the translator wants to use a CAT tool to translate it, the choice is between moving the cursor to the beginning of the “Target=” header after having translated each sentence, or to prep the file by assigning the “translatable” attribute to the Target sentence and by making the rest of the test untranslatable
- I noticed that if the HTML file contains diacritics (accented letters) or characters that are rendered by using Unicode in the HTML file, these become corrupted during the conversion to the translatable TXT file. This issue might or might not be addressed by the Encoding Converter option available in the program, which I did not test
- Caterpillar has an option that allows to merge the source files into one single translatable TXT file. This sounds particularly interesting for translators who like the auto-propagation feature offered by some CAT tools and for those complex projects comprising hundreds of tiny HTML files in multiple sub-folders
- Interestingly, the file types that are available for processing include (besides HTML) PHP, XML and ASP. I did not test all these formats. However, I did test the program with one of those dreaded XML files that contain embedded HTML code. Surprisingly, Caterpillar did a decent job of extracting the translatable text. On the downside, the program creates a segmentation break at each tag that is preceded and followed by translatable text, so for the following code:
<p><b>Note:</b> When searching, look for the <img src="/images/search/plus_icon.gif" width=9 height=9> icon to see which items are only available to ACME <span class="hlt">Plus</span> customers.]]></Data></Cell>
is rendered as follows:
Source=When searching, look for the Target=When searching, look for the ID=133 Type=text Source=icon to see which items are only available to ACME Target=icon to see which items are only available to ACME ID=134 Type=text Source=Plus Target=Plus ID=135 Type=text Source=customers.]]> Target=customers.]]>
In conclusion, although this tool still requires some extra work for prepping the files in order to process them with a CAT tool and it takes a rather radical approach to tags (by deleting them from the working file), it migh prove a useful addition to the utilities folder of those translators who use basic CAT tools that cannot prep HTML and tagged files, and to the advanced users who need a quick way of simplifying complex tagged files, for instance XML with embedded HTML.
However, full reliablility of Caterpillar should be tested carefully before using the tool on a large-scale project.