After the not-so great results I obtained with free online OCR services for PDF files (the main problem being that most services do not do OCR but just convert editable PDF text to Word and do not process embedded text graphics), I may have found a service that actually delivers on this promise: OnlineOCR.net. From the site’s own description:
OnlineOCR.net is a web-based Optical Character Recognition (OCR) service that allows you to convert scanned images and documents into editable Word, Text, Excel, PDF, Html output formats.
A couple of minor caveats
- You need to get a (free) account if you want to convert PDF>DOC
- The activation email I received ended up in my Gmail spam. So you may want to check your Spam folder if you think you have not received the activation message.
Testing the system
I did a test with a two-page PDF file containing editable text in fancy formatting on page 1 and text pasted in as lo-res graphics on page 2.
The system worked fairly well with my test document. Page 1 was rendered without any spelling errors and this confirms my impression that the editable text contained in the PDF is preserved without running it through OCR, which is great. The system has added frames, section breaks and tables in order to render the “fancy” multi-column formatting of the source PDF file.
Page 2 of the DOC file, which contained the graphic text, was rendered with some errors. This was low resolution text, and you might obtain better results if using better-quality embedded graphic text. In this case, too, the formatting was rendered by inserting tables and section breaks.
One advantage that was immediately noticeable was the fact that OnlineOCR does a rather good job at preserving the original’s formatting and does this without adding superfluous carriage returns, which are such a nuisance for translators since they disrupt the sentence-by-sentence sequence used by most CAT tools.
I could not find any information on the website that would indicate a payment plan for this service, so I would assume it’s offered for free. Considering the price, I think that this system is well worth a try if you need to convert a PDF file into an editable format. If the PDF document only (or mainly) contains editable text, you will be pleased by the results. If the file also contains text that has been pasted as graphic pages, the output will likely require some post-editing, but I think that will be comparable to what you may obtain with the majority of commercial OCR packages.
I just tried it, and would be great but it is only “free to try” … first five pages. At this time there clearly a pay method you must use for more than your trial of 5 pages. You buy blocks of pages … 10c/pg for 30 pages ($3), 5c/pg for 200 pages ($20). Good service, but definitely NOT “free”.
Thanks for your review.
Better try http://www.free-pdf-to-word-converter.com – it is completely free.
I use GIRDAC PDF to Word Converter Pro to convert scanned and text PDF documents to Word format. It can produce various output formats DOC, DOCX, XML, RTF and uses multiple conversion methods.
Thanks , I’ve recently been searching for information approximately
this subject for a long time and yours is the greatest I’ve came upon till now.
However, what about the bottom line? Are you sure in regards to the supply?
I find another free online pdf to word converter http://www.online-code.net/pdf-to-word.html to convert pdf to editable microsoft word online.
Leave a comment