As translators, we receive PDF files from various sources.
As translators, we receive PDF files from various sources. PDF files can be created from a scanned image, a PowerPoint presentation an MS Word document, and so on; and because CAT tools won’t accept these file types our obvious first option is to complete a side-by-side translation straight over an MS Word file. However, if we choose not to do this and use a CAT tool instead, we would somehow need to extract the text from the PDF file in order to open it in our CAT tool. In our opinion, using CAT tools is generally more beneficial to both our needs and the end quality of our translation work.
Extracting Text from a PDF
Depending on its source, there are various ways of extracting text from a PDF. If the PDF has come from an MS Word document, then we need to save/convert our PDF into our preferred format (which in our case would be an MS Word document) and make any necessary changes to the format, such as misspellings, spacing, and so on.
Using a Pre-Desktop Publishing Process
However, if the PDF has come from another source, like a scanned document, we need to have a different approach. If you’re part of a company it’s highly likely that there will be a department, or at least a person, who can perform a pre-Desktop Publishing task in order to extract the text. Then, once the pre-DTP has been completed, the next step would be to pre-edit the file.
Pre-Editing Is Vitally Important
It’s very important that this is done, particularly when dealing with context that involves numbers. Regardless of how reliable the DTP department/person may be, there are always going to be those items that won’t be converted properly and will need reviewing. It’s not unusual to have a ‘c’ which needs converting into an ‘e’ or an ‘l’ that needs to be converted into a’1’, and so on.
Obviously, as a translator you want the quality of your work to always be held to a high standard, and learning how to deal with a PDF is simply part of this process. Please don’t ever forget to pre-edit the file!