Missing Space within Clause [message #1233] |
Fri, 18 June 2021 05:27 |
arusim
Messages: 5 Registered: June 2021
|
Junior Member |
|
|
The indexed contents of some PDFs contain no space within each clause, such as "Asmentionedearlier, lookupsretrieveentriesfromallthetreesandmergetheminsortedord erofkeyvalue. ", which is very annoying. However, the text is totally fine if I just copy&paste it with Adobe Reader.
Is there any way to fix it?
|
|
|
|
Re: Missing Space within Clause [message #1235 is a reply to message #1233] |
Fri, 18 June 2021 09:36 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
Please read our FAQ on this topic.
There is unfortunately not a lot FoxTrot can do in this situation (you can try using Xpdf to extract text from all PDF files, but the result may be better for some documents, and worse for others).
A possible way to fix the issue would be to convert these documents to image files, then convert them back to PDF using a good and correctly configured OCR engine that correctly inserts spaces between words… then update your index.
As FoxTrot uses a different engine when indexing PDFs (either Spotlight's importer, or Xpdf), and when displaying them (PDFKit), sometimes a document is found but the occurrence can't be highlighted; in this case, option-click the found file (or use the "display type: plain text" toolbar menu in FoxTrot 7) to display a plain text version of the file. This won't fix how the document is indexed, but this will correctly highlight what has been found.
Jérôme - FoxTrot Engineering
|
|
|