FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » PDF: text searchable with Preview but no text content in Foxtrot Pro
PDF: text searchable with Preview but no text content in Foxtrot Pro [message #1777] Wed, 06 March 2024 17:50 Go to next message
Artighel
Messages: 10
Registered: November 2021
Junior Member
Hello all,

I am currently trialing FT Pro 8 and stumbled upon a perplexing issue:

I have a Pdf where text is selectable and searchable with plain old Preview, yet when indexed in FT, no text content is shown in the metadata (viewable with the "text only" FT view);therefore it is not searchable in FT !
I gather that this is an image pdf where PDFkit could be doing on-the-fly OCR in Preview ? In which case, is FT unable to perform the same action? It would avoid the need to find and perform OCR on PDF image files.

Thank you for any pointers !
Re: PDF: text searchable with Preview but no text content in Foxtrot Pro [message #1778 is a reply to message #1777] Wed, 06 March 2024 18:03 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
Indeed, Preview.app automatically performs OCR when you open PDF files containing images, letting the user belive it contains indexable text.
FoxTrot can also perform OCR on PDF files, but it won't do this automatically at indexing time, as it would slow down indexing considerably. Therefore, you have to manually perform OCR on these files to convert them to OCR'ed PFD files that can be indexed.
To do so, search for PDF files by name or by type, then select the files that you want to convert, and use the contextual menu (right-click) "PDF Optical Character Recognition".
You can specifically search for large PDF files with no textual content, which are very susceptible to contain scanned text: check our FoxTrot Tips page.


Jérôme - FoxTrot Engineering

[Updated on: Wed, 06 March 2024 18:07]

Report message to a moderator

Re: PDF: text searchable with Preview but no text content in Foxtrot Pro [message #1779 is a reply to message #1777] Wed, 06 March 2024 18:20 Go to previous messageGo to next message
Artighel
Messages: 10
Registered: November 2021
Junior Member
Thanks for clarifying the issue !
So I take it there is no way for FT to leverage PDFkit's instant OCR data?
Re: PDF: text searchable with Preview but no text content in Foxtrot Pro [message #1780 is a reply to message #1779] Wed, 06 March 2024 18:25 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
Unfortunately there is no "instant OCR" in PDFKit. It is performed in the background when you open a PDF file; the first page is quickly selectable, but performing OCR on a large file is anything but instant. Anyway, FoxTrot uses PDFKit to perform OCR, but does not do it automatically.

Jérôme - FoxTrot Engineering
Re: PDF: text searchable with Preview but no text content in Foxtrot Pro [message #1781 is a reply to message #1780] Wed, 06 March 2024 18:33 Go to previous message
Artighel
Messages: 10
Registered: November 2021
Junior Member
Will converting thousands of pdfs in FT be fine? Laughing I used the search method outlined in the tips you linked and was surprised by the result.
Thank you for your very quick reply anyway !
Previous Topic: Html indexing
Next Topic: HELP: How to backup bookmarks and search history?
Goto Forum:
  


Current Time: Tue Dec 03 20:16:32 GMT+1 2024