Re: Do html files rely on Spotlight index? [message #1360 is a reply to message #1354] |
Sun, 20 February 2022 10:36 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
(2): finding a file relies on the index, and thus on the text that was extracted from the file at indexing time (usually using a Spotlight metadata importer, or in some specific cases an alternate text extractor like Xpdf or Gumbo). Highlighting found occurrences in the preview is completely different, and uses PDFKit, WebKit, or macOS's text engine depending of the file type. So yes, the result can be different. However, as I said in the previous message, you can option-click a found message to display the plain text that has actually been indexed, and then you should see why a file can be or can't be found.
The current FoxTrot version uses Gumbo only when the Spotlight importer completely fails to process an HTML file (usually because of a charset problem). With version 7.5, you will be able to use Gumbo for all HTML files by typing this command in Terminal.app:
defaults write com.ctmdev.FoxTrotShared PreferGumbo -bool YES
or to use Gumbo for some files only:
xattr -w com.ctmdev.foxtrot.extractor gumbo [file [file…]]
Jérôme - FoxTrot Engineering
|
|
|