FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » Gumbo is not indexing entire file
Gumbo is not indexing entire file [message #1553] Sun, 06 November 2022 15:25 Go to next message
Atlas
Messages: 130
Registered: August 2009
Senior Member
1. I have an html file that's over 100MB (yes, I know it's big, but it's an archive export), and I'm using Gumbo as my default html indexer by using `defaults write com.ctmdev.FoxTrotShared PreferGumbo -bool YES`.
2. Noticed that Foxtrot was not able to search for certain words that I know is in the file.
3. When I view the html file in plain text in Foxtrot, I see that it only contains text for the top 1/5 of the file. Unsurprisingly, anything that's not indexed to text is also not searchable by Foxtrot.
4. As sanity check, I ran the same search on second machine that doesn't have Gumbo turned on, and I can search the file just fine.

Question: Am I missing something in my setting? I know that there's a size limit for text files, but is there a size limit for html files as well if I use Gumbo?

Thank you.
Re: Gumbo is not indexing entire file [message #1563 is a reply to message #1553] Tue, 08 November 2022 15:30 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
There is no size limit that I am aware of regarding HTML files indexed using Gumbo; but maybe Gumbo has its own limitations…

Jérôme - FoxTrot Engineering
Re: Gumbo is not indexing entire file [message #1564 is a reply to message #1563] Tue, 08 November 2022 15:33 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
Note that you no longer need to use the hidden preference to use Gumbo; we have added a checkbox in the "First Aid / manage third-party metadata importers" window (press the command and option keys while launching FoxTrot)

Jérôme - FoxTrot Engineering
Re: Gumbo is not indexing entire file [message #1574 is a reply to message #1564] Fri, 18 November 2022 21:28 Go to previous messageGo to next message
Atlas
Messages: 130
Registered: August 2009
Senior Member
I just checked with another smaller HTML that's only 35MB, and the same thing happen where the plain text view of the file is only showing text for maybe the top 30% - 50% of the file. Again, I'm using Gumbo to index html files.

Can you check to see if Gumbo is having a similar issue for you when indexing larger html files? I dont know what file size is too big for Gumbo, or if the file size is even the problem. Thanks.
Re: Gumbo is not indexing entire file [message #1619 is a reply to message #1574] Wed, 15 February 2023 19:39 Go to previous messageGo to next message
Atlas
Messages: 130
Registered: August 2009
Senior Member
I've been waiting several months for a reply on this thread. Can Foxtrot Engineering please let me know if they've looked into this? Foxtrot seems to give incomplete index of html files, even when they are as small as 35MB and using Gumbo. <--- Is this fixed in 7.5.4?? Sorry, I cannot delete posts. Feel free to delete this if the issue is already fixed.

[Updated on: Wed, 15 February 2023 20:29]

Report message to a moderator

Re: Gumbo is not indexing entire file [message #1636 is a reply to message #1619] Tue, 21 February 2023 11:30 Go to previous message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
This issue has been fixed in version 7.5.4, when using Gumbo to index HTML files AND setting the PlainTextFileLimitMB hidden preference, then rebuilding the index:
defaults write com.ctmdev.FoxTrotShared PlainTextFileLimitMB -int 0


Jérôme - FoxTrot Engineering
Previous Topic: Search for Files modified today
Next Topic: Index Frequency Load and Availability
Goto Forum:
  


Current Time: Thu Mar 28 21:36:54 GMT+1 2024