Gumbo is not indexing entire file [message #1553] |
Sun, 06 November 2022 15:25 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
1. I have an html file that's over 100MB (yes, I know it's big, but it's an archive export), and I'm using Gumbo as my default html indexer by using `defaults write com.ctmdev.FoxTrotShared PreferGumbo -bool YES`.
2. Noticed that Foxtrot was not able to search for certain words that I know is in the file.
3. When I view the html file in plain text in Foxtrot, I see that it only contains text for the top 1/5 of the file. Unsurprisingly, anything that's not indexed to text is also not searchable by Foxtrot.
4. As sanity check, I ran the same search on second machine that doesn't have Gumbo turned on, and I can search the file just fine.
Question: Am I missing something in my setting? I know that there's a size limit for text files, but is there a size limit for html files as well if I use Gumbo?
Thank you.
|
|
|
|
Re: Gumbo is not indexing entire file [message #1564 is a reply to message #1563] |
Tue, 08 November 2022 15:33 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
Note that you no longer need to use the hidden preference to use Gumbo; we have added a checkbox in the "First Aid / manage third-party metadata importers" window (press the command and option keys while launching FoxTrot)
Jérôme - FoxTrot Engineering
|
|
|
Re: Gumbo is not indexing entire file [message #1574 is a reply to message #1564] |
Fri, 18 November 2022 21:28 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
I just checked with another smaller HTML that's only 35MB, and the same thing happen where the plain text view of the file is only showing text for maybe the top 30% - 50% of the file. Again, I'm using Gumbo to index html files.
Can you check to see if Gumbo is having a similar issue for you when indexing larger html files? I dont know what file size is too big for Gumbo, or if the file size is even the problem. Thanks.
|
|
|
Re: Gumbo is not indexing entire file [message #1619 is a reply to message #1574] |
Wed, 15 February 2023 19:39 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
I've been waiting several months for a reply on this thread. Can Foxtrot Engineering please let me know if they've looked into this? Foxtrot seems to give incomplete index of html files, even when they are as small as 35MB and using Gumbo. <--- Is this fixed in 7.5.4?? Sorry, I cannot delete posts. Feel free to delete this if the issue is already fixed.
[Updated on: Wed, 15 February 2023 20:29] Report message to a moderator
|
|
|
Re: Gumbo is not indexing entire file [message #1636 is a reply to message #1619] |
Tue, 21 February 2023 11:30 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
This issue has been fixed in version 7.5.4, when using Gumbo to index HTML files AND setting the PlainTextFileLimitMB hidden preference, then rebuilding the index:
defaults write com.ctmdev.FoxTrotShared PlainTextFileLimitMB -int 0
Jérôme - FoxTrot Engineering
|
|
|