Is it advisable to index html files as txt? [message #1547] |
Mon, 31 October 2022 09:42 |
Atlas
Messages: 142 Registered: August 2009
|
Senior Member |
|
|
I'm aware that some html files need to use the Gumbo importer instead of the default Spotlight importer to search accurately. But what about telling Foxtrot to index html files as just txt files? Wouldn't that completely solve the problem of not knowing which importer to use? Perhaps I'm missing something. Thanks.
[Updated on: Mon, 31 October 2022 09:46] Report message to a moderator
|
|
|
|
Re: Is it advisable to index html files as txt? [message #1550 is a reply to message #1547] |
Tue, 01 November 2022 08:38 |
FoxTrot Engineering
Messages: 413 Registered: April 2020
|
Senior Member |
|
|
Indexing HTML files as plain text would usually give unexpected results:
- it would index every HTML tag, as well as javascript source code etc
- it would not not decode HTML entities (accented letters and special characters encoded in US-ascii)
- it would not handle character set encodings (UTF-8, ISO-8859-1 etc) properly
- etc
However if you actually need to index them as source code files rather than for their displayed content, you can use the Aliases hidden preference. We recommend using PrefsEditor instead of the command line to set this preference.
Jérôme - FoxTrot Engineering
|
|
|
|