FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » FoxTrot not indexing files containing utf16 tag
FoxTrot not indexing files containing utf16 tag [message #299] Wed, 27 May 2015 20:32 Go to next message
elsacha
Messages: 4
Registered: May 2015
Junior Member
Hello,

We ran into a problem building an index with html files containing a utf16
tag.
The files contain the follow code in the header:



some_file.xml


If both lines and , are in the file, then FoxTrot indexes only the name of the
file and does not index their contents at all.

If we delete the UTF-16 line (), then FoxTrot starts indexing the
contents and it is possible to search inside the files.

We can not modify these files by deleting this line and it is not feasible
anyway (we have thousands of such files)/

I use Mac OS Yosemite 10.10.3, but colleagues ran into similar issue with
previous OS versions too.

I would be grateful for any insight on this problem.
Re: FoxTrot not indexing files containing utf16 tag [message #300 is a reply to message #299] Thu, 28 May 2015 10:58 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
elsacha wrote:

> We ran into a problem building an index with html files containing a utf16
> tag.

> If both lines and , are in the file, then FoxTrot indexes only the name of the
> file and does not index their contents at all.

FoxTrot relies on the Spotlight importers to extract indexable text from files; unfortunately, the RichText importer has some bugs or limitations concerning UTF-16 html files, and/or html files with incoherent charset tags.

If removing the charset=UTF-16 tag fixes the problem, maybe you can do a multi-file search and replace using, for example, TextWrangler?
If the files are really UTF-16, you may have ton convert them to UTF-8; TextBatchConv can do this.


Jérôme - CTM Engineering


------------------------------------------------------------ ---------
"FoxTrot is the one app with which I would have to go back to the PC
because Spotlight is so profoundly useless for serious research (don't
get me started). FoxTrot steps in and does about everything I need to
and does it quickly and with grace. Everytime I have emailed the devs,
I get a timely and responsive answer. I have a few quibbles of course
such as the use of non-standard Boolean operators (| instead of OR for
example) but overall I am very, very pleased.
Believe me, I am a serous researcher and this is what you want!"
FoxTrot Personal Search user comment on www.versiontracker.com

Download a demo version from www.foxtrot.ch
------------------------------------------------------------ ---------


Jérôme - FoxTrot Engineering
Re: FoxTrot not indexing files containing utf16 tag [message #301 is a reply to message #300] Thu, 28 May 2015 12:45 Go to previous messageGo to next message
elsacha
Messages: 4
Registered: May 2015
Junior Member
Thank you very much for your reply.
Is it technically feasible to run a multi-file search and replace
on thousands of files in many different subfolders (10 Gb approximately)
with TextWrangler or bbedit. The files are actually in UTF16, so we will
also have to convert them, I will check the link you provided.

Thank you for your help.

On Thursday, May 28, 2015 at 4:58:48 AM UTC-4, FoxTrot Engineering wrote:
>
> gadebski wrote:
>
>> We ran into a problem building an index with html files containing a
> utf16
>> tag.
>
>> If both lines and , are in the file, then FoxTrot indexes only the name of
> the
>> file and does not index their contents at all.
>
> FoxTrot relies on the Spotlight importers to extract indexable text from
> files; unfortunately, the RichText importer has some bugs or limitations
> concerning UTF-16 html files, and/or html files with incoherent charset
> tags.
>
> If removing the charset=UTF-16 tag fixes the problem, maybe you can do a
> multi-file search and replace using, for example, TextWrangler?
> If the files are really UTF-16, you may have ton convert them to UTF-8;
> TextBatchConv can do
> this.
>
>
> Jérôme - CTM Engineering
>
>
> "FoxTrot is the one app with which I would have to go back to the PC
> because Spotlight is so profoundly useless for serious research (don't
> get me started). FoxTrot steps in and does about everything I need to
> and does it quickly and with grace. Everytime I have emailed the devs,
> I get a timely and responsive answer. I have a few quibbles of course
> such as the use of non-standard Boolean operators (| instead of OR for
> example) but overall I am very, very pleased.
> Believe me, I am a serous researcher and this is what you want!"
> FoxTrot Personal Search user comment on www.versiontracker.com
>
> Download a demo version from www.foxtrot.ch
>
>
Re: FoxTrot not indexing files containing utf16 tag [message #302 is a reply to message #301] Fri, 29 May 2015 04:36 Go to previous message
elsacha
Messages: 4
Registered: May 2015
Junior Member
I managed to convert a large number of our files and replace the utf16 tag
following the suggestion by FoxTrot engineering.
Thank you very much for your help.

Alexandre

On Thursday, May 28, 2015 at 6:45:18 AM UTC-4, gadebski wrote:
>
> Thank you very much for your reply.
> Is it technically feasible to run a multi-file search and replace
> on thousands of files in many different subfolders (10 Gb approximately)
> with TextWrangler or bbedit. The files are actually in UTF16, so we will
> also have to convert them, I will check the link you provided.
>
> Thank you for your help.
>
> On Thursday, May 28, 2015 at 4:58:48 AM UTC-4, FoxTrot Engineering wrote:
>>
>> gadebski wrote:
>>
>>> We ran into a problem building an index with html files containing a
>> utf16
>>> tag.
>>
>>> If both lines and , are in the file, then FoxTrot indexes only the name of
>> the
>>> file and does not index their contents at all.
>>
>> FoxTrot relies on the Spotlight importers to extract indexable text from
>> files; unfortunately, the RichText importer has some bugs or limitations
>> concerning UTF-16 html files, and/or html files with incoherent charset
>> tags.
>>
>> If removing the charset=UTF-16 tag fixes the problem, maybe you can do a
>> multi-file search and replace using, for example, TextWrangler?
>> If the files are really UTF-16, you may have ton convert them to UTF-8;
>> TextBatchConv can do
>> this.
>>
>>
>> Jérôme - CTM Engineering
>>
>>
>> "FoxTrot is the one app with which I would have to go back to the PC
>> because Spotlight is so profoundly useless for serious research
>> (don't
>> get me started). FoxTrot steps in and does about everything I need to
>> and does it quickly and with grace. Everytime I have emailed the
>> devs,
>> I get a timely and responsive answer. I have a few quibbles of course
>> such as the use of non-standard Boolean operators (| instead of OR
>> for
>> example) but overall I am very, very pleased.
>> Believe me, I am a serous researcher and this is what you want!"
>> FoxTrot Personal Search user comment on www.versiontracker.com
>>
>> Download a demo version from www.foxtrot.ch
>>
>>
Previous Topic: [ANN] [corrected] FoxTrot Search products *5.5b4* available for testing
Next Topic: [ANN] FoxTrot Search products 5.5b5 available for testing
Goto Forum:
  


Current Time: Fri Mar 29 08:23:46 GMT+1 2024