| 
		
			| FoxTrot not indexing files containing utf16 tag [message #299] | Wed, 27 May 2015 20:32  |  
			| 
				
				
					| elsacha Messages: 4
 Registered: May 2015
 | Junior Member |  |  |  
	| Hello, 
 We ran into a problem building an index with html files containing a utf16
 tag.
 The files contain the follow code in the header:
 
 
 
 some_file.xml
 
 
 If both lines  and , are in the file, then FoxTrot indexes only the name of the
 file and does not index their contents at all.
 
 If we delete the UTF-16 line (), then FoxTrot starts indexing the
 contents and it is possible to search inside the files.
 
 We can not modify these files by deleting this line and it is not feasible
 anyway (we have thousands of such files)/
 
 I use Mac OS Yosemite 10.10.3, but colleagues ran into similar issue with
 previous OS versions too.
 
 I would be grateful for any insight on this problem.
 |  
	|  |  | 
	| 
		
			| Re: FoxTrot not indexing files containing utf16 tag [message #300 is a reply to message #299] | Thu, 28 May 2015 10:58   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| elsacha wrote: 
 > We ran into a problem building an index with html files containing a utf16
 > tag.
 
 > If both lines  and , are in the file, then FoxTrot indexes only the name of the
 > file and does not index their contents at all.
 
 FoxTrot relies on the Spotlight importers to extract indexable text from files; unfortunately, the RichText importer has some bugs or limitations concerning UTF-16 html files, and/or html files with incoherent charset tags.
 
 If removing the charset=UTF-16 tag fixes the problem, maybe you can do a multi-file search and replace using, for example, TextWrangler?
 If the files are really UTF-16, you may have ton convert them to UTF-8; TextBatchConv  can do this.
 
 
 Jérôme - CTM Engineering
 
 
 ------------------------------------------------------------ ---------
 "FoxTrot is the one app with which I would have to go back to the PC
 because Spotlight is so profoundly useless for serious research (don't
 get me started). FoxTrot steps in and does about everything I need to
 and does it quickly and with grace. Everytime I have emailed the devs,
 I get a timely and responsive answer. I have a few quibbles of course
 such as the use of non-standard Boolean operators (| instead of OR for
 example) but overall I am very, very pleased.
 Believe me, I am a serous researcher and this is what you want!"
 FoxTrot Personal Search user comment on www.versiontracker.com
 
 Download a demo version from www.foxtrot.ch
 ------------------------------------------------------------ ---------
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	| 
		
			| Re: FoxTrot not indexing files containing utf16 tag [message #301 is a reply to message #300] | Thu, 28 May 2015 12:45   |  
			| 
				
				
					| elsacha Messages: 4
 Registered: May 2015
 | Junior Member |  |  |  
	| Thank you very much for your reply. Is it technically feasible to run a multi-file search and replace
 on thousands of files in many different subfolders (10 Gb approximately)
 with TextWrangler or bbedit. The files are actually in UTF16, so we will
 also have to convert them, I will check the link you provided.
 
 Thank you for your help.
 
 On Thursday, May 28, 2015 at 4:58:48 AM UTC-4, FoxTrot Engineering wrote:
 >
 >  gadebski wrote:
 >
 >> We ran into a problem building an index with html files containing a
 >  utf16
 >> tag.
 >
 >> If both lines  and , are in the file, then FoxTrot indexes only the name of
 >  the
 >> file and does not index their contents at all.
 >
 >  FoxTrot relies on the Spotlight importers to extract indexable text from
 >  files; unfortunately, the RichText importer has some bugs or limitations
 >  concerning UTF-16 html files, and/or html files with incoherent charset
 >  tags.
 >
 >  If removing the charset=UTF-16 tag fixes the problem, maybe you can do a
 >  multi-file search and replace using, for example, TextWrangler?
 >  If the files are really UTF-16, you may have ton convert them to UTF-8;
 >  TextBatchConv  can do
 >  this.
 >
 >
 >  Jérôme - CTM Engineering
 >
 >
 >     "FoxTrot is the one app with which I would have to go back to the PC
 >      because Spotlight is so profoundly useless for serious research (don't
 >      get me started). FoxTrot steps in and does about everything I need to
 >      and does it quickly and with grace. Everytime I have emailed the devs,
 >      I get a timely and responsive answer. I have a few quibbles of course
 >      such as the use of non-standard Boolean operators (| instead of OR for
 >      example) but overall I am very, very pleased.
 >      Believe me, I am a serous researcher and this is what you want!"
 >    FoxTrot Personal Search user comment on www.versiontracker.com
 >
 >           Download a demo version from www.foxtrot.ch
 >
 >
 |  
	|  |  | 
	| 
		
			| Re: FoxTrot not indexing files containing utf16 tag [message #302 is a reply to message #301] | Fri, 29 May 2015 04:36  |  
			| 
				
				
					| elsacha Messages: 4
 Registered: May 2015
 | Junior Member |  |  |  
	| I managed to convert a large number of our files and replace the utf16 tag following the suggestion by FoxTrot engineering.
 Thank you very much for your help.
 
 Alexandre
 
 On Thursday, May 28, 2015 at 6:45:18 AM UTC-4, gadebski wrote:
 >
 >  Thank you very much for your reply.
 >  Is it technically feasible to run a multi-file search and replace
 >  on thousands of files in many different subfolders (10 Gb approximately)
 >  with TextWrangler or bbedit. The files are actually in UTF16, so we will
 >  also have to convert them, I will check the link you provided.
 >
 >  Thank you for your help.
 >
 >  On Thursday, May 28, 2015 at 4:58:48 AM UTC-4, FoxTrot Engineering wrote:
 >>
 >>  gadebski wrote:
 >>
 >>> We ran into a problem building an index with html files containing a
 >>  utf16
 >>> tag.
 >>
 >>> If both lines  and , are in the file, then FoxTrot indexes only the name of
 >>  the
 >>> file and does not index their contents at all.
 >>
 >>  FoxTrot relies on the Spotlight importers to extract indexable text from
 >>  files; unfortunately, the RichText importer has some bugs or limitations
 >>  concerning UTF-16 html files, and/or html files with incoherent charset
 >>  tags.
 >>
 >>  If removing the charset=UTF-16 tag fixes the problem, maybe you can do a
 >>  multi-file search and replace using, for example, TextWrangler?
 >>  If the files are really UTF-16, you may have ton convert them to UTF-8;
 >>  TextBatchConv  can do
 >>  this.
 >>
 >>
 >>  Jérôme - CTM Engineering
 >>
 >>
 >>     "FoxTrot is the one app with which I would have to go back to the PC
 >>      because Spotlight is so profoundly useless for serious research
 >>  (don't
 >>      get me started). FoxTrot steps in and does about everything I need to
 >>      and does it quickly and with grace. Everytime I have emailed the
 >>  devs,
 >>      I get a timely and responsive answer. I have a few quibbles of course
 >>      such as the use of non-standard Boolean operators (| instead of OR
 >>  for
 >>      example) but overall I am very, very pleased.
 >>      Believe me, I am a serous researcher and this is what you want!"
 >>    FoxTrot Personal Search user comment on www.versiontracker.com
 >>
 >>           Download a demo version from www.foxtrot.ch
 >>
 >>
 |  
	|  |  |