| 
		
			| How to find strings inside sequences? [message #1172] | Sun, 04 April 2021 18:37  |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| I am going crazy trying to figure out how to do this. I have a dataset with sequences such as this: 
 144361345600:100040598733:Doe:John:male:::::32/14/3999
 
 How do I search for "Doe:John" inside such a sequence?
 
 I tried reading the FoxTrot Query help but I must say I find it to be somewhat impenetrable. I don't see a way to find strings inside sequences.
 
 Help would be apreciated
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1173 is a reply to message #1172] | Sun, 04 April 2021 19:20   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| If searching for a string preceded and followed by colons is enough for you: [Contents…] [include the exact string] [case] [:doe:john:]
 
 For more precise processing of your sequences, something like:
 [Contents…] [include consecutive words] [Doe John]
 [then apply advanced filter] [contents] [contains the regular expression]  [\d{12}:\d{12}:Doe:John:(male|female):::::\d\d/\d\d/\d\d\d\d ]
 
 See our FAQ for more info; it includes some links to learn about regular expression syntax.
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	|  | 
	|  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1176 is a reply to message #1174] | Mon, 05 April 2021 10:14   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| These examples were for using the user interface popup menus… If you prefer using the FoxTrot Query syntax, to search for an exact string including punctuation: [contents…] [matches the FoxTrot query] [^:doe:john:^]
 
 Or to search for a regular expressions:
 [Contents…] [matches the FoxTrot query] [doe john  `\d{12}:\d{12}:Doe:John:(male|female):::::\d\d/\d\d/\d\d\d\d `]
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	|  | 
	|  | 
	|  | 
	|  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1181 is a reply to message #1180] | Mon, 05 April 2021 14:37   |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| Ok, sorry but now I under what you mean about the "user interface" which I never used before because I never needed it. 
 Anyway, I made a test set of a folder with 8 large text files (400-500MB each) and then applied the attached searches using just a surname I know is in all of the files from doing a TextEdit search. With both searches, I only get a hit from one of the eight files which is also the same result if I do a simple search for just the name.
 
 Is it possible something went wrong with the indexing of these files or am I still doing something wrong?
 
 
 
 
 
 
 
	
	 Attachment: FT1.png (Size: 25.97KB, Downloaded 291 times)
	 Attachment: FT 2.png (Size: 18.96KB, Downloaded 317 times)
 [Updated on: Mon, 05 April 2021 15:00] Report message to a moderator |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1182 is a reply to message #1181] | Mon, 05 April 2021 17:18   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| 
  Should your query be case insensitive? If so, make sure to check [case] in the [ignoring] popup menu (or use the FoxTrot query: [^{c}:Doe:^])
 Do your files have a .txt filename extension? If they have another extension, they may be parsed by a Spotlight metadata importer on indexing. Option-click on a file in the result list (or use the [display type] popup menu in the toolbar to view the file as plain text), to show what has been indexed, which could differ to what is displayed in the preview.
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1183 is a reply to message #1182] | Mon, 05 April 2021 18:30   |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| "Should your query be case insensitive? If so, make sure to check [case] in the [ignoring] popup menu (or use the FoxTrot query: [^{c}:Doe:^])" 
 Already done.
 
 "Do your files have a .txt filename extension? "
 
 All are .txt
 
 "Option-click on a file in the result list (or use the [display type] popup menu in the toolbar "
 
 Not sure about this. When I option-click on one of these text files in the results lit nothing happens. "Display as plain text" command is grey.
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1184 is a reply to message #1183] | Mon, 05 April 2021 22:59   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| Do you have the preview pane hidden, and view found files in the separate window? In this case, either option-double-click a file in the result list, or use the [display type] popup menu in the separate window… 
 Could you copy and paste an actual example of the string you can't find with FT, but can find in TextEdit, and a screenshot of your query in FT? Are you sure there is no special characters in the string, like ASCII 0 etc?
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	|  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1186 is a reply to message #1185] | Tue, 06 April 2021 16:24   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| This query should work, assuming that: - your file is actually inside the [USA] folder
 - the string in the file does not contain zero-lenth character such as ASCII NULL; did you try to copy-and-paste the string (including the colons) directly from the file in TextEdit, to the FoxTrot search field? Also, if the characters that you replaced by xxx contain accents, you should also check [ignoring] [accents] in the query, as there are multiple ways to encode a single accented character in Unicode.
 - you do not have a third-party Spotlight importer that parses .txt files, which seems pretty unlikely; you will be sure once you successfully view the file as it has been indexed: select it in the result list and option-click the "view in FoxTrot" toolbar button…
 
 
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1187 is a reply to message #1186] | Tue, 06 April 2021 17:45   |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| I tried all the suggestions and no luck but I think I see the problem now. When I compared one of the text files in FoxTrot using the option-click the "view in FoxTrot" toolbar button, it seems that a huge chunk of the file was not indexed. In the TextEdit version, the sequence numbers stop at 144390774 which is empty after that and that is the last number. The text edit file ends in number 16084490640. There isn't an entry for every number but visually inspecting, I would say about 75% of the file was not indexed. In other words, FoxTrot cant find the entry because it isn't in the index. 
 How do I fix this?
 
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1188 is a reply to message #1187] | Wed, 07 April 2021 09:19   |  
			| 
				
				
					| FoxTrot Engineering Messages: 427
 Registered: April 2020
 | Senior Member |  |  |  
	| Some Spotlight metadata importers (which we use to extract indexable text from documents) only return the first 10 MB of the file content. It is the case for the plain text (.txt) importer. We will add a hidden preference to FoxTrot 7.1 (that can be set from Terminal.app) so you can extend this limit for .txt and .log files.
 To extend the limit to 1 GB:
 
 To completely remove the limit:defaults write com.ctmdev.FoxTrot PlainTextFileLimitMB -int 1024
 Note that, in case you have large non-textual files with a .txt filename extension (or TEXT filesystem type attribute), this could considerably degrade the performance of the indexer.defaults write com.ctmdev.FoxTrot PlainTextFileLimitMB -int 0
 Jérôme - FoxTrot Engineering
 |  
	|  |  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1189 is a reply to message #1188] | Wed, 07 April 2021 10:29   |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| "Some Spotlight metadata importers (which we use to extract indexable text from documents) only return the first 10 MB of the file content." 
 Ok, I did mention above that the files are very, very large.
 
 "We will add a hidden preference to FoxTrot 7.1 "
 
 So I am on 7.04 so I assume it will not work with this version? When doe 7.1 arrive?
 |  
	|  |  | 
	|  | 
	| 
		
			| Re: How to find strings inside sequences? [message #1191 is a reply to message #1190] | Wed, 07 April 2021 17:08  |  
			| 
				
				
					| Steven Miller Messages: 32
 Registered: August 2012
 | Member |  |  |  
	| "split these files" 
 Clearly not viable with files of this size as that would result in hundreds and hundreds of different files.
 
 There really needs to be some kind of clear notice given about this file size limitation. 10 MB is not very large and now I need to go back over my entire day to set to see what I might've missed over the years because of this. Not happy.
 |  
	|  |  |