FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » How to find strings inside sequences?
How to find strings inside sequences? [message #1172] Sun, 04 April 2021 18:37 Go to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
I am going crazy trying to figure out how to do this. I have a dataset with sequences such as this:

144361345600:100040598733:Doe:John:male:::::32/14/3999

How do I search for "Doe:John" inside such a sequence?

I tried reading the FoxTrot Query help but I must say I find it to be somewhat impenetrable. I don't see a way to find strings inside sequences.

Help would be apreciated
Re: How to find strings inside sequences? [message #1173 is a reply to message #1172] Sun, 04 April 2021 19:20 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
If searching for a string preceded and followed by colons is enough for you:
[Contents…] [include the exact string] [case] [:doe:john:]

For more precise processing of your sequences, something like:
[Contents…] [include consecutive words] [Doe John]
[then apply advanced filter] [contents] [contains the regular expression] [\d{12}:\d{12}:Doe:John:(male|female):::::\d\d/\d\d/\d\d\d\d ]

See our FAQ for more info; it includes some links to learn about regular expression syntax.


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1174 is a reply to message #1173] Sun, 04 April 2021 19:49 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
thanks but I tried a FoxTrot query and pasted the string into the search box:

[Contents…] [include the exact string] [case] [:doe:john:]

changing the name to something from the data set and nothing found.

What am I missing?
Re: How to find strings inside sequences? [message #1175 is a reply to message #1173] Sun, 04 April 2021 20:40 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
The odd thing is that when I search on just a last name for example it finds some instances but not all of them even though they are all in the form ":Name:" inside strings.
Re: How to find strings inside sequences? [message #1176 is a reply to message #1174] Mon, 05 April 2021 10:14 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
These examples were for using the user interface popup menus… If you prefer using the FoxTrot Query syntax, to search for an exact string including punctuation:
[contents…] [matches the FoxTrot query] [^:doe:john:^]

Or to search for a regular expressions:
[Contents…] [matches the FoxTrot query] [doe john `\d{12}:\d{12}:Doe:John:(male|female):::::\d\d/\d\d/\d\d\d\d `]


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1177 is a reply to message #1175] Mon, 05 April 2021 10:19 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
If you are searching PDF files, make sure to read our FAQ PDF Importer Problems.

Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1178 is a reply to message #1176] Mon, 05 April 2021 11:27 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
1. sorry but what are "user interface popup menus?" I have never encountered those.

2. I pasted the first query exactly into a Foxtrot Query using a name from the database and still no results found.
Re: How to find strings inside sequences? [message #1179 is a reply to message #1177] Mon, 05 April 2021 11:28 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
these are all text files.
Re: How to find strings inside sequences? [message #1180 is a reply to message #1178] Mon, 05 April 2021 11:38 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
This applies to FoxTrot Pro, not FoxTrot Personal:
/index.php?t=getfile&id=15&private=0


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1181 is a reply to message #1180] Mon, 05 April 2021 14:37 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
Ok, sorry but now I under what you mean about the "user interface" which I never used before because I never needed it.

Anyway, I made a test set of a folder with 8 large text files (400-500MB each) and then applied the attached searches using just a surname I know is in all of the files from doing a TextEdit search. With both searches, I only get a hit from one of the eight files which is also the same result if I do a simple search for just the name.

Is it possible something went wrong with the indexing of these files or am I still doing something wrong?





  • Attachment: FT1.png
    (Size: 25.97KB, Downloaded 138 times)
  • Attachment: FT 2.png
    (Size: 18.96KB, Downloaded 154 times)

[Updated on: Mon, 05 April 2021 15:00]

Report message to a moderator

Re: How to find strings inside sequences? [message #1182 is a reply to message #1181] Mon, 05 April 2021 17:18 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member

  • Should your query be case insensitive? If so, make sure to check [case] in the [ignoring] popup menu (or use the FoxTrot query: [^{c}:Doe:^])
  • Do your files have a .txt filename extension? If they have another extension, they may be parsed by a Spotlight metadata importer on indexing. Option-click on a file in the result list (or use the [display type] popup menu in the toolbar to view the file as plain text), to show what has been indexed, which could differ to what is displayed in the preview.


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1183 is a reply to message #1182] Mon, 05 April 2021 18:30 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
"Should your query be case insensitive? If so, make sure to check [case] in the [ignoring] popup menu (or use the FoxTrot query: [^{c}:Doe:^])"

Already done.

"Do your files have a .txt filename extension? "

All are .txt

"Option-click on a file in the result list (or use the [display type] popup menu in the toolbar "

Not sure about this. When I option-click on one of these text files in the results lit nothing happens. "Display as plain text" command is grey.
Re: How to find strings inside sequences? [message #1184 is a reply to message #1183] Mon, 05 April 2021 22:59 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
Do you have the preview pane hidden, and view found files in the separate window? In this case, either option-double-click a file in the result list, or use the [display type] popup menu in the separate window…

Could you copy and paste an actual example of the string you can't find with FT, but can find in TextEdit, and a screenshot of your query in FT? Are you sure there is no special characters in the string, like ASCII 0 etc?


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1185 is a reply to message #1184] Tue, 06 April 2021 14:02 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
" In this case, either option-double-click a file in the result list, "

option-double-click opens TextEdit and what I see there is the same as what I see in preview if that is what you mean.


"Could you copy and paste an actual example of the string you can't find with FT, but can find in TextEdit, and a screenshot of your query in FT? Are you sure there is no special characters in the string, like ASCII 0 etc?"

15039532923:100027362945234:Merxxx:Calderon:female:::::9/2/2 018 12:00:00 AM::

This is the exact string but I did replace three letters in the last name with "xxx" for privacy. The search name and name in the file were identical however.
  • Attachment: FT Test.png
    (Size: 23.51KB, Downloaded 131 times)
Re: How to find strings inside sequences? [message #1186 is a reply to message #1185] Tue, 06 April 2021 16:24 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
This query should work, assuming that:
- your file is actually inside the [USA] folder
- the string in the file does not contain zero-lenth character such as ASCII NULL; did you try to copy-and-paste the string (including the colons) directly from the file in TextEdit, to the FoxTrot search field? Also, if the characters that you replaced by xxx contain accents, you should also check [ignoring] [accents] in the query, as there are multiple ways to encode a single accented character in Unicode.
- you do not have a third-party Spotlight importer that parses .txt files, which seems pretty unlikely; you will be sure once you successfully view the file as it has been indexed: select it in the result list and option-click the "view in FoxTrot" toolbar button…


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1187 is a reply to message #1186] Tue, 06 April 2021 17:45 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
I tried all the suggestions and no luck but I think I see the problem now. When I compared one of the text files in FoxTrot using the option-click the "view in FoxTrot" toolbar button, it seems that a huge chunk of the file was not indexed. In the TextEdit version, the sequence numbers stop at 144390774 which is empty after that and that is the last number. The text edit file ends in number 16084490640. There isn't an entry for every number but visually inspecting, I would say about 75% of the file was not indexed. In other words, FoxTrot cant find the entry because it isn't in the index.

How do I fix this?
Re: How to find strings inside sequences? [message #1188 is a reply to message #1187] Wed, 07 April 2021 09:19 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
Some Spotlight metadata importers (which we use to extract indexable text from documents) only return the first 10 MB of the file content. It is the case for the plain text (.txt) importer.
We will add a hidden preference to FoxTrot 7.1 (that can be set from Terminal.app) so you can extend this limit for .txt and .log files.
To extend the limit to 1 GB:
defaults write com.ctmdev.FoxTrot PlainTextFileLimitMB -int 1024
To completely remove the limit:
defaults write com.ctmdev.FoxTrot PlainTextFileLimitMB -int 0
Note that, in case you have large non-textual files with a .txt filename extension (or TEXT filesystem type attribute), this could considerably degrade the performance of the indexer.


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1189 is a reply to message #1188] Wed, 07 April 2021 10:29 Go to previous messageGo to next message
Steven Miller
Messages: 32
Registered: August 2012
Member
"Some Spotlight metadata importers (which we use to extract indexable text from documents) only return the first 10 MB of the file content."

Ok, I did mention above that the files are very, very large.

"We will add a hidden preference to FoxTrot 7.1 "

So I am on 7.04 so I assume it will not work with this version? When doe 7.1 arrive?
Re: How to find strings inside sequences? [message #1190 is a reply to message #1189] Wed, 07 April 2021 16:18 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 383
Registered: April 2020
Senior Member
There is no schedule yet for version 7.1.
The only workaround I see in the meantime is to split these files in smaller files, smaller than 10 MB.


Jérôme - FoxTrot Engineering
Re: How to find strings inside sequences? [message #1191 is a reply to message #1190] Wed, 07 April 2021 17:08 Go to previous message
Steven Miller
Messages: 32
Registered: August 2012
Member
"split these files"

Clearly not viable with files of this size as that would result in hundreds and hundreds of different files.

There really needs to be some kind of clear notice given about this file size limitation. 10 MB is not very large and now I need to go back over my entire day to set to see what I might've missed over the years because of this. Not happy.
Previous Topic: The Spotlight importers are disabled or broken.
Next Topic: Best Place to store FTIndex files?
Goto Forum:
  


Current Time: Fri Mar 29 00:17:11 GMT+1 2024