Home » Public Forums » FoxTrot Search User Forum » How to match single wild character in Foxtrot Query?
How to match single wild character in Foxtrot Query? [message #1429] |
Tue, 03 May 2022 00:51 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
I understand we can use * to say that a string strings with or ends with something. But I frequently run into situations where a phrase have multiple spellings where the hyphen is optional. So it could be "multi-polar" or "multipolar" or "multi polar". Is there anyway to search for all these possibilities without typing a bunch of OR's? Ideally, we can use something like [multi?polar], where the ? could mean any character or no character. I can use that syntax if I use regex, but regex won't allow me to combine OR logic in a Foxtrot query.
|
|
|
Re: How to match single wild character in Foxtrot Query? [message #1430 is a reply to message #1429] |
Tue, 03 May 2022 09:13 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
If you don't use [includes the exact string], the "-" character is either considered as a word separator (like space and most punctuation), or either means [does not contain word / quoted string] when preceded by a space or other punctuation, and immediately followed by an alphabetic, numeric or symbol character, or by a quoted string (").
Thus, you don't need "a bunch of OR's", but a single one: [multipolar | "multi polar"] will also find multi-polar.
We could enhance FoxTrot's syntax so a ? character inside a word would mean "any single alphabetic or numeric or symbol character", or maybe "any single alphabetic or numeric or symbol character, or no character", but that would not allow finding either a single word, or two distinct consecutive words.
As for regular expressions, they support the | for alternatives, e.g. [`(foo|multi[ -]?polar|bar)`] will find any of: [foo], [multipolar], [multi-polar], [multi polar], [bar]
Jérôme - FoxTrot Engineering
|
|
|
Re: How to match single wild character in Foxtrot Query? [message #1432 is a reply to message #1430] |
Sun, 08 May 2022 22:36 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
I think there is a pattern of miscommunication here that hopefully I can address. I wanted to respond earlier, but the forum was down.
1. The point of suggesting a syntax for single character wild card is that we don't have to string together OR's. You're right that I can search for the three possibilities with [multipolar | "multi polar"] instead of [multipolar | "multi polar" | multi-polar], but that's just a difference in degree and users still have to write long searching string with OR in it to express something like simple like [multi?polar]. Maybe the best we have now is to use OR statements, but I'm wondering if we can make it better.
2. I think you're mis-understanding me when I said "no character". Clearly, if ? means "no character" then it wouldn't find "multi polar". I'm not trying to give the full engineering spec of how "?" should behave, and we can think through it together on how to spec its behavior. IF (I understand these syntax changes will take work) you are open to adding a "?" syntax, then perhaps it could mean "any alphanumeric, symbol, space, or no character". Look, I'm not an engineer, so I'm sure you can find some ways to make it more exact, and I'm just trying to illustrate some possible ways to tackle the issue. Another possibility is to take inspiration from regex syntax, wher "?" could mean something like ".", but I think "." is not interpreted as "no character". If you're open to discussing alternative ways to implement "?", then let me know. IF you want me to give you an engineering spec, then I'm open to that as well, but we will need to open up a workflow via email or elsewhere.
3. When using "Matches the Foxtrot query", we can only use an OR logic WITHIN the regex (which you've noted), but users cannot use the OR logic to COMBINE regex with other Foxtrot expressions. This is in the Foxtrot Query documentation, which I've tested: note that a regular expression may contain the | alternation operator, but you can’t combine a regular expression with another one, or with another FoxTrot query expression, with the boolean | operator. Thus, bob `(smith|doe)` is a valid FoxTrot query, but bob `smith.*` | `doe.*` or bob | `smith.*` are not valid.
|
|
|
Re: How to match single wild character in Foxtrot Query? [message #1436 is a reply to message #1432] |
Tue, 10 May 2022 10:14 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
FoxTrot searches whole words (groups of adjacent letter, number and symbol characters). You can use wildcards (leading or trailing *) to search any whole word matching a pattern, instead of a single whole word. You can't use wildcards to search characters that are not part of a word (punctuation, spaces etc), nor to find a sequence of whole words (however, you can use wildcards inside a quoted string, or when using [includes consecutive words]).
Thus, we could enhance the wildcard syntax to some extent, but that would not allow to do what you request here.
For example, if we implement * inside a word (in addition to leading and trailing), then [multi*polar] would find [multipolar] or [multiantipolar], but it would not find [multi-polar] nor [multi polar], because those are not full words, but sequences of full words. Same thing if we implement a single-character (or single-or-no-character) wildcard like your suggested ?.
Jérôme - FoxTrot Engineering
[Updated on: Tue, 10 May 2022 10:15] Report message to a moderator
|
|
|
Re: How to match single wild character in Foxtrot Query? [message #1440 is a reply to message #1436] |
Tue, 10 May 2022 22:56 |
Atlas
Messages: 140 Registered: August 2009
|
Senior Member |
|
|
1. You seem to be raising two separate points: (1) Foxtrot search only whole words (2) Modifying the "*" won't work. Is that correct?
2. The second point is confusing to me, because no one is suggesting that we should implement the "?" syntax by modifying the existing "*" syntax. We all know that "*" means only leading and trailing whole words. Maybe you are suggesting the idea to yourself and raising the point that the idea of using "*" won't work? I'm not sure where this line of thought is coming from. Bottom line is, YES, we probably cannot implement "?" simply by shoving "*" in the middle.
3. If Foxtrot searches only whole words, then would it be correct to say that Foxtrot has no capability to implement single-character wildcard? But earlier you said that "We could enhance FoxTrot's syntax so a ? character inside a word would mean "any single alphabetic or numeric or symbol character", or maybe "any single alphabetic or numeric or symbol character, or no character", but that would not allow finding either a single word, or two distinct consecutive words."?? This gives the impression that we CAN implement a new single-character syntax. <-- Maybe the previous remark was a mistake? I'm confused that you say "Foxtrot only searches for whole words", but at the same time you're saying maybe we can implement a single-character search.
Help me understand what's the issue here. Is it the case that Foxtrot CANNOT implement single-character wildcard because it's a limitation of how Foxtrot is designed to search for only whole words? Or is it the case that Foxtrot CAN implement single-character wildcard, but it's hard to do and we don't have a good way of doing it yet? It would help me to see a clearer articulation of your engineering assessment.
[Updated on: Tue, 10 May 2022 22:58] Report message to a moderator
|
|
|
Re: How to match single wild character in Foxtrot Query? [message #1441 is a reply to message #1440] |
Wed, 11 May 2022 11:34 |
FoxTrot Engineering
Messages: 406 Registered: April 2020
|
Senior Member |
|
|
That is a single point: wildcards allow searching any whole word matching an expression; they can't be used to search a sequence of words, or any character that is not part of a word (i.e. spaces, newlines, punctuation etc). This applies to the current implementation (leading and trailing *), and this would also apply if we implement other wildcards, like * inside a word to find whole words with a given prefix and suffix, or ? for any-single-alphabetic-or-letter-or-symbol-character.
Likewise, for your needs you can't use:
[includes the exact string] [ignore punctuation, ignore blanks] [multipolar]
nor:
[includes the exact string] [ignore punctuation, ignore blanks] [multi polar]
nor:
[includes the exact string] [ignore punctuation, ignore blanks] [multi-polar]
because [includes the exact string] first finds documents containing all the whole words of the query (i.e. [multipolar] for the first one, and [multi] + [polar] for the other queries), then it filters out from the results the documents that do not contain the query as an exact string (optionally after stripping accents, punctuation, converting case etc)
You can however use:
[all items of type] [...]
[then apply advanced filter] [contents or any metadata] [contains the string] [ignore punctuation, ignore blanks] [multipolar]
This is however much slower, because here we don't search for any whole words in the index, but rather search linearly the given string in the raw text storage (using the plain text content that has been stored during indexing)
It is much faster to use, when possible:
[content, any metadata or filename] [includes all of the words] [someWholeWord]
[then apply advanced filter] [contents or any metadata] [contains the string] [ignore punctuation, ignore blanks] [multipolar]
But I think the most efficient is still to search for [multipolar | "multi polar"].
Jérôme - FoxTrot Engineering
|
|
|
Goto Forum:
Current Time: Fri Dec 20 18:35:38 GMT+1 2024
|