Unexpected behavior when searching for single character with Foxtrot Query [message #1749] |
Sat, 23 December 2023 11:56 |
Atlas
Messages: 134 Registered: August 2009
|
Senior Member |
|
|
I'm aware that searching for "exact strings" require the search string to contain at least one whole word, but I'm not using exact string in this case.
What I did:
- Searched for the single character "‣" using "Contents Only" with "Matches the Foxtrot query".
- Searched for the string "adam smith" by adding another criteria to search for "Contents, any metadata or filename", and select "Matches the Foxtrot query".
Unexpected Result:
- This return a lot of documents, and none of them contains the character "‣" in content. It seems to search for only "adam smith". An example file is attached.
If I repeat the same process, but search for another single character like the letter "f", then it works as expected, and I will find documents with a single letter "f" in the contents. For some reason it's not working as expected for the character "‣"? If this part of expected behavior, please clarify it for users. Otherwise, if this is a bug, then please fix.
|
|
|
Re: Unexpected behavior when searching for single character with Foxtrot Query [message #1752 is a reply to message #1749] |
Tue, 26 December 2023 12:29 |
FoxTrot Engineering
Messages: 397 Registered: April 2020
|
Senior Member |
|
|
FoxTrot finds whole words (or partial words when using a leading or trailing * wildcard). This applies to [includes all of the words], [includes at least one of the words], [includes consecutive words], [includes neighboring words], and [matches the FoxTrot query].
A word is a sequence of consecutive characters that fall into the following Unicode categories: Letter (Lu, Ll, Lt, Lm, Lo) or Number (Nd, Nl, No). For example, the strings [H2O] or [ft²] are both single words, whereas [eco-friendly] iscomposed of two distinct words separated by some punctuation.
In addition, Symbol characters (Sm, Sc, Sk, So), as well as Chinese, Japanese and Korean Ideographic characters, and characters whose unicode codepoint is higher than U+10000 (e.g. emojis), are also considered by FoxTrot as a whole word (even when contiguous to other high code point, Symbol, Ideographic, Letter or Number character).
As ‣ (U+2023) is in the punctuation category (Po), you can only find it using [then apply advanced filter] [contents] [contains the string]. Or, if it follows / precedes a whole word you also want to search, using [includes the exact string].
Jérôme - FoxTrot Engineering
[Updated on: Wed, 27 December 2023 11:07] Report message to a moderator
|
|
|
|