FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » Foxtrot seems confused with certain unicode characters
Foxtrot seems confused with certain unicode characters [message #1744] Wed, 06 December 2023 15:46 Go to next message
Atlas
Messages: 130
Registered: August 2009
Senior Member
I have a file with the name "231019656 ・ Non‐Western orientations to strategic intelligence - Bozeman [Book].pdf". When I search for files with the unicode character "non-breaking hyphen" (U+2011), this file also pop up, even though the hyphen inside "Non‐Western" is actually a unicode character called "hyphen" (U+2010). I double checked, and it looks like Foxtrot thinks the "hyphen" (U+2010) is a "non-breaking hyphen (U+2011) when I search for these characters using "Advanced Filter" > File Name > "Contains any of the strings > "Ignore Case"+"Ignore Composition"+"Multiple Strings".

These two unicode characters are definitely different, but maybe "Ignore Composition" is somehow having an unintended effect? As far as I can tell, the "Ignore Composition" is only suppose to affect alphabet letters (according to Foxtrot's FAQ page). Please look into this and let us know if this is a bug that could be fixed, or if it's part of Foxtrot's designed behavior that needs to be clarified. I'm on Foxtrot 7.5.6.

Thanks.
Re: Foxtrot seems confused with certain unicode characters [message #1745 is a reply to message #1744] Thu, 07 December 2023 11:32 Go to previous message
FoxTrot Engineering
Messages: 384
Registered: April 2020
Senior Member
You are right, "hyphen" and "non-breaking hyphen" are considered equivalent when "Ignore Composition" is enabled; this is however an intended effect, as the "non-breaking hyphen" character can be decomposed to the "hyphen" character, in a non-breaking variant (see unicode U+2011)

We have updated the FAQ accordingly:
Ignore Composition: in Unicode, some characters can be encoded using either a single codepoint, or a sequence of codepoints. This is especially the case for accented lowercase Roman vowels (those part of ISO-8859-1), and Korean letters. Also, some characters can be decomposed to an “equivalent” character or sequence, e.g. ¼ can be decomposed to 1/4, ² to 2, ④ to 4, 𝒄 to c, non-breaking hyphen to hyphen etc. When enabled, both forms are considered equal


Jérôme - FoxTrot Engineering
Previous Topic: Unusually long and empty blacklist window in version 8.0 build 2968 (Apple Silicon)
Next Topic: It's a paid upgrade -- you'll learn after you install and try to use it
Goto Forum:
  


Current Time: Wed May 01 04:28:38 GMT+2 2024