Re: Broken words at the end of line [message #1272 is a reply to message #1271] |
Thu, 30 September 2021 10:18   |
Des Bw
Messages: 26 Registered: June 2017
|
Junior Member |
|
|
Thank you for the reply. Yes, I know how to do these searches using Regex or Foxtrot's own system. The problem is I often don't know of a word is hyphenated or not. For the word ranking everything else, broken words are irrelevant/ignored in the current system.
- As to the first point, I think the way pdf generating softwares create hyphenated words is pretty consistent.
It is always a hyphen followed the return key. When I look at these breakages in the text version of the index, they always appear as hyphen followed by a space, as I have shown in the example.
(I am assuming all the ranking, and proximity magic works on the text version of the index. I assume the pdf formatting such as the margins, etc is irrelevant to the search algorism).
- So, I was hopping, if Foxtrot can be programmed at its core to always remove (ignore) HYPHENspace sequence, such that broken words can be read as one (unified).
|
|
|