Re: Indexing PDFs with lots of text [message #1314 is a reply to message #1311] |
Sat, 27 November 2021 17:42   |
Grant Barrett
Messages: 35 Registered: October 2019
|
Member |
|
|
I put together a script using pdftotext that counts characters in files. I ran it on 10 problem files. Six of the 10 are well under 5,000,000 characters, which means that something else is amiss. Here are the counts:
1554967
2167651
2293207
2458849
2903970
3388007
5458417
7954470
11318310
11481914
I have been, as you mention, option-clicking to look at the parsed text in FoxTrot. I have read the FAQ and taken all steps proposed there. I am now in the process of trying to determine what makes these other <5,000,000-character files problem files. I can make them, and the odd coverpage one, available to you in a private link.
|
|
|