FoxTrot Search Forum: FoxTrot Search User Forum » Indexing PDFs with lots of text

Home » Public Forums » FoxTrot Search User Forum » Indexing PDFs with lots of text

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

Re: Indexing PDFs with lots of text [message #1314 is a reply to message #1311]

Sat, 27 November 2021 17:42

Grant Barrett
Messages: 36
Registered: October 2019

Member

I put together a script using pdftotext that counts characters in files. I ran it on 10 problem files. Six of the 10 are well under 5,000,000 characters, which means that something else is amiss. Here are the counts:

1554967
2167651
2293207
2458849
2903970
3388007
5458417
7954470
11318310
11481914

I have been, as you mention, option-clicking to look at the parsed text in FoxTrot. I have read the FAQ and taken all steps proposed there. I am now in the process of trying to determine what makes these other <5,000,000-character files problem files. I can make them, and the odd coverpage one, available to you in a private link.

Report message to a moderator

[Message index]

		Indexing PDFs with lots of text By: Grant Barrett on Fri, 26 November 2021 21:17
		Re: Indexing PDFs with lots of text By: Grant Barrett on Fri, 26 November 2021 21:51
		Re: Indexing PDFs with lots of text By: Grant Barrett on Fri, 26 November 2021 22:47
		Re: Indexing PDFs with lots of text By: FoxTrot Engineering on Sat, 27 November 2021 14:42
		Re: Indexing PDFs with lots of text By: Grant Barrett on Sat, 27 November 2021 16:44
		Re: Indexing PDFs with lots of text By: Grant Barrett on Sat, 27 November 2021 17:42
		Re: Indexing PDFs with lots of text By: Grant Barrett on Sat, 27 November 2021 18:09
		Re: Indexing PDFs with lots of text By: Grant Barrett on Sat, 27 November 2021 19:53
		Re: Indexing PDFs with lots of text By: Grant Barrett on Sat, 27 November 2021 23:19
		Re: Indexing PDFs with lots of text By: Itkind on Sat, 11 December 2021 23:57
		Re: Indexing PDFs with lots of text By: FoxTrot Engineering on Mon, 13 December 2021 18:25

Previous Topic:	BUG: Searching with OR operator and quotation mark
Next Topic:	pdf documents with search highlights

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Sun Jul 12 11:54:38 GMT+2 2026