Re: how to find all non searchable pdf [message #1200 is a reply to message #1193] |
Tue, 04 May 2021 10:36   |
FoxTrot Engineering
Messages: 420 Registered: April 2020
|
Senior Member |
|
|
I am not sure why this does not seem to work for you. It should. However, to find PDF with no text at all, instead of [does not contain the string] [a], you would better use:
[all items of type] [PDF]
[then apply advanced filter] [contents] [is exactly the string] [] []
If you want to find PDF files whose textual content length is less than 1000 characters (instead of absolutely empty), the following should theoretically work:
[all items of type] [PDF]
[then apply advanced filter] [contents] [contains the regular expression] [". also applies to newlines"] [^.{0,1000}$]
However, due to a bug, the later currently does not work. This will be fixed in release 7.1, but in the mean time this one works:
[all items of type] [PDF]
[then apply advanced filter] [contents] [contains the regular expression] [". also applies to newlines"] [^.{0,1000}\x00*$]
Also note that this regular expression currently finds document whose content length is between 1 and 1000 characters, and misses lengths of 0 character. This will also be fixed in 7.1. Note that the maximum length you can search with this kind of regular expression is 65535.
Jérôme - FoxTrot Engineering
|
|
|