Feature request: make selection of metadata importers a per-index option [message #1038] |
Mon, 27 April 2020 23:12 |
Zvi Biener
Messages: 8 Registered: April 2020
|
Junior Member |
|
|
Hi all and CTM,
I'm using the group to make a feature request, and see if anyone else is
interested. It is to make the choice of metadata importers a per-index
selection, not a global selection.
I use mostly scanned and OCRed pdfs, and the different between spotlight
and XPDF importer can be enormous. I duplicate all my indexes, one using
the spotlight importer, and one using XPDF. Since FT sorts by relevance, it
is very easy to skip files that are next to one another (i.e., the same
results are found), but also easy to see that some files that are of low
relevance using one importer become high relevance on the other. It works.
However, this means I can't have the indexes updated automatically, since I
need to manually select the importer, then update the relevant indexes,
then select the other importer, and update the others. (I can streamline
one of these, but not both). Having the importers as a per-index selection
would solve this.
thanks.
|
|
|
Re: Feature request: make selection of metadata importers a per-index option [message #1708 is a reply to message #1038] |
Fri, 29 September 2023 09:45 |
madison437
Messages: 1 Registered: September 2023
|
Junior Member |
|
|
This would be EXTREMELY useful to have an index based PDF parser, and whatever available options for the chosen parser, as well.
Also, it would be great to be able to specify arguments for xpdf "pdftotext", or instead use the Poppler version of "pdftotext". I'm finding with my OCR'd handwritten text, the result is better with Poppler, and with Poppler I don't have to use the "stream order" option, i.e. "-raw", which appears a little safer in getting the order of text right, generally speaking.
It's true there is the "Use Xpdf's "stream order" layout mode" checkbox when configuring xpdf, but even that would be helpful to be specified on an index basis.
Thanks.
|
|
|
Re: Feature request: make selection of metadata importers a per-index option [message #1713 is a reply to message #1708] |
Thu, 19 October 2023 17:31 |
AJKS
Messages: 53 Registered: June 2020
|
Member |
|
|
I would add my request to this topic: all PDF documents are not equal and all indexes are not equal.
I also use a curated portfolio of documents, a great number of which are scanned and OCR-ed PDFs, close to 4TB in all.
I've explored other data/text mining resources, but the affordable/accessible/understandable alternatives seem to work on plain text files as their raw data—that's just unworkable for me.
Do hear me when I say that FoxTrot Pro is basically a life-saver and has given me a career; I use it almost every day.
Still, anything that can be done to improve data indexing, cataloging, searching, and display of results—would only make FTP even more valuable than it already is.
I'd love to see word clouds and graphical topic clustering (e.g. see DEVONagent Pro, also Wikiweb on iOS) in a future edition of FTP.
Thanks
[Updated on: Thu, 19 October 2023 17:31] Report message to a moderator
|
|
|