FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » Feature request: make selection of metadata importers a per-index option
Feature request: make selection of metadata importers a per-index option [message #1038] Mon, 27 April 2020 23:12 Go to next message
Zvi Biener
Messages: 8
Registered: April 2020
Junior Member
Hi all and CTM,
I'm using the group to make a feature request, and see if anyone else is
interested. It is to make the choice of metadata importers a per-index
selection, not a global selection.

I use mostly scanned and OCRed pdfs, and the different between spotlight
and XPDF importer can be enormous. I duplicate all my indexes, one using
the spotlight importer, and one using XPDF. Since FT sorts by relevance, it
is very easy to skip files that are next to one another (i.e., the same
results are found), but also easy to see that some files that are of low
relevance using one importer become high relevance on the other. It works.
However, this means I can't have the indexes updated automatically, since I
need to manually select the importer, then update the relevant indexes,
then select the other importer, and update the others. (I can streamline
one of these, but not both). Having the importers as a per-index selection
would solve this.

thanks.
Re: Feature request: make selection of metadata importers a per-index option [message #1708 is a reply to message #1038] Fri, 29 September 2023 09:45 Go to previous messageGo to next message
madison437
Messages: 1
Registered: September 2023
Junior Member
This would be EXTREMELY useful to have an index based PDF parser, and whatever available options for the chosen parser, as well.

Also, it would be great to be able to specify arguments for xpdf "pdftotext", or instead use the Poppler version of "pdftotext". I'm finding with my OCR'd handwritten text, the result is better with Poppler, and with Poppler I don't have to use the "stream order" option, i.e. "-raw", which appears a little safer in getting the order of text right, generally speaking.

It's true there is the "Use Xpdf's "stream order" layout mode" checkbox when configuring xpdf, but even that would be helpful to be specified on an index basis.

Thanks.
Re: Feature request: make selection of metadata importers a per-index option [message #1713 is a reply to message #1708] Thu, 19 October 2023 17:31 Go to previous message
AJKS
Messages: 53
Registered: June 2020
Member
I would add my request to this topic: all PDF documents are not equal and all indexes are not equal.

I also use a curated portfolio of documents, a great number of which are scanned and OCR-ed PDFs, close to 4TB in all.

I've explored other data/text mining resources, but the affordable/accessible/understandable alternatives seem to work on plain text files as their raw data—that's just unworkable for me.

Do hear me when I say that FoxTrot Pro is basically a life-saver and has given me a career; I use it almost every day.

Still, anything that can be done to improve data indexing, cataloging, searching, and display of results—would only make FTP even more valuable than it already is.

I'd love to see word clouds and graphical topic clustering (e.g. see DEVONagent Pro, also Wikiweb on iOS) in a future edition of FTP.

Thanks

[Updated on: Thu, 19 October 2023 17:31]

Report message to a moderator

Previous Topic: Indexing takes forever
Next Topic: Version 8 beta info?
Goto Forum:
  


Current Time: Wed Dec 04 10:06:46 GMT+1 2024