FoxTrot Search Forum
FoxTrot Search for macOS Forum

Home » Public Forums » FoxTrot Search User Forum » How to Index Files in Distinct Directory Paths Under a Single Index?
How to Index Files in Distinct Directory Paths Under a Single Index? [message #1837] Thu, 22 August 2024 19:07 Go to next message
foxtrotter
Messages: 12
Registered: July 2024
Junior Member
Hi,

Making this thread to ask for advice on how to best handle the following situation in Foxtrot. Please pardon me if this has been covered elsewhere.

Suppose I have files (.pdf, .docx) pertaining to a subject, say, "Linear Algebra", and they are are located at TWO distinct directory paths. My goal is to make all of these files searchable in Foxtrot, preferably under a single index.

The first location is easy to add to a Foxtrot index because all documents are housed in the same directory path, for example:

/path/to/lin_alg/lecture1.pdf
/path/to/lin_alg/lecture2.pdf
/path/to/lin_alg/lecture3.pdf
/path/to/lin_alg/tutorial1.pdf

(Currently, I am only indexing this first location. It would be nice if the relevant files in the second location, which I share about next, could be added to this index)

The second location is more challenging. The relevant documents are housed in separate child paths. Additionally, there are documents UNrelated to "Linear Algebra" in neighboring paths. To illustrate what I mean, the paths look something like these:

/path/to/bookshelf/Kevin-H/linear_algebra_book1.pdf
/path/to/bookshelf/Sheldon-A/linear_algebra_book2.pdf
/path/to/bookshelf/Sergei-W/linear_algebra_book3.pdf
/path/to/bookshelf/John-B/self_love_book.pdf
/path/to/bookshelf/Vivian-K/cooking_western_cuisine_book.pdf
/path/to/bookshelf/Eugene-P/probability_theory_book.pdf
[… so on and so forth …]

To clarify, the last three documents (self_love_book.pdf, cooking_western_cuisine_book.pdf, probability_theory_book.pdf) are not related to the subject "Linear Algebra", and as such, should be excluded from indexing.

If anyone is curious as to the complicated structuring of the second location, the reason is that it is automatically generated by the open-source digital book management app called "Calibre", which I'd like to think is quite widely used to manage digital books.

If you're reading this, I'd be happy to hear your opinions or advice if you have any you'd like to share.

At the moment I'm thinking whether it'd be possible for me to tag the books in the second location with the tag "Linear Algebra", and then tell Foxtrot to only index these tagged files in this particular location while ignoring other files. I'm currently on Mac OS so I'm guessing tagging should be done with Mac OS's native file tags. I think what Mac OS does when tagging a file is to add an extended attribute to it; and I'm hoping that Foxtrot allows filtering such attributes when specifying which files to include in an index.

If anyone knows of a better solution, please feel free to share, I would be very thankful to hear it.

Well, it appears that my post has turned out kind of lengthy, so thank you for reading through all this.
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1838 is a reply to message #1837] Tue, 27 August 2024 12:01 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
You can only index whole folders, skipping some subfolders (at any depth) you manually add to the relevant pane.
You can't skip from indexing files or folders based on their tag, but you can use tags at search time (this applies to files tagged individually; you can't tag a folder to mark all the files its contains):
- use the "by Tag" categorizer in the left pane to filter the found files
- or add a [tags] criterion
- or add a [then apply advanced filter] [tags] criterion


Jérôme - FoxTrot Engineering
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1854 is a reply to message #1838] Fri, 27 September 2024 12:51 Go to previous messageGo to next message
Atlas
Messages: 140
Registered: August 2009
Senior Member
I'll share how I manage my searches in these situations, and maybe some of it useful for your situation.  In all situations below, I create only 1 index for all the file locations, and I don't split up indexes.  


Situation 1

If the files you're searching for contain the same pattern of keywords in their filenames or pathnames, then search for "full path" using regular expressions.  Specifically, use "Then apply advanced filter" -> "Full Path" -> "Contains any of the regular expression".  For example, if all the files contain "lin alg" or "linear algebra" in their filename or pathname, then you can write a regular expression to search for both variants.

Situation 2

If the files you're searching for is part of list, but their filenames don't necessarily have anything in common, then you probably have to use a more involved process.  For me, I would write a script to grab the first 20 characters from the filename of each file (or their unique fileID if they have one), and then concatenate them together in a procedural manner to automatically generate a long search string with a bunch of "OR".  For example, search for filenames with with "string1 OR string2 OR string3 ..etc.".  Use this with the feature Foxtrot feature "Contains any of the strings".  <-- This is probably not you, but I'm just highlighting the option.

Situation 3

If you use tags, then you should just use the method suggested by Foxtrot Engineering.  Tags offer a painless way to deal with situations like this.  However, I avoid using tags because (1) tag data doesn't travel well (2) I can't easily back up tag data structure (3) tag data can only be read within Mac ecosystem.  I do use tags for data processing purposes, such as tagging a large group of files with temporary tags for data processing, and then deleting the tag data afterward.

[Updated on: Fri, 27 September 2024 12:52]

Report message to a moderator

Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1859 is a reply to message #1838] Sat, 28 September 2024 16:40 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
You can only index whole folders, skipping some subfolders (at any depth) you manually add to the relevant pane.
However, there is a hidden preference (configurable from Terminal.app) to exclude files or entire folders based on their name or path. This won't work for Finder tags, but might work for pseudo-tags you add to file or folder names (e.g.: "Thesis [lin-alg].pdf".
See "Skipping some folders at indexing time" in our hidden Preferences documentation


Jérôme - FoxTrot Engineering
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1860 is a reply to message #1859] Sun, 29 September 2024 03:33 Go to previous messageGo to next message
Atlas
Messages: 140
Registered: August 2009
Senior Member
Quote:
However, there is a hidden preference (configurable from Terminal.app) to exclude files or entire folders based on their name or path.
This prompted me to lookup the new feature in Foxtrot 8.  Please confirm if my understanding is correct:  Users can now choose which file/folder to index by applying regular expression filter on the full path?  For example, I can choose to index only files with {regex pattern A} or NOT index files with {regex pattern B}?  Why is this not news???  This makes Foxtrot indexes so much easier to use.  Currently, I've been applying fullpath regex on every new search just because I cannot perform the regex filter at index-time.  

Quoted from documentation:
"With version 8, you can also define custom regular expressions to specify some other files or folders to skip. You can either use a single regular expression (define SkipPathRegex as a string) or multiple ones (define SkipPathRegex as an array of strings). To avoid incorrect parsing when using the defaults command, it is suggested to quote the regular expression between single quotes ('), and to prepend any string with a "-string" argument. The regular expression is applied to the full path, and a file or folder is skipped if its path contains the regex; use a leading ^ and / or a trailing $ if the regex should be found at the beginning / ending of the path."

As a small comment, I can see why you might not want to apply regex at index-time, and instead encourage user to apply regex only at search-time, because you might be slowing down the index update process.  Currently, my index update process is already taking 8 - 10 seconds, and I wouldn't want the update process to approach 15 - 20 seconds.  Nonetheless, it's a good option to give users.  So kudos, if this feature is actually implemented.

[Updated on: Sun, 29 September 2024 03:35]

Report message to a moderator

Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1861 is a reply to message #1860] Sun, 29 September 2024 10:06 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
- this is a global setting that applies to all your indices. The goal is more to exclude some kind of configuration, cache, log, dump or archive files, rather than to have complex rules to organize documents with a path nomenclature
- the regexes are used to exclude some files or folders; if you only want to index files that match a given regex, you will have to write the opposite regex to exclude the other files, which can be quite tricky


Jérôme - FoxTrot Engineering
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1862 is a reply to message #1861] Sun, 29 September 2024 12:41 Go to previous messageGo to next message
Atlas
Messages: 140
Registered: August 2009
Senior Member
FoxTrot Engineering wrote on Sun, 29 September 2024 10:06
- the regexes are used to exclude some files or folders; if you only want to index files that match a given regex, you will have to write the opposite regex to exclude the other files, which can be quite tricky
This is true, in case anyone ever wants to write complex regex that negate conditions.  Thanks for clarifying that the feature is designed mainly to perform global exclusion from indexes.
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1863 is a reply to message #1838] Mon, 30 September 2024 01:33 Go to previous messageGo to next message
foxtrotter
Messages: 12
Registered: July 2024
Junior Member
FoxTrot Engineering wrote on Tue, 27 August 2024 12:01
You can only index whole folders, skipping some subfolders (at any depth) you manually add to the relevant pane.
You can't skip from indexing files or folders based on their tag, but you can use tags at search time (this applies to files tagged individually; you can't tag a folder to mark all the files its contains):
- use the "by Tag" categorizer in the left pane to filter the found files
- or add a [tags] criterion
- or add a [then apply advanced filter] [tags] criterion
Trying my luck here:

Suppose a pdf file's tags – say, tags being "cooking" and "fusion cuisine" for a cookbook – could be written to the pdf file's XMP headers under the key "kMDItemKeywords", could this data still be filtered on in Foxtrot Pro? Noticed that Mac OS tags are written to a different key – "kMDItemUserTags".

In case anyone is curious why this is happening, it's because the ebook management app I use embeds its tags under "kMDItemKeywords".

[Updated on: Mon, 30 September 2024 01:35]

Report message to a moderator

Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1864 is a reply to message #1863] Mon, 30 September 2024 09:50 Go to previous messageGo to next message
FoxTrot Engineering
Messages: 406
Registered: April 2020
Senior Member
Yes, PDF keywords can also be used; they are handled by FoxTrot as part of "keywords and comments". There is no categorizer for this in the left pane, but you can add a second search criterion for [keywords and comments].
If you specifically want to search kMDItemKeywords but not other metadata that FoxTrot handles in the same "keywords and comments" field (like kMDItemProjects, kMDItemComment or kMDItemMusicalGenre), then you need to add a [then filter by Spotlight attribute] criterion; see this other topic


Jérôme - FoxTrot Engineering

[Updated on: Mon, 30 September 2024 16:15]

Report message to a moderator

Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1865 is a reply to message #1863] Mon, 30 September 2024 09:57 Go to previous messageGo to next message
foxtrotter
Messages: 12
Registered: July 2024
Junior Member
Discovered that on the Mac OS, Microsoft Word also manages a .docx file's tags (these are called "keywords" in Microsoft Word) by storing them under the "kMDItemKeywords" key in the .docx file's XMP headers.

/index.php?t=getfile&id=66&private=0

/index.php?t=getfile&id=67&private=0
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1874 is a reply to message #1865] Mon, 28 October 2024 16:18 Go to previous messageGo to next message
foxtrotter
Messages: 12
Registered: July 2024
Junior Member
On the Mac OS, to filter by a file's keywords, use Foxtrot's "keyword" search criteria:

/index.php?t=getfile&id=74&private=0

/index.php?t=getfile&id=73&private=0
Re: How to Index Files in Distinct Directory Paths Under a Single Index? [message #1875 is a reply to message #1838] Mon, 28 October 2024 17:01 Go to previous message
foxtrotter
Messages: 12
Registered: July 2024
Junior Member
FoxTrot Engineering wrote on Tue, 27 August 2024 12:01
You can only index whole folders, skipping some subfolders (at any depth) you manually add to the relevant pane.
You can't skip from indexing files or folders based on their tag, but you can use tags at search time (this applies to files tagged individually; you can't tag a folder to mark all the files its contains):
- use the "by Tag" categorizer in the left pane to filter the found files
- or add a [tags] criterion
- or add a [then apply advanced filter] [tags] criterion

It would be really nice if Foxtrot devs could consider allowing users to specify indexing files only if they contained particular tags & keywords. For example, only index files with keyword "math". (For forum users who haven't been following this thread, by keyword I'm referring to metadata in files' XMP headers, and not the regular textual content one sees after opening a file.)

Notably, at the moment, Foxtrot already has the ability to specify indexing certain file types – for instance a user could choose to only index PDF files, and ignore all XML files.

Without the ability to specify a tag/keyword criteria when indexing, when I would like to search for, say, the word "affine" in my linear algebra material that's spread across two indexes, one of which just entirely contains linear algebra stuff (call this index1), and the other which contains a mix of stuff – cooking recipes, exercise tips, linear algebra (call this index2), I have to do two searches:

1st query:
Select _only_ index1. Run search query: "affine", no keyword filter.

2nd query:
Deselect index1, select _only_ index2. Run search query: "affine", keyword filter: "linear algebra".

And I'll have to remember, or figure out during query time, that the keyword to filter on in the 2nd query is "linear algebra".

The above search task could have been achieved with a single query with both index1 & index2 selected, and also without worrying about specifying the "linear algebra" keyword/tag filter criterion correctly, if index2 had undergone an indexing process where a keyword filter made it only focus on indexing linear algebra stuff.
Previous Topic: Can Foxtrot Display a PDF Document Outline Links?
Next Topic: [ANN] FoxTrot Search 8.0.4 b3 public beta available
Goto Forum:
  


Current Time: Thu Nov 21 10:39:25 GMT+1 2024