# Example 2: Searching Through Your Archive

This example covers the basics of searching through your `Archive` metadata fields, using the `.csv` files generated in Example 1. This will introduce you to most of the methods in the `Archive` object, as well as some of the extra tools stored in `tools.py`.

In [None]:
import MetaViz as mv
album = mv.Archive()

## Performing a search

Now that we've setup our package in the last example, we can now begin playing around with metadata! First, let's try some simple searches.

The `album` object contains two central functions for interfacing with your collection's metadata:

1. **`FindSource(searchterms)`**: Basic search function which will return any filenames whose metadata contains the terms specified in a list of `searchterms`. Several optional filters, such as `fields` and `subfolders` allow for a more refined search, which will only look for the terms inside certain metadata fields or subfolders.

2. **`GrabData()`**: Grabs all of the metadata into a `Pandas.DataFrame()`. The optional filter `fields` allows you to specify which fields you want returned, and results can be refined to only include certain files or date ranges.

Because these two functions have slightly different filter options for refining results, the combination of them can be used for fairly complex searches.

## Starting simple

First, let's search through our imaginary metadata to find any mentions of some terms. Here, let's look for mentions of the cities *Austin*, *Oxford*, and *Singapore*. By default, this will look in all metadata fields available in our csv's.

The output will be a list of filenames (excluding the folder path) whose metadata contain any of these terms.

In [None]:
terms = ['Austin','Oxford','Singapore'] # Terms for which to look

FileNames = album.FindSource(terms)
print(FileNames)

## Refining our search

Now, let's refine our search to be more specific. Because all our keywords are city names, perhaps we'd like to see which files were taken in these locations. We can do this by restricting our search to the *Coverage* XMP field.

Additionally, we can refine our search into specific sub-folders of our repository. Here we refine to *Travel*.

All of these inputs are as lists, and accept multiple entries.

**Note: If the full path to files is desired, rather than just the filename, set flag `withPath = True`** 

In [None]:
terms = ['Austin','Oxford','Singapore'] # Terms for which to look
fields = ['Coverage'] # XMP fields in which to look
subfolders = ['Travel'] # Sub-folders in which to search

FileNames = album.FindSource(terms, fields, subfolders)
print(FileNames)

## Complex search criteria

Note that, by default, `FindSource` returns filenames containing **any** one of our search terms inside the requested metadata fields. This makes sense for locations, as photos can only be taken in one place. However, perhaps for a different kind of keyword, we only want to find files in which **all** of the keywords appear. How can we do that?

There are a couple ways to do this. One simple option is to use `FindSource` with the `include_all` flag set to `True`.

Let's look for every file that mentions **both** *Kodak* and *Polaroid* in the description of the file.

In [None]:
terms = ['Kodak','Polaroid'] # Terms for which to look
fields = ['Description'] # XMP fields in which to look

FileNames = album.FindSource(terms, fields, include_all=True)
print(FileNames)

As long as your keywords are relatively simple (don't include a lot of regex wildcards) this should do the job. However, this option isn't ideal in many applications. For example, the `include_all` flag only works inside one metadata field, so if you have multiple fields in which the keywords can appear, this function will fail. Similarly, it gets messy if there are certain wildcards in your search terms, especially at the edge of keywords.

A better option is to combine multiple searches, using the list combination functions available in `tools.py`. Specifically, the functions `IntersectLists` and `DifferenceLists` can be used in conjunction to create some very complex searches. `IntersectLists` will find all filenames that appear in both lists, and `DifferenceLists` can find all files that appear in one or the other lists (but not both).

Let's try the above search again, except let's allow the terms to appear anywhere in the metadata fields.

In [None]:
first = album.FindSource(['Kodak'])
second = album.FindSource(['Polaroid'])

both = mv.IntersectLists([first, second])

Perhaps the simplest application for `DifferenceLists` is searching for terms that often appear inside other words. For example, perhaps you have a lot of photos of your friend *Antonio*, but you don't want your search to return all those photos you took in *San Antonio*. 

In [None]:
first = album.FindSource(['Antonio'])
second = album.FindSource(['San Antonio'])

either = mv.DifferenceLists([first, second])

## Accessing other metadata

Up to now we've only been grabbing filenames. However, by using the `GrabData` function, we can load any or all of our metadata fields.

If left blank, the default output grabs all available metadata into a `Pandas.DataFrame()`.

In [None]:
data = album.GrabData()
print(data)

We can also combine this function with our outputs from `FindSource` to grab the metadata only for the results of our earlier search queries. We can also choose which metadata fields to return. Using our list of `FileNames` we found earlier, let's return the `CreateDate` for all of those files:

In [None]:
data = album.GrabData(FileNames, fields=['CreateDate'])
print(data)

## Filtering by date

Of particular note, `GrabData` allows users to filter results to certain date ranges by specifying a `startdate` and `enddate`. This can be used to add another layer of refinement to searches. 

Returning to our earlier search query, one could easily implement a date range on that search as follows:

In [None]:
terms = ['Austin','Oxford','Singapore'] # Terms for which to look
fields = ['Coverage'] # XMP fields in which to look
subfolders = ['Travel'] # Sub-folders in which to search
startdate = '19990301_120000' # Beginning date, YYYYmmdd_HHMMSS
enddate = '20050915_180000' # End date, YYYYmmdd_HHMMSS

FileNames = album.FindSource(terms, fields, subfolders)
FileNames = album.GrabData(FileNames, fields=['SourceFile'],
                           startdate=startdate,
                           enddate=enddate).tolist()
print(FileNames)

## Copying files

Lastly, while many of these features will be useful for plotting routines in the next example, perhaps your goal is even simpler: maybe you just want to find all the pictures you took on a family vacation and send them to someone else who was there.

Once you have a list of filenames, it's very simple to copy those files into a specific folder somewhere else on your computer. Just specify a destination location (which need not exist yet), and call `CopyFiles`.

**Note: It is important for `CopyFiles` to specify filenames with their full paths, i.e. `withPath = True`**

dst = r'/Pictures/FamilyVacations' # relative or abs path

mv.CopyFiles(FileNames, dst)