# Demonstrate Grouping Methods
This notebook demonstrates the various grouping methods that are available in `mdf_matio`.

In [1]:
from mdf_matio.grouping import groupby_directory, groupby_file
from materials_io.utils.interface import ParseResult
import os

## Make Example Files
The grouping operations all operate on the `ParseResult` objects returned by MaterialsIO.
None of the grouping operations assume that you have access to these files and operate purely on the infomration available in these records.

In [2]:
example_files = [
    ParseResult(('a.in',), 'fake', {}),
    ParseResult((os.path.join('d', 'a.in'), 'a.in',), 'fake', {}),
    ParseResult((os.path.join('d', 'a.in'),), 'fake', {}),
    ParseResult((os.path.join('d', 'b.in'),), 'fake', {}),
    ParseResult((os.path.join('e', 'a.in'),), 'fake', {})
]

In [3]:
example_files

[ParseResult(group=('a.in',), parser='fake', metadata={}),
 ParseResult(group=('d\\a.in', 'a.in'), parser='fake', metadata={}),
 ParseResult(group=('d\\a.in',), parser='fake', metadata={}),
 ParseResult(group=('d\\b.in',), parser='fake', metadata={}),
 ParseResult(group=('e\\a.in',), parser='fake', metadata={})]

## Illustrate Grouping Mechanisms
All of our grouping mechanisms return a generator object to reduce memory requirements. 
You can still treat them as lists within loops and other operations that use iterators.

### Group by Directory
One option for MDF users is to group files by directory

In [4]:
for i, group in enumerate(groupby_directory(example_files)):
    print(f'Group {i+1}: {group}')

Group 1: [ParseResult(group=('a.in',), parser='fake', metadata={}), ParseResult(group=('d\\a.in', 'a.in'), parser='fake', metadata={})]
Group 2: [ParseResult(group=('d\\a.in',), parser='fake', metadata={}), ParseResult(group=('d\\b.in',), parser='fake', metadata={})]
Group 3: [ParseResult(group=('e\\a.in',), parser='fake', metadata={})]


Group by directory produces three groups.

- Group 1 contains parser results that include a result from `a.in` and a group of files (`a.in`, `d/a.in`). The directory of a group of files is their [common path](https://docs.python.org/3/library/os.path.html#os.path.commonpath).
- Group 2 contains files that are in directory `d`. Note that `d/a.in` appears in multiple parsing records that have different directories due to the "common path" ruls and therefore appears in multiple groups.
- Group 3 contains the only record in directory `e`.

## Group by File
The MDF merges all metadata records from a single file into a single record.
The `groupby_file` operation supports this functionality

In [5]:
for i, group in enumerate(groupby_file(example_files)):
    print(f'Group {i+1}: {group}')

Group 1: [ParseResult(group=('e\\a.in',), parser='fake', metadata={})]
Group 2: [ParseResult(group=('d\\b.in',), parser='fake', metadata={})]
Group 3: [ParseResult(group=('d\\a.in',), parser='fake', metadata={}), ParseResult(group=('a.in',), parser='fake', metadata={}), ParseResult(group=('d\\a.in', 'a.in'), parser='fake', metadata={})]


Group by file produces three groups:

- Group 1 contains 1 record `e/a.in`, as it is the only record that contains the file
- Group 2 contains 1 record `e/b.in`, as it is the only record that contains the file
- Group 3 contains 3 records. Records 0 and 1 share `a.in` in common, and records 1 and 2 share `d\a.in`. All three records are therefore grouped together