# Concept Map Analysis
Analyse a set of dynamic graphs, computing for the change in [https://en.wikipedia.org/wiki/Betweenness_centrality](betweenness) centrality pre- / post-intervention.

The intervention is an interdisciplinary lesson, and students are asked to create a concept map at the beginning and at the end of the lesson. Concept maps are created from a set of given concepts. Mapping is performed in [https://www.ddi.uni-konstanz.de/forschung/forschungsprojekte/concept-map-creator/](Concept Map Creator).

The resulting concept maps can be downloaded from _Concept Map Creator_ as zip-file per class. Each zip-file contains a hierarchical structure, with a folder per student, and all the student's graphs contained therein.

In [8]:
%pip install networkx tqdm

Collecting tqdm
  Downloading tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
Downloading tqdm-4.66.5-py3-none-any.whl (78 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.66.5
Note: you may need to restart the kernel to use updated packages.


## Import data
  * read zip file
  * find student's with at least two graphs
  * pick the oldest and most recent graph
  * analyse each graph for per-concept node centrality
  * return a dictionary of {concept : change}

## Massage Graphs
The produced graphs cannot be read by networkx nor graph_tool as they do not conform to the graphml spec. 

Let's massage them a little:
   * fix graphml namespace definitions: 
     * replace `xmlns="http://graphml.graphdrawing.org/xmlns/graphml"` by `xmlns="http://graphml.graphdrawing.org/xmlns"`.
     * replace `xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/graphml` by `xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns`
   * fix graphml attribute definitions:
     * add `attr.type="string"` to the keys for with id `d2`, `d7`, and `d8`.
     * remove the illegal `d13` key with `<key for="graphml"`
   * change graph from directed to undirected

In [124]:
def fix_graphml(xmlcontents):
    xmlcontents = xmlcontents.replace("http://graphml.graphdrawing.org/xmlns/graphml", "http://graphml.graphdrawing.org/xmlns")
    xmlcontents = xmlcontents.replace('<key for="graphml" id="d13" yfiles.type="resources"/>\n', '')
    xmlcontents = xmlcontents.replace('edgedefault="directed"', 'edgedefault="undirected"')
    xmlcontents = xmlcontents.replace('<key for="node" id="d2"', '<key for="node" id="d2" attr.type="string"')
    xmlcontents = xmlcontents.replace('<key for="edge" id="d7"', '<key for="edge" id="d7" attr.type="string"')
    xmlcontents = xmlcontents.replace('<key for="graph" id="d8"', '<key for="graph" id="d8" attr.type="string"')
    return xmlcontents

def read_student_graph(zip_file, folder_info):
    from io import TextIOWrapper
    import pathlib
    import graph_tool as gt
    import os
    result = {}
    for fileinfo in zip_file.infolist():
        if not fileinfo.is_dir() and fileinfo.filename.startswith(folder_info.filename):
            print(f'Extracting {fileinfo.filename}')
            path = pathlib.Path('tmp/'+fileinfo.filename)
            with zip_file.open(fileinfo) as graph_file:
                contents = TextIOWrapper(graph_file, "UTF-8").read()
                fixed = fix_graphml(contents)

                os.makedirs(path.parent, exist_ok=True)
                with TextIOWrapper(open(path, "wb"), "UTF-8") as out:
                    out.write(fixed)
            graph = gt.load_graph(str(path.absolute()))
            prop = graph.new_vertex_property("double")
            vc, ec = gt.centrality.betweenness(graph, prop)
            #gt.draw.graph_draw(graph)
            centrality_dict = dict(zip(graph.vertex_properties['id'], vc))
            result[path.stem] = centrality_dict  # only use vertex centrality
    if (len(result) > 1):
        return result
    print(f'Ignoring {folder_info.filename} as we need at least two graphs')

def read_recording(filename):
    """Read the given filename and produce a dictionary from geonameid to a dict per entity."""
    import zipfile
    from tqdm.auto import tqdm
    result = {}
    with zipfile.ZipFile(filename) as recording:
        for fileinfo in recording.infolist():
            if fileinfo.is_dir():
                result[fileinfo.filename] = read_student_graph(recording, fileinfo)

    return result

In [125]:
rec2Ma = read_recording('data/Robotics_Acceleration 2Ma.zip')
rec2Mf = read_recording('data/Robotics_Acceleration 2Mf.zip')
for ts, centrality in rec2Ma['nescherr@ksr.ch/'].items():
    print(ts, centrality)


Extracting libisseg@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring libisseg@ksr.ch/ as we need at least two graphs
Extracting seanbuck@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring seanbuck@ksr.ch/ as we need at least two graphs
Extracting iawegner@ksr.ch/2024-09-02 07:52:25.graphml
Extracting iawegner@ksr.ch/2024-09-02 09:16:15.graphml
Extracting lyaepper@ksr.ch/2024-09-02 07:52:24.graphml
Extracting lyaepper@ksr.ch/2024-09-02 09:16:14.graphml
Extracting tomfuchs@ksr.ch/2024-09-02 09:16:14.graphml
Ignoring tomfuchs@ksr.ch/ as we need at least two graphs
Extracting legoetsc@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring legoetsc@ksr.ch/ as we need at least two graphs
Extracting noahelms@ksr.ch/2024-09-02 07:52:25.graphml
Ignoring noahelms@ksr.ch/ as we need at least two graphs
Extracting estlopez@ksr.ch/2024-09-02 07:52:24.graphml
Extracting estlopez@ksr.ch/2024-09-02 09:16:15.graphml
Extracting maxoeler@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring maxoeler@ksr.ch/ as we need at least two graph