# Concept Map Analysis
Analyse a set of dynamic graphs, computing for the change in [betweenness](https://en.wikipedia.org/wiki/Betweenness_centrality) centrality pre- / post-intervention.

The intervention is an interdisciplinary lesson, and students are asked to create a concept map at the beginning and at the end of the lesson. Concept maps are created from a set of given concepts. Mapping is performed in [Concept Map Creator](https://www.ddi.uni-konstanz.de/forschung/forschungsprojekte/concept-map-creator/).

The resulting concept maps can be downloaded from _Concept Map Creator_ as zip-file per class. Each zip-file contains a hierarchical structure, with a folder per student, and all the student's graphs contained therein.

The following libraries need to be installed, preferrably via conda:
  * graph_tool
  * tqdm
  * pandas
  * jupyter
  * ipython

```conda install jupyter ipython tqdm graph_tool pandas```


## Import data
  * read zip file
  * find students with at least two graphs
  * pick the oldest and most recent graph
  * analyse each graph for per-concept node centrality
  * return a dictionary of {concept : change}

## Massage Graphs
The produced graphs cannot be read by networkx nor graph_tool as they do not conform to the graphml spec. 

Let's massage them a little:
   * fix graphml namespace definitions: 
     * replace `xmlns="http://graphml.graphdrawing.org/xmlns/graphml"` by `xmlns="http://graphml.graphdrawing.org/xmlns"`.
     * replace `xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/graphml` by `xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns`
   * fix graphml attribute definitions:
     * add `attr.type="string"` to the keys for with id `d2`, `d7`, and `d8`.
     * remove the illegal `d13` key with `<key for="graphml"`
   * change graph from directed to undirected

In [229]:
def fix_graphml(xmlcontents):
    """Fix up graphml such that graphtool can import it."""
    xmlcontents = xmlcontents.replace("http://graphml.graphdrawing.org/xmlns/graphml", "http://graphml.graphdrawing.org/xmlns")
    xmlcontents = xmlcontents.replace('<key for="graphml" id="d13" yfiles.type="resources"/>\n', '')
    xmlcontents = xmlcontents.replace('edgedefault="directed"', 'edgedefault="undirected"')
    xmlcontents = xmlcontents.replace('<key for="node" id="d2"', '<key for="node" id="d2" attr.type="string"')
    xmlcontents = xmlcontents.replace('<key for="edge" id="d7"', '<key for="edge" id="d7" attr.type="string"')
    xmlcontents = xmlcontents.replace('<key for="graph" id="d8"', '<key for="graph" id="d8" attr.type="string"')
    return xmlcontents

def read_student_graph(zip_file, folder_info):
    """Read the graphs of a single student folder, compute betweenness, return a dict
       {timestamp : { concept : betweenness } }."""
    from io import TextIOWrapper
    import pathlib
    import graph_tool as gt
    import os
    result = {}
    for fileinfo in zip_file.infolist():
        if not fileinfo.is_dir() and fileinfo.filename.startswith(folder_info.filename):
            print(f'Extracting {fileinfo.filename}')
            path = pathlib.Path('tmp/'+fileinfo.filename)
            with zip_file.open(fileinfo) as graph_file:
                contents = TextIOWrapper(graph_file, "UTF-8").read()
                fixed = fix_graphml(contents)

                os.makedirs(path.parent, exist_ok=True)
                with TextIOWrapper(open(path, "wb"), "UTF-8") as out:
                    out.write(fixed)
            graph = gt.load_graph(str(path.absolute()))
            vc, ec = gt.centrality.betweenness(graph)
            #gt.draw.graph_draw(graph)
            # only use vertex centrality
            centrality_dict = dict(zip(graph.vertex_properties['id'], vc))
            # path is of the form <timestamp>.graphml
            result[path.stem] = centrality_dict
    if (len(result) > 1):
        return result
    print(f'Ignoring {folder_info.filename} as we need at least two graphs')

def read_recording(filename):
    """Read the given filename and produce a dictionary {student : betweenness_info}."""
    import zipfile
    from tqdm.auto import tqdm
    result = {}
    with zipfile.ZipFile(filename) as recording:
        for fileinfo in tqdm(recording.infolist()):
            if fileinfo.is_dir():
                # record only students with valid data (>= 2 graphs)
                data = read_student_graph(recording, fileinfo)
                if data:
                    result[fileinfo.filename[:-1]] = data

    return result

In [230]:
rec2Ma = read_recording('data/Robotics_Acceleration 2Ma.zip')
rec2Mf = read_recording('data/Robotics_Acceleration 2Mf.zip')

  0%|          | 0/37 [00:00<?, ?it/s]

Extracting libisseg@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring libisseg@ksr.ch/ as we need at least two graphs
Extracting seanbuck@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring seanbuck@ksr.ch/ as we need at least two graphs
Extracting iawegner@ksr.ch/2024-09-02 07:52:25.graphml
Extracting iawegner@ksr.ch/2024-09-02 09:16:15.graphml
Extracting lyaepper@ksr.ch/2024-09-02 07:52:24.graphml
Extracting lyaepper@ksr.ch/2024-09-02 09:16:14.graphml
Extracting tomfuchs@ksr.ch/2024-09-02 09:16:14.graphml
Ignoring tomfuchs@ksr.ch/ as we need at least two graphs
Extracting legoetsc@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring legoetsc@ksr.ch/ as we need at least two graphs
Extracting noahelms@ksr.ch/2024-09-02 07:52:25.graphml
Ignoring noahelms@ksr.ch/ as we need at least two graphs
Extracting estlopez@ksr.ch/2024-09-02 07:52:24.graphml
Extracting estlopez@ksr.ch/2024-09-02 09:16:15.graphml
Extracting maxoeler@ksr.ch/2024-09-02 07:52:24.graphml
Ignoring maxoeler@ksr.ch/ as we need at least two graph

  0%|          | 0/21 [00:00<?, ?it/s]

Extracting kevichau@ksr.ch/2024-09-02 07:47:57.graphml
Extracting kevichau@ksr.ch/2024-09-02 08:19:21.graphml
Extracting moeugste@ksr.ch/2024-09-02 07:54:51.graphml
Ignoring moeugste@ksr.ch/ as we need at least two graphs
Extracting lehasano@ksr.ch/2024-09-02 07:47:58.graphml
Ignoring lehasano@ksr.ch/ as we need at least two graphs
Extracting samuelle@ksr.ch/2024-09-02 07:47:57.graphml
Extracting samuelle@ksr.ch/2024-09-02 08:19:21.graphml
Extracting coroesch@ksr.ch/2024-09-02 07:54:52.graphml
Extracting coroesch@ksr.ch/2024-09-02 08:19:22.graphml
Extracting maschoen@ksr.ch/2024-09-02 07:54:51.graphml
Ignoring maschoen@ksr.ch/ as we need at least two graphs
Extracting nicoweik@ksr.ch/2024-09-02 07:54:51.graphml
Extracting nicoweik@ksr.ch/2024-09-02 08:19:20.graphml
Extracting noewirth@ksr.ch/2024-09-02 07:47:57.graphml
Extracting noewirth@ksr.ch/2024-09-02 08:19:20.graphml


## Processing
For each student:
  * compute delta pre/post
  * assemble in data table: 
    * rows: vertex names
    * columns: students
    * values: betweenness delta
    

In [236]:
import pandas as pd
import itertools
# Produce a single dataframe for all students, recording per student centrality delta.
delta = pd.DataFrame()
for idx, (student, data) in enumerate(itertools.chain(rec2Ma.items(), rec2Mf.items())):
    # Produce a DataFrame for each student:
    df = pd.DataFrame.from_dict(data)
    # Select the oldest column as pre-intervention and the most recent as post-intervention.
    # Column headers are timestamps, hence min/max will do. Rename accordingly.
    cols = list(df)
    df.rename(columns={min(cols): 'pre', max(cols): 'post'}, inplace=True)
    df['delta'] = df['post'] - df['pre']
    # Anonymize as "student X" instead of username to not leak student emails in public.
    delta["student " + str(idx)] = df['delta']
    if student.startswith('noewir'):
        df.sort_values(by='delta', ascending=False, inplace=True)
        s = df

delta['mean betweenness delta'] = delta.mean(axis=1)
delta.sort_values(by='mean betweenness delta', ascending=False, inplace=True)
delta
#s


Unnamed: 0,pre,post,delta
Sensor,0.0,0.519231,0.519231
Gravitation,0.025641,0.282051,0.25641
Beschleunigung,0.179487,0.352564,0.173077
Motor,0.0,0.153846,0.153846
Schwerelosigkeit,0.0,0.153846,0.153846
Microbit,0.25641,0.384615,0.128205
Roboter,0.269231,0.282051,0.012821
Aktor,0.0,0.0,0.0
Ausrichtung,0.0,0.0,0.0
Erdanziehung,0.0,0.0,0.0


### Other Stats

Compute min/max betweenness concept pre and post intervention.

In [232]:
import pandas as pd
import itertools
pre_df = pd.DataFrame()
post_df = pd.DataFrame()
# Produce a single dataframe for all students, recording per student centrality delta.
delta_df = pd.DataFrame()
for student, data in itertools.chain(rec2Ma.items(), rec2Mf.items()):
    df = pd.DataFrame.from_dict(data)
    cols = list(df)
    df.rename(columns={min(cols): 'pre', max(cols): 'post'}, inplace=True)
    pre_df[student] = df['pre']
    post_df[student] = df['post']
    delta_df[student] = df['post'] - df['pre']
    #print(df.iloc[0])

main = pd.DataFrame()
main['pre_mean'] = pre_df.mean(axis=1)
main['pre_max'] = pre_df.max(axis=1)
main['post_mean'] = post_df.mean(axis=1)
main['post_max'] = post_df.max(axis=1)
main['delta_mean'] = delta_df.mean(axis=1)
main.sort_values(by='delta_mean', inplace=True, ascending=False)
main


Unnamed: 0,pre_mean,pre_max,post_mean,post_max,delta_mean
Accelerometer,0.029915,0.230769,0.313568,0.769231,0.283654
Beschleunigung,0.047009,0.179487,0.208689,0.619658,0.161681
Sensor,0.012821,0.115385,0.155983,0.519231,0.143162
Gravitation,0.070513,0.192308,0.170228,0.423077,0.099715
Microbit,0.053419,0.25641,0.149038,0.653846,0.09562
Motor,0.04594,0.346154,0.13141,0.448718,0.08547
Ausrichtung,0.0,0.0,0.063034,0.602564,0.063034
Erdanziehung,0.011752,0.102564,0.069801,0.525641,0.058048
Aktor,0.0,0.0,0.042557,0.294872,0.042557
Roboter,0.060897,0.269231,0.097756,0.410256,0.036859
