# Social Computing/Social Gaming - Summer 2022
# Exercise Sheet 1 - Introduction to Python
Welcome to the 2022 Social Computing / Social Gaming tutorial assignments. For all exercise sheets Python is the programming language of choice. This exercise sheet will therefor provide an introduction to Python for you. In the latter part of this exercise sheet you will take on your first Social Computing task.

In addition to the exercise sheet iPython notebooks it is essential for you to have a look at the introduction videos and/or the introduction slides, both provided on Moodle, for every exercise sheet as they contain **helpful hints** and the **form of the assignment** which is **mandatory**!

## Task 1.1: Largest palindrome product
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99

Find the largest palindrome made from the product of two 3-digit numbers.

**a)** Create the function findPalindrome(N), which returns 1 if N is a Palindrome or 0 if it isn't.

**Hints:**

- In order to execute a code cell, press Shift + Enter.

In [None]:
# TODO:
def findPalindrome(N):



**b)** Now create the function maxPalindrome(), which computes the largest palindrome made from the product of two 3-digit numbers

In [None]:
# TODO:
def maxPalindrome():


print(maxPalindrome())

For this task problem 4 of [ProjectEuler.net](https://projecteuler.net/about) [1] was used, if you have fun solving this kind of mathematical riddles please check them out.

## Task 1.2 : The Simpsons are introducing Social Computing
In social computing research, we need powerful tools to create, manipulate and display graphs. Luckily, there is a plethora of tools and libaries for that. 
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Especially for the second exercise sheet, we are going to use [**NetworkX**](https://networkx.github.io) [2]. It provides rich graph data structures and many out-of-the-box functions to process graphs and calculate different metrics. The tasks below should make you familiar with the library.<br>
Please consult the [reference](https://networkx.github.io/documentation/stable/reference/index.html) [3] and the [tutorial](https://networkx.github.io/documentation/stable/tutorial.html) [4].

To give you a short background on graph visualization, it is the research area in mathematics/computer science concerned with drawing graphs. It has applications in many fields, one of them is social computing. The quality of graph visualization is measured based on certain criteria, for example crossing minimization and bend minimization. There are many graph drawing algorithms that vary in their quality according to the graph's application and size. One technique is to draw graphs by using physical analogies.<br>
The basic idea of this technique is to associate edges between graph nodes with physical forces acting upon the nodes and computing an energy minimum. By setting off the dynamics that is induced by the forces, the graph will finally settle into a natural optimal display. A famous algorithm that implements this technique is the **Fruchtermann-Rheingold** algorithm. Its basic idea is to replace the graph edges with mechanical springs, and let the springs move the system to a minimal energy state.


Furthermore, you will be working with a large datasets, you need to save the variables in a suitable data format. In our case, the pandas library is a good choice. 
A pandas DataFrame is a 2D tabular structure, not unlike a SQL table.
Pandas DataFrame consists of rows, columns and data.
For more information on the library, see the [pandas manual](https://pandas.pydata.org/pandas-docs/version/0.19.2) [5] and its [tutorial](https://pandas.pydata.org/pandas-docs/version/0.19.2/10min.html) [6] to pandas dataframes.

In this exercise you will analyze a dataset about the TV show "The Simpsons".

First import the datasets consisting ``nodes.csv``, ``edges.csv`` and ``ep-char.csv``.

- **nodes.csv**: each vertex represents a character
- **edges.csv**: edges between the source character and target character. Represents an undirected graph showing the characters which appeared together in an episode
- **ep-char.csv**: shows which character has appeared in which episode


**HINT**: 
- For most TODOs it is sufficient to look at the pandas manual and use pandas library functions  
- You can get a better overview of the dataframe by printing it

**Import** the necessary libraries for this exercise.

In [None]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

**Read** the csv files into pandas dataframes.

In [None]:
# Read the csv into pandas DataFrames
df_edges = pd.read_csv("simpsons/edges.csv")
df_nodes = pd.read_csv("simpsons/nodes.csv")
df_epchar = pd.read_csv("simpsons/ep-char.csv")

# 226 is the number of the last episode in season 10.
HIGHEST_EPISODE = 226

**a)** Your first task is to **drop unwanted rows** in the episodes dataframe. We are only interested in Seasons 1-10.   

**Hint:** Unwanted rows are rows which have an ``episode_id`` higher than ``HIGHEST_EPISODE``.   
**Note:** Please note that this operation will only delete the rows without changing the weights of the characters. Do not worry about this.

In [None]:
# TODO: drop rows of the df_epchar DataFrame.



# Delete these row indices from the dataframe
df_epchar.drop(indexNames, inplace=True)

**b)** Now you can **merge** the DataFrames together to link the required information. This is not unlike the join operation in SQL.

Since we are only interested in characters from the first 10 seasons, create a DataFrame ``df_merged`` which only contains characters from the first 226 episodes.

In [None]:
# TODO:
df_merged = 


# df_merged now consists of the characters which appear only in the first 10 seasons
df_merged.drop(['episode_id', 'character_id'], axis=1, inplace=True)

# TODO: now we have unnecessary information, drop the duplicates.


df_merged

**c)** Now use the DataFrame of limited characters and **merge** them with the edges

In [None]:
# TODO:
# Hint: Use a left join, left_on='Id', right_on='Source'
df_merged2 =


# Drop Type, as it is not that interesting
df_merged2 = df_merged2.drop(['Type'], axis=1)
df_merged2

**d)** Now we are only interested in **characters who have appeared at least 20 times together. Select those.**

In [None]:
# TODO: drop rows of the df_episodes DataFrame.


# Delete these row indices from the dataframe
df_merged2.drop(indexNames, inplace=True)
df_merged2

**e)** Now you have to **include your alter ego into the network**. Create a pandas Series with your name, your Id (which is 1337) and weights. Connect yourself to Bart Simpson. 

In [None]:
# TODO:
# Create a series for your character who is connected to homer 234 times and add it to the dataframe


# TODO: 
# Append the newly created series to the pandas data frame


# Create the graph from the dataframe
graph = nx.from_pandas_edgelist(df_merged2, source="Id", target="Target", edge_attr=True)

**f)** **Draw** the resulting graph with the given options. Choose 2 [layout](https://networkx.github.io/documentation/stable/reference/drawing.html) [7] options that seem the most suitable for the data. Briefly discuss why you chose these over the others.

In [None]:
# Relabel the graph
df_nodes_labels_dict = df_nodes.set_index('Id').to_dict()['charname']
graph = nx.relabel_nodes(graph, df_nodes_labels_dict)

# Set the edge color according to the weight
edges,weights = zip(*nx.get_edge_attributes(graph,'Weight').items())

# Style the graph
options = {
    "font_size" : 14,
    "font_color" : '#552222',
    "node_color" : '#22FF22',
    "width" : 5.0,
    "edgelist" : edges,
    "edge_color" : weights,
    "edge_cmap" : plt.cm.Blues
}

plt.figure(1,figsize=(40,40)) 

# TODO: plot the graph



**TODO: Write your observations here:**

## References

[1] https://projecteuler.net/about
<br>[2] https://networkx.github.io
<br>[3] https://networkx.github.io/documentation/stable/reference/index.html
<br>[4] https://networkx.github.io/documentation/stable/tutorial.html
<br>[5] https://pandas.pydata.org/pandas-docs/version/0.19.2
<br>[6] https://pandas.pydata.org/pandas-docs/version/0.19.2/10min.html
<br>[7] https://networkx.github.io/documentation/stable/reference/drawing.html