## Problem set

You will analyze a dataset representing payments between agents. Each row represents a payment with the following columns:

- **Sourceid:** ID of the agent initiating the payment.
- **Targetid:** ID of the recipient agent.
- **Weights:** The amount of money transferred (payment amount).

## **1. Preprocessing**

- Load the dataset into a pandas DataFrame and inspect the first few rows. Rename columns to source, target, and weight for consistency.
- Only keep edges where the weight is greater than or equal to 1000. This helps focus on significant payments.
- Using `NetworkX`, create a directed graph from the filtered data. Each row should represent an edge from source to target with a weight.
- Create an initial visualization of the full graph with spring_layout, where node sizes are proportional to their degree centrality. Set edge thickness proportional to the weight.


*Q1: Based on the visualization, do you observe any highly connected nodes? How would you interpret these nodes in the context of payment flows?*

*Q2: Are there any isolated clusters or communities that stand out visually? What might these clusters represent in real-world terms?*

## **2. Basic Analysis**

Compute the following properties of the graph:
- Number of nodes
- Number of edges
- Average degree 
- Create a histogram of node degrees (number of edges connected to each node). First calculate the degree (in-degree + out-degree) for each node, then plot the histogram.
- Calculate the density of the graph.
- Calculate the reciprocity, which measures the proportion of edges that are bidirectional (if an edge exists from node A to node B, does one also exist from B to A?). Hint: Use `nx.reciprocity()` for directed graphs.
- Find nodes that have no incoming edges (in-degree = 0) and those with no outgoing edges (out-degree = 0). Print their IDs and counts.

*Q3: What does the average degree tell you about the structure of the network?*

*Q4: If a node has a high degree, what might that indicate about its role in this payment network?*

*Q5: How might having a few high-degree nodes affect the network's resilience? Would the network be more vulnerable if these high-degree nodes were removed?*

*Q6: Is there a high level of reciprocity? How does this relate to the nature of payment flows between nodes?*

*Q7: If many nodes have no incoming or outgoing edges, what might that suggest about the connectivity of the network?*

## 3. Weighted Analysis

- Calculate the in-strength and out-strength of each node, and find the top 8 nodes with the highest total strength
- Calculate the average in-strength and average out-strength of the nodes.
- Calculate the standard deviation of the edge weights. A higher standard deviation indicates more variability in payment amounts. Comment on whether the payments are relatively uniform or highly variable.
- Select the top 10 nodes by out-strength and visualize them as a subgraph. Color these nodes differently and show the edge weights.

*Q8: Nodes with high in-strength or out-strength represent significant "payment hubs" or "payment distributors." Can you identify these nodes? What real-world roles might they play?*

## 4. Centrality Measures

**Alternative Centrality Measures**

Calculate the following centrality measures and list the top 5 nodes for each:
- *Degree centrality*
- *Closeness Centrality*
- *Eigenvector Centrality*: shows nodes that are connected to other important nodes.
- *Load Centrality*: Measures the fraction of all shortest paths passing through a node, emphasizing the node’s role in connecting different parts of the network.

Create a scatter plot comparing degree centrality and eigenvector centrality for each node.
Highlight the top 5 nodes for each centrality measure and label them.

*Q9: Which centrality measure do you think best reflects the "influence" of a node in a payment network? Why?*

*Q10: How does the correlation between degree centrality and eigenvector centrality help you understand the network's structure?*

## 5. Community and Core-Periphery Structure

- Convert the directed graph to an undirected graph and apply the Louvain method for community detection. List the number of communities detected and the size of each.

Hint: You can use `community_louvain.best_partition()` from the `python-louvain` package.

- Find the k-core of the network for k=2 and determine the core and periphery nodes.
- Calculate the average clustering coefficient for the core nodes and compare it with that of the periphery nodes.
- Find the node with the maximum clustering coefficient.

Hint: Use `nx.k_core()` and `nx.clustering()`.

## 6. Bonus Questions 

**Analyze Network Robustness**

- Remove the top 5 nodes by betweenness centrality and analyze how the graph’s structure changes.
- Calculate the new number of nodes and edges, as well as the network density, after these nodes are removed.
- look at the size of the new SCC and compare it with the old one.

*Q11: After removing the top 5 nodes by betweenness centrality, how has the network changed in terms of node count, edge count, and density?*

*Q12: How did the SCC changed? What is your interpretation?*