# Advanced Options and Performance

This notebook covers advanced options (node attributes, custom constraints) and performance benchmarking context.


In [None]:
import random

import contextily as cx
import geodatasets
import geopandas as gpd
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import matplotlib_inline
import networkx as nx
from IPython.display import Image
import numpy as np

from pysgn import (
    geo_barabasi_albert_network,
    geo_watts_strogatz_network,
)

matplotlib_inline.backend_inline.set_matplotlib_formats("svg")


We will reuse the same grocery-store dataset from Getting Started so this notebook can run independently.


In [None]:
gdf = (
    gpd.read_file(geodatasets.get_path("geoda.groceries"))
    .explode(index_parts=False)
    .reset_index(drop=True)
    .to_crs("EPSG:26971")
)

_, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, markersize=6, color="#2b8cbe", alpha=0.8)
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, crs=gdf.crs)
ax.set_title("Grocery store locations (geodatasets)")
ax.set_axis_off()

# More Options

## Including Node Attributes

When generating a network, you can choose to include node attributes from your GeoDataFrame. This is done using the `node_attributes` parameter in PySGN functions. You can specify:

- `True`: To include all variables from the GeoDataFrame as node attributes. This is the default behavior.
- A string or list of strings: To include only specific attributes by their column names.
- `False`: Only the position of the nodes will be saved as a `pos` attribute in the network.

Including node attributes allows you to leverage additional data in your network analysis, such as demographic information, geographic features, or any other relevant metadata.

We can keep using the grocery stores: assign each store to a synthetic "Group A/B" and then visualize how that attribute flows through a generated network.

In [None]:
gdf["group"] = random.choices(["Group A", "Group B"], k=len(gdf))
gdf[["group", "geometry"]].head()

By default, PySGN functions keep all variables from the GeoDataFrame as node attributes:

In [None]:
graph = geo_watts_strogatz_network(gdf, k=4, p=0.3, random_state=42)

groups = nx.get_node_attributes(graph, "group")
colors = {"Group A": "tab:blue", "Group B": "tab:orange"}
node_colors = [colors[groups[node]] for node in graph.nodes]

fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, markersize=8, color="#d9d9d9", alpha=0.4)
nx.draw(
    graph,
    pos=nx.get_node_attributes(graph, "pos"),
    node_color=node_colors,
    edge_color="#636363",
    node_size=20,
    width=0.5,
    ax=ax,
)
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, crs=gdf.crs)
legend_handles = [
    mpatches.Patch(color=color, label=group) for group, color in colors.items()
]
ax.legend(
    handles=legend_handles,
    title="Groups",
    loc="upper right",
    bbox_to_anchor=[1.5, 1.0],
)
ax.set_title("Watts-Strogatz network with group attribute")
ax.set_axis_off()


You may turn this off by setting `node_attributes=False`, or providing the column names that should be included in the network:

```python
# saving only a `pos` node attribute
geo_watts_strogatz_network(gdf, k=4, p=0.3, node_attributes=False)

# each node has `pos` and `group` attributes
geo_watts_strogatz_network(gdf, k=4, p=0.3, node_attributes="group")

# each node has `pos`, `group`, and `some_other_col` attributes
geo_watts_strogatz_network(gdf, k=4, p=0.3, node_attributes=["group", "some_other_col"])
```

This works with the `geo_watts_strogatz_network` and `geo_barabasi_albert_network` methods too.


## Custom Constraints

Custom constraints allow you to impose specific rules on the connections between nodes in your network. This can be useful for modeling real-world scenarios where certain connections are only possible under specific conditions.

In PySGN, you can define a custom constraint function that takes two nodes as input and returns a boolean value indicating whether an edge should be created between them. This function can be passed to the network generation method using the `constraint` parameter.

We'll continue with the grocery stores and the synthetic `group` attribute - first restricting edges to stores sharing the same group, then enforcing cross-group connections only.

In [None]:
graph = geo_watts_strogatz_network(
    gdf, k=4, p=0.3, constraint=lambda u, v: u.group == v.group
)

groups = nx.get_node_attributes(graph, "group")
colors = {"Group A": "tab:blue", "Group B": "tab:orange"}
node_colors = [colors[groups[node]] for node in graph.nodes]
edge_colors = [colors[groups[edge[0]]] for edge in graph.edges]

fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, markersize=6, color="#bdbdbd", alpha=0.5)
nx.draw(
    graph,
    pos=nx.get_node_attributes(graph, "pos"),
    node_color=node_colors,
    edge_color=edge_colors,
    node_size=20,
    width=0.4,
    ax=ax,
)
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, crs=gdf.crs)
legend_handles = [
    mpatches.Patch(color=color, label=group) for group, color in colors.items()
]
ax.legend(
    handles=legend_handles, title="Groups", loc="upper right", bbox_to_anchor=[1.5, 1.0]
)
ax.set_title("Same-group constraint (Watts-Strogatz)")
ax.set_axis_off()

Conversely, if you want nodes to be connected to nodes of different group only:

In [None]:
graph = geo_watts_strogatz_network(
    gdf, k=4, p=0.3, constraint=lambda u, v: u.group != v.group
)

groups = nx.get_node_attributes(graph, "group")
colors = {"Group A": "tab:blue", "Group B": "tab:orange"}
node_colors = [colors[groups[node]] for node in graph.nodes]

fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax, markersize=6, color="#bdbdbd", alpha=0.5)
nx.draw(
    graph,
    pos=nx.get_node_attributes(graph, "pos"),
    node_color=node_colors,
    edge_color="#636363",
    node_size=20,
    width=0.4,
    ax=ax,
)
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, crs=gdf.crs)
legend_handles = [
    mpatches.Patch(color=color, label=group) for group, color in colors.items()
]
ax.legend(
    handles=legend_handles, title="Groups", loc="upper right", bbox_to_anchor=[1.5, 1.0]
)
ax.set_title("Cross-group constraint (Watts-Strogatz)")
ax.set_axis_off()

# Performance

We measure and visualize the time and space complexities of our synthetic network models as they scale with different numbers of nodes. This helps understand how computational resources scale with network size.

Note that performance can vary significantly depending on the specific parameters (e.g., distance decay exponent, scaling factor, `k`, `m`, and so on) used in your network models. Here we set `m=10` for geospatial Barabási-Albert network and `k=10`, `p=0.1` for geospatial Watts-Strogatz network, with all other parameters using their default values.

**Hardware Specifications**

The results are based on the hardware this notebook is currently running on, with Python 3.13.2:

- CPU: 2.6 GHz 6-Core Intel Core i7
- Memory: 16 GB 2667 MHz DDR4
- Operating System: macOS Sequoia 15.3.2

In [None]:
Image("perf.png")


You can run the code below in a code cell to test the performance on your own hardware. This gives you a personalized estimate of how the algorithms will perform in your environment.

```python
import time
import tracemalloc

from matplotlib import ticker

num_points = 10000
x_coords = np.random.uniform(-5000, 5000, num_points)
y_coords = np.random.uniform(-5000, 5000, num_points)
points = [Point(x, y) for x, y in zip(x_coords, y_coords)]
points_gdf = gpd.GeoDataFrame(pd.DataFrame({"geometry": points}), crs="EPSG:3857")
gdf_list = [
    points_gdf.sample(n=n, replace=False, random_state=42)
    for n in [100, 1000, 2000, 5000, 8000, 10000]
]

perf = {
    "num_nodes": [],
    "time_s": [],
    "memory_mb": [],
    "model": [],
}
for gdf in gdf_list:
    for (
        model,
        func,
        params,
    ) in zip(
        ["Geo Barabási-Albert", "Geo Watts-Strogatz"],
        [
            geo_barabasi_albert_network,
            geo_watts_strogatz_network,
        ],
        [{"m": 10}, {"k": 10, "p": 0.1}],
    ):
        perf["num_nodes"].append(len(gdf))
        perf["model"].append(model)
        tracemalloc.start()
        start_time = time.perf_counter()
        _ = func(gdf, **params)
        elapsed = time.perf_counter() - start_time
        _, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        perf["time_s"].append(elapsed)
        perf["memory_mb"].append(peak / 1024**2)

perf_df = pd.DataFrame(perf)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
for color, model, marker in zip(
    ["tab:blue", "tab:orange", "tab:green"],
    ["Geo Barabási-Albert", "Geo Watts-Strogatz"],
    ["o", "*", "x"],
):
    ax[0].plot(
        perf_df.loc[perf_df["model"] == model, "num_nodes"],
        perf_df.loc[perf_df["model"] == model, "time_s"],
        color=color,
        marker=marker,
        label=model,
    )
    ax[0].grid(True, linestyle="--")
    ax[1].plot(
        perf_df.loc[perf_df["model"] == model, "num_nodes"],
        perf_df.loc[perf_df["model"] == model, "memory_mb"],
        color=color,
        marker=marker,
        label=model,
    )
    ax[1].grid(True, linestyle="--")

ax[0].set_xticks(perf_df.loc[perf_df["model"] == model, "num_nodes"])
ax[0].set_xlabel("number of nodes")
ax[0].set_ylabel("time (s)")
ax[0].set_title("Time Complexity")
ax[0].xaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

ax[1].set_xticks(perf_df.loc[perf_df["model"] == model, "num_nodes"])
ax[1].set_xlabel("number of nodes")
ax[1].set_ylabel("memory (MB)")
ax[1].set_title("Space Complexity")
ax[1].xaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))
lines, labels = ax[1].get_legend_handles_labels()
fig.legend(lines, labels, loc="upper left", bbox_to_anchor=(1.0, 0.94))
fig.tight_layout()
```


## Benchmark Context

- Environment: run in your local Python environment using the currently installed `pysgn` and dependency versions.
- Dataset size: 2,000 sampled points in a projected CRS (`EPSG:3857`).
- Model/parameters: `geo_watts_strogatz_network(gdf_bench, k=10, p=0.1, random_state=42)`.
- Note: elapsed time varies by hardware, BLAS backend, and package versions.


In [None]:
import time

gdf_bench = gdf.sample(n=2000, replace=False, random_state=42).copy()

start = time.perf_counter()
graph_bench = geo_watts_strogatz_network(
    gdf_bench,
    k=10,
    p=0.1,
    random_state=42,
)
elapsed = time.perf_counter() - start

print(f"Nodes: {graph_bench.number_of_nodes():,}")
print(f"Edges: {graph_bench.number_of_edges():,}")
print(f"Elapsed time (s): {elapsed:.4f}")


## See also

- Previous: [Utilities](utils.ipynb)
- Model tutorials: [Network Models: WS + BA](network_models_ws_ba.ipynb)
- Start here: [Getting Started](getting_started.ipynb)
