# End-to-End Demo
## Running Pagerank on Wikipedia With vs. Without `nx-cugraph`

This notebook demonstrates a zero code change, end-to-end workflow using `cudf.pandas` and `nx-cugraph`.

Please see the [System Requirements](https://docs.rapids.ai/api/cugraph/stable/nx_cugraph/installation/#system-requirements) in order to run this notebook.

In [1]:
# Uncomment these two lines to enable GPU acceleration
# The rest of the code stays the same!

%load_ext cudf.pandas
%env NX_CUGRAPH_AUTOCONFIG=True

env: NX_CUGRAPH_AUTOCONFIG=True


In [2]:
import pandas as pd
import networkx as nx

Downloading the data

In [3]:
import gzip
import shutil
import urllib.request
from pathlib import Path

# Get the data
def download_datafile(url, file_path):
    compressed_path = file_path + ".gz"

    if not Path(file_path).exists():
        print(f"File not found. Downloading from {url}...")
        urllib.request.urlretrieve(url, compressed_path)

        print(f"\tDownloaded to {compressed_path}. Unzipping...")
        with gzip.open(compressed_path, 'rb') as f_in, open(file_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

        print("Done.")
    else:
        print(f"File already exists at {file_path}. Skipping download")

In [4]:
import os 
import urllib.request
top_dir = "./data/"
if not os.path.exists(top_dir):
    print('creating data directory')
    os.system('mkdir ./data')
data_dir = './data/wiki/'
if not os.path.exists(data_dir):
    print('creating wiki data directory')
    os.system('mkdir ./data/wiki/')

creating wiki data directory


In [5]:
nodedata_url="https://data.rapids.ai/cugraph/benchmark/enwiki-20240620-nodeids.csv.gz"
nodedata_path = data_dir+"enwiki-20240620-nodeids.csv"
download_datafile(nodedata_url, nodedata_path)

edgelist_url="https://data.rapids.ai/cugraph/benchmark/enwiki-20240620-edges.csv.gz"
edgelist_path = data_dir+"enwiki-20240620-edges.csv"
download_datafile(edgelist_url, edgelist_path)

File not found. Downloading from https://data.rapids.ai/cugraph/benchmark/enwiki-20240620-nodeids.csv.gz...
	Downloaded to ./data/wiki/enwiki-20240620-nodeids.csv.gz. Unzipping...
Done.
File not found. Downloading from https://data.rapids.ai/cugraph/benchmark/enwiki-20240620-edges.csv.gz...
	Downloaded to ./data/wiki/enwiki-20240620-edges.csv.gz. Unzipping...
Done.


The dataset used in this script falls under the Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License, available at https://creativecommons.org/licenses/by-sa/4.0/legalcode.en

Timed end-to-end code

In [6]:
%%time

# Read the Wikipedia Connectivity data from `edgelist_path`
edgelist_df = pd.read_csv(
    edgelist_path,
    sep=" ",
    names=["src", "dst"],
    dtype="int32",
)

CPU times: user 3.17 s, sys: 1.8 s, total: 4.97 s
Wall time: 2.32 s


In [7]:
%%time

# Read the Wikipedia Page metadata from `nodedata_path`
nodedata_df = pd.read_csv(
    nodedata_path,
    sep="\t",
    names=["nodeid", "title"],
    dtype={"nodeid": "int32", "title": "str"},
)

CPU times: user 471 ms, sys: 314 ms, total: 785 ms
Wall time: 293 ms


In [8]:
%%time

# Create a NetworkX graph from the connectivity info
G = nx.from_pandas_edgelist(
    edgelist_df,
    source="src",
    target="dst",
    create_using=nx.DiGraph,
)

CPU times: user 6.25 s, sys: 3.2 s, total: 9.45 s
Wall time: 9.44 s


In [9]:
%%time

# Run pagerank on NetworkX
nx_pr_vals = nx.pagerank(G)

CPU times: user 5.15 s, sys: 1.81 s, total: 6.95 s
Wall time: 6.94 s


In [10]:
%%time

# Create a DataFrame containing the results
pagerank_df = pd.DataFrame({
    "nodeid": nx_pr_vals.keys(),
    "pagerank": nx_pr_vals.values()
})

CPU times: user 3.65 s, sys: 436 ms, total: 4.09 s
Wall time: 4.09 s


In [11]:
%%time
# Add NetworkX results to `nodedata` as new columns
nodedata_df = nodedata_df.merge(pagerank_df, how="left", on="nodeid")

# Here the top 25 pages based on pagerank value
nodedata_df.sort_values(by="pagerank", ascending=False).head(25)

CPU times: user 1min 11s, sys: 10.6 s, total: 1min 22s
Wall time: 1min 22s


Unnamed: 0,nodeid,title,pagerank
5387,5993,"""'Category:Living people'""",0.001056
11549889,14054455,"""'en:User:COIBot#Blacklist'""",0.000815
11549893,14054459,"""'en:User:COIBot#Monitor list'""",0.00073
2757876,3424753,"""'Wikipedia:Deletion review'""",0.000707
11549892,14054458,"""'en:User:COIBot#Whitelist'""",0.000695
11549891,14054457,"""'en:User:COIBot#Monitorlist'""",0.000695
11549890,14054456,"""'en:User:COIBot#Domainredlist'""",0.000695
3848948,4812183,"""'Wikipedia:Articles for deletion/PAGENAME (2n...",0.000616
411173,490251,"""'Help:Talk pages'""",0.000336
70301,81293,"""'List of sovereign states'""",0.000319
