Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
  • Loading branch information
xianggebenben committed Apr 5, 2024
1 parent 107ac39 commit 057e3e2
Showing 1 changed file with 2 additions and 13 deletions.
15 changes: 2 additions & 13 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,21 +34,10 @@ We present GraphSL, a novel library designed for investigating the graph source

![An example of graph source localization.\label{fig:example}](SL_example.png)

Graph diffusion is a fundamental task in graph learning, which aims to predict future graph cascade patterns given source nodes. Conversely, its inverse problem, graph source localization, though rarely explored, stands as an extremely important topic: it focuses on the detection of source nodes given their future graph cascade patterns. As illustrated in \autoref{fig:example}, graph diffusion seeks to predict the cascade pattern $\{b,c,d,e\}$ from a source node $b$, whereas graph source localization aims to identify source nodes $b$ from the cascade pattern $\{b,c,d,e\}$. Graph source localization spans a broad spectrum of promising research and real-world applications. For instance, online social media platforms like Twitter and Facebook have been instrumental in disseminating rumors and misinformation with significant repercussions [@evanega2020coronavirus]. Additionally, the rapid propagation of computer viruses across the Internet, infecting millions of computers, underscores the critical need for tracking their sources [@kephart1993measuring]. Moreover, in smart grids, where isolated failures can trigger rolling blackouts leading to substantial financial losses [@amin2007preventing], graph source localization plays a pivotal role. Hence, the graph source localization problem demands attention and extensive investigations from machine learning researchers.
Graph diffusion is a fundamental task in graph learning, which aims to predict future graph cascade patterns given source nodes. Conversely, its inverse problem, graph source localization, though rarely explored, stands as an extremely important topic: it focuses on the detection of source nodes given their future graph cascade patterns. As illustrated in \autoref{fig:example}, graph diffusion seeks to predict the cascade pattern $\{b,c,d,e\}$ from a source node $b$, whereas graph source localization aims to identify source nodes $b$ from the cascade pattern $\{b,c,d,e\}$. Graph source localization spans a broad spectrum of promising research and real-world applications such as rumor detection [@evanega2020coronavirus], tracking of sources for computer viruses, [@kephart1993measuring], and failures detection in smart grids [@amin2007preventing]. Hence, the graph source localization problem demands attention and extensive investigations from machine learning researchers.

Some open-source tools have been developed to support the research of the graph source localization problem due to its importance. Two recent examples are cosasi [@McCabe2022joss] and RPaSDT [@frkaszczak2022rpasdt]. However, they missed comprehensive simulations of information diffusion, real-world benchmark datasets, and up-to-date state-of-the-art source localization approaches. To fill this gap, we propose a new library GraphSL: the first one to include both real-world benchmark datasets and recent source localization methods to our knowledge, which enables researchers and practitioners to easily evaluate novel techniques against appropriate baselines. These methods do not require prior knowledge (e.g. single source or multiple sources), and can handle graph source localization based on various diffusion simulation models such as Independent Cascade (IC) and Linear Threshold (LT). Our GraphSL library is standardized: for instance, tests of all source inference methods return a Metric object, which provides five performance metrics (accuracy, precision, recall, F-score, and area under ROC curve) for performance evaluation.


# Problem Definition
Consider a graph $G=(V,E)$, where $V=\{v_1,\cdots,v_n\}$ and $E$ are the node set and the edge set respectively, $\vert V\vert=n$ is the number of nodes.
$Y_t\in \{0,1\}^{n}$ is a diffusion vector at time $t$. $Y_{t,i}=1$ means that node $i$ is diffused, while $Y_{t,i}=0$ means that node $i$ is not diffused.
$S$ is a set of source nodes. $x\in \{0,1\}^n$ is a source vector, $x_i=1$ if $v_i\in S$ and $x_i=0$ otherwise.
The diffusion process begins at timestamp 0 and terminates at timestamp $T$. The graph diffusion model is denoted as $\theta$, and its inverse problem,
graph source localization, is to infer $x$ from $Y_{T}$:
\begin{align}
\theta^{-1}: Y_T \rightarrow x. \label{eq:source localization}
\end{align}

# Methods and Benchmark Datasets

![The hierarchical structure of our GraphSL library version 0.1.\label{fig:overview}](overview.png)
Expand All @@ -72,7 +61,7 @@ GNN-based methods learn rules from graph data in an end-to-end manner by capturi

Table: \label{tab:statistics} The statistics of eight datasets.

Aside from methods, we also provide eight benchmark datasets to facilitate the research of graph source localization, whose statistics are shown in \autoref{tab:statistics}. Memetracker and Digg provide Seed-Diffusion vector pairs $(x,Y_{T})$. While others do not have such pairs, they can be generated by information diffusion simulations.
Aside from methods, we also provide eight benchmark datasets to facilitate the research of graph source localization, whose statistics are shown in \autoref{tab:statistics}. Memetracker and Digg provide Seed-Diffusion vector pairs. While others do not have such pairs, they can be generated by information diffusion simulations.

# Availability and Documentation

Expand Down

0 comments on commit 057e3e2

Please sign in to comment.