Skip to content
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Python Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin make package Nov 17, 2019
src make package Nov 17, 2019
.gitignore Initial commit Oct 16, 2018
Dockerfile
LICENSE
README.md
setup.py change to 2.1 Nov 17, 2019

README.md

DAGs with NO TEARS 🚫💧

This is an implementation of the following paper:

DAGs with NO TEARS: Continuous Optimization for Structure Learning (NeurIPS 2018, Spotlight)

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric Xing.

If you find it useful, please consider citing:

@inproceedings{zheng2018dags,
    author = {Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P.},
    booktitle = {Advances in Neural Information Processing Systems},
    title = {{DAGs with NO TEARS: Continuous Optimization for Structure Learning}},
    year = {2018}
}

tl;dr Structure learning in <60 lines

Check out notears.py for a complete, end-to-end implementation of the NOTEARS algorithm in fewer than 60 lines.

This includes L2, Logistic, and Poisson loss functions with L1 penalty.

Introduction

A directed acyclic graphical model (aka Bayesian network) with d nodes defines a distribution of random vector of size d. We are interested in the Bayesian Network Structure Learning (BNSL) problem: given n samples from such distribution, how to estimate the graph G?

A major challenge of BNSL is enforcing the directed acyclic graph (DAG) constraint, which is combinatorial. While existing approaches rely on local heuristics, we introduce a fundamentally different strategy: we formulate it as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely. In other words,

characterization

where h is a smooth function whose level set exactly characterizes the space of DAGs.

Requirements

  • Python 3.6+
  • numpy
  • scipy
  • python-igraph: Install igraph C core and pkg-config first.

Contents (New version)

  • notears.py - the 60-line implementation of NOTEARS with l1 regularization for various losses
  • utils.py - graph simulation, data simulation, and accuracy evaluation

Running a simple demo

The simplest way to try out NOTEARS is to run a simple example:

$ git clone https://github.com/xunzheng/notears.git
$ cd notears/
$ python src/notears.py

This runs the l1-regularized NOTEARS on a randomly generated 20-node Erdos-Renyi graph with 100 samples. Within a few seconds, you should see output like this:

{'fdr': 0.0, 'tpr': 1.0, 'fpr': 0.0, 'shd': 0, 'nnz': 20}

The data, ground truth graph, and the estimate will be stored in X.csv, W_true.csv, and W_est.csv.

Running as a command

Alternatively, if you have a CSV data file X.csv, you can install the package and run the algorithm as a command:

$ pip install git+git://github.com/xunzheng/notears
$ notears_linear_l1 X.csv

The output graph will be stored in W_est.csv.

Examples: Erdos-Renyi graph

  • Ground truth: d = 20 nodes, 2d = 40 expected edges.

    ER2_W_true
  • Estimate with n = 1000 samples: lambda = 0, lambda = 0.1, and FGS (baseline).

    ER2_W_est_n1000

    Both lambda = 0 and lambda = 0.1 are close to the ground truth graph when n is large.

  • Estimate with n = 20 samples: lambda = 0, lambda = 0.1, and FGS (baseline).

    ER2_W_est_n20

    When n is small, lambda = 0 perform worse while lambda = 0.1 remains accurate, showing the advantage of L1-regularization.

Examples: Scale-free graph

  • Ground truth: d = 20 nodes, 4d = 80 expected edges.

    SF4_W_true

    The degree distribution is significantly different from the Erdos-Renyi graph. One nice property of our method is that it is agnostic about the graph structure.

  • Estimate with n = 1000 samples: lambda = 0, lambda = 0.1, and FGS (baseline).

    SF4_W_est_n1000

    The observation is similar to Erdos-Renyi graph: both lambda = 0 and lambda = 0.1 accurately estimates the ground truth when n is large.

  • Estimate with n = 20 samples: lambda = 0, lambda = 0.1, and FGS (baseline).

    SF4_W_est_n20

    Similarly, lambda = 0 suffers from small n while lambda = 0.1 remains accurate, showing the advantage of L1-regularization.

Other implementations

You can’t perform that action at this time.