Python package for estimating `idLogit` models
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
idlogit
.gitignore
LICENSE
LICENSE.txt
README.md
setup.cfg
setup.py

README.md

idlogit (CURRENTLY IN DEVELOPMENT)

idlogit is a python "package" for estimating "idLogit" models, or Logit models with Idiosyncratic Deviations. The idLogit is a non-parametric model of choice heterogeneity with a convex maximum likelihood estimation problem.

See this article for methodological details.

Installing

As usual, do pip install idlogit. This package requires numpy, scipy, and ecos.

Using idlogit

Basic Syntax

The most basic call is

x , info = idlogit( K , I , N , y , X , ind )

where

  • K (integer) is the number of model features
  • I (integer) is the number of individuals in the observations
  • N (integer) is the number of observations
  • y (numpy.array) is a N-vector of (binary) choices, coded as +/- 1
  • X (numpy.array or scipy.sparse) is a NxK-matrix of observation-specific features (dense or sparse)
  • ind (list or numpy.array) is a N-vector of observation-individual assignments in {1,...,I}

and

  • x (numpy.array) is a K-vector of estimated coefficients
  • info (numpy.array) is the ECOS information structure resulting from the solve attempt

There are, of course, options we cover below. If a sparse X matrix is passed, it is internally transformed into a scipy.sparse.coo_matrix before use. If a dense X matrix is passed, it is not processed as a sparse matrix; that is to say, idlogit presumes all of X's entries are nonzero. If this is not the case (for example, you have hard-coded dummies in the data) using a sparse matrix may be much better.

Options

Options that can currently be passed:

  • constant (boolean) Include a constant in the model, or not. The returned x will be K+1 if True, with the first element being the estimated parameter corresponding to the constant.
  • outopt (boolean) Is there an "outside good", "outside option", or no-choice option?
  • Lambdas (list) A 2-element list of L1 and L2 penalty parameter values (respectively).
  • bin (list) A list of indices from 1,...,K that identify which variables in X are binary (0/1). Indices must be mutually exclusive with cat. Binary variables are encoded with a single dummy equal to 1 for any "truthy" value in X. Variables not in bin or cat are interpreted as numerical and not transformed.
  • cat (list) A list of indices from 1,...,K that identify which are categorical (finite, with level-specific coefficients). Indices must be mutually exclusive with bin. Categorical variables are analyzed for their cardinality and subsequently "expanded" into level-dummies whose coefficients are constrained to sum to zero for identification. Variables not in bin or cat are interpreted as numerical and not transformed.
  • prints (dict) A dictionary of prints of the ECOS data created (for debugging, really). Valid keys are start, costs, lineq, lerhs, cones, ccrhs, and valid values are booleans (or anything "truthy").

as well as any options for ecos-python passed directly to ECOS as **kwargs.

Detailed Description

This code solves problems of the general form

min 1/N sum_n log( 1 + exp{ -y_n x_n'( b + d_{i(n)} ) } ) + L1/N || d ||_1 + L2/2N || d ||_2
wrt b , d_1 , ... , d_I in Real(K)
sto d_1 + ... + d_I = 0

The solve is done by transforming this problem into an equivalent Exponential Cone Programming problem that can be passed to the ECOS solver.

Contact

W. Ross Morrow