A Python package for Recentered Influence Function (RIF) regression analysis. Provides tools for analyzing distributional effects in econometrics and data science applications.
You can install the package using pip:
pip install pyrifreg- Implementation of Recentered Influence Function (RIF) regression
- Support for various distributional statistics (mean, quantiles, variance, gini, etc.)
- Easy-to-use API for regression analysis
- Integration with
pandasandscikit-learn
import numpy as np
import pandas as pd
from pyrifreg import RIFRegression
# Create sample data
X = np.random.randn(1000, 2)
y = np.random.randn(1000)
# Initialize and fit RIF regression
median_rif = RIFRegression(statistic='quantile', q=0.5)
median_rif.fit(X, y)
# Get regression results
results = median_rif.summary()
print(results)You can find more examples in example.py.
You can find detailed usage examples in the examples/ directory.
Many regression models focus on conditional statistics like:
or conditional quantiles
But policy questions often require understanding how a variable like education or income influences the entire distribution of an outcome, not just its mean or conditional parts. For example:
- How would expanding access to college change the 90th percentile of the wage distribution?
- What is the effect of a tax policy on income inequality or the Gini index?
Instead of looking at changes within subgroups (conditional on
Let
i.e., how that statistic changes when the distribution shifts. RIF regression provides a way to estimate how different variables contribute to such shifts.
The influence function measures how sensitive a statistic is to a small change in the data. More precisely, it tells us how much an individual observation
Formally, imagine a slightly perturbed distribution:
where
This gives us a first-order approximation of how
Because the average of the influence function is always zero, we can’t use it directly in a regression. To fix this, we “recenter” it by adding the original statistic back:
Now, the expected value of the RIF is equal to the statistic itself:
This makes it a useful outcome variable for regression, allowing us to relate changes in the statistic
RIF regression works in two main steps:
-
Estimate the target statistic
$T(F)$ (e.g. median or Gini) and compute the influence value for each observation. -
Construct the RIF pseudo-outcome for each data point and regress it on
$X$ using linear regression:$$ r_i = x_i^\top \beta + \varepsilon_i. $$
The regression coefficients
UQR is a special case of RIF regression, where the statistic of interest is an unconditional quantile
where
This is in contrast to conditional quantile regression (Koenker & Bassett, 1978), which examines changes in
Since RIFs are estimated in a first step before regression, the usual OLS standard errors are biased. To correct this, inference proceeds in two stages:
- Estimate the statistic
$T$ , the influence function, and any needed density estimates. - Run the regression and compute corrected standard errors using bootstrap.
The package includes support for bootstrap inference out of the box.
- Firpo, S., Fortin, N. M., & Lemieux, T. (2009). Unconditional Quantile Regressions. Econometrica, 77(3), 953–973.
- Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.
- Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata: RIF regression and RIF decomposition. The Stata Journal, 20(1), 51-94.
This project is licensed under the MIT License - see the LICENSE file for details.
To cite this package in publications, please use the following BibTeX entry:
@misc{yasenov2025pyrifreg,
author = {Vasco Yasenov},
title = {pyrifreg: Python Tools for Recentered Influence Function (RIF) Regression},
year = {2025},
howpublished = {\url{https://github.com/vyasenov/pyrifreg}},
note = {Version 0.1.0}
}