Testing Benford's Law on the Total Assets, Total Revenue, Total Income and Net Operating Cash Outflow of the Quaterly Financial Information of US firms within the period 1995 to 2014.

Writing a function to test Benford's Law using Chi Square Test

In [5]:
import numpy as np
import pandas as pd
np.random.seed(6011993)
from pandas import DataFrame,Series
def isBenford_chisq(x):
    """This function accepts a Series and tests if it follows the Benford's Law using Chi Square Statistics"""
    x = Series(np.array([i[0] for i in np.abs(x).astype('str')]).astype('int'))
    x = (x.value_counts()).sort_index()
    x = DataFrame(x,columns = ['Actual'])
    x['Expected'] = (np.log10(1+1.0/(x.index)))* (x['Actual'].sum())
    x['Chi Square'] = ((x['Actual'] - x['Expected'])**2)/(x['Expected'])
    actual = x['Chi Square'].sum()
    cutoff = 15.51

    return (actual,cutoff,actual <= cutoff,x)

Importing Data

In [6]:
companies = pd.read_csv("data.csv")
companies.describe()

Unnamed: 0,gvkey,fyearq,fqtr,actq,atq,ceqq,cshoq,dlttq,lctq,niq,revtq,cogsy,oancfy,prccq,gind,sic
count,364285.0,364285.0,364285.0,364285.0,364285.0,364285.0,364285.0,364285.0,364285.0,364285.0,364138.0,364285.0,364285.0,364285.0,363607.0,364285.0
mean,57532.219987,2004.859025,2.51805,922.566861,2950.512708,1176.425015,106.273979,638.92132,669.41725,36.744126,633.213795,1062.234601,184.225278,18.088612,294285.936932,4301.55361
std,59025.37437,5.310472,1.121935,4319.336357,13870.26801,6037.645978,376.368922,2918.590656,3541.313324,413.809237,3324.084826,6805.990633,1206.355715,44.190655,125432.65087,2151.034637
min,1004.0,1995.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,-66047.822,0.0,0.0,-6360.0,0.0001,101010.0,100.0
25%,11051.0,2001.0,2.0,17.532,37.238,20.708,11.685,0.0,6.705,-0.932,5.92725,7.376,-1.05,2.68,201040.0,2835.0
50%,28429.0,2005.0,3.0,81.124,184.759,98.274,28.464,6.082,34.312,0.759,40.292,49.577,3.275,9.6,255040.0,3714.0
75%,107027.0,2009.0,4.0,366.211,1013.658,456.227,70.606,194.93,188.393,10.46,229.4825,320.519,42.363,24.49,451010.0,5812.0
max,294524.0,2014.0,4.0,153774.0,410074.0,284434.0,21667.189,129025.0,145148.0,69758.268,207307.334,432443.466,59725.0,5010.0,551050.0,9997.0


Cleaning atq (Total Assets) to remove 0's and ensuring that first digit of every atq value is not 0

In [13]:
print (((companies.atq <1) & (companies.atq > 0)).sum())
companies['atq'] = (companies.atq)*1000
companies = companies[:][~(companies.atq == 0)]
companies = companies[:][~(companies.atq.isnull())]
print (((companies.atq <1)).sum())

3358
0


Cleaning revtq (Total Revenue) to remove 0's and ensuring that first digit of every atq value is not 0

In [None]:
companies = companies[:][~(companies.revtq == 0)]
companies = companies[:][~(companies.revtq.isnull())]
print (((companies.revtq <1)).sum())
companies['revtq'] = (companies.revtq) * 1000
print (((companies.revtq <1)).sum())

Running isBenford_chisq we get,

In [None]:
atq_actual,atq_cutoff,atq_isBenford,atq_dist = isBenford_chisq(companies.atq)
print ("For Total Assets Distribution")
print (atq_dist)
print ("Actual Chi Square Statistic: {0}, Cut-off: {1}, isBenford = {2}".format\
(atq_actual,atq_cutoff,atq_isBenford))

In [None]:
revtq_actual,revtq_cutoff,revtq_isBenford,revtq_dist = isBenford_chisq(companies.revtq)
print "For Total Revenue Distribution"
print revtq_dist
print "Actual Chi Square Statistic: {0}, Cut-off: {1}, isBenford = {2}".format\
(revtq_actual,revtq_cutoff,revtq_isBenford)

As we can see from above, Total Revenue and Total Assets do not follow Benford's Law. This shows that the numbers for revenue and total assets have been manipulated and do not reflect natural distribution.

Cleaning Total Income (niq(

In [None]:
companies = companies[:][~(companies.niq == 0)]
print (np.abs(companies.niq) <1).sum()
companies['niq'] = (companies.niq) * 1000
print (np.abs(companies.niq) <1).sum()

Cleaning Net Operating Cash Flows (oancfy)

In [None]:
companies = companies[:][~(companies.oancfy == 0)]
print (np.abs(companies.oancfy) <1).sum()
companies['oancfy'] = (companies.oancfy) * 1000
print (np.abs(companies.oancfy) <1).sum()

Running isBenford_chisq we get,

In [None]:
niq_actual,niq_cutoff,niq_isBenford,niq_dist = isBenford_chisq(companies.niq)
print "For Net Income Distribution"
print niq_dist
print "Actual Chi Square Statistic: {0}, Cut-off: {1}, isBenford = {2}".format\
(niq_actual,niq_cutoff,niq_isBenford)

In [None]:
oancfy_actual,oancfy_cutoff,oancfy_isBenford,oancfy_dist = isBenford_chisq(companies.oancfy)
print "For Operating Cash Outflow Distribution"
print oancfy_dist
print "Actual Chi Square Statistic: {0}, Cut-off: {1}, isBenford = {2}".format\
(oancfy_actual,oancfy_cutoff,oancfy_isBenford)

As we can see from above, Net Income does not follow Benford's law. This is because if Revenue and Costs are manipulated, the Net Income which is the difference between Revenue and Costs is also not naturally distributed. The CEO's manipulate the Net Income to manipulate their firm's performance. On the other hand, we can see that Operating Cash Outflows follow Benford's law because there is not much scope for a CEO to manipulate in cash outflow. As soon as cash/operating expenses are paid for, it is recognized as cash outflow, thus limiting the scope for manipulation.