# **A test for Zipf Law**

A test statistic for Zipf's law written both in Python and R

## **Author**
Carlos M. Urzúa, urzuacarlosm@gmail.com

# **Description**

Given a vector *x* of positive real numbers, the statistic lmz proposed in Urzúa (2000) can be used to test for Zipf's law. Under the null hypothesis, **lmz** is asymptotically distributed as a chi-squared distribution with two degrees of freedom, and so the probability value can be estimated accordingly. But if the number of observations is less or equal than 30, it is suggested to use instead the critical values given in Table 1 of that paper.

The vector *x* does not need to be ordered, and only the observations greater or equal than a given value of mu are used to compute the statistic. This is handy because Zipf's law is typically rejected when *mu* (>= minimum element of x) is not close to the right tail of the distribution. Contrast the two examples given in the last section of Urzúa (2000).

# **Syntax**

The call function is simply `lmz(x,mu)` in the case of, both, the Python script lmz.py and the R program lmz.R included in this repository.


# **Notes**

* It is not advisable to test for Zipf's law by means of a regression (Urzúa, 2011).

* In a diverse number of disciplines, from Linguistics to Geography, it is not uncommon to test for Zipf's law. It is worth noticing, however, that such a law is a limit case among the distributions that exhibit a power-law behavior. To test for that behavior one could use the pwlaw statistic proposed in Urzúa (2020). The repository https://github.com/urzuacarlosm/A-test-for-power-law contains the Python and R codes


# The following code is written in Python

In [None]:
import numpy as np
import statistics as st
import scipy.stats as stats
import pandas as pd

def lmztest(x):
    xn = x.values[-1]
    n = len(x)
    z1 = 1 - np.mean(np.log(x / xn ))
    z2 = 0.5 - np.mean(xn / x)
    lmz = 4 * n * (z1 ** 2 + 6 * (z1 * z2) + 12 * (z2 ** 2))
    p_value = 1 - stats.chi2.cdf (lmz, df=2)
    return lmz, p_value

In order to check that the program is well written, you can use the following example:

Consider the US metropolitan areas that, in 1991 population of 250,000 or more inhabitants (US Bureau of the Census, 1993, Table 42). For this data set, 135 areas in total, Krugman (1996), p.40 and Gabaix (1998). Both authors claim that Zipf's law holds almost perfectly in this case. You can find the data in the following link: (link).
The results of this particular data are:

`lmz test result: (3.1592115995128527, 0.20605630964449895)`

In [None]:
import lmz as lmz_t
import pandas as pd

x = pd.read_excel("MET.xlsx", header = None)
lmztest_result = lmz_t.lmztest(x)
print("lmz test result:", lmztest_result)

# The following code is written in R

In [None]:
lmztest = function(x) {
  xn <- tail(x, n = 1)
  n <- length (x)
  z1 <- 1 - mean(log(x / xn))
  z2 <- 0.5 - mean(xn / x)
  lmz <- 4 * n * (z1 ^ 2 + 6 * (z1 * z2) + 12 * (z2 ^ 2))
  p_value = 1 - pchisq(lmz, df = 2)
  return(c(lmz, p_value))
}

In order to check that the program is well written, you can use the following example:

Consider the US metropolitan areas that, in 1991 population of 250,000 or more inhabitants (US Bureau of the Census, 1993, Table 42). For this data set, 135 areas in total, Krugman (1996), p.40 and Gabaix (1998). Both authors claim that Zipf's law holds almost perfectly in this case. You can find the data in the following link: (link).
The results of this particular data are:

`lmz test result: (3.1592116 0.2060563)`

In [None]:
install.packages("readxl")
lmztest(x, mu)
library(readxl)
x <- as.matrix(read_excel("1M.xlsx", col_names = FALSE, n_max = 200))
mu <- 1
lmztest(x, mu)

# **Bibliography**

* Gabaix, X., 1998. Zipf's law for cities: an explanation. Forthcoming in Quaterly Journal of Economics
* Krugman, P., 1996. The Self-prgamazing Economy, Blackwell, Oxford
* Urzúa, C. M. 2000. A simple and efficient test for Zipf's law. Economics Letters, vol. 66, pp. 257-260.

* Urzúa, C. M. 2011. Testing for Zipf's law: A common pitfall. Economics Letters, vol. 112, pp. 254-255.

* Urzúa, C. M. 2020. A simple test for power-law behavior. Stata Journal, vol. 20, no. 3, pp. 604-612
