Gini coefficient
In economics, the Gini coefficient, also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality, the wealth inequality, or the consumption inequality within a nation or a social group. It was developed by Italian statistician and sociologist Corrado Gini.
The Gini coefficient measures the inequality among the values of a frequency distribution, such as income levels. A Gini coefficient of 0 reflects perfect equality, where all income or wealth values are the same. In contrast, a Gini coefficient of 1 reflects maximal inequality among values, where a single individual has all the income while all others have none.
Corrado Gini proposed the Gini coefficient as a measure of inequality of income or wealth. For OECD countries in the late 20th century, considering the effect of taxes and transfer payments, the income Gini coefficient ranged between 0.24 and 0.49, with Slovakia being the lowest and Mexico the highest. African countries had the highest pre-tax Gini coefficients in 2008–2009, with South Africa having the world's highest, estimated to be 0.63 to 0.7. However, this figure drops to 0.52 after social assistance is taken into account and drops again to 0.47 after taxation. Slovakia has the lowest Gini coefficient, with a Gini coefficient of 0.232. Various sources have estimated the Gini coefficient of the global income in 2005 to be between 0.61 and 0.68.
There are multiple issues in interpreting a Gini coefficient, as the same value may result from many different distribution curves. The demographic structure should be taken into account to mitigate this. Countries with an aging population or those with an increased birth rate experience an increasing pre-tax Gini coefficient even if real income distribution for working adults remains constant. Many scholars have devised over a dozen variants of the Gini coefficient.
History
The Italian statistician Corrado Gini developed the Gini coefficient and published it in his 1912 paper Variabilità e mutabilità. Building on the work of American economist Max Lorenz, Gini proposed using the difference between the hypothetical straight line depicting perfect equality and the actual line depicting people's incomes as a measure of inequality. In this paper, he introduced the concept of simple mean difference as a measure of variability.He then applied the simple mean difference of observed variables to income and wealth inequality in his work On the measurement of concentration and variability of characters in 1914. Here, he presented the concentration ratio, which further developed into today's Gini coefficient. Secondly, Gini observed that improving methods introduced by Lorenz, Chatelain, or Séailles could also achieve his proposed ratio.
In 1915, Gaetano Pietra introduced a geometrical interpretation between Gini's proposed ratio and between the observed area of concentration and maximum concentration. This altered version of the Gini coefficient became the most commonly used inequality index in upcoming years.
According to data from the OECD, the Gini coefficient was first officially used country-wide in Canada in the 1970s. Canadian index of income inequality ranged from 0.303 to 0.284 from 1976 to the end of the 1980s. The OECD has published more data on countries since the start of the 21st century. The Central European countries of Slovenia, Czechia, and Slovakia have had the lowest inequality index of all OECD countries ever since the 2000s. Scandinavian countries also frequently appeared at the top of the equality list in recent decades.
Definition
The Gini coefficient is an index for the degree of inequality in the distribution of income/wealth, used to estimate how far a country's wealth or income distribution deviates from an equal distribution.The Gini coefficient is usually defined mathematically based on the Lorenz curve, which plots the proportion of the total income of the population that is cumulatively earned by the bottom x of the population. The line at 45 degrees thus represents perfect equality of incomes. The Gini coefficient can then be thought of as the ratio of the area that lies between the line of equality and the Lorenz curve over the total area under the line of equality ; i.e.,. If there are no negative incomes, it is also equal to 2A and due to the fact that.
Assuming non-negative income or wealth for all, the Gini coefficient's theoretical range is from 0 to 1. This measure is often rendered as a percentage, spanning 0 to 100. However, if negative values are factored in, as in cases of debt, the Gini index could exceed 1. Typically, we presuppose a positive mean or total, precluding a Gini coefficient below zero.
An alternative approach is to define the Gini coefficient as half of the relative mean absolute difference, which is equivalent to the definition based on the Lorenz curve. The mean absolute difference is the average absolute difference of all pairs of items of the population, and the relative mean absolute difference is the mean absolute difference divided by the average,, to normalize for scale. If xi is the wealth or income of person i, and there are n persons, then the Gini coefficient G is given by:
When the income distribution is given as a continuous probability density function p, the Gini coefficient is again half of the relative mean absolute difference:
where is the mean of the distribution, and the lower limits of integration may be replaced by zero when all incomes are positive.
Calculation
While the income distribution of any particular country will not correspond perfectly to the theoretical models, these models can provide a qualitative explanation of the income distribution in a nation given the Gini coefficient.Example: Two levels of income
The extreme cases are represented by the most equal possible society in which every person receives the same income, and the most unequal society where a single person receives 100% of the total income and the remaining people receive none.A simple case assumes just two levels of income, low and high. If the high income group is a proportion u of the population and earns a proportion f of all income, then the Gini coefficient is. A more graded distribution with these same values u and f will always have a higher Gini coefficient than.
For example, if the wealthiest u = 20% of the population has f = 80% of all income, the income Gini coefficient is at least 60%. In another example, if u = 1% of the world's population owns f = 50% of all wealth, the wealth Gini coefficient is at least 49%.
Alternative expressions
In some cases, this equation can be applied to calculate the Gini coefficient without direct reference to the Lorenz curve. For example, :- For a population of n individuals with values,
is a consistent estimator of the population Gini coefficient, but is not in general unbiased. In simplified form:
There does not exist a sample statistic that is always an unbiased estimator of the population Gini coefficient.
Discrete probability distribution
For a discrete probability distribution with probability mass function , where is the fraction of the population with income or wealth, the Gini coefficient is:where
If the points with non-zero probabilities are indexed in increasing order, then:
where
Continuous probability distribution
When the population is large, the income distribution may be represented by a continuous probability density function f where f ''dx is the fraction of the population with wealth or income in the interval dx about x''. If F is the cumulative distribution function for f:and L is the Lorenz function:
then the Lorenz curve L may then be represented as a function parametric in L and F and the value of B can be found by integration:
The Gini coefficient can also be calculated directly from the cumulative distribution function of the distribution F. Defining μ as the mean of the distribution, then specifying that F is zero for all negative values, the Gini coefficient is given by:
The latter result comes from integration by parts. '
The Gini coefficient may be expressed in terms of the quantile function Q '
Since the Gini coefficient is independent of scale, if the distribution function can be expressed in the form f where φ is a scale factor and a, b, c... are dimensionless parameters, then the Gini coefficient will be a function only of a, b, c.... For example, for the exponential distribution, which is a function of only x and a scale parameter, the Gini coefficient is a constant, equal to 1/2.
For some functional forms, the Gini index can be calculated explicitly. For example, if y follows a log-normal distribution with the standard deviation of logs equal to, then where is the error function. In the table below, some examples for probability density functions with support on are shown. The Dirac delta distribution represents the case where everyone has the same wealth ; it implies no variations between incomes.
- is the Gamma function
- is the Beta function
- is the Regularized incomplete beta function
Other approaches
- Xk is the cumulated proportion of the population variable, for k = 0,...,n, with X0 = 0, Xn = 1.
- Yk is the cumulated proportion of the income variable, for k = 0,...,n, with Y0 = 0, Yn = 1.
- Yk should be indexed in non-decreasing order
is the resulting approximation for G. More accurate results can be obtained using other methods to approximate the area B, such as approximating the Lorenz curve with a quadratic function across pairs of intervals or building an appropriately smooth approximation to the underlying distribution function that matches the known data. If the population mean and boundary values for each interval are also known, these can also often be used to improve the accuracy of the approximation.
The Gini coefficient calculated from a sample is a statistic, and its standard error, or confidence intervals for the population Gini coefficient, should be reported. These can be calculated using bootstrap techniques, mathematically complicated and computationally demanding even in an era of fast computers. Economist Tomson Ogwang made the process more efficient by setting up a "trick regression model" in which respective income variables in the sample are ranked, with the lowest income being allocated rank 1. The model then expresses the rank as the sum of a constant A and a normal error term whose variance is inversely proportional to yk:
Thus, G can be expressed as a function of the weighted least squares estimate of the constant A and that this can be used to speed up the calculation of the jackknife estimate for the standard error. Economist David Giles argued that the standard error of the estimate of A can be used to derive the estimate of G directly without using a jackknife. This method only requires using ordinary least squares regression after ordering the sample data. The results compare favorably with the estimates from the jackknife with agreement improving with increasing sample size.
However, it has been argued that this depends on the model's assumptions about the error distributions and the independence of error terms. These assumptions are often not valid for real data sets. There is still ongoing debate surrounding this topic.
Guillermina Jasso and Angus Deaton independently proposed the following formula for the Gini coefficient:
where is mean income of the population, Pi is the income rank P of person i, with income X, such that the richest person receives a rank of 1 and the poorest a rank of N. This effectively gives higher weight to poorer people in the income distribution, which allows the Gini to meet the Transfer Principle. Note that the Jasso-Deaton formula rescales the coefficient so that its value is one if all the are zero except one. Note however Allison's reply on the need to divide by N² instead.
FAO explains another version of the formula.