Iterative proportional fitting

The iterative proportional fitting procedure is the operation of finding the fitted matrix which is the closest to an initial matrix but with the row and column totals of a target matrix . The fitted matrix being of the form, where and are diagonal matrices such that has the margins of. Some algorithms can be chosen to perform biproportion. We have also the entropy maximization, information loss minimization or RAS which consists of factoring the matrix rows to match the specified row totals, then factoring its columns to match the specified column totals; each step usually disturbs the previous step's match, so these steps are repeated in cycles, re-adjusting the rows and columns in turn, until all specified marginal totals are satisfactorily approximated. However, all algorithms give the same solution.
In three- or more-dimensional cases, adjustment steps are applied for the marginals of each dimension in turn, the steps likewise repeated in cycles.

History

IPF has been "re-invented" many times, the earliest by Kruithof in 1937
in relation to telephone traffic, Deming and Stephan in 1940 for adjusting census crosstabulations, and G.V. Sheleikhovskii for traffic as reported by Bregman. .
Early proofs of uniqueness and convergence came from Sinkhorn, Bacharach, Bishop, and Fienberg. Bishop's proof that IPFP finds the maximum likelihood estimator for any number of dimensions extended a 1959 proof by Brown for 2x2x2... cases. Fienberg's proof by differential geometry exploits the method's constant crossproduct ratios, for strictly positive tables. Csiszár. found necessary and sufficient conditions for general tables having zero entries. Pukelsheim and Simeone
give further results on convergence and error behavior.
An exhaustive treatment of the algorithm and its mathematical foundations can be found in the book of Bishop et al.. Idel gives a more recent survey.
Other general algorithms can be modified to yield the same limit as the IPFP, for instance the Newton–Raphson method and
the EM algorithm. In most cases, IPFP is preferred due to its computational speed, low storage requirements, numerical stability and algebraic simplicity.
Applications of IPFP have grown to include trip distribution models, Fratar or Furness and other applications in transportation planning, survey weighting, synthesis of cross-classified demographic data, adjusting input–output models in economics, estimating expected quasi-independent contingency tables, biproportional apportionment systems of political representation, and for a preconditioner in linear algebra.

Biproportion

Biproportion, whatever the algorithm used to solve it, is the following concept:, matrix and matrix are known real nonnegative matrices of dimension ; the interior of is unknown and is searched such that has the same margins than, i.e. and, and such that is close to following a given criterion, the fitted matrix being of the form, where and are diagonal matrices.
s.t., ∀ and, ∀.
The Lagrangian is.
Thus, for ∀,
which, after posing and, yields
, ∀, i.e.,,
with, ∀ and, ∀. and form a system that can be solve iteratively:
, ∀ and, ∀.
The solution is independent of the initialization chosen Input-output analysis: Foundations and Extensions, Second edition, Cambridge : Cambridge University Press, p. 335-336 ).
Some properties :
Lack of information: if brings no information, i.e.,, ∀ then.
Idempotency: if has the same margins than.
Composition of biproportions: ;.
Zeros: a zero in is projected as a zero in. Thus, a bloc-diagonal matrix is projected as a bloc-diagonal matrix and a triangular matrix is projected as a triangular matrix.
Theorem of separable modifications: if is premutiplied by a diagonal matrix and/or postmultiplied by a diagonal matrix, then the solution is unchanged.
Theorem of "unicity": If is any non-specified algorithm, with, and being unknown, then and can always be changed into the standard form of and. The demonstrations calls some above properties, particularly the Theorem of separable modifications and the composition of biproportions.

Algorithm 1 (classical IPF)

Given a two-way -table, we wish to estimate a new table for all i and j such that the marginals satisfy and.
Choose initial values, and for set
Repeat these steps until row and column totals are sufficiently close to u and v.
Notes:

For the RAS form of the algorithm, define the diagonalization operator, which produces a matrix with its input vector on the main diagonal and zero elsewhere. Then, for each row adjustment, let, from which. Similarly each column adjustment's, from which. Reducing the operations to the necessary ones, it can easily be seen that RAS does the same as classical IPF. In practice, one would not implement actual matrix multiplication with the whole R and S matrices; the RAS form is more a notational than computational convenience.

Algorithm 2 (factor estimation)

Assume the same setting as in the classical IPFP.
Alternatively, we can estimate the row and column factors separately: Choose initial values, and for set
Repeat these steps until successive changes of a and b are sufficiently negligible.
Finally, the result matrix is
Notes:

The two variants of the algorithm are mathematically equivalent, as can be seen by formal induction. With factor estimation, it is not necessary to actually compute each cycle's.
The factorization is not unique, since it is for all.

Discussion

The vaguely demanded 'similarity' between M and X can be explained as follows: IPFP
maintains the crossproduct ratios, i.e.
since
This property is sometimes called structure conservation and directly leads to the geometrical interpretation of contingency tables and the proof of convergence in the seminal paper of Fienberg.
Direct factor estimation is generally the more efficient way to solve IPF: Whereas a form of the classical IPFP needs
elementary operations in each iteration step, factor estimation needs only
operations being at least one order in magnitude faster than classical IPFP.
IPFP can be used to estimate expected quasi-independent contingency tables, with, and for included cells and for excluded cells. For fully independent contingency tables, estimation with IPFP concludes exactly in one cycle.

Comparison with the NM-method

Similar to the IPF, the NM-method is also an operation of finding a matrix which is the “closest” to matrix while its row totals and column totals are identical to those of a target matrix .
However, there are differences between the NM-method and the IPF. For instance, the NM-method defines closeness of matrices of the same size differently from the IPF. Also, the NM-method was developed to solve for matrix in problems, where matrix 'is not a sample from the population characterized by the row totals and column totals of matrix, but represents another population. In contrast, matrix is a sample from this population in problems where the IPF is applied as the maximum likelihood estimator'.
Macdonald
is at ease with the conclusion by Naszodi
that the IPF is suitable for sampling correction tasks, but not for generation of counterfactuals. Similarly to Naszodi, Macdonald also questions whether the row and column proportional transformations of the IPF preserve the structure of association within a contingency table that allows us to study social mobility.

Existence and uniqueness of MLEs

Necessary and sufficient conditions for the existence and uniqueness of MLEs are complicated in the general case, but sufficient conditions for 2-dimensional tables are simple:

the marginals of the observed table do not vanish and
the observed table is inseparable.

If unique MLEs exist, IPFP exhibits linear convergence in the worst case, but exponential convergence has also been observed. If a direct estimator exists, IPFP converges after 2 iterations. If unique MLEs do not exist, IPFP converges toward the so-called extended MLEs by design, but convergence may be arbitrarily slow and often computationally infeasible.
If all observed values are strictly positive, existence and uniqueness of MLEs and therefore convergence is ensured.

Example

Consider the following table, given with the row- and column-sums and targets.

	1	2	3	4	TOTAL	TARGET
1	40	30	20	10	100	150
2	35	50	100	75	260	300
3	30	80	70	120	300	400
4	20	30	40	50	140	150
TOTAL	125	190	230	255	800
TARGET	200	300	400	100		1000

For executing the classical IPFP, we first adjust the rows:

	1	2	3	4	TOTAL	TARGET
1	60.00	45.00	30.00	15.00	150.00	150
2	40.38	57.69	115.38	86.54	300.00	300
3	40.00	106.67	93.33	160.00	400.00	400
4	21.43	32.14	42.86	53.57	150.00	150
TOTAL	161.81	241.50	281.58	315.11	1000.00
TARGET	200	300	400	100		1000

The first step exactly matched row sums, but not the column sums. Next we adjust the columns:

	1	2	3	4	TOTAL	TARGET
1	74.16	55.90	42.62	4.76	177.44	150
2	49.92	71.67	163.91	27.46	312.96	300
3	49.44	132.50	132.59	50.78	365.31	400
4	26.49	39.93	60.88	17.00	144.30	150
TOTAL	200.00	300.00	400.00	100.00	1000.00
TARGET	200	300	400	100		1000

Now the column sums exactly match their targets, but the row sums no longer match theirs. After completing three cycles, each with a row adjustment and a column adjustment, we get a closer approximation:

	1	2	3	4	TOTAL	TARGET
1	64.61	46.28	35.42	3.83	150.13	150
2	49.95	68.15	156.49	25.37	299.96	300
3	56.70	144.40	145.06	53.76	399.92	400
4	28.74	41.18	63.03	17.03	149.99	150
TOTAL	200.00	300.00	400.00	100.00	1000.00
TARGET	200	300	400	100		1000

Implementation

The R package mipfp provides a multi-dimensional implementation of the traditional iterative proportional fitting procedure. The package allows the updating of a N-dimensional array with respect to given target marginal distributions.
Python has an equivalent package, ipfn that can be installed via pip. The package supports numpy and pandas input objects.