NM-method

The NM-method or Naszodi–Mendonca [|method] is the operation that can be applied in statistics, econometrics, economics, sociology, and demography to construct counterfactual contingency tables. The method finds the matrix which is "closest" to [|matrix] in the sense of being [|ranked] the same but with the [|row and column totals of a target matrix] . While the row totals and column totals of are known, matrix itself may not be known.
Since the [|solution] for matrix is unique, the NM-method is a function:, where is a row vector of ones of size, while is a column vector of ones of size.
The NM-method was developed by Naszodi and Mendonca to solve for matrix in problems, where matrix is not a sample from the population characterized by the row totals and column totals of matrix, but represents another population.
Their application aimed at quantifying intergenerational changes in the strength of educational homophily and thus measuring the historical change in social inequality between different educational groups in the US between 1980 and 2010. The trend in inequality was found to be U-shaped, supporting the view that with appropriate social and economic policies inequality can be reduced.

Definition of matrix ranking

The closeness between two matrices of the same size can be defined in several ways. The Euclidean distance, and the Kullback–Leibler divergence are two well-known examples.
The NM-method is consistent with a definition relying on the ordinal Liu–Lu index which is the slightly modified version of the Coleman-index defined by Eq. in Coleman. According to this definition, matrix is "closest" to matrix, if their Liu–Lu values are the same. In other words, if they are ranked the same by the ordinal Liu–Lu index.
If is a 2×2 matrix, its [|scalar-valued Liu–Lu index] is defined as
, where
Following Coleman, this index is interpreted as the “actual minus expected over maximum minus minimum”, where is the actual value of the entry of the seed matrix ; is its expected value under the counterfactual assumptions that the corresponding row total and column total of are predetermined, while its interior is random. Also, is its minimum value if the association between the row variable and the column variable of is non-negative. Finally, is the maximum value of for given row total and column total.
For matrix of size n×m, the Liu–Lu index was generalized by Naszodi and Mendonca to a matrix-valued index. One of the [|preconditions] for the generalization is that the row variable and the column variable of matrix have to be ordered. Equating the generalized, matrix-valued Liu–Lu index of with that of matrix is equivalent to dichotomizing their ordered row variable and ordered column variable in ways by explointing the ordered nature of the row and column variables. Than, equating the original, [|scalar-valued Liu–Lu indices] of the 2×2 matrices obtained with the dichotomizations. I.e., for any pair of the restriction is imposed, where is the matrix with its being of size, and its being of size. Similarly, is the matrix given by the transpose of with its being of size, and its being of size.

Constraints on the row totals and column totals

Matrix should satisfy not only but also the pair of constraints on its row totals and column totals: and.

Solution

Assuming that for all pairs of, the solution for is unique, deterministic, and given by a closed-form formula.
For matrices and of size, the solution is
The other 3 cells of are uniquely determined by the row totals and column totals. So, this is how the NM-method works for 2×2 seed tables.
For, and matrices of size, the solution is obtained by dichotomizing their ordered row variable and ordered column variable in [|all possible meaningful ways] before solving number of problems of 2×2 form. Each problem is defined for an pair with, and the target row totals and column totals:, and, respectively. Each problem is to be solved separately by the [|formula] for. The set of solutions determine number of entries of matrix. Its remaining elements are uniquely determined by the target row totals and column totals.
Next, let us see how the NM-method works if matrix is such that the second [|precondition] of is not met for.
If for all pairs of, the solution for is also unique, deterministic, and given by a closed-form formula. However, the corresponding concept of matrix ranking is slightly different from the one [|discussed above]. Liu and Lu define it as, where ; is the smallest integer being larger than or equal to.
Finally, neither the NM-method, nor is defined if pair such that, while for another pair of .

A numerical example

Consider the following complemented with its row totals and column totals and the targets, i.e., the and :

Z	1	2	3	4	TOTAL	TARGET
1					240
2					235
3					185
4					140
TOTAL	210	230	185	175	800
TARGET						1,000

As a first step of the NM-method, is multiplied by the, and matrices for each pair of . It yields the following 9 matrices of size 2×2 with their target row totals and column totals:

The next step is to calculate the generalized matrix-valued Liu–Lu index, by applying the formula of the original scalar-valued Liu–Lu index to each of the 9 matrices:


0.39	0.54	0.62
0.53	0.44	0.47
0.73	0.61	0.45

Apparently, matrix is positive. Therefore, the NM-method is defined. [|Solving] each of the 9 problems of the 2×2 form yields 9 entries of the matrix. Its other 7 entries are uniquely determined by the target row totals and column totals. The solution for is:

	1	2	3	4	TOTAL
1	253.1	91.4	40.5	15.1	400
2	91.1	147.1	39.8	21.9	300
3	39.6	36.8	64.2	9.3	150
4	16.2	24.7	55.5	53.6	150
TOTAL	400	300	200	100	1,000

Another numerical example taken from Abbott et al.(2019)

Consider the following complemented with its row totals and column totals and the targets, i.e., the and :

Z	1	2	3	TOTAL	TARGET
1				1,360
2				5,840
3				2,800
TOTAL	1,390	5,670	2,940	10,000
TARGET					10,000

As a first step of the NM-method, is multiplied by the, and matrices for each pair of . It yields the following 4 matrices of size 2×2 with their target row totals and column totals:

The next step is to calculate the generalized matrix-valued Liu–Lu index, by applying the formula of the original scalar-valued Liu–Lu index to each of the 4 matrices:


	0.75	0.95
	0.95	0.78

Apparently, matrix is positive. Therefore, the NM-method is defined. Solving each of the 4 problems of the 2×2 form yields 4 entries of the matrix. Its other 5 entries are uniquely determined by the target row totals and column totals. The solution for is:

	1	2	3	TOTAL
1	1,101	476	24	1,600
2	271	4,819	809	5,900
3	18	375	2,107	2,500
TOTAL	1,390	5,670	2,940	10,000

Implementation

The NM-method is implemented in Excel, Visual Basic, R, and also in Stata.

Applications

The NM-method can be applied to study various phenomena including assortative mating, intergenerational mobility as a type of social mobility, residential segregation, recruitment and talent management.
In all of these applications, matrices,, and represent joint distributions of one-to-one matched entities characterized either by a dichotomous categorical variable, or an ordered multinomial categorical variable. Although the NM-method has a wide range of applicability, all the examples to be presented next are about assortative mating along the education level. In these applications, the two preconditions are not debated to be met.
Assume that matrix characterizes the joint educational distribution of husbands and wives in Zimbabwe, while matrix characterizes the same in Yemen. Matrix to be constructed with the NM-method tells us what would be the joint educational distribution of couples in Zimbabwe, if the educational distributions of husbands and wives were the same as in Yemen, while the overall desire for homogamy were unchanged.
In a second application, matrices and characterize the same country in two different years. Matrix is the joint educational distribution of American newlyweds in 2040, where the husbands are from Generation Z and being young adults when observed. Matrix is the same but for Generation Y observed in year 2024. By constructing matrix, one can study in the future what would be the educational distribution among the just married American young couples if they sorted into marriages the same way as the males in Generation Z and their partners do, while the education level were the same as among the males in Generation Y and their partners.
In a [|third application], matrices and characterize again the same country in two different years. In this application, matrix is the joint educational distribution of Portuguese young couples in 2011. And is the same but it is observed in year 1981. One may aim to construct matrix in order to study what would have been the educational distribution of Portuguese young couples if they had sorted into marriages like their peers did in 2011, while their gender-specific educational distributions were the same as in 1981.
In each of the first two applications, matrix represents a counterfactual joint distribution. It can be used to quantify certain ceteris paribus effects. More precisely, to quantify on a cardinal scale the difference between the directly unobservable degree of marital sorting in Zimbabwe and Yemen, or in Generation Z and Generation Y with a counterfactual decomposition. For the decomposition, the counterfactual table is used to calculate the contribution of each of the driving forces and that of their interaction to an observable cardinal scaled statistics.
The third application was used by Naszodi and Mendonca as an example for a non-sense counterfactual: the education level has changed so drastically in Portugal over the three decades studied that this counterfactual is impossible to be obtained. Surprisingly, a method, which was commonly used in the assortative mating literature until recently, hallucinates a solution for the impossible counterfactual in the third example, while the NM-method rejects to construct it.

Some features of the NM-method

First, the NM-method does not yield a meaningful solution if it reaches the limit of its applicability. For instance, in the third application, the NM-method signals with a negative entry in matrix that the counterfactual is impossible. In this respect, the NM-method is similar to the linear probability model that signals the same with a predicted probabiity outside the unit interval.
Second, the NM-method commutes with merging neighboring categories of the row variable and that of the column variable:, where is the row merging matrix of size ; and, where is the column merging matrix of size.
Third, the NM-method works even if there are zero entries in matrix.

Comparison with the IPF

The iterative proportional fitting procedure is also a function:. It is the operation of finding the fitted matrix which fulfills a set of conditions similar to those met by matrix constructed with the NM-method. E.g., matrix is the closest to matrix but with the row and column totals of the target matrix.
However, there are differences between the IPF and the NM-method. The IPF defines closeness of matrices of the same size by the cross-entropy, or the Kullback–Leibler divergence. Accordingly, the IPF compatible concept of distance between the 2×2 matrices and is zero, if their crossproduct ratios are the same:. To recall, the NM-method's condition for [|equal ranking] of matrices and is.
The following numerical example highlights that the IPF and the NM-method are not identical:. Consider the with its :

	1	2	TOTAL	TARGET
1			600
2			400
TOTAL	500	500
TARGET				1,500

The NM-method yields the following matrix :

	1	2	TOTAL
1	925	125	1,050
2	75	375	450
TOTAL	1,000	500	1,500

Whereas the solution for matrix obtained with the IPF is:

	1	2	TOTAL
1	900	150	1,050
2	100	350	450
TOTAL	1,000	500	1,500

The IPF is equivalent to the maximum likelihood estimator of a joint population distribution, where matrix is calculated from matrix, the observed joint distribution in a random sample taken from the population characterized by the row totals and column totals of matrix. In contrast to the problem solved by the IPF, matrix is not sampled from this population in the problem that the NM-method was developed to solve. In fact, in the NM-problem, matrices and characterize two different populations. This difference facilitates the choice between the NM-method and the IPF in empirical applications.
Deming and Stephan, the inventors of the IPF, illustrated the application of their method on a classic maximum likelihood estimation problem, where matrix was sampled from the population characterized by the row totals and column totals of matrix. They were aware of the fact that in general, the IPF is not suitable for counterfactual predictions: they explicitly warned that their algorithm is “not by itself useful for prediction”.
In addition, the domains are different for which the IPF and the NM-method yield solutions. First, unlike the NM-method, the IPF does not provide a solution for all seed tables with zero entries.
Second, the [|precondition of the NM-method] is not a precondition for the applicability of the IPF. Third, unlike the NM, the IPF provides a seeminly meaningful solution for pairs of matrices and defining impossible counterfactuals, such as the pair of matrices in our third numerical example with Portugal.
Finally, unlike the NM, the IPF does not commute with the operation of merging neighboring categories of the row variable and that of the column variable as it is illustrated with a numerical example in Naszodi.
For this reason, the transformed table obtained with the IPF can be sensitive to the choice of the number of trait categories.
Kenneth Macdonald
is at ease with the conclusion by Naszodi
that the IPF is suitable for sampling correction tasks, but not for generation of counterfactuals. Similarly to Naszodi, Macdonald also questions whether the row and column proportional transformations of the IPF preserve the structure of association within a contingency table that allows us to study social mobility.

Comparison with the Minimum Euclidean Distance Approach

The Minimum Euclidean Distance Approach is also a function:
First, MEDA assigns a scalar to matrix : it is the weight used for
constructing the convex combination of two extreme cases by minimizing the Eucledean distance with. E.g. this scalar is in the
numerical example taken from Abbott et al.(2019).
Second, for any pair of counterfactual marginal distributions the MEDA constructs the convex combination of the two extreme cases ).
Differences between the NM and the MEDA:
while the NM holds the assortativeness unchanged by keeping the generalized matrix-valued Liu–Lu index fixed, the MEDA does the same by keeping the scalar fixed.
For, and matrices of size the two methods produces the same transformed table provided ranks the contingency tables the same as the scalar-valued Liu–Lu index does.
However, for matrices larger than 2×2, the generalized Liu–Lu index is matrix-valued, so it is different from the scalar-valued.
Therefore, the NM-transformed table is also different from the MEDA-transformed table.
For instance, in the numerical example taken from Abbott et al.(2019), the counterfactual table constructed by MEDA
is the matrix :

	1	2	3	TOTAL
1	1,081	240	279	1,600
2	217	5,054	629	5,900
3	92	376	2,032	2,500
TOTAL	1,390	5,670	2,940	10,000

The difference between matrix and matrix is not negligible. E.g. the share of homogamous couples is 2 percentage points smaller in the MEDA-constructed counterfactual matrix than in the observed matrix,
whereas it is 3.4 percentage points smaller in the NM-constructed counterfactual matrix relative to.
Because Abbott's example is not a fictional one, but is based on the empirical educational distribution of American couples, therefore the difference between 2 percentage points and 3.4 percentage points can be interpreted as the MEDA quantifies changes in inequality from one generation to another generation to be significantly smaller compared to the NM.