Understanding the Relationship Between Leverage and Mahalanobis Distance

Data Science
Mathematics
Statistics
Author

Lam Fu Yuan, Kevin

Published

December 15, 2022

In Multiple Linear Regression, it is useful to detect the presence of outliers in the sample. An indicator of the presence of outliers is the leverage (McCullagh & Nelder, 1989, p. 405). The leverage is a measure of the distance between an observation in the sample and the sample mean vector (p. 405). Given that the leverage is a measure of distance, it is perhaps unsurprising that it is related to the another measure of distance known as the Mahalanobis distance (Mahalanobis, 1936). In this post, I prove the following mathematical relationship between the leverage and the Mahalanobis distance:

\[ d_{i}^{2}=(n-1)\begin{pmatrix} h_{ii} – \frac{1}{n} \end{pmatrix} \] where \(d_{i}^{2}\) is the square of the Mahalanobis distance between the \(i\)-th observation and the sample mean vector, and \(h_{ii}\) is the leverage of the \(i\)-th observation, for \(i=1,2,…,n\).

Notations

Before we proceed to prove the mathematical relationship between the leverage and the Mahalanobis distance, it is useful to introduce the notation that will be used in the proof.

Sample. Let \(X=(x_{1}, x_{2}, …, x_{n})^{T}\) be an \(n \times p\) matrix which represents a sample of \(n\) observations across \(p\) covariates. In \(X\), the \(i\)-th row represents the \(i\)-th observation and the \(j\)-th column represents the \(j\)-th covariate, for \(i=1,2,…,n\) and \(j=1,2,…,p\).

Sample Means. Let \(\mu=(\mu_{1},\mu_{2},…,\mu_{p})^{T}=\frac{1}{n}X^{T}J_{n,1}\) be a column vector with \(p\) elements which represents the sample mean vector, where \(J_{n,1}\) is an column vector with \(n\) \(1\)s.

Sample Covariances. Let \(\Sigma=\frac{1}{n-1}(X-J_{n,1}\mu^{T})^{T}(X-J_{n,1}\mu^{T})\) be a \(p \times p\) matrix which represents the sample covariance matrix.

Model Matrix. Let \((J_{n,1} \quad X)=((1 \quad x_{1}^{T})^{T}, (1 \quad x_{2}^{T})^{T}, …, (1 \quad x_{n}^{T})^{T})^{T}\) be an \(n \times (p+1)\) matrix which represents the model matrix which includes an intercept in addition to the \(p\) covariates.

Leverage. The leverage is a measure of the distance between an observation in the sample and the sample mean vector (McCullagh & Nelder, 1989, p. 405). Let \(h_{ii}\) be the leverage of the \(i\)-th observation in the sample, for \(i=1,2,…,n\). By definition,

\[ \begin{align*} h_{ii} &= (1 \quad x_{i}^{T})((J_{n,1} \quad X)^{T}(J_{n,1} \quad X))^{-1} (1 \quad x_{i}^{T})^{T} \end{align*} \]

Mahalanobis Distance. The Mahalanobis distance between two vectors is a measure of the distance between the vectors. Let \(d_{i}^{2}\) be the square of the Mahalanobis distance between the \(i\)-th observation in the sample and the sample mean vector, for \(i=1,2,…,n\) in the sample. By definition,

\[ \begin{align*} d_{i}^{2} &=(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu) \end{align*} \]

Result

Now that we have introduced the notation that will be used in the proof, let us proceed to prove the mathematical relationship between the leverage and the Mahalanobis distance.

Theorem 1. The Mahalanobis distances are related to the leverages as follows: \[ d_{i}^{2}=(n-1)\begin{pmatrix} h_{ii} – \frac{1}{n} \end{pmatrix} \] Proof:

\[ \begin{align*} h_{ii} &= (1 \quad x_{i}^{T})((J_{n,1} \quad X)^{T}(J_{n,1} \quad X))^{-1} (1 \quad x_{i}^{T})^{T} \\ &= (1 \quad x_{i}^{T}) \begin{pmatrix} n & n\mu^{T} \\ n\mu & X^{T}X \end{pmatrix}^{-1} (1 \quad x_{i}^{T})^{T} \\ &= (1 \quad x_{i}^{T}) \begin{pmatrix} \frac{1}{n} + \frac{1}{n}n\mu^{T}\frac{1}{n-1}\Sigma^{-1}n\mu\frac{1}{n} & -\frac{1}{n}n\mu^{T}\frac{1}{n-1}\Sigma^{-1} \\ -\frac{1}{n-1}\Sigma^{-1}n\mu\frac{1}{n} & \frac{1}{n-1}\Sigma^{-1} \end{pmatrix} (1 \quad x_{i}^{T})^{T} \\ &= (1 \quad x_{i}^{T}) \begin{pmatrix} \frac{1}{n}+\frac{1}{n-1}\mu^{T}\Sigma^{-1}\mu & -\frac{1}{n-1}\mu^{T}\Sigma^{-1} \\ -\frac{1}{n-1}\Sigma^{-1}\mu & \frac{1}{n-1}\Sigma^{-1} \end{pmatrix} (1 \quad x_{i}^{T})^{T} \\ &= \begin{pmatrix} \frac{1}{n}+\frac{1}{n-1}\mu^{T}\Sigma^{-1}\mu-\frac{1}{n-1}x_{i}^{T}\Sigma^{-1}\mu & -\frac{1}{n-1}\mu^{T}\Sigma^{-1}+\frac{1}{n-1}x_{i}^{T}\Sigma^{-1} \end{pmatrix} (1 \quad x_{i}^{T})^{T} \\ &= \frac{1}{n} + \frac{1}{n-1} \begin{pmatrix} \mu^{T}\Sigma^{-1}\mu – x_{i}^{T}\Sigma^{-1}\mu -\mu^{T}\Sigma^{-1}x_{i} + x_{i}^{T}\Sigma^{-1}x_{i} \end{pmatrix}\\ &= \frac{1}{n} + \frac{1}{n-1} \begin{bmatrix} -(x_{i}^{T}-\mu^{T})\Sigma^{-1}\mu+(x_{i}^{T}-\mu^{T})\Sigma^{-1}x_{i} \end{bmatrix}\\ &= \frac{1}{n} + \frac{1}{n-1} \begin{bmatrix} (x_{i}^{T}-\mu^{T})\Sigma^{-1}(x_{i}-\mu) \end{bmatrix} \\ &= \frac{1}{n}+\frac{1}{n-1}d_{i}^{2} \end{align*} \]

Therefore,

\[ d_{i}^{2}=(n-1)\begin{pmatrix} h_{ii} – \frac{1}{n} \end{pmatrix} \] which was to be demonstrated.

In the proof,

\[ \begin{pmatrix} n & n\mu^{T} \\ n\mu & X^{T}X \end{pmatrix} \] is inverted blockwise using the analytic inversion formula:

\[ \begin{pmatrix} \textbf{A} & \textbf{B} \\ \textbf{C} & \textbf{D} \end{pmatrix}^{-1} = \begin{pmatrix} \textbf{A}^{-1}+\textbf{A}^{-1}\textbf{B}(\textbf{D}-\textbf{CA}^{-1}\textbf{B})^{-1}\textbf{CA}^{-1} & -\textbf{A}^{-1}\textbf{B}(\textbf{D}-\textbf{CA}^{-1}\textbf{B})^{-1} \\ -(\textbf{D}-\textbf{CA}^{-1}\textbf{B})^{-1}\textbf{CA}^{-1} & (\textbf{D}-\textbf{CA}^{-1}\textbf{B})^{-1} \end{pmatrix} \]

In the analytic inversion formula,

\[ \begin{align*} \textbf{D}-\textbf{CA}^{-1}\textbf{B} &= X^{T}X-n\mu\frac{1}{n}n\mu^{T} \\ &= X^{T}X-n\mu\mu^{T} \\ &= (n-1)\Sigma \end{align*} \]

This is because

\[ \begin{align*} \Sigma &= \frac{1}{n-1}(X-J_{n,1}\mu^{T})^{T}(X-J_{n,1}\mu^{T}) \\ &= \frac{1}{n-1}(X^{T}X-X^{T}J_{n,1}\mu^{T}-\mu J_{1,n}X+\mu J_{1,n}J_{n,1}\mu^{T}) \\ &= \frac{1}{n-1}(X^{T}X-n\mu\mu^{T}-n\mu\mu^{T}+n\mu\mu^{T}) \\ &= \frac{1}{n-1}(X^{T}X-n\mu\mu^{T}) \end{align*} \]

References

Mahalanobis, C. P. (1936). On the generalized distance in statistics. In Proceedings of the National Institute of Sciences of India (Vol. 2, No. 1, pp. 49-55).

McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. Springer.

Appendix

In the Appendix, I present some results on the equivalence between statistics obtained from samples without and with mean-centering.

Notations

Sample. Let \(X^{c}=X-J_{n,1}\mu^{T}=(x_{1}^{c}, x_{2}^{c}, …, x_{n}^{c})^{T}\) be an \(n \times p\) matrix which represents the mean-centred sample.

Sample Means. Let \(\mu^{c}=(\mu_{1}^{c}, \mu_{2}^{c}, …, \mu_{p}^{c})^{T}=0_{p,1}\) be a column vector with \(p\) elements which represents the mean-centred sample mean vector.

Sample Covariances. Let ${c}=(X{c}-J_{n,1}({c}){T}){T}(X{c}-J_{n,1}({c}){T})=(X{c}){T}(X^{c})$ be a \(p \times p\) matrix which represents the mean-centred sample covariance matrix.

Model Matrix. Let \((J_{n,1} \quad X^{c})\) be an \(n \times (p+1)\) matrix which represents the model matrix using the mean-centred sample.

Leverage. Let \(h_{ii}^{c}\) be the leverage of the \(i\)-th observation in the mean-centred sample, for \(i=1,2,…,n\). By definition,

\[ \begin{align*} h_{ii}^{c} &= (1 \quad (x_{i}^{c})^{T})((J_{n,1} \quad X^{c})^{T}(J_{n,1} \quad X^{c}))^{-1} (1 \quad (x_{i}^{c})^{T})^{T} \\ &= (1 \quad x_{i}^{T}-\mu^{T})((J_{n,1} \quad X^{c})^{T}(J_{n,1} \quad X^{c}))^{-1} (1 \quad x_{i}^{T}-\mu^{T})^{T} \end{align*} \]

Mahalanobis Distance. Let \((d_{i}^{c})^{2}\) be the square of the Mahalnobis distance between the \(i\)-th observation and \(\mu^{c}\), for \(i=1,2,…,n\) in the mean-centred sample. By definition,

\[ \begin{align*} (d_{i}^{c})^{2} &=(x_{i}^{c}-\mu^{c})^{T}(\Sigma^{c})^{-1}(x_{i}^{c}-\mu^{c}) \\ &=(x_{i}-\mu)^{T}(\Sigma^{c})^{-1}(x_{i}-\mu) \end{align*} \]

Results

Proposition 1. The sample covariance matrix is the same regardless of whether the sample has been mean-centred. In other words, \(\Sigma=\Sigma^{c}\).

Proof:

\[ \begin{align*} \Sigma &=\frac{1}{n-1}(X-J_{n,1}\mu^{T})^{T}(X-J_{n,1}\mu^{T}) \\ &= \frac{1}{n-1}(X^{c})^{T}(X^{c}) \\ &= \Sigma^{c} \end{align*} \]

which was to be demonstrated.

Proposition 2. The leverages are the same regardless of whether the sample has been mean-centred. In other words, \(h_{ii}=h_{ii}^{c}\).

Proof:

\[ \begin{align*} h_{ii}^{c} &= (1 \quad x_{i}^{T}-\mu^{T})((J_{n,1} \quad X^{c})^{T}(J_{n,1} \quad X^{c}))^{-1} (1 \quad x_{i}^{T}-\mu^{T})^{T} \\ &= (1 \quad x_{i}^{T}-\mu^{T}) \begin{pmatrix} n & 0_{1,p} \\ 0_{p,1} & (X^{c})^{T}X^{c} \end{pmatrix}^{-1} (1 \quad x_{i}^{T}-\mu^{T})^{T} \\ &= (1 \quad x_{i}^{T}-\mu^{T}) \begin{pmatrix} n & 0_{1,p} \\ 0_{p,1} & (n-1)\Sigma \end{pmatrix}^{-1} (1 \quad x_{i}^{T}-\mu^{T})^{T} \\ &= (1 \quad x_{i}^{T}-\mu^{T}) \begin{pmatrix} \frac{1}{n} & 0_{1,p} \\ 0_{p,1} & \frac{1}{n-1}\Sigma^{-1} \end{pmatrix} (1 \quad x_{i}^{T}-\mu^{T})^{T} \\ &= \begin{pmatrix} \frac{1}{n} & \frac{1}{n-1}(x_{i}^{T}-\mu^{T})\Sigma^{-1} \end{pmatrix} (1 \quad x_{i}^{T}-\mu^{T})^{T} \\ &= \frac{1}{n} + \frac{1}{n-1} \begin{bmatrix} (x_{i}^{T}-\mu^{T})\Sigma^{-1}(x_{i}-\mu) \end{bmatrix} \\ &= \frac{1}{n}+\frac{1}{n-1}d_{i}^{2} \\ &= h_{ii} \end{align*} \]

which was to be demonstrated.

In the proof, \(\begin{pmatrix} n & 0_{1,p} \\ 0_{p,1} & (n-1)\Sigma \end{pmatrix}\) is a diagonal block matrix and therefore inverted blockwise.

Proposition 3. The Mahalanobis distances are the same regardless of whether the sample has been mean-centred. In other words, \(d_{i}=d_{i}^{c}\).

Proof:

\[ \begin{align*} d_{i}^{2} &=(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu) \\ &=(x_{i}-\mu)^{T}(\Sigma^{c})^{-1}(x_{i}-\mu) \\ &=(d_{i}^{c})^{2} \end{align*} \]

Therefore,

\[ d_{i}=d_{i}^{c} \]

which was to be demonstrated.

Copyright © 2024 Lam Fu Yuan, Kevin. All rights reserved.