 Moore-Penrose Pseudoinverse Calculation

So this isn’t the type of spreadsheet I normally upload to nicksfinancetricks.com but it was something I spent quite a bit of time preparing and something that my research indicated was not easily found on the internet…

This spreadsheet calculates the Moore-Penrose Pseudoinverse (i.e. an estimate of the hypothetical inverse) of a matrix, a feature I thought was required for my soon to be released Mean-Variance Portfolio Optimisation spreadsheet. As it turned out, being able to calculate the Pseudoinverse of the Variance-Covariance matrix isn’t critical as this matrix is always square and seldom Singular. Nonetheless, this feature means an estimate for the inverse matrix can always be achieved, where the MS Excel built-in matrix inverse function can sometimes fail due to the presence of near singular values.

Moore-Penrose Pseudoinverse

I would appreciate any contributions to the VBA Subroutine if anyone is interested. The following link will take you to my Github repository:

Github – VBA Pseudoinverse

Ironically, what can be done in 1 line of code in MATLAB using the “pinv” built-in function took me almost 400 lines of code in VBA. But if you like working with spreadsheets like I do then you’ll appreciate the convenience of having the ability to calculate the pseudoinverse of a matrix and immediately use the result within your spreadsheet. Plus, not everybody has access to MATLAB.

Anyway, if you’re interested in the mechanics behind the pseudoinverse algorithm it’s probably worth reading on. There are three main components to the algorithm, these include QR Factorisation via Gram-Schmidt Orthogonalisation, Singular Value Decomposition (SVD), and finally the removal of any Singular Values so the pseudoinverse can be formulated.

The algorithm I used for the QR Factorisation function was obtained from Timothy Sauer – Numerical Analysis 2E. The following pseudo-code for the function was extracted from Sauer-Chapter 4. For the SVD algorithm, I converted a MATLAB function prepared by a Paul Godfrey (23/10/2006) into a VBA function. Paul’s MATLAB function can be found on the Mathworks file exchange at the following link:

Mathworks – Simple SVD

Paul’s algorithm uses QR Factorisation on matrix-A (the matrix to be inverted) to gradually pull matrix-U out from the left and then uses QR Factorisation on matrix-A transposed to gradually pull matrix-V out from the right. The process makes matrix-A lower triangular and then upper triangular alternately. Eventually, matrix-A becomes both upper and lower triangular at the same time (i.e. diagonal) with the singular values on the diagonal. The algorithm outputs three matrices U, S and V where S is a diagonal matrix with any singular values and A = U*S*VT.

For the final step, I converted MATLAB’s own “pinv” function to a VBA function that uses the above SVD function in place of MATLAB’s “svd” built-in function. This step identifies any values in the diagonal S-matrix as singular if they are smaller than the user-defined tolerance. These singular values are removed from matrix-S so the inverse can be estimated without rounding errors forcing the result erroneously towards infinity. The function outputs matrix X of the same dimensions as AT so that A*X*A = A, X*A*X = X and A*X and X*A are Hermitian.

The other, more trivial, VBA functions required to produce the pseudoinverse are as follows:

• EYE – returns identity matrix;
• ZEROS – returns matrix composed entirely of zeros;
• NORM – returns the length of a vector;
• TRIU – returns the upper triangular part of a matrix;
• DIAG – returns a column vector of the main diagonal elements of a matrix;
• TransposeArray – returns the transpose of a matrix;
• MultArrays – returns the matrix multiplication of two matrices of compatible size.

To assess the accuracy of the pseudoinverse, the result of multiplying matrix-A by matrix-X (pseudoinverse) is compared with the appropriately sized identity matrix. This is based on the notion that the inverse of a matrix multiplied by the matrix itself is equal to the identity, i.e. A*A-1 = EYE where EYE is the identity matrix. The difference between EYE and A*X is compared via the Frobenius Norm and printed to the “Start” sheet. The closer this number is to zero, the closer the pseudoinverse is to the hypothetical inverse.

The condition number of matrix-A is also estimated by dividing the largest absolute value found in the diagonal matrix-S (from SVD) by the smallest absolute value. This number can be used to estimate the expected number of correct digits produced by inverting matrix-A.

By way of comparison between MATLAB’s “pinv” function and the pseudoinverse algorithm I wrote in VBA, I’ve carried out a couple of tests. These tests were carried out on a 15-stock variance-covariance matrix. The results of which are summarised below:

 Test Matrix-A size Tol. Cond. # Rank Singular values F-Norm (VBA) F-Norm (MATLAB) Max. Diff. Matrix Type 1 15 x 15 1e-16 8.27e+01 15 0 2.09e-14 1.56e-14 4.64e-11 Square 2 14 x 15 1e-16 8.24e+01 14 0 2.16e-14 1.96e-14 4.58e-11 Underdetermined 3 15 x 14 1e-16 8.24e+01 14 0 7.82e-02 1.00e+00 4.56e-11 Overdetermined 4 16 x 16 1e-16 6.52e+16 15 1 1.00e+00 1.00e+00 4.81e-11 Square 5 15 x 16 1e-16 4.90e+16 14 1 1.00e+00 1.00e+00 4.59e-11 Underdetermined 6 16 x 15 1e-16 6.61e+16 14 1 1.00e+00 1.41e+00 4.65e-11 Overdetermined

Where ‘Max. Diff.’ represents the maximum difference between MATLAB’s “pinv” function and my VBA algorithm.

So basically, for overdetermined matrices (matrices with more rows than columns, equivalent to systems with more equations than unknowns) and matrices with singular values, both VBA and MATLAB functions produce pseudoinverse estimations lacking in accuracy (since an accurate result is not achievable under these circumstances given double-precision data types and the tolerance specified). For underdetermined matrices (matrices with less rows than columns, or systems with fewer equations than unknowns) and square matrices without singular values, both MATLAB and VBA functions produce pseudoinverse estimations with up to approximately 14 decimal places of accuracy.

The MATLAB and VBA functions typically produce results of comparable accuracy according to the Frobenius Norms. The maximum differences observed between the results of the two functions are less meaningful in terms of accuracy but still serve as an indicator to correctness.