"Over the past twenty years, understanding of and business practice in mortgage markets has been influenced significantly by the application of statistical models. Mortgage underwriting was automated using statistical models of default and default loss, and statistical models of denial rates and loan pricing were used to test for discrimination in lending. Efforts to measure mortgage market discrimination and credit risk have been propelled by an increase in the loan-level data available through various resources. Unfortunately, as researchers strived to produce results from these data, critical statistical errors were overlooked and then repeated in what has become the “conventional approach” to measuring discrimination and credit risk."

That is the executive summary of A Review of Statistical Problems in the Measurement of Mortgage Market and Credit Risk, a paper sponsored by MBA's Research Institute for Housing America (RIHA) and conducted by Professor Anthony M. Yezer of George Washington University.   The study examined three models used in conventional approaches to testing for discrimination in the granting of mortgages - testing based on applicant rejection equations, based on mortgage pricing equations, and based on mortgage default equations and one conventional approach to measuring credit risks.

The study found that statistical errors had been perpetuated through many different analysis of the underlying data, those errors were then unknowingly institutionalized and accepted as conventional practice, i.e. some simple statistical models are not firmly grounded in economic theory.

According to Yezer, the major simplifying assumption made in the models is that borrowers have no knowledge of the mortgage lending process and do not select mortgage terms strategically.  The theory has been that the lender selects the mortgage terms and borrowers are oblivious to the effects of their own decisions on the transaction.  For example, conventional statistical techniques assume that borrowers determine the amount they are willing to pay for a home without considering that this decision will impact on the chances they will be rejected.  This, the author says, flies in the face of reality.  Any buyer finds out early in the process that his ability to qualify will be based largely on the price of the home he selects and the amount of his down payment.  This self-selection is largely overlooked in conventional studies.

Another problem with conventional techniques is omitted variable bias.  For example, in an evaluation of discrimination based on rejection, the variable indicating race made be analyzed incorrectly as correlating with rejection when instead it is a variable that is a dummy for race (perhaps a lower average income or credit score for a minority population) that is causing the effect but is not included in the equation.   

Incorrect coding also led to incorrect interpretations.  For example, in an analysis of Boston Federal Reserve data, one study found a significant source of errors was the difference between what was initially claimed by the applicant and the final determination by the underwriter.  Sometimes both the initial false claim and the verified information were both retained and coded and it is unclear which should be properly included in a statistical analysis.

The complexity of the loan decision leads to other problems.  For example, some characteristics of the borrower may become less important when there is a cosigner.  If there is a poor credit history, then LTV takes on more significance.  When one study interacted various underwriting variables with a race dummy, it was found that, for some variables, being a minority was an advantage while in others it was a disadvantage.

These types of complexity also affect analysis of loan pricing as a determinant of discrimination.  Loan terms used to determine risk are not only causes of APR, they are caused by APR.  For example, while the likelihood of prepayment may cause higher APR, it is equally true that higher APR makes prepayment more likely.

The report states that the recent financial crisis revealed many shortcomings in the market, one of which was that default and default loss models "woefully underestimated credit losses."  The two common models used are ex-ante default based on seasoned mortgages and relying on the probability of default falling drastically with time. This model does not take into consideration borrower demographics and eliminates the chances for statistical discrimination as lending decisions are made on objective criteria.  The second model is designed to estimate the cash flow from mortgages and includes variables reflecting conditions at application and those reflecting the evolving conditions of the mortgage and housing markets.  This model is also flawed because the population surviving into out years is fundamentally different that the original population of mortgages simply because they have failed to prepay or default.

The study concludes that the outcome of a mortgage transaction involves the simultaneous consideration of many factors and this complexity is ignored by current theoretical models that consider two variables at a time. The serious limitations of current statistical approaches to testing for discrimination and credit risk in mortgage lending have likely contributed to recent problems in mortgage markets. If these limitations are not recognized and naïve reliance on them continues, current problems are likely to recur in the future. Alternatively, there are major gains to be made if economic analysis of mortgage market discrimination and mortgage credit risk can be improved.

Common Sense Not Found in Automated Underwriting Engines

Expanding the Pool of Eligible Homeowners: Common Sense Underwriting Needed