"Over the past twenty years, understanding of and business practice in mortgage markets has been influenced significantly by the application of statistical models. Mortgage underwriting was automated using statistical models of default and default loss, and statistical models of denial rates and loan pricing were used to test for discrimination in lending. Efforts to measure mortgage market discrimination and credit risk have been propelled by an increase in the loan-level data available through various resources. Unfortunately, as researchers strived to produce results from these data, critical statistical errors were overlooked and then repeated in what has become the “conventional approach” to measuring discrimination and credit risk."
That is the executive summary of A Review of Statistical Problems
in the Measurement of Mortgage Market and Credit Risk, a paper sponsored by MBA's Research
Institute for Housing America (RIHA) and conducted by Professor Anthony M.
Yezer of George Washington University. The study examined three models used in
conventional approaches to testing for discrimination in the granting of
mortgages - testing based on applicant rejection equations, based on mortgage
pricing equations, and based on mortgage default equations and one conventional
approach to measuring credit risks.
The study found that statistical
errors had been perpetuated through many different analysis of the underlying
data, those errors were then unknowingly institutionalized and accepted as conventional practice, i.e. some simple
statistical models are not firmly
grounded in economic theory.
According to Yezer, the major simplifying
assumption made in the models is that borrowers have no knowledge of the
mortgage lending process and do not select mortgage terms strategically. The theory has been that the lender selects
the mortgage terms and borrowers are oblivious to the effects of their own decisions
on the transaction. For example,
conventional statistical techniques assume that borrowers determine the amount
they are willing to pay for a home without considering that this decision will
impact on the chances they will be rejected.
This, the author says, flies in the face of reality. Any buyer finds out early in the process that
his ability to qualify will be based largely on the price of the home he selects
and the amount of his down payment. This
self-selection is largely overlooked in conventional studies.
Another problem with conventional
techniques is omitted variable bias. For
example, in an evaluation of discrimination based on rejection, the variable
indicating race made be analyzed incorrectly as correlating with rejection when
instead it is a variable that is a dummy for race (perhaps a lower average
income or credit score for a minority population) that is causing the effect
but is not included in the equation.
Incorrect coding also led to
incorrect interpretations. For example,
in an analysis of Boston Federal Reserve data, one study found a significant
source of errors was the difference between what was initially claimed by the
applicant and the final determination by the underwriter. Sometimes both the initial false claim and
the verified information were both retained and coded and it is unclear which
should be properly included in a statistical analysis.
The complexity of the loan
decision leads to other problems. For
example, some characteristics of the borrower may become less important when
there is a cosigner. If there is a poor
credit history, then LTV takes on more significance. When one study interacted various
underwriting variables with a race dummy, it was found that, for some
variables, being a minority was an advantage while in others it was a
disadvantage.
These types of complexity also
affect analysis of loan pricing as a determinant of discrimination. Loan terms used to determine risk are not
only causes of APR, they are caused by APR.
For example, while the likelihood of prepayment may cause higher APR, it
is equally true that higher APR makes prepayment more likely.
The report states that the
recent financial crisis revealed many shortcomings in the market, one of which
was that default and default loss models "woefully underestimated credit
losses." The two common models used
are ex-ante default based on seasoned mortgages and relying on the probability
of default falling drastically with time. This model does not take into
consideration borrower demographics and eliminates the chances for statistical
discrimination as lending decisions are made on objective criteria. The second model is designed to estimate the
cash flow from mortgages and includes variables reflecting conditions at application
and those reflecting the evolving conditions of the mortgage and housing
markets. This model is also flawed
because the population surviving into out years is fundamentally different that
the original population of mortgages simply because they have failed to prepay
or default.
The study concludes that
the outcome of a mortgage transaction involves the simultaneous consideration
of many factors and this complexity is ignored by current theoretical models
that consider two variables at a time. The serious limitations of current
statistical approaches to testing for discrimination and credit risk in
mortgage lending have likely contributed to recent problems in mortgage markets.
If these limitations are not recognized and naïve reliance on them continues,
current problems are likely to recur in the future. Alternatively, there are
major gains to be made if economic analysis of mortgage market discrimination
and mortgage credit risk can be improved.
Common Sense Not Found in Automated Underwriting Engines
Expanding the Pool of Eligible Homeowners: Common Sense Underwriting Needed