P2.T6.705. Logistic regression and principal component analysis (PCA, De Laurentis)

Nicole Seaman

Director of CFA & FRM Operations
Staff member
Subscriber
Learning objectives: Describe the application of a logistic regression model to estimate default probability. Define and interpret cluster analysis and principal component analysis.

Questions:

705.1. Logistic regression is often used to predict whether a loan will default. For example, the logit function can predict conditional default by the estimation of default probability as a function of several explanatory variables x(1), x(2) ... x(n), where X(i) for example could be income or loan-to-value. Here is the general form:

P2.T6.705.1.png


In reference to this logistic regression model, each of the following is a true statement EXCEPT which is false?

a. The logistic regression assumes homoskedasticity and normally distributed errors terms just like the classic linear regression model (CLRM)
b. The slope coefficient β(1) can be interpreted as the change in the "odds ratio" associated with a one unit change in the explanatory (predictor) variable x(1)
c. The logistic function constrains the dependent variable (output) to a value between the [0,1] interval which is necessary to be considered a probability
d. The logistic function is a transformation of a linear regression: the link function, LN[π/(1-π)], is a linear combination of the explanatory (predictor) variables


705.2. De Laurentis says about cluster analysis that "the objective of cluster analysis is to explore if, in a dataset, groups of similar cases are observable. This classification is based on measures of distance of observations’ characteristics. Clusters of observations can be discovered using an aggregating criterion based on a specific homogeneity definition. Therefore, groups are subsets of observations that, in the statistical domain of the (q) variables, have some similarities due to analogous variables’ profiles and are distinguishable from those belonging to other groups. The usefulness of clusters depends on: (i) algorithms used to define them, and (ii) economic meanings that we can find in the extracted aggregations. Operationally, we can use two approaches: hierarchical or aggregative on the one hand, and partitioned or divisive on the other hand." (Source: Giacomo De Laurentis, Renato Maino, and Luca Molteni, Developing, Validating and Using Internal Ratings (West Sussex, United Kingdom: John Wiley & Sons, 2010))

About the cluster analysis, each of the following statements is true EXCEPT which is false?

a. In the case of divisive clustering with (n) observations, the initial state is one cluster of size (n)
b. In the case of hierarchical clustering with (n) observations, the initial state is (n) clusters, each of size one
c. A "pre-treatment" (i.e., preliminary transformation) of variables in order to reach similar magnitude and variability
d. Both hierarchical and divisive clustering require the assumption of Euclidean distance which does not penalize greater distances


705.3. Your colleague has conducted a principal component analysis (PCA) and prepared the output below. This are De Laurentis' Tables 3.11 and 3.12. The variables are financial performance indicators, most of which are income statement and/or balance sheet ratios. Specifically, return on equity (ROE), return on investment (EBIT/invested capital), current ratio (CR; current assets/current liabilities), quick ratio, leverage (MCTI), market share (SHARE%), and intangibles (R&S%):

P2.T6.705.3.png

(Source: Giacomo De Laurentis, Renato Maino, and Luca Molteni, Developing, Validating and Using Internal Ratings (West Sussex, United Kingdom: John Wiley & Sons, 2010))

Each of the following statements is true about this PCA output EXCEPT which is inaccurate?

a. The third component (COMP3) is the current ratio (CR) variable
b. The first four components of the PCA explain over 90.0% of the total variance
c. It is possible to use the components themselves as independent (explanatory) variables in a linear regression
d. The first and second components (COMP1 and COMP2) are orthogonal to each other; i.e., uncorrelated vectors

Answers here:
 
Top