Learning objectives: Understand the differences between and consequences of underfitting and overfitting, and propose potential remedies for each. Use principal components analysis to reduce the dimensionality of a set of features.
Questions:
23.2.1. Patricia is building a factor model for her firm's primary equity portfolio. Her database includes several dozen candidate common factors, aka features. She considers employing principal component analysis (PCA) for the task. Each of the following statements about PCA is true EXCEPT which is false?
a. PCA is an unsupervised learning technique
b. The primary benefit of PCA is dimensionality reduction
c. PCA translates correlated features into a linear combination of uncorrelated components
d. PCA is an ensemble technique that combines models where each model partitions the data into hierarchical nodes with branches
23.2.2. Oliver is conducting principal component analysis (PCA) for risk management purposes: his goal is to isolate the primary drivers of price variability in the firm's portfolios. His dataset includes many features, and his general procedure includes the following steps (but various sub-steps are excluded):
a. Variables with larger scales will dominate the analysis
b. It may be hard to interpret the meaning of the components
c. It will require that he label the observations, which is a time-consuming process
d. Depending on the interior relationships, his final product will generate n*(n-1)/2 principal components
23.2.3. For purposes of fraud detection, Sarah is experimenting with the following machine learning techniques:
a. She should reduce complexity with the goal of minimizing the variance of the prediction in the training and validation set
b. Her models exhibit the classic performance-complexity symptom because she is mistakenly using unsupervised learning models such that she should switch to supervised learning models
c. She should increase complexity until the model's error begins to increase (aka, deteriorates) for the validation set because further complexity is likely to produce predictions with low bias but excessive variance
d. She should increase complexity until the model's error begins to increase (aka, deteriorates) for the validation set because further complexity is likely to produce predictions with low variance but excessive bias
Answers here:
Questions:
23.2.1. Patricia is building a factor model for her firm's primary equity portfolio. Her database includes several dozen candidate common factors, aka features. She considers employing principal component analysis (PCA) for the task. Each of the following statements about PCA is true EXCEPT which is false?
a. PCA is an unsupervised learning technique
b. The primary benefit of PCA is dimensionality reduction
c. PCA translates correlated features into a linear combination of uncorrelated components
d. PCA is an ensemble technique that combines models where each model partitions the data into hierarchical nodes with branches
23.2.2. Oliver is conducting principal component analysis (PCA) for risk management purposes: his goal is to isolate the primary drivers of price variability in the firm's portfolios. His dataset includes many features, and his general procedure includes the following steps (but various sub-steps are excluded):
- Standardize the data with (n) number of features
- Compute covariance matrix
- Retrieve eigenvalues and eigenvectors
- Sort the eigenvectors (with respect to their eigenvalues) and chose the components with a scree plot
a. Variables with larger scales will dominate the analysis
b. It may be hard to interpret the meaning of the components
c. It will require that he label the observations, which is a time-consuming process
d. Depending on the interior relationships, his final product will generate n*(n-1)/2 principal components
23.2.3. For purposes of fraud detection, Sarah is experimenting with the following machine learning techniques:
- Decision tree
- Random forest
- K-nearest neighbor (KNN)
- A high-order (aka, high-degree) polynomial regression
a. She should reduce complexity with the goal of minimizing the variance of the prediction in the training and validation set
b. Her models exhibit the classic performance-complexity symptom because she is mistakenly using unsupervised learning models such that she should switch to supervised learning models
c. She should increase complexity until the model's error begins to increase (aka, deteriorates) for the validation set because further complexity is likely to produce predictions with low bias but excessive variance
d. She should increase complexity until the model's error begins to increase (aka, deteriorates) for the validation set because further complexity is likely to produce predictions with low variance but excessive bias
Answers here: