Principal Component Analysis

brian.field

Well-Known Member
Subscriber
Principal Components Analysis (PCA) is a multivariate statistical technique that is used further as a dimension reduction technique. Assume you have a plethora of data with 100s or even 1000s of variables and you don't have a good idea of which are potential explanatory variables. You could use PCA to identify the drivers of the response, which are called principal components. The principal components are not actual variables but rather hypothetical constructs (or factors similar to factors in factor analysis) that are ordered in terms of most explanatory to least explanatory. Typically, the first 2-3 components explain a significant amount of the information in the data, so the statistician makes a decision with respect to to how many components to utilize. Lastly, much of PCA is described within linear algebra via eigenvectors and eigenvalues.
 
Last edited:

ami44

Well-Known Member
Subscriber
Just to add to brian.fields explanation, the principal components are linear combinations of your original variables. Dependent on your application they might or might not have a useful interpretation.

Mathematically PCA is the diagonalization of the correlation matrix of your variables. This means the principal components are the eigenvectors of that matrix.
 

bpdulog

Active Member
The reading states that the principal components themselves are not correlated - how can this be the case? To use the Tuckman reading example, if I am hedging a bond portfolio, my first few components may be a 2 year and 30 year Treasury, how can there be no correlation between these two components?
 

brian.field

Well-Known Member
Subscriber
The components are mathematical/theoretical constructs....the components would NOT be the 2 year and the 30 year if you were to use PCA in your example. PCA in a problem with only 2 independent variables would be inappropriate. I forget the details, so some of my statements may be off slightly, but if I remember correctly, PCA is appropriate for high dimensional problems. It is a dimension reduction technique used in multivariate statistics. The PCA approach identifies eigenvalues/eigenvectors that "explain" the behavior of the data. Eigenvalues/vectors are, by definition, perpendicular, or in other words, uncorrelated, i.e., their dot products are 0. Say you have 10 eigenvalues....the first 4 of them might explain 95% of the behavior in the data, so you could effectively utilize the 4 principal components associated with the 4 eigenvalues rather than using all 10 since last 6 appear unimportant as they explain only 5% of the behavior. So, you will have reduced the dimensionality down from 10 to 4.
 

bpdulog

Active Member
Thanks for the explanation! I'll stick with the high level concept for now, hopefully we won't have to calculate any eigenvalues...
 

brian.field

Well-Known Member
Subscriber
I'd take a quick look at the previous posts in this thread - perhaps I should have as well before responding!
 

brian.field

Well-Known Member
Subscriber
Thanks for the explanation! I'll stick with the high level concept for now, hopefully we won't have to calculate any eigenvalues...
Incidentally, I think it is a fascinating technique....I have never been able to use it in my work though....
 

Matthew Graves

Active Member
Subscriber
Just in case anybody wants a practical example, PCA on yield curve changes is quite common. In well-behaved, liquid markets the first three components are usually described as Shift (approx. parallel change), Twist (approx. steepness) and Butterfly (approx. curvature). Thus you can model the changes in, say, 15 tenor points along the curve in terms of the 3 principal components. From there you can start to describe your return etc. in terms of these less granular, more intuitive components.
 

Linghan

Active Member
The components are mathematical/theoretical constructs....the components would NOT be the 2 year and the 30 year if you were to use PCA in your example. PCA in a problem with only 2 independent variables would be inappropriate. I forget the details, so some of my statements may be off slightly, but if I remember correctly, PCA is appropriate for high dimensional problems. It is a dimension reduction technique used in multivariate statistics. The PCA approach identifies eigenvalues/eigenvectors that "explain" the behavior of the data. Eigenvalues/vectors are, by definition, perpendicular, or in other words, uncorrelated, i.e., their dot products are 0. Say you have 10 eigenvalues....the first 4 of them might explain 95% of the behavior in the data, so you could effectively utilize the 4 principal components associated with the 4 eigenvalues rather than using all 10 since last 6 appear unimportant as they explain only 5% of the behavior. So, you will have reduced the dimensionality down from 10 to 4.
@brian.field is the eigenvalue of PCA the same thing explaines the curviture and shape of the model? I was trying to associate canonical table of data from my design and experiment class to PCAs...prob not necessary...
 

Matthew Graves

Active Member
Subscriber
The eigenvalue of each principal component is essentially indicating the amount of variance explained by that component. Associating eigenvectors (components) to real world concepts is a tricky business as they are by definition mathematical constructs rather than real world observables. If you want to try and make sense of the components, try looking at the values within the eigenvector for each input factor. There may be clusters/patterns that will allow you to (loosely) associate each component to a real world concept.
 
Top