P1.T2.24.2. Rescaling variables in data preparation

Nicole Seaman

Director of CFA & FRM Operations
Staff member
Subscriber
Learning Objectives: Compare and apply the two methods utilized for rescaling variables in data preparation.

Questions:

24.2.1: You are analyzing a dataset containing information about customers' purchasing behavior, including variables such as the amount spent, frequency of purchases, and customer age. These variables have different scales and units. In which of the following scenarios would you choose standardization over normalization?

a. You want to scale the features to a common range based on their minimum and maximum values while preserving their original distribution.
b. You plan to apply machine learning algorithms that require features to have similar scales, as large differences in scale may lead to biased model performance.
c. You aim to transform the features to a fixed range between 0 and 1 to facilitate direct comparison between different customers.
d. You are using clustering algorithms such as k-means clustering, which calculates distances between data points based on the Euclidean distance metric, making them sensitive to differences in feature scales.


24.2.2: A real estate analyst is tasked with comparing home prices in two different neighborhoods―Neighborhood A and Neighborhood B. These neighborhoods have distinct characteristics such as average home size, median income of residents, and proximity to amenities like schools and parks. The analyst wants to make sure prices are comparable in terms of their distribution and variability.

24_2-2Q.png


Should the analyst use normalization or standardization, and what would be the appropriate normalized or standardized value for home 1 in City A?

a. Normalization. -0.8697
b. Standardization. -0.8697
c. Normalization. 0.9591
d. Standardization. 0.9591


24.2.3: A real estate investor is considering investing in properties in two cities―City A and City B. City A is a small town with relatively low home prices, where the average home price is about $149,000 with a standard deviation of about $20,000. On the other hand, City B is a metropolitan area with higher-priced homes, where the average home price is $750,000 with a standard deviation of $100,000. The investor would like to compare the relative affordability of homes in each city.

24_2-3Q.png


Should the investor use normalization or standardization, and what would be the appropriate normalized or standardized value for home 1 in City A?

a. Normalization. 0.3724
b. Standardization. 0.3724
c. Normalization. -0.6391
d. Standardization. -0.6391

Answers here:
 
Top