David Harper CFA FRM

David Harper CFA FRM
Subscriber
Learning objectives: Differentiate among unsupervised, supervised, and reinforcement learning models. Explain how reinforcement learning operates and how it is used in decision-making.

Questions:

23.4.1. Emma is designing a reinforcement learning algorithm to support her investment firm's technical trading guidelines. She is writing code according to both a Monte Carlo (MC) approach and temporal difference (TD) approach. Symbolically, these are represented as follows:

I. MC: V(S) ← V(S) + α[GT - V(S)]
II. TD: V(S) ← V(S) + α[R1+ γ*V(S1) - V(S)]

In these functions, V(S) is the value at time T as given by V(S) = max(A)[Q(S, A)], and V(S1) is the value of the next (T+1) state; i.e., V[S(T+1)] is abbreviated to V(S1). Further, GT = total reward and R1 = reward at the next step.

In regard to these MC and TD approaches to reinforcement learning, which of the following statements is TRUE?

a. In both methods (i.e., MC and TD), the algorithm randomly chooses between greedy exploitation and non-greedy exploration
b. As more trials are conducted for both methods, the probability of exploration increases while the probability of exploitation decreases
c. Both methods are supervised learning; the TD approach is supervised due to its next step's value, V(S1), and the MC approach is supervised due to its GT variable
d. The MC method updates state value by looking one decision ahead, while the TD method updates state value as a function of the total reward


23.4.2. Robert just started his new job as a data scientist, and his first discovery is that a data scientist spends a lot of time performing data wrangling. The first dataset he needs to wrangle (aka, clean) contains some typical issues, including duplicates, missing features, and outliers.

In regard to his data wrangling, each of the following statements is true EXCEPT which is false?

a. He should remove duplicates
b. He should remove all observations that contain any missing features
c. Because his data contains outliers, standardization is probably better than normalization
d. If the data's distribution is non-Gaussian and he wants to preserve its distribution, normalization is probably better than standardization


23.4.3. For her investment firm, Audrey has identified six high-priority machine learning use cases. They are given below in three pairs:

I. Segment customers into different groups; and perform sentiment analysis of 10K SEC filings
II. Predict if borrower(s) will default; and forecast future stock price(s) based on fundamental data
III. Decide how to split large-volume trades to sell quickly while minimizing the adverse price effect, and decide how much of a position to hedge using derivatives.

Which of the following is the best mapping of a general machine learning approach to its pair of use cases?

a. I=Supervised, II=Reinforcement, III=Unsupervised
b. I=Reinforcement, II=Unsupervised, III=Supervised
c. I=Reinforcement, II=Unsupervised, III=Unsupervised
d. I=Unsupervised, II=Supervised, III=Reinforcement

Answers:
 
Last edited by a moderator:
Top