Let’s visualize distributions of each feature.
The price return and feature data look sensible, mostly unimodal with some skew.

We have not standarized data yet, so these are in the actual units as collected from FactSet.

Scatterplot monthly returns to features to explore any obvious direct relationships.

Note that below P_PRICE_RETURNS is our y (i.e., the value we want to estimate, decompose, and model), whereas, P_PRICE_RETURNS_PR is trailing 6 months returns and is actually a potential feature (i.e., x predictor) often termed as momentum in the financial literature.
Visually, we don’t see any real strong predictors here, perhaps momentum, but in the interest of time and experimentation we move ahead with the modeling process.

Scatterplot features versus features to explore any correlations.

Highly correlated features are not recommended in a regression model. Certain correlations below are sensible and expected such as any price ratio vs price itself. We will not use price itself in the actual feature set for the Lasso regression, but there are several ratio features that involve price.

<< Part 2   Part 4 >>