AI Workflow

Workflow for AI Factor

Marco Salerno
Written by Marco SalernoLast update 3 months ago

Machine Learning Any Investor Can Master

Machine learning workflows define which phases are implemented during a machine learning project. The typical phases include data collection, data pre-processing, model training and refinement, evaluation, and deployment to production. Check out the AI Factor User Guide for how Portfolio123 implements these steps.

Preparation

Define the main settings:

Target: the factor you want to predict, for example the excess return over the benchmark over the next three months.

Universe: The stocks used to create the training dataset.

Features: The input variables a machine learning model uses to predict a target variable. They can be financial ratios, technical indicators, macro data, or other relevant data that may influence the target variable.

Preprocessing: The method used to transform features to the same scale (e.g., Z-score, Rank, Min-Max scaling). This is crucial for certain algorithms that rely on distance metrics.

Feature Engineering

Feature engineering is the process of using domain knowledge to select, modify, or create features (input variables) that make machine learning algorithms work more effectively. It plays a crucial role in supervised learning, where the goal is to map input data to a target variable. The quality of features can significantly influence the performance of a model.

Cross Validation

In this step many different models are trained using a "Cross Validation" method to produce a variety of reports. The term "Cross Validation" involves dividing the dataset into multiple subsets or "folds". It is the preferred method for trading systems as the "generalization" of the model (how well it performs on unseen data) can be tested across different market cycles.

Portfolio123 provides you with many predefined models you can test, specifically engineered for financial data, or you can create your own by modifying the models hyperparameters. In addition, different validation methods can be used like K-fold or Rolling Time Series.

Evaluate Models

The validated models are compared using standard ML tools, like the Lift Chart and reports specifically geared for investors. For example, quantile portfolio backtests are automatically generated to compare their performances.

The information learned from this step is used to iteratively tune models and/or to further refine the features used. Keep in mind that the more iterations you perform, the chances of over-fitting to the past increases.

Tune Models

The validation reports can be used to further tune hyper-parameters. For example if a pattern is noticed that models with more "trees" in Random Forest algorithms perform better, new models can be tested using more "trees". Alternatively, hundreds of hyper-parameters combinations can be tested via brute force using grid search.

Train Predictor

The best model is chosen and a Predictor is trained that can be used to generate current predictions. These predictions can be used in Ranking Systems, Buy/Sell rules and for screening. It's also possible to use multiple different models as an ensemble.

Did this answer your question?