AI algorithms
AI algorithms
XGBoost
https://xgboost.readthedocs.io/en/latest/parameter.html
XGBoost (Extreme Gradient Boosting) implements a Gradient Boosting algorithm that forms a strong predictor by training a sequence of weak predictors, each improving on the previous ones' results. It is a non-parametric machine learning algorithm, meaning it does not rely on assumptions about the underlying distribution of the data. Memory usage can be a concern for extremely large datasets.
Random Forests
https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.RandomForestRegressor.html
Random Forest is an ensemble algorithm formed by averaging the outputs of a set of decision trees. It is a non-parametric machine learning algorithm, meaning it does not rely on assumptions about the underlying distribution of the data. Memory-intensive with many trees or features.
Extra Trees
https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html
Extra Trees Regressor is a machine learning model that predicts numerical values using an ensemble of decision trees. It improves accuracy by introducing randomness in tree splits and data sampling, making it more robust and less prone to overfitting. It is a non-parametric machine learning algorithm, meaning it does not rely on assumptions about the underlying distribution of the data. Faster than Random Forest but with similar memory limitations.
LightGBM
https://lightgbm.readthedocs.io/en/stable
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel, distributed, and GPU learning.
Capable of handling large-scale data.
Linear Regression
https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LinearRegression.html
Linear regression is a parametric statistical technique that fits the linear relationship between the target and features. Best suited for Z-Score preprocessor.
Keras Neural Networks
https://keras.io/about/
Neural networks imitate the functioning of the layer of neurons in the human brain. They fit multiple layers of interconnected nodes, producing a non-linear, non-parametric transformation on the input data.
It is a non-parametric machine learning algorithm, meaning it does not rely on assumptions about the underlying distribution of the data.
Support Vector Machines
https://scikit-learn.org/1.5/modules/generated/sklearn.svm.SVR.html
SVMs are primarily designed for parametric data but can also handle non-parametric data to some extent. However, it's important to note that SVMs might not perform as well with highly nonlinear data compared to non-parametric methods like decision trees or neural networks. Additionally, preprocessing steps like feature scaling are recommended before applying SVMs to any data type.
Generalized Additive Models
https://www.statsmodels.org/stable/gam.html
GAMs are a flexible class of statistical models that can accommodate parametric and non-parametric relationships between the predictors and the response variable. Their flexibility makes them suitable for various data types and distributions. They are handy when the relationship between predictors and the response is unknown or suspected to be nonlinear.
DeepTables
https://deeptables.readthedocs.io/en/latest/
DeepTables is a machine learning tool for efficiently working with tabular data using neural networks. It automates data preprocessing, feature engineering, model selection, hyperparameter tuning, and ensemble learning. DeepTables can handle both parametric and non-parametric variables.
Scalability with large datasets
Certain models do not scale well and will struggle with datasets containing millions of rows or high-dimensional datasets with hundreds of features. To reduce the training time and out of memory errors when using these models:
Use a smaller training universe
Shorten the dataset period or lengthen the dataset frequency
Reduce the number of features
Scale Well: LightGBM, XGBoost, Random Forest, Extra Trees, Linear Regression, Keras, DeepTables.
Do Not Scale Well: Support Vector Machines (SVMs), Generalized Additive Models (GAMs).
Did this answer your question?