🤖
AI Wiki
Gradient PlatformDocsGet Started FreeContact Sales
  • Artificial Intelligence Wiki
  • Topics
    • Accuracy and Loss
    • Activation Function
    • AI Chips for Training and Inference
    • Artifacts
    • Artificial General Intelligence (AGI)
    • AUC (Area under the ROC Curve)
    • Automated Machine Learning (AutoML)
    • CI/CD for Machine Learning
    • Comparison of ML Frameworks
    • Confusion Matrix
    • Containers
    • Convergence
    • Convolutional Neural Network (CNN)
    • Datasets and Machine Learning
    • Data Science vs Machine Learning vs Deep Learning
    • Distributed Training (TensorFlow, MPI, & Horovod)
    • Generative Adversarial Network (GAN)
    • Epochs, Batch Size, & Iterations
    • ETL
    • Features, Feature Engineering, & Feature Stores
    • Gradient Boosting
    • Gradient Descent
    • Hyperparameter Optimization
    • Interpretability
    • Jupyter Notebooks
    • Kubernetes
    • Linear Regression
    • Logistic Regression
    • Long Short-Term Memory (LSTM)
    • Machine Learning Operations (MLOps)
    • Managing Machine Learning Models
    • ML Showcase
    • Metrics in Machine Learning
    • Machine Learning Models Explained
    • Model Deployment (Inference)
    • Model Drift & Decay
    • Model Training
    • MNIST
    • Overfitting vs Underfitting
    • Random Forest
    • Recurrent Neural Network (RNN)
    • Reproducibility in Machine Learning
    • REST and gRPC
    • Serverless ML: FaaS and Lambda
    • Synthetic Data
    • Structured vs Unstructured Data
    • Supervised, Unsupervised, & Reinforcement Learning
    • TensorBoard
    • Tensor Processing Unit (TPU)
    • Transfer Learning
    • Weights and Biases
Powered by GitBook
On this page

Was this helpful?

  1. Topics

Random Forest

PreviousOverfitting vs UnderfittingNextRecurrent Neural Network (RNN)

Last updated 5 years ago

Was this helpful?

Random forests are an ensemble learning technique that combines multiple decision trees into a forest or final model of decision trees that ultimately produces more accurate and stable predictions.

Random forests operate on the principle that a large number of trees operating as a committee (forming a strong learner) will outperform a single constituent tree (a weak learner). This is akin to the requirement in statistics to have a sample size large enough to be statistically relevant. Some individual trees may be wrong but as long as the individual trees are not making completely random predictions, their aggregate will form an approximation of the underlying data.

Bagging is the algorithmic technique used in the random forest scenario. This, we may recall, differs from the technique. Bagging trains individual decision trees on random samples of subsets of the dataset to reduce correlation. A benefit of bagging over boosting is that bagging can be performed in parallel while boosting is a sequential operation.

Individual decision trees are prone to and have a tendency to learn the noise in the dataset. Random Forests take an average of multiple trees -- so as long as the individual decision trees are not correlated, this strategy reduces overfitting and sensitivity to noise in the dataset.

Gradient Boosting
overfitting
Source: TIBCO
Source: CitizenNet