🤖
AI Wiki
Gradient PlatformDocsGet Started FreeContact Sales
  • Artificial Intelligence Wiki
  • Topics
    • Accuracy and Loss
    • Activation Function
    • AI Chips for Training and Inference
    • Artifacts
    • Artificial General Intelligence (AGI)
    • AUC (Area under the ROC Curve)
    • Automated Machine Learning (AutoML)
    • CI/CD for Machine Learning
    • Comparison of ML Frameworks
    • Confusion Matrix
    • Containers
    • Convergence
    • Convolutional Neural Network (CNN)
    • Datasets and Machine Learning
    • Data Science vs Machine Learning vs Deep Learning
    • Distributed Training (TensorFlow, MPI, & Horovod)
    • Generative Adversarial Network (GAN)
    • Epochs, Batch Size, & Iterations
    • ETL
    • Features, Feature Engineering, & Feature Stores
    • Gradient Boosting
    • Gradient Descent
    • Hyperparameter Optimization
    • Interpretability
    • Jupyter Notebooks
    • Kubernetes
    • Linear Regression
    • Logistic Regression
    • Long Short-Term Memory (LSTM)
    • Machine Learning Operations (MLOps)
    • Managing Machine Learning Models
    • ML Showcase
    • Metrics in Machine Learning
    • Machine Learning Models Explained
    • Model Deployment (Inference)
    • Model Drift & Decay
    • Model Training
    • MNIST
    • Overfitting vs Underfitting
    • Random Forest
    • Recurrent Neural Network (RNN)
    • Reproducibility in Machine Learning
    • REST and gRPC
    • Serverless ML: FaaS and Lambda
    • Synthetic Data
    • Structured vs Unstructured Data
    • Supervised, Unsupervised, & Reinforcement Learning
    • TensorBoard
    • Tensor Processing Unit (TPU)
    • Transfer Learning
    • Weights and Biases
Powered by GitBook
On this page
  • What is the right number of epochs?
  • What is the right batch size?

Was this helpful?

  1. Topics

Epochs, Batch Size, & Iterations

PreviousGenerative Adversarial Network (GAN)NextETL

Last updated 5 years ago

Was this helpful?

In most cases, it is not possible to feed all the training data into an algorithm in one pass. This is due to the size of the dataset and memory limitations of the compute instance used for training. There is some terminology required to better understand how data is best broken into smaller pieces.

An epoch elapses when an entire dataset is passed forward and backward through the neural network exactly one time. If the entire dataset cannot be passed into the algorithm at once, it must be divided into mini-batches. Batch size is the total number of training samples present in a single min-batch. An iteration is a single gradient update (update of the model's weights) during training. The number of iterations is equivalent to the number of batches needed to complete one epoch.

So if a dataset includes 1,000 images split into mini-batches of 100 images, it will take 10 iterations to complete a single epoch.

What is the right number of epochs?

During each pass through the network, the weights are updated and the curve goes from , to optimal, to . There is no magic rule for choosing the number of epochs — this is a that must be determined before training begins.

What is the right batch size?

Like the number of epochs, batch size is a with no magic rule of thumb. Choosing a batch size that is too small will introduce a high degree of variance (noisiness) within each batch as it is unlikely that a small sample is a good representation of the entire dataset. Conversely, if a batch size is too large, it may not fit in memory of the compute instance used for training and it will have the tendency to the data. It's important to note that batch size is influenced by other hyperparameters such as learning rate so the combination of these hyperparameters is as important as batch size itself.

A common heuristic for batch size is to use the square root of the size of the dataset. However this is a hotly debated topic.

underfitting
overfitting
hyperparameter
hyperparameter
overfit