🤖
AI Wiki
Gradient PlatformDocsGet Started FreeContact Sales
  • Artificial Intelligence Wiki
  • Topics
    • Accuracy and Loss
    • Activation Function
    • AI Chips for Training and Inference
    • Artifacts
    • Artificial General Intelligence (AGI)
    • AUC (Area under the ROC Curve)
    • Automated Machine Learning (AutoML)
    • CI/CD for Machine Learning
    • Comparison of ML Frameworks
    • Confusion Matrix
    • Containers
    • Convergence
    • Convolutional Neural Network (CNN)
    • Datasets and Machine Learning
    • Data Science vs Machine Learning vs Deep Learning
    • Distributed Training (TensorFlow, MPI, & Horovod)
    • Generative Adversarial Network (GAN)
    • Epochs, Batch Size, & Iterations
    • ETL
    • Features, Feature Engineering, & Feature Stores
    • Gradient Boosting
    • Gradient Descent
    • Hyperparameter Optimization
    • Interpretability
    • Jupyter Notebooks
    • Kubernetes
    • Linear Regression
    • Logistic Regression
    • Long Short-Term Memory (LSTM)
    • Machine Learning Operations (MLOps)
    • Managing Machine Learning Models
    • ML Showcase
    • Metrics in Machine Learning
    • Machine Learning Models Explained
    • Model Deployment (Inference)
    • Model Drift & Decay
    • Model Training
    • MNIST
    • Overfitting vs Underfitting
    • Random Forest
    • Recurrent Neural Network (RNN)
    • Reproducibility in Machine Learning
    • REST and gRPC
    • Serverless ML: FaaS and Lambda
    • Synthetic Data
    • Structured vs Unstructured Data
    • Supervised, Unsupervised, & Reinforcement Learning
    • TensorBoard
    • Tensor Processing Unit (TPU)
    • Transfer Learning
    • Weights and Biases
Powered by GitBook
On this page

Was this helpful?

  1. Topics

Long Short-Term Memory (LSTM)

PreviousLogistic RegressionNextMachine Learning Operations (MLOps)

Last updated 5 years ago

Was this helpful?

We recommend reading the before diving in to LSTMs.

Long Short Term Memory networks (LSTMs) are a special kind of recurrent neural network () capable of learning long-term dependencies in sequence data. LSTMs were created to overcome the inability to retain information for long periods of time, which is an inherent limitation in RNNs.

The following image illustrates the structure of a vanilla RNN:

LSTMs have four activation functions which are used to save pertinent information to be used in later stages of training.

The horizontal line running across the cells illustrates the cell state, a channel for information to flow across the entire chain. The cell state can be updated if the information is deemed pertinent (regulated by gates in each cell). With this novel refinement to RNNs, LSTMs were explicitly designed to capture, retain, and re-use key information over long sequences.

RNN article
RNN
Source: Christopher Olah