🤖
AI Wiki
Gradient PlatformDocsGet Started FreeContact Sales
  • Artificial Intelligence Wiki
  • Topics
    • Accuracy and Loss
    • Activation Function
    • AI Chips for Training and Inference
    • Artifacts
    • Artificial General Intelligence (AGI)
    • AUC (Area under the ROC Curve)
    • Automated Machine Learning (AutoML)
    • CI/CD for Machine Learning
    • Comparison of ML Frameworks
    • Confusion Matrix
    • Containers
    • Convergence
    • Convolutional Neural Network (CNN)
    • Datasets and Machine Learning
    • Data Science vs Machine Learning vs Deep Learning
    • Distributed Training (TensorFlow, MPI, & Horovod)
    • Generative Adversarial Network (GAN)
    • Epochs, Batch Size, & Iterations
    • ETL
    • Features, Feature Engineering, & Feature Stores
    • Gradient Boosting
    • Gradient Descent
    • Hyperparameter Optimization
    • Interpretability
    • Jupyter Notebooks
    • Kubernetes
    • Linear Regression
    • Logistic Regression
    • Long Short-Term Memory (LSTM)
    • Machine Learning Operations (MLOps)
    • Managing Machine Learning Models
    • ML Showcase
    • Metrics in Machine Learning
    • Machine Learning Models Explained
    • Model Deployment (Inference)
    • Model Drift & Decay
    • Model Training
    • MNIST
    • Overfitting vs Underfitting
    • Random Forest
    • Recurrent Neural Network (RNN)
    • Reproducibility in Machine Learning
    • REST and gRPC
    • Serverless ML: FaaS and Lambda
    • Synthetic Data
    • Structured vs Unstructured Data
    • Supervised, Unsupervised, & Reinforcement Learning
    • TensorBoard
    • Tensor Processing Unit (TPU)
    • Transfer Learning
    • Weights and Biases
Powered by GitBook
On this page
  • Introduction
  • What is a Hyperparameter?
  • Hyperparameter Tuning with Hyperopt
  • Hyperparameter Optimization + Gradient

Was this helpful?

  1. Topics

Hyperparameter Optimization

PreviousGradient DescentNextInterpretability

Last updated 5 years ago

Was this helpful?

Introduction

Hyperparameter optimization (sometimes called hyperparameter search, sweep, or tuning) is a technique to fine-tune a model to improve its final accuracy.

Common hyperparameters include the number of hidden layers, learning rate, , and number of . There are various methods for searching the various permutations for the best possible outcome. Examples include grid search, random search, and Bayesian methods.

What is a Hyperparameter?

A model hyperparameter is a configuration that is external to the model whose value cannot be estimated from data. Since it is not possible to know the best value for a model hyperparameter on a given problem, the hyperparameter optimization process needs to iterate through the various possible permutations. We may use rules of thumb, copy values used on other problems, or search for the best value by trial and error.

Hyperparameter Tuning with

Hyperopt is a method for searching through a hyperparameter space. For example, it can use the Tree-structured Parzen Estimator (TPE) algorithm, which intelligently explores the search space while narrowing down to the best estimated parameters.

It is thus a good method for meta-optimizing a neural network. Whereas a neural network is an optimization problem that is tuned using gradient descent methods, hyperparameters cannot be tuned using gradient descent methods.

That's where Hyperopt shines -- it's useful not only for tuning hyperparameters like learning rate, but also for tuning more sophisticated parameters in a flexible way. Hyperopt can change the number of layers of different types, the number of neurons in one layer or another, or even the type of layer to use at a certain place in the network given an array of choices -- each of which may have nested, tunable hyperparameters.

It is more efficient to randomly search through values and intelligently narrow the search space, rather than looping on fixed sets of hyperparameter values. This kind of Oriented Random Search is Hyperopt's strength, as opposed to a simpler Grid Search where hyperparameters are pre-established with fixed-step increases. Random Search for Hyperparameter Optimization has proven to be such an effective search technique that it's no surprise that the paper detailing this technique is among the most cited of all deep learning papers.

If you want to learn more about Hyperopt, you'll probably want to watch the video below, made by the creator of Hyperopt:

Hyperparameter Optimization + Gradient

Gradient offers powerful hyperparameter tuning out of the box, something very difficult to implement on your own. At a bare minimum, you need a mechanism to orchestrate serial/parallel training runs, a central data repository to sync results, and have some way of measuring and exploring the output. Gradient uses TensorBoard for model comparison and Hyperopt on the backend.

activation function
epochs
Hyperopt