đŸ¤–
AI Wiki
Gradient PlatformDocsGet Started FreeContact Sales
  • Artificial Intelligence Wiki
  • Topics
    • Accuracy and Loss
    • Activation Function
    • AI Chips for Training and Inference
    • Artifacts
    • Artificial General Intelligence (AGI)
    • AUC (Area under the ROC Curve)
    • Automated Machine Learning (AutoML)
    • CI/CD for Machine Learning
    • Comparison of ML Frameworks
    • Confusion Matrix
    • Containers
    • Convergence
    • Convolutional Neural Network (CNN)
    • Datasets and Machine Learning
    • Data Science vs Machine Learning vs Deep Learning
    • Distributed Training (TensorFlow, MPI, & Horovod)
    • Generative Adversarial Network (GAN)
    • Epochs, Batch Size, & Iterations
    • ETL
    • Features, Feature Engineering, & Feature Stores
    • Gradient Boosting
    • Gradient Descent
    • Hyperparameter Optimization
    • Interpretability
    • Jupyter Notebooks
    • Kubernetes
    • Linear Regression
    • Logistic Regression
    • Long Short-Term Memory (LSTM)
    • Machine Learning Operations (MLOps)
    • Managing Machine Learning Models
    • ML Showcase
    • Metrics in Machine Learning
    • Machine Learning Models Explained
    • Model Deployment (Inference)
    • Model Drift & Decay
    • Model Training
    • MNIST
    • Overfitting vs Underfitting
    • Random Forest
    • Recurrent Neural Network (RNN)
    • Reproducibility in Machine Learning
    • REST and gRPC
    • Serverless ML: FaaS and Lambda
    • Synthetic Data
    • Structured vs Unstructured Data
    • Supervised, Unsupervised, & Reinforcement Learning
    • TensorBoard
    • Tensor Processing Unit (TPU)
    • Transfer Learning
    • Weights and Biases
Powered by GitBook
On this page
  • What Is MLOps?
  • Multi-cloud
  • Why Is MLOps Important?
  • An Analogy
  • MLOps + Gradient

Was this helpful?

  1. Topics

Machine Learning Operations (MLOps)

PreviousLong Short-Term Memory (LSTM)NextManaging Machine Learning Models

Last updated 5 years ago

Was this helpful?

What Is MLOps?

Machine Learning Operations (MLOps) is a set of practices that provide determinism, scalability, agility, and governance in the model development and deployment pipeline.

Simplified model deployment - Data scientists use a variety of languages, frameworks, tools, and IDEs. With MLOps, ML teams can develop models using the interface, framework, and language that makes the most sense for the task at hand. For example, transitioning from a prototype model in a Jupyter Notebook to a large-scale hyperparameter sweep should be trivial.

Accelerate model training - The machine learning training process is computational complex and time intensive. Data scientists need access to on-demand compute and storage resources so they can iterate faster in the training phase. With MLOps, training phase is infrastructure agnostic, scalable, and minimizes complexity for the data scientist.

Purpose-built model monitoring - Tools and techniques used to monitor traditional software are not suitable for machine learning. MLOps provides purpose-built monitoring systems designed for machine learning. The key metrics must be tracked to gauge and compare performance of deployed models. Realtime alerting on important signals like model drift is also critical.

Model life cycle management - Models are not static assets. They are constantly retrained on new data and improved over time. Tooling these updates become a burden as teams scale up and begin to juggle multiple models and contributors. MLOps provides a unified hub for tracking the lineage and performance as new models are developed and rolled-out to production.

Model governance - Data, models, and other resources need to be tightly controlled to prevent undesirable changes and to ensure regulatory compliance where applicable. MLOps provides centralized access control, traceability, and audit logs to minimize risk and ensure regulatory compliance.

Multi-cloud

Enterprises are increasingly opting for a multi-cloud approach to leverage various resources across environments and decrease vendor lock-in. They require enhanced development capabilities with greater flexibility across multiple cloud and on-prem environments to suit their specific workflow needs.

Why Is MLOps Important?

Developing and deploying machine learning models turns out to be a slow process. The tools lack automation, collaboration is difficult, and workflows are difficult to scale. Ultimately, the time it takes to move from concept to production and deliver business value is a major hurdle in the industry. That’s why we need good MLOps that are designed to standardize and streamline the lifecycle of ML in production.

An Analogy

DevOps as a practice ensures that the software development and IT operations lifecycle is efficient, well documented, scalable, and easy to troubleshoot. MLOps incorporates these practices to deliver machine learning applications and services at high velocity. This new paradigm is useful as a way for enterprises to overcome the many challenges of training and deploying models into production.

MLOps + Gradient

Gradient abstracts infrastructure and provides robust reproducibility and determinism. For the first time, data scientists can operate without a dependence on SREs or software teams to handle tasks like managing a Kubernetes cluster, ingesting data, and pipelining. When ML teams can operate with full autonomy and own the entire stack, they are much more efficient and agile.

Gradient from offers agile ML tooling and methodology across multi-cloud, on-premise, and hybrid environments — without the need for DevOps or any manual configuration.

model-specific
Paperspace