Insights

Introducing Large Language Models to Traditional Machine Learning Operations

Machine learning (ML) operations (MLOps) is the set of practices, tools, and associated culture that bridges the gap between building ML models and running them reliably in production. MLOps is as much about science and engineering as it is about systems thinking. Some of learnings in this article come from a session at GTC focused on explaining why MLOps is needed, and what changes with MLOps when generative technologies are introduced.

Article content
A traditional MLOps approach starts with problem definition and culminates with continuous monitoring and validation, like most (good) product development lifecycles.

Why does MLOps matter?

Let’s start with why this matters. Especially in financial services, we have to be able to answer at least four key questions:

  1. What is this model doing and where is it in production?
  2. What was it trained and validated on, and is production still consistent with that?
  3. Who approved it, who owns it now, and what changed since approval?
  4. Is it still performing safely, fairly, and within policy, and what happens if it does not?

MLOps is at the heart of many of these questions. Model management is not new to us in mortgage and has significantly evolved from the early days of automated underwriting and securitization. With the financial crisis of 2008, model risk became increasingly central to fair lending, risks assumptions, and stress testing. The introduction of generative models into our ecosystem only increases the importance of good operational practices around this new kind of ML system, hence the idea of MLOps as central to the new AI-powered mortgage ecosystem.

Famous Examples of MLOps Gone Wrong

There are many examples of MLOps "gone wrong", or cases where an engineering operations focus could have created better outcomes. A few examples mentioned at the NVIDIA conference:

Failure Through an MLOps Lens

Article content
It was helpful for me to see this diagram explained within the context of the spectacular failures as explained by serious MLOPs engineers.

Looking at a robust MLOps framework like the one used at NVIDIA, we can pinpoint the failure points.

How MLOps Changes with Generative Models

Article content

Generative models are simply a new and more complicated model to manage. Even a simple retrieval augmented generation (RAG) bot is actually a relatively complicated MLOps system.

We still start with problem definition, that hasn't changed. And we still have to make our data "ready" for use in retrieval. This data, which will be used to enrich and further contextualize the base knowledge of the large language model, has to be converted to vector embeddings, which requires, chunking, tokenization, and use of an embedding model. All steps in an MLOps pipeline flow. We should probably also store the original natural language data, yet another thing to manage. And don't forget change, how we govern the process of updating our embeddings is kind of a big deal. More MLOps.

We have this new problem of prompt management and evaluation, which requires golden standard data sets with both the questions and the answer (also called QA pairs for question and answer). It also requires us to have a good natural language versioning solution, again because the problem of change and optimization is a big deal. Prompts are a significant asset to understand, version, evaluate, test, optimize and control.

And of course model management, not exactly new in a generative scenario, but perhaps more fluid. Even if we are not training foundation models, we still have to preserve optionality, adapt the behavior of the model (perhaps going so far as to fine tune it), and customize how it behaves under real workloads and in the real world. More work with the intersections of prompting, data, and evaluation. Ensuring the model performs with edge cases, unanticipated scenarios, and over time (detecting and addressing model drift) would all be the purview of MLOps.

Although not expressly called out in our diagram above, latency management seems more significant with the addition of generative technologies. Users have come to expect instant answers, regardless of how complicated the request it. We have to think about when to offload workloads to an asynch process, when to use streaming so users can at least see the answer as it builds, paralellization.

Finally guardrails, again not exactly new to traditional MLOps, but something acutely important for understanding, controlling, and evidencing an AI system. Proving the responsible implementation, demonstrating safety, evidencing harm prevention - all essential to managing a good AI system, and all the purview of MLOps if you want it done well.

By Tela Mathias, CTO & Chief Nerd and Mad Scientist at PhoenixTeam

Accelerate Your Operations with AI-powered Expertise

Let’s Talk

Stay Connected

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
© 2026 PhoenixTeam. All rights reserved.   |   Privacy Policy   |   Terms of Use