This Time it's Different: Why GenAI Demands a New Playbook

Tiemen Vanderstraeten

03 Mar 26

6min read

AI & Data Science
Data Engineering

In late 2022, Air Canada deployed a generative AI chatbot on its website to help customers with flight information. It seemed like a straightforward upgrade to customer service. But months later, a grieving customer asked the chatbot about bereavement fares. Instead of retrieving the actual policy, the chatbot confidently invented a completely new one, assuring the customer they could book a regular ticket and claim a refund later.

When the airline refused to honor the fictional refund, the customer took them to small claims court. Air Canada argued that the chatbot was a "separate legal entity" responsible for its own actions. The tribunal disagreed, holding the airline liable for the chatbot's "confident lie" and forcing them to pay damages.

This highly publicized incident perfectly highlights a dangerous misconception in modern tech: treating a generative AI system like traditional software. A traditional bug crashes an app; a generative AI bug invents policies, alters your brand voice, or exposes you to legal liability.

This is where LLMOps comes in.

What is LLMOps?

Operationalizing GenAI, commonly referred to as LLMOps, is the set of operational disciplines needed to run LLM-enabled applications reliably in production.

The key point is that the operational “unit” is rarely the model alone. It is the application behavior that emerges from the model plus prompts, context, retrieval, tools, policies, and user interaction.

The shift from deterministic software to probabilistic generation forces a fundamental rethink of how we operationalize systems. With GenAI, production behavior is no longer defined only by code paths and configurations. This requires us to adapt MLOps principles, making LLMOps an extension rather than a simple renaming.

Comparing the Operational Paradigms:

Why MLOps is Insufficient on its Own

MLOps was designed around predictive systems. In those systems, performance can be validated against ground truth using numeric metrics and relatively stable input schemas. With LLM systems, the landscape shifts entirely:

  • The Data Gap: The emphasis shifts from training pipelines to inference-time context (like retrieval/RAG) that directly shapes outputs.
  • The Evaluation Gap: Many GenAI tasks don’t have a single “correct” output. Quality is often linguistic and context-sensitive rather than purely numeric.
  • Component Complexity: The deployable unit is frequently a chain or graph consisting of prompts, retrieval, policies, and tools. When a failure happens, it is often semantic or policy-related rather than a simple model crash.

Stepping back, the most important conceptual difference is this: LLMOps operationalizes behavior that is co-authored by code and language. Fitting GenAI into a purely MLOps-shaped frame often creates blind spots in visibility and quality control.

Why It Is Needed: Do Not Ship Blindly

Using GenAI without these operational rules is like releasing a powerful tool that can make up facts and be tricked by users into a risky environment. Standard software updates simply don't cover these unique risks:

  • Risk of "Confident Lies": AI models are designed to generate answers even when they are unsure. "Hallucinations" aren't rare glitches; they are predictable patterns of how these systems behave.
  • Words Become Weapons: Unlike normal software, GenAI can be hacked using plain language. Bad actors can hide "instructions" inside content to trick the AI into doing things it shouldn't, creating new security flaws like "prompt injection."
  • Reliance on Others: Many apps rely on AI models controlled by outside vendors. If the vendor changes the model, your app's behavior might change unexpectedly. Operations must manage this dependency.
  • Governance is Harder: Managing risks isn't just a technical job anymore. It involves legal and design teams because the AI creates new content that can expose the company to liability (as seen in the Air Canada case).

The Challenges: Why LLMOps is Hard in Practice

Managing GenAI combines uncertainty, security threats, and hard-to-measure quality into one complex problem. The main challenge is that GenAI blurs the lines between strict computer code and human conversation.

  • Defining "Good" is Subjective: In traditional software, a program either works or crashes. With GenAI, an answer is judged on “soft” qualities like helpfulness, tone, or safety. We have to monitor multiple complex traits at once.
  • Inconsistency: The system plays a guessing game. Even with the exact same input, the AI might give a different output later. Furthermore, the “input” includes background data and conversational history, making it incredibly hard to reproduce errors.
  • Hard to Blame the Right Component: When the system fails, it is often unclear why. Was the prompt confusing? Was the background data bad? Did the AI model just get confused? Fixing problems is much harder than fixing standard code bugs.
  • Privacy and Security in Real-Time: These systems process sensitive business data instantly. Ensuring the AI doesn't accidentally reveal private information or follow malicious hidden instructions makes compliance an ongoing, real-time problem.
  • Unpredictable Costs: The cost of running these systems can change rapidly. Small changes to the product can lead to huge spikes in the amount of text the AI reads or writes, driving up the price.

Transitioning from experimental AI to reliable, production-ready systems requires a fundamental shift in how you build, monitor, and govern your applications. Implementing effective LLMOps is the key to ensuring your generative systems remain safe, grounded, and aligned with your business goals rather than becoming costly, unpredictable liabilities. If you are ready to move beyond the prototype phase and need expert guidance to safely operationalize your GenAI initiatives, reach out to the team at Datashift today to build a strategy that protects your brand and drives real value.

Subscribe to our newsletter

Read, learn, adapt, grow.