AI Guardrails, Gateways, Governance Nightmares (and what we can do about it)

The reality of Generative AI is that it is fundamentally insecure-by-design. And as the popularity of different models and AI products have grown, we’ve seen a number of breaches and data leaks occur. Since these products significantly boost productivity in organizations (or, at the very least, have the promise to), both innovation and adoption of AI tools won’t be slowing down any time soon.

We’re seeing a tremendous shift in how enterprises are increasingly integrating AI into their development lifecycle and other operations, recognizing the undeniable productivity gains it offers.

As the ecosystem matures, the security industry has built solutions at every step of the way. Unfortunately, it’s also led to plenty of terminology confusion from guardrails to gateways.

Disambiguation: Understanding the AI security landscape

The ambiguity is likely because of several overlapping concepts that, while related, serve distinct purposes in the AI security stack.

Guardrails (Mechanism)
In the context of AI security, a guardrail refers to any mechanism that constrains or monitors AI system behavior to prevent harmful inputs/outputs or actions. Guardrails can be implemented at various levels. For example, from content filtering at the model output layer to behavioral constraints that prevent certain types of API calls.

Gateway (Solution)
A gateway in AI security acts as an intermediary point where security policies can be enforced before requests reach AI systems or external resources. Gateways operate at the network or application layer and can incorporate various security mechanisms, including guardrails, to provide comprehensive protection across AI interactions.

LLM Firewall (Solution)
An LLM firewall specifically focuses on protecting large language models from malicious inputs and preventing dangerous outputs. These systems typically operate at the HTTP API level, analyzing requests and responses to and from LLM services. LLM firewalls usually incorporate multiple guardrail mechanisms such as prompt injection detection and content filtering.

AI Guardrails - The Concept
As a broader concept, AI guardrails encompass any systematic approach to ensuring AI systems operate within acceptable boundaries. This includes technical controls, policy frameworks, and governance structures designed to prevent AI misuse.

AI Guardrails - The Product
In the commercial space, "AI Guardrails" often refers to specific products or services that provide pre-built security controls for AI deployments. These products will also offer features like prompt injection detection, content filtering, and compliance monitoring for AI applications. There’s significant overlap here with LLM Firewalls; the difference being that AI Guardrail products will cover more than protecting large language models. For example, they can also intercept the user’s interactions with online chats in the browser.

Note: for the purposes of this blog post, we will be referring to the concept of AI Guardrails and how they work with MCP Servers, unless specifically specified it’s an AI Guardrails product!

LLMs are insecure by design

Large language models operate by predicting the next token in a sequence. However, this prediction process is highly sensitive to how a user phrases their input, making LLM’s inherently vulnerable to manipulation through carefully crafted prompts. It's important to remember that the full context provided to the LLM is a combination of the application's prompt (with a specific purpose to achieve something) and the user's input (which can be malicious). This design creates substantial risk through two primary attack vectors: prompt injections and data leaks.

Prompt Injection Attacks occur when malicious users embed instructions within seemingly legitimate inputs, causing the LLM to ignore its original instructions and produce unintended outputs that can lead to harmful actions. An analogous example would be like social-engineering someone to fool them into giving you their bank account credentials. Unlike traditional injection attacks that target specific parsing vulnerabilities, prompt injections exploit the LLM's core function of following natural language instructions.

Data Leakage represents another critical vulnerability, as LLMs may inadvertently expose sensitive information from multiple sources: their training data, connected data sources like databases, APIs, and external systems they access during operation, user-provided context from history of conversations, or reveal details about their system prompts and internal processes.The probabilistic nature of token generation means that even well-intentioned models can be manipulated into revealing confidential information.

Courtesy: https://www.securityweek.com/google-gemini-tricked-into-showing-phishing-message-hidden-in-email/

LLM Firewalls: The first generation of AI security

LLM Firewalls were built specifically to address these types of vulnerabilities. They provide a focused security layer that operates on a much smaller scope compared to traditional network security solutions. Most importantly, the LLM-focused guardrails in these products only require HTTP API integration at the software level, making them relatively straightforward to deploy in existing infrastructure. Integration can also occur via a network proxy, depending on the vendor.

Traditional LLM firewalls excel in controlled environments where all AI interactions flow through well-defined API endpoints. However, LLM firewalls have significant limitations in the modern AI landscape. They were designed for the first generation of AI interactions which followed predictable request-response patterns. As AI systems became more autonomous (agents) and began interacting with external resources through protocols like Model Context Protocol (MCP), the limitations of HTTP-only security became apparent.

MCPs are insecure by design

The Model Context Protocol (MCP) enables AI systems to dynamically discover and interact with external services without requiring the pre-defined web APIs of a pre-AI era. This means that AI agents can act and make actual changes on your behalf with any MCP-compatible resource. For example, an AI agent might read from a CRM, correlate it with data from a website analytics tool, generate a report, and then email it to stakeholders, all through different MCP servers.

The inherent insecurity stems from the core logic of how MCP works, which relies on an LLM to determine tool selection and parameter passing (according to the user prompt or additional context, like that customer record from the CRM, which can also be malicious). Ultimately, this decision-making process for MCP is LLM-based.

While this is a game-changer for AI productivity, it also means that organizations have to do much more than securing simple request-response interactions between known endpoints. They now need to protect AI systems that can autonomously discover new services, establish trust relationships, and perform complex multi-step operations across distributed environments they've never seen before.

The Chaos of MCP Deployment
On top of that, the ability to let AI agents interact with external applications has caused the MCP builder ecosystem to explode with countless new MCP servers being added to GitHub every day. There are two issues that arise, and together they create a massive security and governance headache.

Both official and community-developed MCP servers introduce new supply chain vulnerabilities: Third-party implementations have varying security posture and capabilities. This doesn’t just pertain to community-developed servers, but official vendor ones as well.
Recent examples include vulnerabilities in both the GitHub MCP server and Anthropic's Slack MCP server, which exposed potential data leakage risks. (https://invariantlabs.ai/blog/mcp-github-vulnerability) (https://embracethered.com/blog/posts/2025/security-advisory-anthropic-slack-mcp-server-data-leakage/)

Flexible deployment options means the rise of “Shadow” MCP Servers: Developers can deploy Desktop MCP Servers or leverage Cloud-hosted ones without much IT oversight or security review.
Third-party Account Access is Highly Risky: Typically, MCP servers connect you to their respective cloud applications, necessitating credentials to access your account. These credentials are inherently sensitive, allowing an untrusted server to potentially compromise your account by stealing your data or inflicting damage. Not to mention, stealing the credentials themselves.

Desktop MCP Server Risks

Desktop MCP server deployments present unique security challenges. Developers often store API keys in plaintext configuration (mcp.json) files on local machines, creating shadow IT scenarios that endpoint detection systems may not flag as security risks. These implementations become supply chain vulnerabilities when malicious MCP servers can access and exfiltrate stored credentials from developer workstations.

As the ecosystem continues to expand, adversaries are also able to take advantage of the lack of protection and governance.

For example, Rug Pull Attacks, or plainly rogue MCP servers, represent a threat unique to MCP ecosystems. In these attacks, a malicious MCP server initially identifies itself correctly and behaves as expected to build trust with AI agents. Once trust is established, the server changes its behavior either by impersonating legitimate servers or altering its responses to steal data, inject malicious content, or redirect AI agents to perform unauthorized actions Because AI systems dynamically discover and trust MCP capabilities, they may not detect when a previously trusted server becomes malicious.

The challenge with rug pull attacks is that they exploit the trust relationship inherent in the MCP protocol. Unlike traditional APIs where security policies can be based on static endpoint analysis, MCP requires real-time behavioral analysis to detect when trusted servers begin acting maliciously.

MCP Gateways: The next evolution of AI security

MCP Gateways address the complexities of modern AI sprawl.

Protocol-Native Security
Unlike LLM firewalls that operate only at the HTTP layer, MCP Gateways understand the JSON-RPC protocol that MCP servers use to communicate. This allows them to analyze the semantic content of MCP requests and responses, not just the transport layer data.

Integrated LLM Guardrails
MCP Gateways also incorporate traditional LLM guardrail functionality because securing the MCP interaction is only part of the challenge. Organizations still need to block malicious prompt injections and prevent data leaks, but now these protections must work in the context of multi-touchpoint AI workflows. It is also required that such gateways can check for indirection prompt injections in malicious links or attachments, complicating their implementation.

Runtime Behavior Monitoring
MCP Gateways include monitoring server behavior over time to detect rug pull attacks, validating the security posture of newly discovered MCP servers, and maintaining access controls that work across distributed MCP deployments.

Policy Enforcement
MCP Gateways enable organizations to define and control which servers and tools are permitted or blocked for use, or even at the parameter level to be able to look for some patterns and decide what action to take. Security managers need to be able to block connecting AI workflows with specific MCP servers. This is crucial when data is highly sensitive or susceptible to exfiltration.

Resource Access Control
MCP Gateways must also manage resource scoping challenges. When an AI agent connects to external services like Google Drive or CRM systems, the authentication tokens may grant overly broad permissions. Organizations need granular control to limit AI agents to specific folders, data types, or operational scopes rather than full account access.

How AI guardrails and MCP gateways work together

To understand the comprehensive security approach needed for modern AI deployments, consider what happens when Steve the Developer is using Claude that's also configured to communicate with an MCP server.

In this scenario, Steve's prompt must first pass through an AI Guardrails product that analyzes the input for prompt injection attempts and policy violations. If the prompt passes these checks, Claude processes the request and determines that it needs to interact with an MCP server to complete the task.

The MCP Gateway then intercepts Claude's JSON-RPC communication with the MCP server, validates that the server is authorized and behaving correctly, and analyzes the requested operation for potential security issues. The Gateway might also apply additional guardrails to Claude's interaction with the MCP server, ensuring that sensitive data isn't inadvertently exposed or that the AI doesn't exceed its authorized scope of operations.

Finally, when Claude generates its response to Steve, the AI Guardrails analyzes the output to ensure that no sensitive information leaked from the MCP interaction, that malicious content wasn't injected by compromised MCP servers, and that the response complies with organizational policies.

Are just gateways enough?

In our humble opinion here at MCPTotal, no. As our customers also say:

“with LLM the security risk was acceptable, but not anymore!”.

Gateways do cover a lot more ground in the much larger attack surface created by both approved and rogue MCP deployments, but the principle of defense-in-depth still applies. AI Guardrails, whether it’s an integrated part of a Gateway or as a standalone product are still very much needed. More importantly, organizations also need solutions for the deployment and policy enforcement of AI agents.

At MCPTotal, that’s exactly what we’ve built. Our holistic strategy provides the following:

With MCPTotal's Hub:
A. Organizations can deploy MCP servers in a secure sandboxed environment. While they can deploy any server of their choosing, they can also select from secured versions of MCP servers (with the same capabilities) to reduce their supply-chain risk.

B. Equally important, the Hub provides centralized credential management, eliminating the need to distribute OAuth tokens and API keys across individual MCP server deployments. Instead of each server managing its own third-party service credentials, organizations can centrally provision, rotate, and monitor all access tokens from a single control plane, preventing credential sprawl and shadow IT risks.

C. MCP servers hosted in the hub undergo automatic static security code reviews and runtime monitoring to detect malicious behavioral activities such as data exfiltration.
MCPTotal’s Gateway helps protect AI workflows as they interact with internal and external MCP Servers, complete with real-time monitoring, detection, blocking, and auditing.
MCPTotal’s Gateway also provides governance functionality to continuously monitor and enforce policies custom to your organization.

You can sign up below for a free scan of your environment and get an understanding of any Shadow MCP deployments or supply chain risk!