
MCP Security Top 10 - Part 11: Overreliance on AI
This is the eleventh article in a series about the top 10 security risks associated with the Model Context Protocol (MCP). This post focuses on Overreliance on AI, a critical vulnerability where users place excessive trust in AI systems with tool access, potentially leading to unreviewed destructive actions, overlooked errors, and diminished human oversight.
Introduction
The Model Context Protocol (MCP) transforms AI systems from conversational assistants into agents capable of taking real-world actions through integrated tools. While this capability dramatically enhances productivity, it also introduces a significant psychological and security risk: as users observe AI systems successfully performing complex tasks, they tend to develop excessive trust in the AI's judgment and may gradually abdicate their critical oversight responsibilities (Bruce Schneier's blog post on The Fallacy of AI Trust).
Understanding the Psychology of AI Overreliance
Overreliance on AI is a complex psychological phenomenon with several contributing factors:
Automation Bias
Research has consistently demonstrated that humans exhibit a cognitive bias toward accepting computer-generated recommendations, even when those recommendations contradict their own judgment or observable evidence. This "automation bias" becomes even more pronounced with advanced AI systems that can:
- Communicate fluently and persuasively in natural language
- Present information with apparent confidence and authority
- Provide detailed justifications for their recommendations
- Successfully perform complex tasks that previously required human expertise

The Authority Effect of Tool Access
MCP significantly amplifies automation bias by granting AI systems access to tools that:
-
Increase Perceived Competence: Users observe the AI performing real actions (sending emails, querying databases, modifying files), which dramatically increases their perception of the AI's competence
-
Create Functional Authority: The AI's ability to access and manipulate systems creates a sense of authority - "if it has access to these systems, it must be trusted"
-
Reduce Verification Impulse: As users see the AI successfully using tools, they become less likely to verify its actions and recommendations over time
MCP Security Top 10 Series
This article is part of a comprehensive series examining the top 10 security risks when using MCP with AI agents:
- MCP Security Top 10 Series: Introduction & Index
- MCP Overview
- Over-Privileged Access
- Prompt Injection Attacks
- Malicious MCP Servers
- Unvalidated Tool Responses
- Command Injection
- Resource Exhaustion
- Cross-Context Data Leakage
- MITM Attacks
- Social Engineering
- Overreliance on AI (this article)
Real-World Examples and Consequences
Several documented incidents highlight the dangers of overreliance on AI systems with tool access:
1. The GPS Navigation Incidents
While not directly MCP-related, GPS navigation systems provide a clear parallel. Numerous incidents have occurred where drivers followed GPS directions into dangerous situations (driving into bodies of water, following non-existent roads) despite clear visual evidence contradicting the system's guidance. As one researcher noted, "When the system tells them to turn right, they turn right, even if the road goes into a lake."
With MCP, this same dynamic could lead to users approving destructive actions proposed by an AI without adequate review.
2. Financial Decision Support Systems
In financial trading, algorithmic systems have caused significant losses when human operators failed to adequately scrutinize their recommendations. The 2010 "Flash Crash" and similar incidents were exacerbated by traders' overreliance on automated systems. As AI agents gain direct access to financial systems through MCP, the potential for similar incidents increases.
3. Data Deletion Incidents
In early testing environments, researchers have documented cases where users approved AI recommendations to delete "unnecessary" data without thoroughly reviewing what would be deleted, resulting in the loss of important information. The AI's confident presentation of the recommendation ("These files are clearly redundant") combined with its ability to execute the deletion created a dangerous scenario.
4. Code Generation Overreliance
Developers using AI-assisted coding tools have reported deploying code generated by AI without proper review, assuming its correctness because "the AI has access to all these libraries and documentation." This has led to the introduction of security vulnerabilities and logical errors that human review would have caught.
MCP-Specific Risks of Overreliance
The Model Context Protocol introduces unique overreliance risks due to its tool integration capabilities:
1. Deferred Responsibility
Users may develop a mental model where responsibility for outcomes is shifted to the AI:
User: "Clean up the old log files."
AI: "I'll use the file system tool to remove log files older than 30 days."
[AI proceeds to delete critical system files that were mistakenly identified as logs]
Without clear agency boundaries and responsibility models, users may not adequately verify what the AI is doing before approving actions.
2. Expertise Atrophy
As AI systems handle increasingly complex tasks, users may stop developing their own expertise:
User: "Configure our database for optimal performance."
AI: "I'll modify these 15 database parameters to optimize for your workload."
[User approves complex changes without understanding them]
This creates dangerous knowledge gaps where no human in the organization fully understands critical systems.
3. Confirmation Fatigue
When AI systems require human confirmation for every action, users may develop "confirmation fatigue" and start automatically approving actions without review:
AI: "I need to modify 200 files to update the API version. Approve?"
User: [Clicks "Approve" without reviewing the specific changes]
This pattern effectively negates the security benefit of human-in-the-loop designs.
Implementing Balanced Trust Safeguards
To mitigate overreliance risks in MCP-enabled systems, implement these protective measures:

1. Graduated Trust Model
Implement a tiered approach to AI permissions and approval requirements:
def determine_approval_requirements(action, context):
# Define risk tiers for different actions
risk_tiers = {
"read_public_file": "low",
"read_sensitive_file": "medium",
"modify_file": "high",
"delete_file": "very_high",
"deploy_to_production": "critical"
}
# Map the action to its risk tier
action_risk = risk_tiers.get(action, "high") # Default to high if unknown
# Define approval requirements based on risk tier
if action_risk == "low":
return {"requires_approval": False}
elif action_risk == "medium":
return {"requires_approval": True, "approval_type": "quick_confirm"}
elif action_risk == "high":
return {"requires_approval": True, "approval_type": "detailed_review"}
elif action_risk == "very_high":
return {"requires_approval": True, "approval_type": "detailed_review",
"requires_justification": True}
elif action_risk == "critical":
return {"requires_approval": True, "approval_type": "detailed_review",
"requires_justification": True, "requires_secondary_approval": True}
2. Meaningful Approval Interfaces
Design approval interfaces that encourage thoughtful review rather than automatic confirmation:
function requestActionApproval(action, details) {
// Don't just show a generic "Approve?" dialog
// Instead, show specific details about the action
const approvalDialog = {
title: `Approve ${action}?`,
description: `The AI wants to ${action} with the following details:`,
// Show a diff or preview when possible
preview: generatePreviewForAction(action, details),
// Ask specific questions to ensure engagement
confirmationQuestions: [
`What files will be affected by this action?`,
`What will happen if this action fails?`
],
// Require typing a specific confirmation for high-risk actions
confirmationText: action === "delete" ? "I understand these files will be permanently deleted" : null,
// Add time delay for critical actions to prevent automatic clicking
approvalDelay: calculateDelayForAction(action, details)
};
return showApprovalDialog(approvalDialog);
}
3. Explainability Requirements
Require AI systems to explain their reasoning in human-understandable terms before taking actions:
def execute_tool_action(tool_name, parameters, context):
# Get explanation from the AI
explanation = context.get("explanation")
# Check if explanation is required for this tool
if requires_explanation(tool_name) and not explanation:
return {
"error": "Explanation required",
"message": "Please explain your reasoning before using this tool"
}
# Log the explanation for audit
log_action_explanation(tool_name, parameters, explanation)
# Proceed with action execution
return execute_action(tool_name, parameters)
4. Surprise-Based Intervention
Implement systems that detect when AI actions deviate significantly from expected patterns:
def check_for_surprising_actions(action, parameters, user_history):
# Check if this action is unusual for this user
if action not in user_history.common_actions:
# Calculate how surprising this action is
surprise_score = calculate_surprise_score(action, parameters, user_history)
if surprise_score > SURPRISE_THRESHOLD:
# Require additional verification for surprising actions
return {
"requires_extra_verification": True,
"verification_message": f"This action is unusual based on your history. Please confirm you want to {action}."
}
return {"requires_extra_verification": False}
5. Knowledge Preservation Requirements
For critical systems, require human understanding as part of the approval process:
function ensureKnowledgePreservation(action, details) {
if (isComplexSystemChange(action, details)) {
// Require documentation of how the change works
return promptForDocumentation(
"Please document how this change works and why it's being made"
);
}
return Promise.resolve(true);
}
Organizational Strategies to Combat Overreliance
Beyond technical safeguards, organizations should implement these cultural and training practices:
1. Explicit AI Competency Models
Define and communicate what the AI system can and cannot do reliably. This clear boundary setting helps prevent unrealistic expectations.
2. Structured Critical Review Training
Train users specifically on how to review AI recommendations, including:
- Which aspects to verify independently
- Common AI failure modes to watch for
- When to seek additional human expertise
3. Responsible AI Champions
Designate team members as "AI responsibility champions" who emphasize appropriate trust levels and model thoughtful AI interaction practices.
4. Regular Trust Calibration Exercises
Periodically introduce deliberate errors into AI recommendations (in test environments) to ensure humans remain vigilant in their review processes.
Conclusion
Overreliance on AI represents one of the most subtle yet dangerous risks of MCP-enabled systems (Model Context Protocol | Hacker News). The very features that make these systems valuable—their ability to take actions on behalf of users—also create significant psychological pressure toward excessive trust and diminished oversight.
By implementing graduated trust models, meaningful approval interfaces, explainability requirements, and organizational safeguards, we can harness the benefits of AI tool access while maintaining appropriate human judgment and oversight. The goal is not to eliminate trust in AI systems, but to calibrate it appropriately—trusting AI for what it does well while maintaining human responsibility for final decisions.
As we conclude this series on MCP security risks, this overreliance risk stands as perhaps the most foundational concern. Without appropriate human oversight, all other technical security measures may ultimately be compromised by the human tendency to trust increasingly capable autonomous systems.
Secure Your MCP Implementations with Garnet
As we've explored in this article, overreliance on AI systems with MCP tool access introduces subtle yet significant security risks that traditional security tools cannot address alone. These risks manifest in users' psychology and behavior patterns rather than purely technical vulnerabilities.

Garnet provides specialized runtime security monitoring designed to detect potentially dangerous patterns of AI tool usage that might indicate overreliance. Unlike conventional security tools, Garnet's approach focuses on runtime behavior monitoring, allowing it to identify situations where critical human oversight might be bypassed.
With Garnet's Linux-based Jibril sensor, you can protect your environments at every stage:
- Build Pipeline Protection: Detect patterns of diminished oversight during CI/CD processes where MCP servers might be leveraged
- Test Environment Security: Monitor approval patterns during testing to identify potential overreliance issues before they reach production
- Production Safeguards: Maintain continuous protection in live environments where MCP-enabled tools operate with appropriate human oversight
The Garnet Platform provides centralized visibility into AI interaction patterns and potential overreliance risks, with integrations that deliver alerts directly within your existing workflows.
Learn more about securing your AI-augmented development environments at Garnet.ai.