Skip to main content

Case Study: Second Opinion Reviews Itself

When we shipped the Second Opinion tools, we used the tool to critique its own implementation. This is a real example of how the tool catches issues you might miss.

The Setup

{
"context": "Built a 'Second Opinion' feature for MCPammer that lets AI agents get challenger feedback from alternative models via Artemis LLM Gateway.",
"proposal": "Implementation includes 5 tools with file reading security checks (500KB limit, blocked patterns, path containment).",
"mode": "challenge",
"file_paths": ["mcpammer_api/clients/second_opinion.py"]
}

What It Found

The tool identified several categories of issues:

CategoryIssues Found
SecuritySymlink traversal possible after path resolution, no rate limiting, no MIME type validation
ReliabilityNo retry logic for API calls, 120s timeout too long, no circuit breaker
PerformanceFiles loaded into memory (no streaming), sequential API calls in dialogue mode
ArchitectureSingleton pattern limits testing, tight coupling to providers, no fallback models

The most critical issue was subtle. Our security check used resolve() to get the canonical path, but a carefully crafted symlink inside an allowed directory could point outside:

~/allowed/evil-link -> /etc/passwd

The path ~/allowed/evil-link passes the containment check, but after resolution points to /etc/passwd.

The Outcome

Instead of blocking the release, we:

  1. Shipped the feature - It's an internal tool with limited exposure
  2. Created a hardening epic - Tracked all issues for follow-up
  3. Used second_opinion_quick to validate this approach:
{
"likely_to_work": true,
"confidence": "high",
"brief_reasoning": "Since it's an internal tool with limited exposure, fixing these issues in a follow-up sprint is reasonable."
}

Lessons Learned

  1. Use the tool on your own code - You'll find things you missed
  2. Pass actual file paths - The tool gives better feedback with real code
  3. Don't let perfect be the enemy of shipped - Use quick mode to validate your prioritization
  4. Create tickets from findings - Don't lose the insights

The hardening epic (5023653e) now tracks: symlink fix, rate limiting, retry logic, circuit breaker, and observability improvements.