You wired an LLM into your incident workflow, gave it kubectl access via an MCP server you found on GitHub, and only later realized it was running against your production cluster with your personal kubeconfig. That’s the kind of mistake that happens when you move fast with MCP server DevOps tooling without thinking through the operational model first. I’ve been there. This post is the comparison I wish I had before we started.
The Model Context Protocol (MCP) is a JSON-RPC 2.0 protocol that gives LLMs like Claude or GPT-4 structured, typed access to external tools — kubectl, Terraform, PagerDuty, your internal CMDB — with session state and proper error handling. It’s not just another API wrapper. It’s the difference between prompt-engineering your way to fragile shell commands and giving your AI assistant a real, auditable toolchain. But the moment you go from a weekend demo to a team-shared production deployment, you face a real architectural decision: do you use community-maintained pre-built MCP servers, or do you build your own?
The wrong answer costs you either brittle, unmaintainable hacks or an over-engineered custom server nobody on your team wants to touch at 3 AM. Let me walk you through both options with the honest trade-offs.
When You Face This Choice

This decision usually surfaces when you’re wiring LLMs into something that actually matters: incident triage, infra query automation, or CI/CD pipeline introspection. You need structured tool access — not raw API calls buried in a system prompt, but proper MCP tools with typed inputSchema, error normalization, and the ability to maintain context across multi-step operations like “get the failing pods, look up the runbook, then check recent deploys.”
The MCP spec (currently at version 2025-03-26, which introduced breaking changes to the resources/list response schema versus the earlier 2024-11-05 version) defines a protocolVersion field in the initialize handshake. If your server and client are on different spec versions, tool calls silently misbehave. That’s your first hint that the ecosystem is still moving fast and your tooling choices have real consequences.
The core question: does your team need to expose internal tools that will never have a community server, and does this MCP setup need to serve more than one engineer? If yes to either, you’re already leaning toward custom. If you’re running a proof-of-concept for one person against non-production systems, pre-built is the right starting point. Let’s go through both properly.
Option A — Pre-Built Community MCP Servers
Pre-built servers are genuinely impressive for getting started. uvx mcp-server-kubernetes gives you pod listing, log fetching, and resource inspection in under ten minutes. The official GitHub MCP server (github.com/github/github-mcp-server, distributed as a Go binary) supports fine-grained PATs and is the only community server I know of with a published security advisory process as of mid-2025. mcp-server-prometheus handles instant PromQL queries with authentication. You get community-maintained tool schemas that already handle pagination, error normalization, and auth flows — things that take real time to get right when you build from scratch.
Pros: Zero-to-running in under 10 minutes. npx @modelcontextprotocol/server-github or uvx mcp-server-kubernetes works out of the box. The tool schemas are already battle-tested for common operations. You don’t own the spec upgrade burden on day one.
Cons: The tool surface is fixed. You cannot expose your internal Backstage catalog, your custom SLO dashboard API, or your Spinnaker deployment history. Versioning is genuinely chaotic — mcp-server-kubernetes v0.1.x vs v0.2.x broke tool signatures in March 2025, and if you’re pinned to an old version in a team deployment, you’ll spend a debugging session figuring out why tool calls are returning MCP error -32602: Invalid params. You also inherit their security model entirely.
Watch out for this: uvx mcp-server-kubernetes requires kubectl in PATH and uses your active kubeconfig context. It will happily point at production if that’s your current context. Always set KUBECONFIG explicitly in your MCP server config to a context-specific kubeconfig file, not your default ~/.kube/config.
Watch out for this too: Most community servers run as stdio transport by default — one process per session, zero shared state, fine for local Claude Desktop usage. The moment you want a team-shared deployment, you need a proxy layer. mcp-proxy (available on PyPI) bridges stdio servers to SSE: mcp-proxy --port 9000 -- uvx mcp-server-kubernetes. That’s an extra operational component you now own. And mcp-server-prometheus does not support PromQL range queries out of the box — only instant queries. If your on-call workflow needs query_range, you’re building a custom tool regardless.
Option B — Custom MCP Servers with the Python or TypeScript SDK
Building your own MCP server with the official Python SDK (pip install mcp==1.6.0) or TypeScript SDK (npm install @modelcontextprotocol/[email protected]) gives you complete control. You define exactly what tools exist, what their schemas look like, how auth works (mTLS, OIDC tokens, Vault-sourced secrets), and critically — what the LLM is actually allowed to touch. You can wrap internal APIs that will never have a community server: your CMDB, your internal runbook API, your custom SLO dashboards, your Spinnaker deployment history.
Pros: Full control over tool schema, authentication model, and blast radius. You can implement tool-level allow-lists in handler middleware. You can log every single tool call to your SIEM. You can truncate verbose responses server-side before they hit the LLM context window — because full kubectl get pods -A JSON output runs 8,000–15,000 tokens per call, which is both expensive and context-window-polluting.
Cons: You own the maintenance burden. The MCP spec moved from 2024-11-05 to 2025-03-26 with breaking changes to resources and sampling. The Python and TypeScript SDKs are still effectively pre-1.0 stable in practice despite their version numbers. You will be doing spec upgrades on a Friday afternoon at some point.
Critical gotcha that almost everyone gets wrong: The MCP SDK does NOT automatically validate tool.inputSchema against the arguments the LLM passes. If the LLM sends malformed args — or if a prompt injection in a log file triggers an unexpected tool call with bad parameters — your handler receives them as-is. You will get a confusing Python KeyError or TypeError deep in your handler instead of a clean error message. You must add Pydantic (Python) or Zod (TypeScript) validation yourself, at the top of every tool handler. This is non-negotiable for production.
Also: do not register more than 15–20 tools in a single MCP server. LLMs degrade in tool selection accuracy above roughly 15 tools. Split by domain: infra-tools, observability-tools, incident-tools. Separate servers, separate concerns.
Here’s a minimal but production-pattern custom MCP server in Python. It wraps kubectl for pod listing and an internal runbook API, with proper Pydantic validation and response truncation:
# custom_mcp_server.py
# Minimal production-pattern MCP server wrapping kubectl + internal runbook API
# Requires: pip install mcp==1.6.0 pydantic==2.7.0 httpx==0.27.0
import asyncio
import subprocess
import json
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, CallToolResult
from pydantic import BaseModel, ValidationError
# --- Input schema models (SDK does NOT validate these automatically) ---
class KubectlGetPodsInput(BaseModel):
namespace: str
label_selector: str = "" # optional label filter e.g. "app=nginx"
max_results: int = 20 # hard cap to control token usage
class RunbookLookupInput(BaseModel):
service_name: str
alert_name: str
# --- Server init ---
app = Server("sre-mcp-server")
RUNBOOK_API_BASE = "https://runbooks.internal.example.com/api/v1"
RUNBOOK_API_TOKEN = "REPLACE_WITH_VAULT_SOURCED_SECRET" # inject via env in prod
# --- Tool definitions ---
@app.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="kubectl_get_pods",
description="List pods in a namespace. Read-only. Max 20 results.",
inputSchema=KubectlGetPodsInput.model_json_schema(),
),
Tool(
name="runbook_lookup",
description="Fetch the runbook for a given service and alert name.",
inputSchema=RunbookLookupInput.model_json_schema(),
),
]
# --- Tool handlers ---
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> CallToolResult:
if name == "kubectl_get_pods":
try:
args = KubectlGetPodsInput(**arguments) # validate here, not in SDK
except ValidationError as e:
return CallToolResult(
content=[TextContent(type="text", text=f"Invalid args: {e}")]
)
cmd = ["kubectl", "get", "pods", "-n", args.namespace, "-o", "json"]
if args.label_selector:
cmd += ["-l", args.label_selector]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=15)
if result.returncode != 0:
return CallToolResult(
content=[TextContent(type="text", text=f"kubectl error: {result.stderr}")]
)
pods_json = json.loads(result.stdout)
# Truncate to max_results and strip managed fields to reduce token usage
items = pods_json.get("items", [])[:args.max_results]
slim = [
{
"name": p["metadata"]["name"],
"phase": p["status"].get("phase"),
"ready": all(
c["ready"] for c in p["status"].get("containerStatuses", [])
),
"restarts": sum(
c.get("restartCount", 0)
for c in p["status"].get("containerStatuses", [])
),
}
for p in items
]
return CallToolResult(
content=[TextContent(type="text", text=json.dumps(slim, indent=2))]
)
elif name == "runbook_lookup":
try:
args = RunbookLookupInput(**arguments)
except ValidationError as e:
return CallToolResult(
content=[TextContent(type="text", text=f"Invalid args: {e}")]
)
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{RUNBOOK_API_BASE}/runbooks",
params={"service": args.service_name, "alert": args.alert_name},
headers={"Authorization": f"Bearer {RUNBOOK_API_TOKEN}"},
timeout=10,
)
if resp.status_code != 200:
return CallToolResult(
content=[TextContent(type="text", text=f"Runbook API error: {resp.status_code}")]
)
return CallToolResult(
content=[TextContent(type="text", text=resp.text)]
)
return CallToolResult(
content=[TextContent(type="text", text=f"Unknown tool: {name}")]
)
# --- Entry point ---
async def main():
async with stdio_server() as (read_stream, write_stream):
await app.run(read_stream, write_stream, app.create_initialization_options())
if __name__ == "__main__":
asyncio.run(main())
And here’s how to wire both a custom server and community servers together in your Claude Desktop config. This is the hybrid pattern I actually run:
{
"mcpServers": {
"sre-tools": {
"command": "python",
"args": ["/opt/mcp-servers/custom_mcp_server.py"],
"env": {
"KUBECONFIG": "/home/sre/.kube/config-staging",
"RUNBOOK_API_TOKEN": "${RUNBOOK_API_TOKEN}"
}
},
"prometheus": {
"command": "uvx",
"args": ["mcp-server-prometheus"],
"env": {
"PROMETHEUS_URL": "https://prometheus.internal.example.com",
"PROMETHEUS_USERNAME": "mcp-readonly",
"PROMETHEUS_PASSWORD": "${PROM_PASSWORD}"
}
},
"github": {
"command": "/usr/local/bin/github-mcp-server",
"args": ["stdio"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_PAT}"
}
}
}
}
Note the explicit KUBECONFIG path pointing to a staging context. Note that the API token comes from a shell environment variable, not hardcoded. Small details, but they matter when this runs unattended.
You can test any MCP server locally without a full LLM client using the mcp dev CLI command, which launches the MCP Inspector UI at http://localhost:5173. Run it before you ever connect Claude to a new server. It saves a lot of confusion.
Decision Matrix
Here’s how these two options actually stack up across the dimensions that matter for real DevOps teams — not theoretical ones.
| Dimension | Pre-Built (Option A) | Custom (Option B) |
|---|---|---|
| Setup time | Under 10 minutes | Half a day minimum |
| Tool customization | Fixed, community-defined | Full control |
| Internal API access | Not possible | First-class |
| Security posture | Inherited, opaque | Explicit, auditable |
| Maintenance burden | Low initially, chaotic on upgrades | Ongoing, owned by you |
| Multi-user / team deployment | Requires mcp-proxy, extra ops | SSE transport, proper auth |
| Production readiness | Proof-of-concept to small teams | Production-grade with work |
| SIEM / audit logging | Not available | Implement in handler middleware |
The inflection point is clear. If your team has fewer than three internal-only tool integrations and this is a proof-of-concept, use pre-built. If you’re running MCP in production for more than five engineers, touching real infrastructure, you need custom servers with proper auth and audit logging. There’s no middle ground that scales.
The transport layer is another forcing function. Stdio transport spawns a new process per Claude Desktop session — zero shared state, which breaks any tool that needs to maintain a Terraform plan or a multi-step workflow between calls. SSE transport (http://localhost:8000/sse by default, configurable with --host, --port, or MCP_HOST/MCP_PORT env vars) is HTTP-based and supports multiple concurrent clients. The moment you need SSE, you’re in custom server territory anyway, because most community servers don’t ship with SSE support configured for team use.
MCP has no built-in authorization at the tool level. This is the security gap that matters most. If your server exposes both kubectl_get and kubectl_delete, the LLM — or a prompt injection hiding in a log file your LLM just read — can call either. Implement tool-level allow-lists in your handler middleware. The official MCP documentation covers the protocol spec in detail at modelcontextprotocol.io. For Kubernetes RBAC, bind your MCP server’s service account to a ClusterRole with only get, list, and watch verbs. See the Kubernetes RBAC documentation for the ClusterRole pattern. Never bind create, delete, or patch unless you have a separate, explicitly audited use case for it.
My Pick
I prefer the hybrid approach, and I’m not hedging on this. Start with pre-built for your proof-of-concept — specifically the mcp-server-kubernetes plus mcp-server-prometheus combination. Get your team comfortable with what MCP-powered workflows actually feel like before you write a line of server code. Then, once you know which internal APIs you actually need and what the real operational requirements are, build a thin custom Python MCP server that wraps those internal integrations.
Do not fork community servers. That’s the trap. You end up owning their entire codebase plus your customizations, and when the spec upgrades, you’re doing a merge instead of just bumping your SDK version.
The pattern I run: community servers (Prometheus, GitHub) run as separate stdio processes. My custom server handles everything internal. Both are registered in Claude Desktop config as separate mcpServers entries. The LLM picks tools from both. Clean separation, minimal blast radius per server.
One non-negotiable: any MCP server touching production infrastructure runs with a dedicated service account, scoped RBAC, and logs every tool call to the SIEM. Treat it like a privileged CI runner, not a chatbot plugin. Store credentials in Vault or your secrets manager, inject them as environment variables at runtime, and rotate them on the same schedule as your other service accounts. The mcp dev inspector tool at http://localhost:5173 is your friend for validating tool schemas before you ever point a real LLM at a server.
MCP server DevOps tooling is moving fast. The spec will break again. But the operational principles — scoped access, audit logging, input validation, response truncation — those don’t change. Get those right and you can absorb the spec churn without a production incident. You can find more patterns for building secure, automated infrastructure tooling at kuryzhev.cloud.
