The Model Context Protocol has moved from interesting specification to actual production infrastructure faster than most people expected.
If you’ve shipped an MCP server — or you’re relying on third-party MCP servers in your agent workflows — you’ve added a new class of dependency to your stack. One that has all the reliability characteristics of an external API: it can be down, it can be slow, it can fail in partial ways that are hard to detect.
And yet most teams aren’t monitoring their MCP servers at all.
What an MCP server actually is, from a monitoring perspective
An MCP server is an HTTP process that exposes tools, resources, and prompts to an AI client. From a network perspective, it’s a service with a defined protocol — the MCP spec — and it can fail in exactly the same ways any other service can:
- The process crashes or OOMs
- The underlying host becomes unavailable
- The network path between client and server degrades
- The server is reachable but responding slowly
- The server is responding quickly but returning protocol errors
The difference from a traditional API is the blast radius. When a REST API endpoint goes down, some user-facing feature breaks. When an MCP server your agents depend on goes down, the agents either fail silently, hallucinate tool results, or hit error paths you may not have tested thoroughly.
Neither outcome is good. The silent failure is worse.
How Vigil monitors MCP servers
Vigil has a dedicated MCP monitor type — not a generic HTTP check, but a monitor that understands the MCP protocol.
When a check runs, Vigil sends a real JSON-RPC initialize request to your server:
{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": { "name": "Vigil", "version": "1.0" }
},
"id": 1
}
This is the minimal handshake defined by the MCP spec — the same first message any real MCP client would send. Vigil confirms the server responds with a valid JSON-RPC result. If the response contains "result", the server is up. If not, it’s down.
This matters because a plain HTTP ping tells you the process is alive. The initialize handshake tells you the server is actually speaking MCP correctly.
Setting it up
Navigate to Monitors → New monitor, select the type MCP, and paste your server URL:
https://mcp.example.com/mcp
That’s it. You don’t configure a method, expected status code, or body assertion — Vigil handles all of that automatically. The MCP protocol details are built in.
If your MCP server requires authentication, expand Advanced and add your Bearer token. Vigil will attach it as Authorization: Bearer … on every check. If you don’t provide a token and the server returns 401, Vigil still marks it as up — a 401 means the server is reachable and correctly rejecting unauthenticated requests, which is a valid state.
Vigil also ships with a preset catalog of real-world MCP servers — GitHub, Linear, Stripe, Supabase, Figma, Slack, and many others — so if you’re depending on a third-party MCP server, you can add it in a few clicks without looking up the endpoint URL.
Alerts that matter
Once the monitor is running, a few additional settings add real signal:
Check interval. The default is 5 minutes. For MCP servers that are critical to your agent workflows, consider dropping to 60 seconds. The faster you know it’s down, the sooner you can respond — or route around it.
SSL certificate expiry. Vigil monitors certificate validity automatically. A 14-day warning gives you time to rotate before a production incident.
Grace period. A single failed check doesn’t immediately trigger an alert — Vigil waits for the grace period to expire before notifying. This filters transient blips. For MCP servers, the default (5 minutes) is usually fine.
The agent observability problem
There’s a deeper issue that protocol-level monitoring doesn’t fully solve: knowing when your MCP server is responding correctly but not working correctly.
An initialize success tells you the server is alive and speaking MCP. It doesn’t tell you:
- Whether the
searchtool is returning useful results - Whether a downstream API your tool depends on is degraded
- Whether tool execution is succeeding or returning soft failures
For this layer of observability, monitor the dependencies your MCP server itself relies on. If your MCP server’s search tool calls a third-party search API, that API needs its own monitor in Vigil — separate from the MCP server check.
This lets you correlate: when your agents start degrading, you can check whether the MCP server is up and whether any of its upstream dependencies are also degraded. Most of the time, that’s exactly where the problem is.
A realistic monitoring setup
For a production MCP server, a complete monitoring setup looks like this:
- Your MCP server — MCP monitor in Vigil, 60s interval
- Each upstream API the server depends on — separate endpoint monitors per dependency
- Any third-party MCP servers you consume — use Vigil’s preset catalog to add them in seconds
This typically means 3–6 monitors for a single MCP server with 2–3 tool dependencies. The setup takes under 10 minutes. The operational visibility it gives you is substantial.
Closing the visibility gap
The teams that run reliable AI agent infrastructure treat MCP servers the same way they treat any other production dependency: explicit monitoring, clear ownership, alerts that fire before users notice.
The gap between “I think our MCP server is running” and “I have a monitor confirming it’s up, speaking the protocol correctly, and my team is notified within a minute of any degradation” is a gap worth closing.
Vigil is available at seppia.ai/products/vigil. Setting up your first MCP monitor takes about two minutes.