Monitoring & Observability

Storm MCP Gateway provides comprehensive monitoring and observability features to help you track the health, performance, and usage of your MCP integrations. This guide covers all monitoring capabilities and how to use them effectively.

Overview

The monitoring system provides:

Real-time Metrics - Live performance data
Request Logs - Detailed request/response history
Analytics - Usage patterns and trends
Alerts - Proactive issue detection
Session Tracking - Active connection monitoring

Observability Dashboard

The main Observability page provides a comprehensive view of your system:

Key Metrics Panel

Monitor critical performance indicators:

┌──────────────────┬──────────────────┬──────────────────┐
│ Response Time    │ Error Rate       │ Active Sessions  │
│ 245ms ↓15%      │ 0.2% ↑0.1%      │ 12 sessions      │
└──────────────────┴──────────────────┴──────────────────┘

Response Time

Average response time across all requests
Trend indicator (↑↓) shows change over period
Click for detailed breakdown by gateway/app

Error Rate

Percentage of failed requests
Categorized by error type
Drill down to specific error patterns

Active Sessions

Current connected clients
Click to view session details
Shows geographic distribution

Time Range Selection

Choose your analysis window:

1h - Last hour (real-time monitoring)
6h - Last 6 hours (short-term trends)
24h - Last day (daily patterns)
7d - Last week (weekly trends)
30d - Last month (long-term analysis)

Performance Monitoring

Response Time Analysis

Track latency across your integrations:

{
  "metrics": {
    "p50": 120,  // Median: 120ms
    "p95": 450,  // 95th percentile: 450ms
    "p99": 1200, // 99th percentile: 1200ms
    "mean": 245, // Average: 245ms
  }
}

Interpreting Metrics:

P50 - Half of requests faster than this
P95 - 95% of requests faster than this
P99 - Only 1% of requests slower
Mean - Mathematical average

Gateway Performance

View performance by gateway:

Gateway: Development Tools
├── GitHub: 180ms avg, 0.1% errors
├── GitLab: 220ms avg, 0.3% errors
└── Docker: 95ms avg, 0.0% errors

Performance Indicators:

🟢 Green - Excellent (under 200ms, under 1% errors)
🟡 Yellow - Acceptable (200-500ms, 1-5% errors)
🔴 Red - Poor (over 500ms, over 5% errors)

App-Level Metrics

Drill down to individual app performance:

Request Volume
- Requests per minute/hour/day
- Peak usage times
- Trending patterns
Function Usage
- Most called functions
- Slowest operations
- Error-prone endpoints
Resource Consumption
- Token usage (for AI clients)
- Data transfer volumes
- API rate limit usage

Request Logs

Accessing Logs

Navigate to Logging page for detailed request history:

Real-time Stream
- Live request feed
- Automatic updates
- Color-coded by status
Historical Logs
- Search past requests
- Filter by multiple criteria
- Export for analysis

Log Details

Each log entry contains:

{
  "timestamp": "2025-01-28T10:30:45Z",
  "gateway": "dev-tools",
  "app": "github",
  "function": "repos_create_issue",
  "duration": 245,
  "status": "success",
  "client": "claude-desktop",
  "session": "sess_abc123",
  "request": {...},
  "response": {...},
  "metadata": {
    "ip": "203.0.113.45",
    "user_agent": "Claude/1.1.0",
    "tokens_used": 1250
  }
}

Filtering and Search

Filter Options:

Gateway - Specific gateway
App - Individual service
Status - Success/Error/Warning
Time Range - Custom date range
Session - Specific client session

Search Capabilities:

function:create_issue AND status:error
gateway:production AND duration:>1000
app:slack AND timestamp:[2025-01-28 TO 2025-01-29]

Log Analysis

Error Patterns

Identify recurring errors
Group by error type
Track error resolution

Performance Issues

Find slow queries
Identify bottlenecks
Optimize problem areas

Usage Patterns

Peak usage times
Popular functions
User behavior analysis

Session Management

Active Sessions View

Monitor all active client connections:

┌─────────────────────────────────────────────────┐
│ Session ID     │ Client        │ Duration       │
├─────────────────────────────────────────────────┤
│ sess_abc123    │ Claude Desktop│ 2h 15m        │
│ sess_def456    │ Cursor IDE    │ 45m           │
│ sess_ghi789    │ API Client    │ 3h 30m        │
└─────────────────────────────────────────────────┘

Session Details

Click any session to view:

Connection Info
- Client type and version
- Connection time
- IP address and location
Activity Summary
- Total requests
- Functions used
- Error count
Resource Usage
- Token consumption
- Data transferred
- Rate limit status

Session Controls

Manage active sessions:

Terminate - End specific session
Block - Prevent reconnection
Message - Send notification to client
Export - Download session data

Analytics & Insights

Usage Analytics

Understand how your integrations are used:

Daily Patterns View your 24-hour usage patterns in the dashboard to identify peak times and optimize resource allocation.

Weekly Trends Compare week-over-week usage to track growth and identify patterns.

Function Distribution See which functions are most commonly used to optimize your gateway configuration.

Cost Analysis (Enterprise)

Track resource consumption and costs:

Monthly Usage Summary
├── API Calls: 125,430 ($125.43)
├── Data Transfer: 45.6 GB ($4.56)
├── Token Usage: 15.2M ($152.00)
└── Total Cost: $281.99

User Analytics

Monitor team usage:

User Activity (Last 30 days)
├── alice@team.com: 45,230 requests
├── bob@team.com: 32,150 requests
└── charlie@team.com: 18,420 requests

Alerting & Notifications

Setting Up Alerts

Configure proactive monitoring:

Navigate to Settings → Alerts
Create New Alert
Name: High Error Rate Condition: error_rate > 5% Window: 5 minutes Action: Email + Slack
Alert Types
- Threshold - Value exceeds limit
- Anomaly - Unusual patterns
- Absence - No data received
- Trend - Sustained direction

Alert Conditions

Common alert configurations:

Performance Alerts

{
  "name": "Slow Response",
  "metric": "response_time_p95",
  "operator": ">",
  "threshold": 1000,
  "duration": "5m"
}

Error Alerts

{
  "name": "High Error Rate",
  "metric": "error_rate",
  "operator": ">",
  "threshold": 0.05,
  "duration": "10m"
}

Usage Alerts

{
  "name": "Rate Limit Warning",
  "metric": "rate_limit_remaining",
  "operator": "<",
  "threshold": 100,
  "duration": "1m"
}

Notification Channels

Configure where alerts are sent:

Email - Individual or team addresses
Slack - Channel notifications
Discord - Server notifications
PagerDuty - Incident management

Debugging & Troubleshooting

Debug Mode

Enable detailed logging for investigation:

Gateway Debug
- Settings → Debug Mode
- Verbose request/response logs
- Performance profiling
App Debug
- App Settings → Enable Debug
- Detailed error messages
- Stack traces

Trace Requests

Follow requests through the system:

Request ID: req_xyz789
├── Received: 10:30:45.123
├── Gateway Processing: +12ms
├── App Authentication: +45ms
├── Function Execution: +180ms
├── Response Sent: +8ms
└── Total Duration: 245ms

Common Issues

High Latency

Check network connectivity
Review gateway configuration
Optimize function selection
Consider caching

Frequent Errors

Verify authentication
Check rate limits
Review error patterns
Update configurations

Missing Data

Confirm client connection
Check data retention settings
Verify time zone settings
Review filters

Exporting & Reporting

Data Export

Export monitoring data for analysis:

Formats Available:

CSV - Spreadsheet analysis
JSON - Programmatic processing
PDF - Reports and documentation

Export Options:

Time Range: Last 7 days
Data: Logs + Metrics
Filters: Gateway=production
Format: CSV

Scheduled Reports

Set up automated reporting:

Create Report
- Choose metrics
- Set schedule
- Add recipients
Report Types
- Daily summary
- Weekly trends
- Monthly analytics
- Custom dashboards

API Access

Programmatic access to monitoring data:

# Fetch metrics
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.stormmcp.ai/v1/metrics?range=24h

# Get logs
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.stormmcp.ai/v1/logs?gateway=dev-tools

Best Practices

Monitoring Strategy

Define Key Metrics
- Identify critical indicators
- Set baseline values
- Track trends over time
Set Appropriate Alerts
- Avoid alert fatigue
- Focus on actionable items
- Use escalation policies
Regular Reviews
- Weekly performance reviews
- Monthly trend analysis
- Quarterly optimization

Performance Optimization

Based on monitoring insights:

Optimize Slow Functions
- Identify bottlenecks
- Cache frequent queries
- Batch operations
Reduce Error Rates
- Fix common errors first
- Improve error handling
- Add retry logic
Balance Load
- Distribute across gateways
- Use multiple instances
- Implement rate limiting

Next Steps

With monitoring configured: