Monitoring & Observability
Storm MCP Gateway provides comprehensive monitoring and observability features to help you track the health, performance, and usage of your MCP integrations. This guide covers all monitoring capabilities and how to use them effectively.
Overview
The monitoring system provides:
- Real-time Metrics - Live performance data
- Request Logs - Detailed request/response history
- Analytics - Usage patterns and trends
- Alerts - Proactive issue detection
- Session Tracking - Active connection monitoring
Observability Dashboard
The main Observability page provides a comprehensive view of your system:
Key Metrics Panel
Monitor critical performance indicators:
┌──────────────────┬──────────────────┬──────────────────┐
│ Response Time │ Error Rate │ Active Sessions │
│ 245ms ↓15% │ 0.2% ↑0.1% │ 12 sessions │
└──────────────────┴──────────────────┴──────────────────┘
Response Time
- Average response time across all requests
- Trend indicator (↑↓) shows change over period
- Click for detailed breakdown by gateway/app
Error Rate
- Percentage of failed requests
- Categorized by error type
- Drill down to specific error patterns
Active Sessions
- Current connected clients
- Click to view session details
- Shows geographic distribution
Time Range Selection
Choose your analysis window:
- 1h - Last hour (real-time monitoring)
- 6h - Last 6 hours (short-term trends)
- 24h - Last day (daily patterns)
- 7d - Last week (weekly trends)
- 30d - Last month (long-term analysis)
Performance Monitoring
Response Time Analysis
Track latency across your integrations:
{
"metrics": {
"p50": 120, // Median: 120ms
"p95": 450, // 95th percentile: 450ms
"p99": 1200, // 99th percentile: 1200ms
"mean": 245, // Average: 245ms
}
}Interpreting Metrics:
- P50 - Half of requests faster than this
- P95 - 95% of requests faster than this
- P99 - Only 1% of requests slower
- Mean - Mathematical average
Gateway Performance
View performance by gateway:
Gateway: Development Tools
├── GitHub: 180ms avg, 0.1% errors
├── GitLab: 220ms avg, 0.3% errors
└── Docker: 95ms avg, 0.0% errors
Performance Indicators:
- 🟢 Green - Excellent (under 200ms, under 1% errors)
- 🟡 Yellow - Acceptable (200-500ms, 1-5% errors)
- 🔴 Red - Poor (over 500ms, over 5% errors)
App-Level Metrics
Drill down to individual app performance:
-
Request Volume
- Requests per minute/hour/day
- Peak usage times
- Trending patterns
-
Function Usage
- Most called functions
- Slowest operations
- Error-prone endpoints
-
Resource Consumption
- Token usage (for AI clients)
- Data transfer volumes
- API rate limit usage
Request Logs
Accessing Logs
Navigate to Logging page for detailed request history:
-
Real-time Stream
- Live request feed
- Automatic updates
- Color-coded by status
-
Historical Logs
- Search past requests
- Filter by multiple criteria
- Export for analysis
Log Details
Each log entry contains:
{
"timestamp": "2025-01-28T10:30:45Z",
"gateway": "dev-tools",
"app": "github",
"function": "repos_create_issue",
"duration": 245,
"status": "success",
"client": "claude-desktop",
"session": "sess_abc123",
"request": {...},
"response": {...},
"metadata": {
"ip": "203.0.113.45",
"user_agent": "Claude/1.1.0",
"tokens_used": 1250
}
}Filtering and Search
Filter Options:
- Gateway - Specific gateway
- App - Individual service
- Status - Success/Error/Warning
- Time Range - Custom date range
- Session - Specific client session
Search Capabilities:
function:create_issue AND status:error
gateway:production AND duration:>1000
app:slack AND timestamp:[2025-01-28 TO 2025-01-29]
Log Analysis
Error Patterns
- Identify recurring errors
- Group by error type
- Track error resolution
Performance Issues
- Find slow queries
- Identify bottlenecks
- Optimize problem areas
Usage Patterns
- Peak usage times
- Popular functions
- User behavior analysis
Session Management
Active Sessions View
Monitor all active client connections:
┌─────────────────────────────────────────────────┐
│ Session ID │ Client │ Duration │
├─────────────────────────────────────────────────┤
│ sess_abc123 │ Claude Desktop│ 2h 15m │
│ sess_def456 │ Cursor IDE │ 45m │
│ sess_ghi789 │ API Client │ 3h 30m │
└─────────────────────────────────────────────────┘
Session Details
Click any session to view:
-
Connection Info
- Client type and version
- Connection time
- IP address and location
-
Activity Summary
- Total requests
- Functions used
- Error count
-
Resource Usage
- Token consumption
- Data transferred
- Rate limit status
Session Controls
Manage active sessions:
- Terminate - End specific session
- Block - Prevent reconnection
- Message - Send notification to client
- Export - Download session data
Analytics & Insights
Usage Analytics
Understand how your integrations are used:
Daily Patterns View your 24-hour usage patterns in the dashboard to identify peak times and optimize resource allocation.
Weekly Trends Compare week-over-week usage to track growth and identify patterns.
Function Distribution See which functions are most commonly used to optimize your gateway configuration.
Cost Analysis (Enterprise)
Track resource consumption and costs:
Monthly Usage Summary
├── API Calls: 125,430 ($125.43)
├── Data Transfer: 45.6 GB ($4.56)
├── Token Usage: 15.2M ($152.00)
└── Total Cost: $281.99
User Analytics
Monitor team usage:
User Activity (Last 30 days)
├── alice@team.com: 45,230 requests
├── bob@team.com: 32,150 requests
└── charlie@team.com: 18,420 requests
Alerting & Notifications
Setting Up Alerts
Configure proactive monitoring:
-
Navigate to Settings → Alerts
-
Create New Alert
Name: High Error Rate Condition: error_rate > 5% Window: 5 minutes Action: Email + Slack -
Alert Types
- Threshold - Value exceeds limit
- Anomaly - Unusual patterns
- Absence - No data received
- Trend - Sustained direction
Alert Conditions
Common alert configurations:
Performance Alerts
{
"name": "Slow Response",
"metric": "response_time_p95",
"operator": ">",
"threshold": 1000,
"duration": "5m"
}Error Alerts
{
"name": "High Error Rate",
"metric": "error_rate",
"operator": ">",
"threshold": 0.05,
"duration": "10m"
}Usage Alerts
{
"name": "Rate Limit Warning",
"metric": "rate_limit_remaining",
"operator": "<",
"threshold": 100,
"duration": "1m"
}Notification Channels
Configure where alerts are sent:
- Email - Individual or team addresses
- Slack - Channel notifications
- Discord - Server notifications
- PagerDuty - Incident management
Debugging & Troubleshooting
Debug Mode
Enable detailed logging for investigation:
-
Gateway Debug
- Settings → Debug Mode
- Verbose request/response logs
- Performance profiling
-
App Debug
- App Settings → Enable Debug
- Detailed error messages
- Stack traces
Trace Requests
Follow requests through the system:
Request ID: req_xyz789
├── Received: 10:30:45.123
├── Gateway Processing: +12ms
├── App Authentication: +45ms
├── Function Execution: +180ms
├── Response Sent: +8ms
└── Total Duration: 245ms
Common Issues
High Latency
- Check network connectivity
- Review gateway configuration
- Optimize function selection
- Consider caching
Frequent Errors
- Verify authentication
- Check rate limits
- Review error patterns
- Update configurations
Missing Data
- Confirm client connection
- Check data retention settings
- Verify time zone settings
- Review filters
Exporting & Reporting
Data Export
Export monitoring data for analysis:
Formats Available:
- CSV - Spreadsheet analysis
- JSON - Programmatic processing
- PDF - Reports and documentation
Export Options:
Time Range: Last 7 days
Data: Logs + Metrics
Filters: Gateway=production
Format: CSV
Scheduled Reports
Set up automated reporting:
-
Create Report
- Choose metrics
- Set schedule
- Add recipients
-
Report Types
- Daily summary
- Weekly trends
- Monthly analytics
- Custom dashboards
API Access
Programmatic access to monitoring data:
# Fetch metrics
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.stormmcp.ai/v1/metrics?range=24h
# Get logs
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.stormmcp.ai/v1/logs?gateway=dev-toolsBest Practices
Monitoring Strategy
-
Define Key Metrics
- Identify critical indicators
- Set baseline values
- Track trends over time
-
Set Appropriate Alerts
- Avoid alert fatigue
- Focus on actionable items
- Use escalation policies
-
Regular Reviews
- Weekly performance reviews
- Monthly trend analysis
- Quarterly optimization
Performance Optimization
Based on monitoring insights:
-
Optimize Slow Functions
- Identify bottlenecks
- Cache frequent queries
- Batch operations
-
Reduce Error Rates
- Fix common errors first
- Improve error handling
- Add retry logic
-
Balance Load
- Distribute across gateways
- Use multiple instances
- Implement rate limiting
Next Steps
With monitoring configured: