Storm Tools

Monitoring & Observability

Storm MCP Gateway provides comprehensive monitoring and observability features to help you track the health, performance, and usage of your MCP integrations. This guide covers all monitoring capabilities and how to use them effectively.

Overview

The monitoring system provides:

  • Real-time Metrics - Live performance data
  • Request Logs - Detailed request/response history
  • Analytics - Usage patterns and trends
  • Alerts - Proactive issue detection
  • Session Tracking - Active connection monitoring

Observability Dashboard

The main Observability page provides a comprehensive view of your system:

Key Metrics Panel

Monitor critical performance indicators:

┌──────────────────┬──────────────────┬──────────────────┐ │ Response Time │ Error Rate │ Active Sessions │ │ 245ms ↓15% │ 0.2% ↑0.1% │ 12 sessions │ └──────────────────┴──────────────────┴──────────────────┘

Response Time

  • Average response time across all requests
  • Trend indicator (↑↓) shows change over period
  • Click for detailed breakdown by gateway/app

Error Rate

  • Percentage of failed requests
  • Categorized by error type
  • Drill down to specific error patterns

Active Sessions

  • Current connected clients
  • Click to view session details
  • Shows geographic distribution

Time Range Selection

Choose your analysis window:

  • 1h - Last hour (real-time monitoring)
  • 6h - Last 6 hours (short-term trends)
  • 24h - Last day (daily patterns)
  • 7d - Last week (weekly trends)
  • 30d - Last month (long-term analysis)

Performance Monitoring

Response Time Analysis

Track latency across your integrations:

{ "metrics": { "p50": 120, // Median: 120ms "p95": 450, // 95th percentile: 450ms "p99": 1200, // 99th percentile: 1200ms "mean": 245, // Average: 245ms } }

Interpreting Metrics:

  • P50 - Half of requests faster than this
  • P95 - 95% of requests faster than this
  • P99 - Only 1% of requests slower
  • Mean - Mathematical average

Gateway Performance

View performance by gateway:

Gateway: Development Tools ├── GitHub: 180ms avg, 0.1% errors ├── GitLab: 220ms avg, 0.3% errors └── Docker: 95ms avg, 0.0% errors

Performance Indicators:

  • 🟢 Green - Excellent (under 200ms, under 1% errors)
  • 🟡 Yellow - Acceptable (200-500ms, 1-5% errors)
  • 🔴 Red - Poor (over 500ms, over 5% errors)

App-Level Metrics

Drill down to individual app performance:

  1. Request Volume

    • Requests per minute/hour/day
    • Peak usage times
    • Trending patterns
  2. Function Usage

    • Most called functions
    • Slowest operations
    • Error-prone endpoints
  3. Resource Consumption

    • Token usage (for AI clients)
    • Data transfer volumes
    • API rate limit usage

Request Logs

Accessing Logs

Navigate to Logging page for detailed request history:

  1. Real-time Stream

    • Live request feed
    • Automatic updates
    • Color-coded by status
  2. Historical Logs

    • Search past requests
    • Filter by multiple criteria
    • Export for analysis

Log Details

Each log entry contains:

{ "timestamp": "2025-01-28T10:30:45Z", "gateway": "dev-tools", "app": "github", "function": "repos_create_issue", "duration": 245, "status": "success", "client": "claude-desktop", "session": "sess_abc123", "request": {...}, "response": {...}, "metadata": { "ip": "203.0.113.45", "user_agent": "Claude/1.1.0", "tokens_used": 1250 } }

Filtering and Search

Filter Options:

  • Gateway - Specific gateway
  • App - Individual service
  • Status - Success/Error/Warning
  • Time Range - Custom date range
  • Session - Specific client session

Search Capabilities:

function:create_issue AND status:error gateway:production AND duration:>1000 app:slack AND timestamp:[2025-01-28 TO 2025-01-29]

Log Analysis

Error Patterns

  • Identify recurring errors
  • Group by error type
  • Track error resolution

Performance Issues

  • Find slow queries
  • Identify bottlenecks
  • Optimize problem areas

Usage Patterns

  • Peak usage times
  • Popular functions
  • User behavior analysis

Session Management

Active Sessions View

Monitor all active client connections:

┌─────────────────────────────────────────────────┐ │ Session ID │ Client │ Duration │ ├─────────────────────────────────────────────────┤ │ sess_abc123 │ Claude Desktop│ 2h 15m │ │ sess_def456 │ Cursor IDE │ 45m │ │ sess_ghi789 │ API Client │ 3h 30m │ └─────────────────────────────────────────────────┘

Session Details

Click any session to view:

  1. Connection Info

    • Client type and version
    • Connection time
    • IP address and location
  2. Activity Summary

    • Total requests
    • Functions used
    • Error count
  3. Resource Usage

    • Token consumption
    • Data transferred
    • Rate limit status

Session Controls

Manage active sessions:

  • Terminate - End specific session
  • Block - Prevent reconnection
  • Message - Send notification to client
  • Export - Download session data

Analytics & Insights

Usage Analytics

Understand how your integrations are used:

Daily Patterns View your 24-hour usage patterns in the dashboard to identify peak times and optimize resource allocation.

Weekly Trends Compare week-over-week usage to track growth and identify patterns.

Function Distribution See which functions are most commonly used to optimize your gateway configuration.

Cost Analysis (Enterprise)

Track resource consumption and costs:

Monthly Usage Summary ├── API Calls: 125,430 ($125.43) ├── Data Transfer: 45.6 GB ($4.56) ├── Token Usage: 15.2M ($152.00) └── Total Cost: $281.99

User Analytics

Monitor team usage:

User Activity (Last 30 days) ├── alice@team.com: 45,230 requests ├── bob@team.com: 32,150 requests └── charlie@team.com: 18,420 requests

Alerting & Notifications

Setting Up Alerts

Configure proactive monitoring:

  1. Navigate to Settings → Alerts

  2. Create New Alert

    Name: High Error Rate Condition: error_rate > 5% Window: 5 minutes Action: Email + Slack
  3. Alert Types

    • Threshold - Value exceeds limit
    • Anomaly - Unusual patterns
    • Absence - No data received
    • Trend - Sustained direction

Alert Conditions

Common alert configurations:

Performance Alerts

{ "name": "Slow Response", "metric": "response_time_p95", "operator": ">", "threshold": 1000, "duration": "5m" }

Error Alerts

{ "name": "High Error Rate", "metric": "error_rate", "operator": ">", "threshold": 0.05, "duration": "10m" }

Usage Alerts

{ "name": "Rate Limit Warning", "metric": "rate_limit_remaining", "operator": "<", "threshold": 100, "duration": "1m" }

Notification Channels

Configure where alerts are sent:

  • Email - Individual or team addresses
  • Slack - Channel notifications
  • Discord - Server notifications
  • PagerDuty - Incident management

Debugging & Troubleshooting

Debug Mode

Enable detailed logging for investigation:

  1. Gateway Debug

    • Settings → Debug Mode
    • Verbose request/response logs
    • Performance profiling
  2. App Debug

    • App Settings → Enable Debug
    • Detailed error messages
    • Stack traces

Trace Requests

Follow requests through the system:

Request ID: req_xyz789 ├── Received: 10:30:45.123 ├── Gateway Processing: +12ms ├── App Authentication: +45ms ├── Function Execution: +180ms ├── Response Sent: +8ms └── Total Duration: 245ms

Common Issues

High Latency

  1. Check network connectivity
  2. Review gateway configuration
  3. Optimize function selection
  4. Consider caching

Frequent Errors

  1. Verify authentication
  2. Check rate limits
  3. Review error patterns
  4. Update configurations

Missing Data

  1. Confirm client connection
  2. Check data retention settings
  3. Verify time zone settings
  4. Review filters

Exporting & Reporting

Data Export

Export monitoring data for analysis:

Formats Available:

  • CSV - Spreadsheet analysis
  • JSON - Programmatic processing
  • PDF - Reports and documentation

Export Options:

Time Range: Last 7 days Data: Logs + Metrics Filters: Gateway=production Format: CSV

Scheduled Reports

Set up automated reporting:

  1. Create Report

    • Choose metrics
    • Set schedule
    • Add recipients
  2. Report Types

    • Daily summary
    • Weekly trends
    • Monthly analytics
    • Custom dashboards

API Access

Programmatic access to monitoring data:

# Fetch metrics curl -H "Authorization: Bearer YOUR_API_KEY" \ https://api.stormmcp.ai/v1/metrics?range=24h # Get logs curl -H "Authorization: Bearer YOUR_API_KEY" \ https://api.stormmcp.ai/v1/logs?gateway=dev-tools

Best Practices

Monitoring Strategy

  1. Define Key Metrics

    • Identify critical indicators
    • Set baseline values
    • Track trends over time
  2. Set Appropriate Alerts

    • Avoid alert fatigue
    • Focus on actionable items
    • Use escalation policies
  3. Regular Reviews

    • Weekly performance reviews
    • Monthly trend analysis
    • Quarterly optimization

Performance Optimization

Based on monitoring insights:

  1. Optimize Slow Functions

    • Identify bottlenecks
    • Cache frequent queries
    • Batch operations
  2. Reduce Error Rates

    • Fix common errors first
    • Improve error handling
    • Add retry logic
  3. Balance Load

    • Distribute across gateways
    • Use multiple instances
    • Implement rate limiting

Next Steps

With monitoring configured:

  1. Set Up Alerts
  2. Optimize Performance
  3. Create Custom Dashboards

Related Resources