_private/qwestly-docs/Engineering/Monitoring/monitoring-api.md

Monitoring API

The Qwestly Monitoring API is a FastAPI-based backend service that provides real-time system health metrics, log shipping management, and operational insights for internal monitoring systems.

Overview

URL: https://qwestly-monitoring-api.vercel.app
Repository: /qwestly-monitoring/qwestly-monitoring-api/
Technology: FastAPI, Python 3.11+, psutil, AWS S3
Documentation: https://qwestly-monitoring-api.vercel.app/docs

Architecture

graph TB A[FastAPI Application] --> B[Health Endpoints] A --> C[Log Management Endpoints] A --> D[CORS Middleware] B --> E[psutil System Metrics] B --> F[Environment Validation] C --> G[AWS S3] C --> H[Log File Analysis] C --> I[Manual Shipping Triggers] J[Status Dashboard] -->|HTTPS| A K[CI/CD Pipeline] -->|Deploy| L[Vercel Serverless]

API Endpoints

Health & Status

GET /

Service information and available endpoints.

Response:

{
  "service": "Qwestly Internal Monitoring",
  "version": "1.0.0",
  "status": "operational",
  "cors": "permissive-internal",
  "endpoints": {
    "health": "/api/health",
    "logs_status": "/api/logs/status",
    "logs_report": "/api/logs/report",
    "documentation": "/docs"
  }
}

GET /api/health

Comprehensive system health check with detailed metrics.

Response:

{
  "status": "healthy",
  "timestamp": "2025-07-02T10:30:00Z",
  "system": {
    "cpu_percent": 45.2,
    "memory_percent": 68.1,
    "memory_available_gb": 8.2,
    "disk_percent": 35.7,
    "disk_free_gb": 125.4
  },
  "environment": {
    "AWS_REGION": "✅ Set",
    "AWS_BUCKET": "✅ Set",
    "DEPLOYMENT_DATE": "✅ Set"
  },
  "missing_variables": []
}

GET /api/health/simple

Basic health status for uptime monitoring.

Response:

{
  "status": "healthy",
  "timestamp": "2025-07-02T10:30:00Z"
}

Log Management

GET /api/logs/status

Current log shipping status with historical data.

Response:

{
  "status": "healthy",
  "deployment_date": "2025-07-01",
  "days_since_deployment": 1,
  "bucket": "qwestly-logs",
  "recent_logs": {
    "2025-07-01": {
      "auth": true,
      "postgres": true,
      "api": false
    }
  },
  "yesterday_stats": {
    "auth": {
      "exists": true,
      "count": 1547,
      "size_bytes": 156780
    },
    "postgres": {
      "exists": true,
      "count": 892,
      "size_bytes": 89432
    },
    "api": {
      "exists": false,
      "count": 0,
      "size_bytes": 0
    }
  },
  "missing_dates": [],
  "last_check": "2025-07-02T10:30:00Z"
}

GET /api/logs/report

Detailed log shipping report with comprehensive analysis.

Response:

{
  "report": "=== Qwestly Log Shipping Report ===\n\nGenerated: 2025-07-02 10:30:00\nDeployment Date: 2025-07-01\nDays Since Deployment: 1\n\n=== Recent Log Status ===\n2025-07-01:\n  ✅ auth: 1547 entries (153 KB)\n  ✅ postgres: 892 entries (87 KB)\n  ❌ api: No logs found\n\n=== Summary ===\nTotal log days: 1\nSuccessful auth days: 1\nSuccessful postgres days: 1\nSuccessful api days: 0\n\n=== Recommendations ===\n- Monitor API log generation\n- All critical logs present for operational day"
}

POST /api/logs/ship

Manual log shipping trigger for on-demand log collection.

Response:

{
  "success": true,
  "message": "Log shipping initiated successfully",
  "timestamp": "2025-07-02T10:30:00Z"
}

Documentation

GET /docs

Interactive API documentation (Swagger UI) with live testing capabilities.

GET /redoc

Alternative documentation format (ReDoc) with detailed schemas.

System Metrics

CPU Monitoring

  • Data Source: psutil.cpu_percent()
  • Update Frequency: Real-time on request
  • Thresholds: Warning 80%, Critical 90%

Memory Monitoring

  • Data Source: psutil.virtual_memory()
  • Metrics: Usage percentage, available GB
  • Thresholds: Warning 80%, Critical 90%

Disk Monitoring

  • Data Source: psutil.disk_usage('/')
  • Metrics: Usage percentage, free space GB
  • Thresholds: Warning 80%, Critical 90%

Environment Validation

The API validates required environment variables and reports their status:

Variable Purpose Required
AWS_REGION AWS region Yes
AWS_BUCKET Log storage bucket Yes
DEPLOYMENT_DATE Deployment tracking No
FRONTEND_URL CORS configuration No

Log Shipping Integration

AWS S3

  • Bucket: Configured via AWS_BUCKET
  • Authentication: Service account or default credentials
  • File Format: Daily log files organized by date and type

Log Types

  • auth: Authentication and authorization logs (critical)
  • postgres: Database query and connection logs (important)
  • api: Application API request logs (optional)

Shipping Schedule

  • Automated: Daily shipping via CI/CD pipeline
  • Manual: On-demand via POST /api/logs/ship
  • Monitoring: Status tracked and reported in dashboard

CORS Configuration

The API is configured with permissive CORS for internal tools:

allow_origins = ["*"]  # Internal monitoring tools only
allow_credentials = False
allow_methods = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allow_headers = ["*"]

Security Considerations

  • Internal Use Only: API designed for internal operations
  • Network Security: Should be behind VPN or internal network
  • No Authentication: Currently open access (internal tool)
  • Audit Trail: All requests logged for monitoring

Development

Local Setup

cd /Users/dominick/Work/qwestly-workspace/qwestly-monitoring
npm run dev:api

Environment Variables

# qwestly-monitoring-api/.env
AWS_REGION=your-region
AWS_BUCKET=your-log-bucket
DEPLOYMENT_DATE=2025-07-02
FRONTEND_URL=https://qwestly-status.vercel.app

Testing

# Test health endpoint
curl http://localhost:3001/api/health

# Test log status
curl http://localhost:3001/api/logs/status

# Test manual log shipping
curl -X POST http://localhost:3001/api/logs/ship

Deployment

Production Environment

  • Platform: Vercel Serverless Functions
  • Runtime: Python 3.11
  • Auto-scaling: Automatic based on request volume
  • Geographic Distribution: Global edge deployment

Deployment Process

cd qwestly-monitoring-api
npm run deploy

Environment Configuration

Set environment variables in Vercel dashboard:

  1. Navigate to project settings
  2. Add environment variables
  3. Redeploy for changes to take effect

Health Monitoring

  • Uptime: Monitored via Vercel analytics
  • Response Times: Tracked via API metrics
  • Error Rates: Logged and alerted

Performance

Response Times

Endpoint Target Typical
/api/health < 200ms ~150ms
/api/logs/status < 500ms ~300ms
/api/logs/report < 1s ~800ms

Optimization Features

  • Caching: System metrics cached for 30 seconds
  • Lazy Loading: GCS operations only when needed
  • Error Handling: Fast failure with meaningful messages
  • Connection Pooling: Efficient resource utilization

Error Handling

HTTP Status Codes

  • 200: Success
  • 400: Bad Request (invalid parameters)
  • 500: Internal Server Error
  • 503: Service Unavailable (dependencies down)

Error Response Format

{
  "error": "Service temporarily unavailable",
  "detail": "AWS S3 connection failed",
  "timestamp": "2025-07-02T10:30:00Z"
}

Common Errors

  • S3 Connection: Credential or network issues
  • System Metrics: Platform limitations in serverless
  • Environment: Missing required variables

Monitoring & Logging

Application Logs

  • Request Logging: All API calls logged
  • Error Tracking: Exceptions captured with context
  • Performance Metrics: Response times and resource usage

Health Checks

  • Self-Monitoring: Health endpoint includes API status
  • Dependencies: S3 connectivity validation
  • Resource Monitoring: System resource availability

Security

Input Validation

  • Request Validation: Pydantic models for all inputs
  • Output Sanitization: Structured JSON responses only
  • Error Sanitization: No sensitive data in error messages

Access Control

  • CORS Policy: Configured for known frontend domains
  • Rate Limiting: Vercel built-in protection
  • Input Sanitization: All user inputs validated

Troubleshooting

Common Issues

API not responding:

  1. Check Vercel deployment status
  2. Verify environment variables
  3. Test individual endpoints
  4. Review application logs

S3 connection failures:

  1. Verify AWS_REGION and AWS_BUCKET
  2. Check service account permissions
  3. Test S3 connectivity directly
  4. Review authentication configuration

CORS errors:

  1. Check FRONTEND_URL configuration
  2. Verify frontend domain in CORS settings
  3. Test with curl to isolate browser issues

Debug Endpoints

CORS Debug:

curl https://qwestly-monitoring-api.vercel.app/api/cors-debug

Health Check:

curl https://qwestly-monitoring-api.vercel.app/api/health

Logs Access

  • Vercel Dashboard: Real-time function logs
  • CLI: vercel logs --project qwestly-monitoring-api
  • API: Error details in response bodies

Future Enhancements

Planned Features

  • Authentication: API key or JWT token validation
  • Rate Limiting: Per-client request quotas
  • Webhooks: Event notifications for status changes
  • Metrics Export: Prometheus/Grafana integration
  • Alerting: Automated notifications for critical issues

Performance Improvements

  • Database Integration: Persistent storage for historical data
  • Caching Layer: Redis for improved response times
  • Background Jobs: Async processing for heavy operations
  • Load Balancing: Multi-region deployment

Support

For API-related issues:

  1. Check interactive documentation at /docs
  2. Review this documentation or contact engineering for help
  3. Test endpoints with curl or Postman
  4. Contact engineering team via Slack #engineering

Last updated: July 2025