_private/qwestly-docs/Engineering/Monitoring/monitoring-api.md
Table of Contents
Monitoring API
The Qwestly Monitoring API is a FastAPI-based backend service that provides real-time system health metrics, log shipping management, and operational insights for internal monitoring systems.
Overview
URL: https://qwestly-monitoring-api.vercel.app
Repository: /qwestly-monitoring/qwestly-monitoring-api/
Technology: FastAPI, Python 3.11+, psutil, AWS S3
Documentation: https://qwestly-monitoring-api.vercel.app/docs
Architecture
graph TB
A[FastAPI Application] --> B[Health Endpoints]
A --> C[Log Management Endpoints]
A --> D[CORS Middleware]
B --> E[psutil System Metrics]
B --> F[Environment Validation]
C --> G[AWS S3]
C --> H[Log File Analysis]
C --> I[Manual Shipping Triggers]
J[Status Dashboard] -->|HTTPS| A
K[CI/CD Pipeline] -->|Deploy| L[Vercel Serverless]
API Endpoints
Health & Status
GET /
Service information and available endpoints.
Response:
{
"service": "Qwestly Internal Monitoring",
"version": "1.0.0",
"status": "operational",
"cors": "permissive-internal",
"endpoints": {
"health": "/api/health",
"logs_status": "/api/logs/status",
"logs_report": "/api/logs/report",
"documentation": "/docs"
}
}
GET /api/health
Comprehensive system health check with detailed metrics.
Response:
{
"status": "healthy",
"timestamp": "2025-07-02T10:30:00Z",
"system": {
"cpu_percent": 45.2,
"memory_percent": 68.1,
"memory_available_gb": 8.2,
"disk_percent": 35.7,
"disk_free_gb": 125.4
},
"environment": {
"AWS_REGION": "✅ Set",
"AWS_BUCKET": "✅ Set",
"DEPLOYMENT_DATE": "✅ Set"
},
"missing_variables": []
}
GET /api/health/simple
Basic health status for uptime monitoring.
Response:
{
"status": "healthy",
"timestamp": "2025-07-02T10:30:00Z"
}
Log Management
GET /api/logs/status
Current log shipping status with historical data.
Response:
{
"status": "healthy",
"deployment_date": "2025-07-01",
"days_since_deployment": 1,
"bucket": "qwestly-logs",
"recent_logs": {
"2025-07-01": {
"auth": true,
"postgres": true,
"api": false
}
},
"yesterday_stats": {
"auth": {
"exists": true,
"count": 1547,
"size_bytes": 156780
},
"postgres": {
"exists": true,
"count": 892,
"size_bytes": 89432
},
"api": {
"exists": false,
"count": 0,
"size_bytes": 0
}
},
"missing_dates": [],
"last_check": "2025-07-02T10:30:00Z"
}
GET /api/logs/report
Detailed log shipping report with comprehensive analysis.
Response:
{
"report": "=== Qwestly Log Shipping Report ===\n\nGenerated: 2025-07-02 10:30:00\nDeployment Date: 2025-07-01\nDays Since Deployment: 1\n\n=== Recent Log Status ===\n2025-07-01:\n ✅ auth: 1547 entries (153 KB)\n ✅ postgres: 892 entries (87 KB)\n ❌ api: No logs found\n\n=== Summary ===\nTotal log days: 1\nSuccessful auth days: 1\nSuccessful postgres days: 1\nSuccessful api days: 0\n\n=== Recommendations ===\n- Monitor API log generation\n- All critical logs present for operational day"
}
POST /api/logs/ship
Manual log shipping trigger for on-demand log collection.
Response:
{
"success": true,
"message": "Log shipping initiated successfully",
"timestamp": "2025-07-02T10:30:00Z"
}
Documentation
GET /docs
Interactive API documentation (Swagger UI) with live testing capabilities.
GET /redoc
Alternative documentation format (ReDoc) with detailed schemas.
System Metrics
CPU Monitoring
- Data Source:
psutil.cpu_percent() - Update Frequency: Real-time on request
- Thresholds: Warning 80%, Critical 90%
Memory Monitoring
- Data Source:
psutil.virtual_memory() - Metrics: Usage percentage, available GB
- Thresholds: Warning 80%, Critical 90%
Disk Monitoring
- Data Source:
psutil.disk_usage('/') - Metrics: Usage percentage, free space GB
- Thresholds: Warning 80%, Critical 90%
Environment Validation
The API validates required environment variables and reports their status:
| Variable | Purpose | Required |
|---|---|---|
AWS_REGION |
AWS region | Yes |
AWS_BUCKET |
Log storage bucket | Yes |
DEPLOYMENT_DATE |
Deployment tracking | No |
FRONTEND_URL |
CORS configuration | No |
Log Shipping Integration
AWS S3
- Bucket: Configured via
AWS_BUCKET - Authentication: Service account or default credentials
- File Format: Daily log files organized by date and type
Log Types
- auth: Authentication and authorization logs (critical)
- postgres: Database query and connection logs (important)
- api: Application API request logs (optional)
Shipping Schedule
- Automated: Daily shipping via CI/CD pipeline
- Manual: On-demand via POST
/api/logs/ship - Monitoring: Status tracked and reported in dashboard
CORS Configuration
The API is configured with permissive CORS for internal tools:
allow_origins = ["*"] # Internal monitoring tools only
allow_credentials = False
allow_methods = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allow_headers = ["*"]
Security Considerations
- Internal Use Only: API designed for internal operations
- Network Security: Should be behind VPN or internal network
- No Authentication: Currently open access (internal tool)
- Audit Trail: All requests logged for monitoring
Development
Local Setup
cd /Users/dominick/Work/qwestly-workspace/qwestly-monitoring
npm run dev:api
Environment Variables
# qwestly-monitoring-api/.env
AWS_REGION=your-region
AWS_BUCKET=your-log-bucket
DEPLOYMENT_DATE=2025-07-02
FRONTEND_URL=https://qwestly-status.vercel.app
Testing
# Test health endpoint
curl http://localhost:3001/api/health
# Test log status
curl http://localhost:3001/api/logs/status
# Test manual log shipping
curl -X POST http://localhost:3001/api/logs/ship
Deployment
Production Environment
- Platform: Vercel Serverless Functions
- Runtime: Python 3.11
- Auto-scaling: Automatic based on request volume
- Geographic Distribution: Global edge deployment
Deployment Process
cd qwestly-monitoring-api
npm run deploy
Environment Configuration
Set environment variables in Vercel dashboard:
- Navigate to project settings
- Add environment variables
- Redeploy for changes to take effect
Health Monitoring
- Uptime: Monitored via Vercel analytics
- Response Times: Tracked via API metrics
- Error Rates: Logged and alerted
Performance
Response Times
| Endpoint | Target | Typical |
|---|---|---|
/api/health |
< 200ms | ~150ms |
/api/logs/status |
< 500ms | ~300ms |
/api/logs/report |
< 1s | ~800ms |
Optimization Features
- Caching: System metrics cached for 30 seconds
- Lazy Loading: GCS operations only when needed
- Error Handling: Fast failure with meaningful messages
- Connection Pooling: Efficient resource utilization
Error Handling
HTTP Status Codes
- 200: Success
- 400: Bad Request (invalid parameters)
- 500: Internal Server Error
- 503: Service Unavailable (dependencies down)
Error Response Format
{
"error": "Service temporarily unavailable",
"detail": "AWS S3 connection failed",
"timestamp": "2025-07-02T10:30:00Z"
}
Common Errors
- S3 Connection: Credential or network issues
- System Metrics: Platform limitations in serverless
- Environment: Missing required variables
Monitoring & Logging
Application Logs
- Request Logging: All API calls logged
- Error Tracking: Exceptions captured with context
- Performance Metrics: Response times and resource usage
Health Checks
- Self-Monitoring: Health endpoint includes API status
- Dependencies: S3 connectivity validation
- Resource Monitoring: System resource availability
Security
Input Validation
- Request Validation: Pydantic models for all inputs
- Output Sanitization: Structured JSON responses only
- Error Sanitization: No sensitive data in error messages
Access Control
- CORS Policy: Configured for known frontend domains
- Rate Limiting: Vercel built-in protection
- Input Sanitization: All user inputs validated
Troubleshooting
Common Issues
API not responding:
- Check Vercel deployment status
- Verify environment variables
- Test individual endpoints
- Review application logs
S3 connection failures:
- Verify
AWS_REGIONandAWS_BUCKET - Check service account permissions
- Test S3 connectivity directly
- Review authentication configuration
CORS errors:
- Check
FRONTEND_URLconfiguration - Verify frontend domain in CORS settings
- Test with curl to isolate browser issues
Debug Endpoints
CORS Debug:
curl https://qwestly-monitoring-api.vercel.app/api/cors-debug
Health Check:
curl https://qwestly-monitoring-api.vercel.app/api/health
Logs Access
- Vercel Dashboard: Real-time function logs
- CLI:
vercel logs --project qwestly-monitoring-api - API: Error details in response bodies
Future Enhancements
Planned Features
- Authentication: API key or JWT token validation
- Rate Limiting: Per-client request quotas
- Webhooks: Event notifications for status changes
- Metrics Export: Prometheus/Grafana integration
- Alerting: Automated notifications for critical issues
Performance Improvements
- Database Integration: Persistent storage for historical data
- Caching Layer: Redis for improved response times
- Background Jobs: Async processing for heavy operations
- Load Balancing: Multi-region deployment
Support
For API-related issues:
- Check interactive documentation at
/docs - Review this documentation or contact engineering for help
- Test endpoints with curl or Postman
- Contact engineering team via Slack
#engineering
Last updated: July 2025