_private/qwestly-docs/Engineering/Monitoring/Log Shipping System.md

Log Shipping System

Document Version: 2.0
Date: July 2025
Author: Dominick Pham
Purpose: Comprehensive reference for Qwestly's log shipping system (user & engineering)


System Overview

The Qwestly Log Shipping System automates the collection, storage, and monitoring of application logs (Auth and API only) from production systems to AWS S3 for analysis, compliance, and operational insights. This system ensures retention and audit requirements are met cost-effectively.

Key Goals

  • Operational Monitoring: Real-time system health and performance tracking
  • Security Analysis: Authentication and authorization event monitoring
  • Compliance: Audit trails and data retention
  • Debugging: Historical log analysis for issue resolution

Architecture

graph TB A[Application Services] --> B[Log Generation] B --> C[Local Log Files] C --> D[Log Shipping Process] D --> E[AWS S3] F[Monitoring API] --> G[Log Status Checking] G --> E G --> H[Status Dashboard] I[Manual Triggers] --> D J[Scheduled Jobs] --> D E --> K[Log Analysis Tools] E --> L[Compliance Reports] E --> M[Alert Systems]

Log Types

Authentication Logs (auth)

  • Purpose: Track user authentication, authorization, and security events
  • Criticality: Critical - Must be present daily
  • Content: Login attempts, permission changes, security violations
  • Format: JSON with timestamp, user_id, action, result, metadata

Example Entry:

{
  "timestamp": "2025-07-02T10:30:00Z",
  "level": "INFO",
  "type": "auth",
  "user_id": "user_123",
  "action": "login_attempt",
  "result": "success",
  "ip_address": "192.168.1.100",
  "user_agent": "Mozilla/5.0...",
  "metadata": {
    "method": "email_password",
    "session_id": "sess_abc123"
  }
}

API Logs (api)

  • Purpose: HTTP request/response tracking and performance monitoring
  • Criticality: Optional - May not exist on low-traffic days
  • Content: Request details, response codes, processing times, errors
  • Format: Structured JSON with request/response metadata

Example Entry:

{
  "timestamp": "2025-07-02T10:30:00Z",
  "level": "INFO",
  "type": "api",
  "method": "GET",
  "path": "/api/users",
  "status_code": 200,
  "duration_ms": 125,
  "user_id": "user_123",
  "request_id": "req_abc123",
  "ip_address": "192.168.1.100"
}

Storage Structure

AWS S3 Organization

qwestly--logs/
├── auth/
│   ├── 2025-07-01/
│   │   └── auth.log
│   └── ...
└── api/
    ├── 2025-07-01/
    │   └── api.log
    └── ...
  • Directory: {log_type}/{YYYY-MM-DD}/
  • Filename: {log_type}.log
  • Full Path: qwestly--logs/{log_type}/{YYYY-MM-DD}/{log_type}.log

Shipping Process

Automated Daily Shipping

  1. Collection: Gather logs from production systems
  2. Validation: Verify log format and completeness
  3. Compression: Gzip compress for storage efficiency
  4. Upload: Transfer to AWS S3
  5. Verification: Confirm successful upload
  6. Cleanup: Remove local temporary files

Manual Shipping

Available via the Monitoring API for on-demand log collection:

curl -X POST https://qwestly-monitoring-api.vercel.app/api/logs/ship

Shipping Schedule

  • Automated: Daily at 02:00 UTC via CI/CD pipeline
  • Manual: On-demand via API or dashboard
  • Retry Logic: 3 attempts with exponential backoff
  • Alerting: Notifications for failed shipping attempts

Monitoring & Status

  • Recent Log Status: Last 7 days of log availability by type
  • Missing Dates: Gaps in log collection timeline
  • File Statistics: Entry counts and file sizes
  • Shipping History: Success/failure tracking

Health Indicators

Status Criteria Action Required
Healthy All expected logs present None
Warning Optional logs missing Monitor
Critical Auth logs missing Immediate investigation

Dashboard Integration


Configuration

Environment Variables

# Supabase
SUPABASE_URL=https://project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=service_role_key

# AWS
AWS_ACCESS_KEY_ID=access_key
AWS_SECRET_ACCESS_KEY=secret_key
AWS_S3_LOGS_BUCKET=qwestly--logs
# AWS_REGION is only needed for CLI, not the API

# Optional
SLACK_WEBHOOK_URL=https://hooks.slack.com/...

# Deployment Tracking
DEPLOYMENT_DATE=2025-07-01
# Log Directory (optional)
LOG_DIRECTORY=/var/log/qwestly

AWS Permissions

  • s3:PutObject - Upload log files
  • s3:GetObject - Read existing logs for verification
  • s3:ListBucket - Check log inventory
  • s3:GetBucketLocation - Verify bucket access

Bucket Configuration

aws s3 mb s3://qwestly--logs
aws s3api put-bucket-lifecycle-configuration --bucket qwestly--logs --lifecycle-configuration file://log-lifecycle.json

Lifecycle Policy (log-lifecycle.json):

{
  "Rules": [
    {
      "ID": "ID",
      "Filter": { "Prefix": "" },
      "Status": "Enabled",
      "Expiration": { "Days": 2555 }
    }
  ]
}

Security & Compliance

Data Security

  • Encryption in Transit: HTTPS/TLS for all transfers
  • Encryption at Rest: AWS S3 default encryption
  • Access Control: IAM-based access with least privilege
  • Audit Trail: All access logged and monitored

Compliance Requirements

  • Retention Period: 7 years for regulatory compliance
  • Data Location: US region for data sovereignty
  • Access Logging: All log access tracked and auditable
  • Integrity Verification: Checksums and file validation

Sensitive Data Handling

  • PII Redaction: Personal information masked in logs
  • Credential Filtering: No passwords or tokens logged
  • Data Classification: Logs classified by sensitivity level

Troubleshooting & Operational Procedures

Common Issues

Missing Auth Logs:

  1. Check application authentication logging configuration
  2. Verify log file permissions and location
  3. Test authentication flow to generate logs
  4. Review log shipping process execution

S3 Upload Failures:

  1. Verify service account credentials
  2. Check bucket permissions and existence
  3. Test network connectivity to S3
  4. Review upload process logs

Large Log Files:

  1. Implement log rotation at application level
  2. Use streaming upload for large files
  3. Consider log compression before upload
  4. Monitor storage costs and usage

Debug Commands

curl https://qwestly-monitoring-api.vercel.app/api/logs/status
curl https://qwestly-monitoring-api.vercel.app/api/logs/report
aws s3 ls s3://qwestly--logs/
aws s3 cp local-log-file.log s3://qwestly--logs/auth/2025-07-02/

Technical Implementation

Core Log Shipping Service

  • Connects to Supabase using service role key
  • Extracts logs from Auth and API sources
  • Formats and structures log data with metadata
  • Uploads to S3 with proper encryption and storage class
  • Handles errors and retry logic

Key Methods:

class SupabaseLogShipper:
    def get_auth_logs(start_date, end_date)     # Fetch authentication logs
    def get_api_logs(start_date, end_date)      # Fetch API logs
    def ship_logs_to_s3(logs, log_type, date)  # Upload to S3
    def ship_daily_logs(target_date)           # Main execution method
    def setup_s3_bucket()                      # Initialize S3 configuration

CLI & Automation

  • setup - Initialize S3 bucket and lifecycle policies
  • test - Verify connections to Supabase and AWS
  • daily - Ship logs for specific date (default: yesterday)
  • historical - Ship logs for multiple days
  • monitor - Check shipping status and generate reports

Usage Examples:

python ship_logs.py daily
python ship_logs.py daily --date 2025-01-15
python ship_logs.py setup
python ship_logs.py test
python ship_logs.py historical --days 7

Monitoring & Observability

  • Check log shipping status for recent days
  • Identify missing logs and gaps
  • Generate comprehensive status reports
  • Calculate storage statistics and costs
  • Health checks for system components

API Endpoints:

  • GET /api/logs/health - System health check
  • GET /api/logs/status - Detailed shipping status
  • GET /api/logs/report - Human-readable report
  • POST /api/logs/ship - Manual trigger
  • GET /api/logs/supabase - Fetch Supabase logs from S3

Testing & Validation

  • Unit and integration tests for log collection, shipping, and monitoring
  • Manual verification of log completeness and S3 storage
  • API endpoint testing
  • Audit preparation and evidence collection

Cost Optimization

  • Compress logs before storage
  • Filter out non-essential log entries
  • Implement intelligent archiving
  • Monitor and adjust lifecycle policies
  • Use free/low-cost AWS tiers where possible

Future Enhancements

  • Real-time log analytics dashboard
  • Anomaly detection and alerting
  • Integration with external SIEM tools
  • Custom metrics and visualization
  • Parallel processing for large log volumes
  • Incremental shipping for real-time logs
  • Compression and deduplication
  • Enhanced error handling and retry logic
  • Machine learning for pattern detection
  • Automated threat intelligence
  • Compliance reporting automation
  • Multi-region storage replication
  • High-availability architecture
  • Auto-scaling based on volume

Support

  • Check the monitoring dashboard for current status
  • Review this documentation or contact engineering for help
  • Test manual log shipping via API
  • Contact engineering team via Slack #engineering