_private/qwestly-docs/Engineering/Monitoring/Log Shipping System.md
Table of Contents
Log Shipping System
Document Version: 2.0
Date: July 2025
Author: Dominick Pham
Purpose: Comprehensive reference for Qwestly's log shipping system (user & engineering)
System Overview
The Qwestly Log Shipping System automates the collection, storage, and monitoring of application logs (Auth and API only) from production systems to AWS S3 for analysis, compliance, and operational insights. This system ensures retention and audit requirements are met cost-effectively.
Key Goals
- Operational Monitoring: Real-time system health and performance tracking
- Security Analysis: Authentication and authorization event monitoring
- Compliance: Audit trails and data retention
- Debugging: Historical log analysis for issue resolution
Architecture
graph TB
A[Application Services] --> B[Log Generation]
B --> C[Local Log Files]
C --> D[Log Shipping Process]
D --> E[AWS S3]
F[Monitoring API] --> G[Log Status Checking]
G --> E
G --> H[Status Dashboard]
I[Manual Triggers] --> D
J[Scheduled Jobs] --> D
E --> K[Log Analysis Tools]
E --> L[Compliance Reports]
E --> M[Alert Systems]
Log Types
Authentication Logs (auth)
- Purpose: Track user authentication, authorization, and security events
- Criticality: Critical - Must be present daily
- Content: Login attempts, permission changes, security violations
- Format: JSON with timestamp, user_id, action, result, metadata
Example Entry:
{
"timestamp": "2025-07-02T10:30:00Z",
"level": "INFO",
"type": "auth",
"user_id": "user_123",
"action": "login_attempt",
"result": "success",
"ip_address": "192.168.1.100",
"user_agent": "Mozilla/5.0...",
"metadata": {
"method": "email_password",
"session_id": "sess_abc123"
}
}
API Logs (api)
- Purpose: HTTP request/response tracking and performance monitoring
- Criticality: Optional - May not exist on low-traffic days
- Content: Request details, response codes, processing times, errors
- Format: Structured JSON with request/response metadata
Example Entry:
{
"timestamp": "2025-07-02T10:30:00Z",
"level": "INFO",
"type": "api",
"method": "GET",
"path": "/api/users",
"status_code": 200,
"duration_ms": 125,
"user_id": "user_123",
"request_id": "req_abc123",
"ip_address": "192.168.1.100"
}
Storage Structure
AWS S3 Organization
qwestly--logs/
├── auth/
│ ├── 2025-07-01/
│ │ └── auth.log
│ └── ...
└── api/
├── 2025-07-01/
│ └── api.log
└── ...
- Directory:
{log_type}/{YYYY-MM-DD}/ - Filename:
{log_type}.log - Full Path:
qwestly--logs/{log_type}/{YYYY-MM-DD}/{log_type}.log
Shipping Process
Automated Daily Shipping
- Collection: Gather logs from production systems
- Validation: Verify log format and completeness
- Compression: Gzip compress for storage efficiency
- Upload: Transfer to AWS S3
- Verification: Confirm successful upload
- Cleanup: Remove local temporary files
Manual Shipping
Available via the Monitoring API for on-demand log collection:
curl -X POST https://qwestly-monitoring-api.vercel.app/api/logs/ship
Shipping Schedule
- Automated: Daily at 02:00 UTC via CI/CD pipeline
- Manual: On-demand via API or dashboard
- Retry Logic: 3 attempts with exponential backoff
- Alerting: Notifications for failed shipping attempts
Monitoring & Status
- Recent Log Status: Last 7 days of log availability by type
- Missing Dates: Gaps in log collection timeline
- File Statistics: Entry counts and file sizes
- Shipping History: Success/failure tracking
Health Indicators
| Status | Criteria | Action Required |
|---|---|---|
| Healthy | All expected logs present | None |
| Warning | Optional logs missing | Monitor |
| Critical | Auth logs missing | Immediate investigation |
Dashboard Integration
- Real-time status: https://qwestly-status.vercel.app
- Status API: https://qwestly-monitoring-api.vercel.app/api/logs/status
Configuration
Environment Variables
# Supabase
SUPABASE_URL=https://project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=service_role_key
# AWS
AWS_ACCESS_KEY_ID=access_key
AWS_SECRET_ACCESS_KEY=secret_key
AWS_S3_LOGS_BUCKET=qwestly--logs
# AWS_REGION is only needed for CLI, not the API
# Optional
SLACK_WEBHOOK_URL=https://hooks.slack.com/...
# Deployment Tracking
DEPLOYMENT_DATE=2025-07-01
# Log Directory (optional)
LOG_DIRECTORY=/var/log/qwestly
AWS Permissions
s3:PutObject- Upload log filess3:GetObject- Read existing logs for verifications3:ListBucket- Check log inventorys3:GetBucketLocation- Verify bucket access
Bucket Configuration
aws s3 mb s3://qwestly--logs
aws s3api put-bucket-lifecycle-configuration --bucket qwestly--logs --lifecycle-configuration file://log-lifecycle.json
Lifecycle Policy (log-lifecycle.json):
{
"Rules": [
{
"ID": "ID",
"Filter": { "Prefix": "" },
"Status": "Enabled",
"Expiration": { "Days": 2555 }
}
]
}
Security & Compliance
Data Security
- Encryption in Transit: HTTPS/TLS for all transfers
- Encryption at Rest: AWS S3 default encryption
- Access Control: IAM-based access with least privilege
- Audit Trail: All access logged and monitored
Compliance Requirements
- Retention Period: 7 years for regulatory compliance
- Data Location: US region for data sovereignty
- Access Logging: All log access tracked and auditable
- Integrity Verification: Checksums and file validation
Sensitive Data Handling
- PII Redaction: Personal information masked in logs
- Credential Filtering: No passwords or tokens logged
- Data Classification: Logs classified by sensitivity level
Troubleshooting & Operational Procedures
Common Issues
Missing Auth Logs:
- Check application authentication logging configuration
- Verify log file permissions and location
- Test authentication flow to generate logs
- Review log shipping process execution
S3 Upload Failures:
- Verify service account credentials
- Check bucket permissions and existence
- Test network connectivity to S3
- Review upload process logs
Large Log Files:
- Implement log rotation at application level
- Use streaming upload for large files
- Consider log compression before upload
- Monitor storage costs and usage
Debug Commands
curl https://qwestly-monitoring-api.vercel.app/api/logs/status
curl https://qwestly-monitoring-api.vercel.app/api/logs/report
aws s3 ls s3://qwestly--logs/
aws s3 cp local-log-file.log s3://qwestly--logs/auth/2025-07-02/
Technical Implementation
Core Log Shipping Service
- Connects to Supabase using service role key
- Extracts logs from Auth and API sources
- Formats and structures log data with metadata
- Uploads to S3 with proper encryption and storage class
- Handles errors and retry logic
Key Methods:
class SupabaseLogShipper:
def get_auth_logs(start_date, end_date) # Fetch authentication logs
def get_api_logs(start_date, end_date) # Fetch API logs
def ship_logs_to_s3(logs, log_type, date) # Upload to S3
def ship_daily_logs(target_date) # Main execution method
def setup_s3_bucket() # Initialize S3 configuration
CLI & Automation
setup- Initialize S3 bucket and lifecycle policiestest- Verify connections to Supabase and AWSdaily- Ship logs for specific date (default: yesterday)historical- Ship logs for multiple daysmonitor- Check shipping status and generate reports
Usage Examples:
python ship_logs.py daily
python ship_logs.py daily --date 2025-01-15
python ship_logs.py setup
python ship_logs.py test
python ship_logs.py historical --days 7
Monitoring & Observability
- Check log shipping status for recent days
- Identify missing logs and gaps
- Generate comprehensive status reports
- Calculate storage statistics and costs
- Health checks for system components
API Endpoints:
GET /api/logs/health- System health checkGET /api/logs/status- Detailed shipping statusGET /api/logs/report- Human-readable reportPOST /api/logs/ship- Manual triggerGET /api/logs/supabase- Fetch Supabase logs from S3
Testing & Validation
- Unit and integration tests for log collection, shipping, and monitoring
- Manual verification of log completeness and S3 storage
- API endpoint testing
- Audit preparation and evidence collection
Cost Optimization
- Compress logs before storage
- Filter out non-essential log entries
- Implement intelligent archiving
- Monitor and adjust lifecycle policies
- Use free/low-cost AWS tiers where possible
Future Enhancements
- Real-time log analytics dashboard
- Anomaly detection and alerting
- Integration with external SIEM tools
- Custom metrics and visualization
- Parallel processing for large log volumes
- Incremental shipping for real-time logs
- Compression and deduplication
- Enhanced error handling and retry logic
- Machine learning for pattern detection
- Automated threat intelligence
- Compliance reporting automation
- Multi-region storage replication
- High-availability architecture
- Auto-scaling based on volume
Support
- Check the monitoring dashboard for current status
- Review this documentation or contact engineering for help
- Test manual log shipping via API
- Contact engineering team via Slack
#engineering