Monitoring and Metrics
This guide covers monitoring parsedmarc-go using Prometheus metrics and setting up alerting.
Prometheus Metrics
parsedmarc-go exposes Prometheus-compatible metrics at /metrics
endpoint.
Available Metrics
Processing Metrics
# Total reports processed
parsedmarc_reports_processed_total{type="aggregate|forensic|tls"} counter
# Failed report processing
parsedmarc_reports_failed_total{type="aggregate|forensic|tls", reason="parse_error|db_error|validation_error"} counter
# Processing duration
parsedmarc_report_processing_duration_seconds{type="aggregate|forensic|tls"} histogram
# Current processing queue size
parsedmarc_processing_queue_size gauge
HTTP Metrics
# HTTP requests
parsedmarc_http_requests_total{method="GET|POST", endpoint="/health|/metrics|/dmarc/report", status="200|400|500"} counter
# HTTP request duration
parsedmarc_http_request_duration_seconds{method="GET|POST", endpoint="/health|/metrics|/dmarc/report"} histogram
# Active HTTP connections
parsedmarc_http_connections_active gauge
# Upload size
parsedmarc_http_upload_size_bytes histogram
IMAP Metrics
# IMAP messages processed
parsedmarc_imap_messages_processed_total{mailbox="INBOX"} counter
# IMAP connection status
parsedmarc_imap_connection_status{server="imap.example.com"} gauge
# IMAP check duration
parsedmarc_imap_check_duration_seconds{mailbox="INBOX"} histogram
# Messages in mailbox
parsedmarc_imap_messages_in_mailbox{mailbox="INBOX"} gauge
Database Metrics
# ClickHouse operations
parsedmarc_clickhouse_operations_total{operation="insert|select", table="dmarc_aggregate_reports|dmarc_aggregate_records|dmarc_forensic_reports"} counter
# ClickHouse operation duration
parsedmarc_clickhouse_operation_duration_seconds{operation="insert|select"} histogram
# ClickHouse connection pool
parsedmarc_clickhouse_connections_active gauge
parsedmarc_clickhouse_connections_idle gauge
System Metrics
# Memory usage
parsedmarc_memory_usage_bytes gauge
# CPU usage
parsedmarc_cpu_usage_percent gauge
# Goroutines
parsedmarc_goroutines_total gauge
# File descriptors
parsedmarc_file_descriptors_open gauge
Metrics Endpoint
Access metrics at:
curl http://localhost:8080/metrics
Example output:
# HELP parsedmarc_reports_processed_total Total number of reports processed
# TYPE parsedmarc_reports_processed_total counter
parsedmarc_reports_processed_total{type="aggregate"} 1247
parsedmarc_reports_processed_total{type="forensic"} 23
# HELP parsedmarc_http_requests_total Total HTTP requests
# TYPE parsedmarc_http_requests_total counter
parsedmarc_http_requests_total{method="POST",endpoint="/dmarc/report",status="200"} 856
parsedmarc_http_requests_total{method="GET",endpoint="/health",status="200"} 12450
# HELP parsedmarc_processing_duration_seconds Time spent processing reports
# TYPE parsedmarc_processing_duration_seconds histogram
parsedmarc_processing_duration_seconds_bucket{type="aggregate",le="0.1"} 234
parsedmarc_processing_duration_seconds_bucket{type="aggregate",le="0.5"} 1156
Prometheus Configuration
Prometheus Setup
Add parsedmarc-go to your prometheus.yml
:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'parsedmarc-go'
static_configs:
- targets: ['localhost:8080']
metrics_path: /metrics
scrape_interval: 30s
scrape_timeout: 10s
Docker Compose with Prometheus
version: '3.8'
services:
parsedmarc-go:
image: parsedmarc-go:latest
ports:
- "8080:8080"
volumes:
- ./config.yaml:/app/config.yaml
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_INSTALL_PLUGINS=grafana-clickhouse-datasource
volumes:
- grafana_data:/var/lib/grafana
depends_on:
- prometheus
volumes:
prometheus_data:
grafana_data:
Grafana Dashboards
Prometheus Data Source
Add Prometheus as a data source in Grafana:
- URL:
http://prometheus:9090
- Access: Server (default)
Key Dashboards
Processing Performance
# Reports processed per second
rate(parsedmarc_reports_processed_total[5m])
# Processing success rate
rate(parsedmarc_reports_processed_total[5m]) /
(rate(parsedmarc_reports_processed_total[5m]) + rate(parsedmarc_reports_failed_total[5m])) * 100
# Average processing time
rate(parsedmarc_report_processing_duration_seconds_sum[5m]) /
rate(parsedmarc_report_processing_duration_seconds_count[5m])
HTTP Performance
# HTTP requests per second
rate(parsedmarc_http_requests_total[5m])
# HTTP error rate
rate(parsedmarc_http_requests_total{status=~"4..|5.."}[5m]) /
rate(parsedmarc_http_requests_total[5m]) * 100
# Average response time
rate(parsedmarc_http_request_duration_seconds_sum[5m]) /
rate(parsedmarc_http_request_duration_seconds_count[5m])
System Resources
# Memory usage
parsedmarc_memory_usage_bytes
# CPU usage
parsedmarc_cpu_usage_percent
# Active connections
parsedmarc_http_connections_active
Alerting
AlertManager Configuration
Create alert rules in alert_rules.yml
:
groups:
- name: parsedmarc-go
rules:
- alert: ParsedmarcHighErrorRate
expr: rate(parsedmarc_reports_failed_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate in parsedmarc-go"
description: "parsedmarc-go has error rate of errors per second"
- alert: ParsedmarcDown
expr: up{job="parsedmarc-go"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "parsedmarc-go is down"
description: "parsedmarc-go has been down for more than 1 minute"
- alert: ParsedmarcHighMemoryUsage
expr: parsedmarc_memory_usage_bytes > 500000000 # 500MB
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage in parsedmarc-go"
description: "parsedmarc-go is using B of memory"
- alert: ParsedmarcLowProcessingRate
expr: rate(parsedmarc_reports_processed_total[10m]) < 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Low processing rate in parsedmarc-go"
description: "parsedmarc-go is processing only reports per second"
- alert: ParsedmarcQueueBacklog
expr: parsedmarc_processing_queue_size > 100
for: 2m
labels:
severity: warning
annotations:
summary: "Processing queue backlog in parsedmarc-go"
description: "parsedmarc-go has items in processing queue"
Health Checks
Basic Health Check
curl http://localhost:8080/health
Response:
{
"status": "ok",
"timestamp": "2024-12-01T10:30:45Z",
"version": "1.0.0",
"uptime": "24h30m15s"
}
Kubernetes Health Checks
apiVersion: v1
kind: Pod
spec:
containers:
- name: parsedmarc-go
image: parsedmarc-go:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
Docker Health Check
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
Log Monitoring
Structured Logging
parsedmarc-go uses structured JSON logging:
{
"timestamp": "2024-12-01T10:30:45Z",
"level": "info",
"message": "Processing DMARC report",
"report_id": "12345678901234567890",
"org_name": "google.com",
"domain": "example.com",
"records": 15,
"processing_time_ms": 125
}
Performance Monitoring
Key Performance Indicators
- Throughput: Reports processed per second
- Latency: Average processing time per report
- Error Rate: Percentage of failed operations
- Resource Usage: CPU, memory, and network utilization
- Queue Depth: Backlog of pending operations
Check processing metrics
curl -s http://localhost:8080/metrics | grep parsedmarc_reports_processed_total
### Debug Metrics
Enable debug metrics in development:
```bash
export PARSEDMARC_DEBUG_METRICS=true
parsedmarc-go -daemon
This exposes additional debug metrics:
- Garbage collection statistics
- Memory allocation details
- Goroutine stack traces
Performance Profiling
Enable pprof endpoint for profiling:
# CPU profile
go tool pprof http://localhost:8080/debug/pprof/profile
# Memory profile
go tool pprof http://localhost:8080/debug/pprof/heap
# Goroutine profile
go tool pprof http://localhost:8080/debug/pprof/goroutine
For more monitoring and observability best practices, see the Configuration Guide.