CloudSnooze - Data Model

This document defines the primary data structures used in CloudSnooze, including configuration, metrics, and communication protocols.

Configuration Model

The main configuration for CloudSnooze is stored in JSON format at /etc/snooze/snooze.json:

// Config represents the complete configuration
type Config struct {
    // General settings
    CheckIntervalSeconds int     `json:"check_interval_seconds"`
    NaptimeMinutes       int     `json:"naptime_minutes"`
    
    // Thresholds
    CPUThresholdPercent    float64 `json:"cpu_threshold_percent"`
    MemoryThresholdPercent float64 `json:"memory_threshold_percent"`
    NetworkThresholdKBps   float64 `json:"network_threshold_kbps"`
    DiskIOThresholdKBps    float64 `json:"disk_io_threshold_kbps"`
    InputIdleThresholdSecs int     `json:"input_idle_threshold_secs"`
    
    // GPU/Accelerator settings
    GPUMonitoringEnabled bool    `json:"gpu_monitoring_enabled"`
    GPUThresholdPercent  float64 `json:"gpu_threshold_percent"`
    
    // AWS settings
    AWSRegion          string `json:"aws_region"`
    EnableInstanceTags bool   `json:"enable_instance_tags"`
    TaggingPrefix      string `json:"tagging_prefix"`
    
    // Logging settings
    Logging LoggingConfig `json:"logging"`
    
    // Advanced settings
    MonitoringMode string `json:"monitoring_mode"` // "basic" or "advanced"
}

// LoggingConfig defines logging behavior
type LoggingConfig struct {
    LogLevel           string `json:"log_level"` // "debug", "info", "warn", "error"
    EnableFileLogging  bool   `json:"enable_file_logging"`
    LogFilePath        string `json:"log_file_path"`
    EnableSyslog       bool   `json:"enable_syslog"`
    EnableCloudWatch   bool   `json:"enable_cloudwatch"`
    CloudWatchLogGroup string `json:"cloudwatch_log_group"`
}

Default configuration (snooze.json):

{
  "check_interval_seconds": 60,
  "naptime_minutes": 30,
  "cpu_threshold_percent": 10.0,
  "memory_threshold_percent": 30.0,
  "network_threshold_kbps": 50.0,
  "disk_io_threshold_kbps": 100.0,
  "input_idle_threshold_secs": 900,
  "gpu_monitoring_enabled": true,
  "gpu_threshold_percent": 5.0,
  "aws_region": "us-east-1",
  "enable_instance_tags": true,
  "tagging_prefix": "CloudSnooze",
  "logging": {
    "log_level": "info",
    "enable_file_logging": true,
    "log_file_path": "/var/log/cloudsnooze.log",
    "enable_syslog": false,
    "enable_cloudwatch": false,
    "cloudwatch_log_group": "CloudSnooze"
  },
  "monitoring_mode": "basic"
}

System Metrics Model

Collected metrics are structured as follows:

// SystemMetrics represents a complete set of system measurements
type SystemMetrics struct {
    Timestamp        time.Time       `json:"timestamp"`
    CPUPercent       float64         `json:"cpu_percent"`
    MemoryPercent    float64         `json:"memory_percent"`
    NetworkKBps      float64         `json:"network_kbps"`
    DiskIOKBps       float64         `json:"disk_io_kbps"`
    InputIdleSecs    int             `json:"input_idle_secs"`
    GPUMetrics       []GPUMetric     `json:"gpu_metrics,omitempty"`
    IdleStatus       bool            `json:"idle_status"` // true if system is idle
    IdleReason       string          `json:"idle_reason,omitempty"`
}

// GPUMetric represents metrics for a single GPU
type GPUMetric struct {
    Type        string  `json:"type"` // "NVIDIA", "AMD", etc.
    ID          int     `json:"id"`
    Name        string  `json:"name"`
    Utilization float64 `json:"utilization"`
    MemoryUsed  uint64  `json:"memory_used"`
    MemoryTotal uint64  `json:"memory_total"`
    Temperature float64 `json:"temperature,omitempty"`
}

Snooze Event Model

When an instance is snoozed (stopped), an event is recorded:

// SnoozeEvent represents a stopping action
type SnoozeEvent struct {
    Timestamp     time.Time         `json:"timestamp"`
    InstanceID    string            `json:"instance_id"`
    InstanceType  string            `json:"instance_type"`
    Region        string            `json:"region"`
    Reason        string            `json:"reason"`
    Metrics       SystemMetrics     `json:"metrics"`
    Tags          map[string]string `json:"tags,omitempty"`
    NaptimeMins   int               `json:"naptime_mins"`
}

AWS Tags Model

When tagging is enabled, the following tags are applied to instances:

CloudSnooze:State      = "Snoozed"
CloudSnooze:Timestamp  = "2025-04-19T15:30:45Z"
CloudSnooze:Reason     = "CPU usage 2.3% below threshold 10.0%; Memory usage 15.7% below threshold 30.0%"
CloudSnooze:Version    = "1.0.0"

The prefix (CloudSnooze:) is configurable in settings.

Socket API Communication Model

The Unix socket API used for communication between components uses JSON messages:

Request Format

{
  "command": "STATUS|CONFIG_GET|CONFIG_SET|HISTORY|START|STOP|RESTART|SIMULATE",
  "params": {
    "key1": "value1",
    "key2": "value2"
  }
}

Response Format

{
  "status": "success|error",
  "data": {},
  "error": "Error message if status is error"
}

Example commands:

  1. STATUS - Get current system status and metrics
  2. CONFIG_GET - Retrieve current configuration
  3. CONFIG_SET param value - Update a configuration parameter
  4. HISTORY - Get snooze history
  5. START|STOP|RESTART - Control the daemon
  6. SIMULATE - Run a simulation with specified metrics

History Storage Model

Snooze history is stored in a local SQLite database (in advanced mode) or as JSON files:

// HistoryEntry in the database
type HistoryEntry struct {
    ID          int64     `json:"id"`
    Timestamp   time.Time `json:"timestamp"`
    InstanceID  string    `json:"instance_id"`
    Reason      string    `json:"reason"`
    MetricsJSON string    `json:"metrics_json"` // JSON serialized metrics
}

In basic mode, a simple file-based history is maintained at /var/lib/snooze/history.json:

{
  "events": [
    {
      "timestamp": "2025-04-19T14:20:30Z",
      "instance_id": "i-1234567890abcdef0",
      "reason": "CPU usage 2.3% below threshold 10.0%",
      "naptime_mins": 30
    },
    {
      "timestamp": "2025-04-18T23:15:10Z",
      "instance_id": "i-1234567890abcdef0",
      "reason": "All metrics below thresholds",
      "naptime_mins": 30
    }
  ]
}