CloudSnooze Restart Logic for External Tools
This document provides detailed guidance on implementing restart logic for instances that have been stopped by CloudSnooze.
New Restart Capability
CloudSnooze now supports explicit restart authorization for external tools through additional tags. When an instance is stopped, CloudSnooze can be configured to tag it with a RestartAllowed
flag and optionally specify which external service IDs are allowed to perform restarts.
Overview
While CloudSnooze focuses on stopping idle instances, external tools like provisioners may need to implement logic to restart these instances when:
- Users need to access the instance
- Scheduled jobs need to run
- System maintenance requires the instance to be online
Implementation Patterns
1. On-Demand Restart
The simplest pattern where instances are restarted when explicitly requested:
User/Service Request → Check if CloudSnooze-stopped → Verify restart authorization → Restart → Update Tags
Example Implementation (AWS)
import boto3
def restart_cloudsnooze_instance(instance_id, service_id, tag_prefix='CloudSnooze'):
ec2 = boto3.client('ec2')
# Check if instance was stopped by CloudSnooze
response = ec2.describe_tags(
Filters=[
{'Name': 'resource-id', 'Values': [instance_id]},
{'Name': 'key', 'Values': [f'{tag_prefix}:Status']},
{'Name': 'value', 'Values': ['Stopped']}
]
)
if not response['Tags']:
return False, "Instance not stopped by CloudSnooze"
# Check if restart is allowed
restart_allowed_response = ec2.describe_tags(
Filters=[
{'Name': 'resource-id', 'Values': [instance_id]},
{'Name': 'key', 'Values': [f'{tag_prefix}:RestartAllowed']},
{'Name': 'value', 'Values': ['true']}
]
)
if not restart_allowed_response['Tags']:
return False, "Restart not allowed for this instance"
# Check if specific restarters are defined
allowed_restarters_response = ec2.describe_tags(
Filters=[
{'Name': 'resource-id', 'Values': [instance_id]},
{'Name': 'key', 'Values': [f'{tag_prefix}:AllowedRestarters']}
]
)
# If specific restarters are defined, check if this service is allowed
if allowed_restarters_response['Tags']:
allowed_restarters = allowed_restarters_response['Tags'][0]['Value'].split(',')
if service_id not in allowed_restarters:
return False, f"Service {service_id} not authorized to restart this instance"
# Start the instance
try:
ec2.start_instances(InstanceIds=[instance_id])
# Update tags
ec2.create_tags(
Resources=[instance_id],
Tags=[
{'Key': f'{tag_prefix}:Status', 'Value': 'Running'},
{'Key': f'{tag_prefix}:RestartTimestamp', 'Value': datetime.now().isoformat()},
{'Key': f'{tag_prefix}:RestartReason', 'Value': 'User requested restart'},
{'Key': f'{tag_prefix}:RestartedBy', 'Value': service_id}
]
)
return True, "Instance restarted successfully"
except Exception as e:
return False, f"Error restarting instance: {str(e)}"
2. Scheduled Restart
For instances that need to run scheduled jobs:
Scheduled Event → Find Matching Stopped Instances → Restart → Run Job → Allow to Stop Again
Example Implementation
def schedule_instance_restart(schedule_expression, instance_tags, tag_prefix='CloudSnooze'):
ec2 = boto3.client('ec2')
# Find instances matching tags that were stopped by CloudSnooze
response = ec2.describe_instances(
Filters=[
{'Name': 'tag:YourScheduleTag', 'Values': [schedule_expression]},
{'Name': f'tag:{tag_prefix}:Status', 'Values': ['Stopped']},
{'Name': 'instance-state-name', 'Values': ['stopped']}
]
)
restarted_instances = []
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Restart the instance
ec2.start_instances(InstanceIds=[instance_id])
# Update tags
ec2.create_tags(
Resources=[instance_id],
Tags=[
{'Key': f'{tag_prefix}:Status', 'Value': 'Running'},
{'Key': f'{tag_prefix}:RestartTimestamp', 'Value': datetime.now().isoformat()},
{'Key': f'{tag_prefix}:RestartReason', 'Value': f'Scheduled event: {schedule_expression}'}
]
)
restarted_instances.append(instance_id)
return restarted_instances
3. Predictive Restart
A more sophisticated approach that predicts when users will need instances:
User Activity Data → Predict Usage Pattern → Preemptively Restart → Update Tags
Factors to Consider
- Historical usage patterns
- Time of day/week
- User login activity from other services
- Calendar/meeting data
Coordination with CloudSnooze
To ensure proper coordination with CloudSnooze, external tools should:
Set Expected Usage Tag: When restarting an instance, set a tag indicating when the instance might become idle again.
Modify Status Tag: Set
CloudSnooze:Status
toRunning
when restarting.Add Context Tags: Include information about why and when the instance was restarted.
Tag Schema for Restarts
Tags Set by CloudSnooze
Tag | Description | Example |
---|---|---|
CloudSnooze:Status | Current status | Stopped |
CloudSnooze:StopTimestamp | When the instance was stopped | 2023-04-19T15:30:00Z |
CloudSnooze:StopReason | Why the instance was stopped | System idle for 30 minutes |
CloudSnooze:RestartAllowed | Whether external tools can restart | true |
CloudSnooze:AllowedRestarters | Comma-separated list of service IDs allowed to restart | UserPortal,JobScheduler |
Tags Set by External Tools
Tag | Description | Example |
---|---|---|
CloudSnooze:Status | Current status (updated) | Running |
CloudSnooze:RestartTimestamp | When the instance was restarted | 2023-04-19T15:30:00Z |
CloudSnooze:RestartReason | Why the instance was restarted | User login or Scheduled job |
CloudSnooze:ExpectedUsageDuration | How long the instance is expected to be needed | 120 (minutes) |
CloudSnooze:RestartedBy | Service that restarted the instance | UserPortal or JobScheduler |
State Machine
The complete lifecycle of an instance with CloudSnooze and an external restart tool:
Running → Idle → Stopped by CloudSnooze → Restarted by External Tool → Running → ...
State Transitions
- Running to Idle: CloudSnooze detects inactivity below thresholds
- Idle to Stopped: CloudSnooze stops the instance after naptime
- Stopped to Restarting: External tool initiates restart
- Restarting to Running: Instance becomes available
- Running to Monitored: CloudSnooze resumes monitoring
Best Practices
Respect Authorization Boundaries:
- Only attempt to restart instances where
RestartAllowed
is set totrue
- Verify your service ID is in the
AllowedRestarters
list if specified - Log authorization failures for security auditing
- Only attempt to restart instances where
Respect Idle Detection:
- Don’t disable CloudSnooze when restarting instances
- Allow the natural idle detection to work
Throttle Restarts:
- Implement cooldown periods to prevent rapid stop/start cycles
- Consider minimum runtime enforcements
Track Effectiveness:
- Log when an instance is restarted
- Track how long it remains active
- Analyze if the restart was necessary
User Communication:
- Inform users when an instance is restarted
- Provide context about when it might be stopped again
Optimize Cold Start:
- For instances that take time to become fully useful after restart
- Consider warming caches or preloading data
Example Architecture
For a complete solution, consider:
Central Management Service:
- Maintains state of all CloudSnooze-managed instances
- Coordinates restart operations
User Portal Integration:
- Allows users to see stopped instances
- Provides one-click restart capability
Scheduler Integration:
- Ensures instances are running for scheduled jobs
- Allows jobs to complete before instances idle out
Monitoring Integration:
- Tracks stop/restart patterns
- Identifies opportunities for optimization
Performance Considerations
Cold Start Time:
- Account for the time needed for instances to fully restart
- For time-sensitive operations, restart in advance
Resource Bursting:
- Be aware that restarting many instances simultaneously can cause resource contention
- Consider staggered restarts for large fleets
Cost Implications:
- Balance between the cost savings of stopping and the overhead of restarting
- Some instances may be better left running if restart frequency is high