AMI Deployment Strategy - Research Environment Snapshots

AMI Deployment Strategy - Research Environment Snapshots

Overview

Enable users to create, share, and deploy pre-configured research environments as Amazon Machine Images (AMIs), eliminating the 20-30 minute setup time and providing guaranteed consistency across deployments.

Problem Statement

Current deployment process requires:

  • 20-30 minutes for package installation (54 packages in genomics)
  • Complex dependency resolution with potential failures
  • Network connectivity for all package downloads
  • Repeated compilation of identical software stacks

Solution: AMI-Based Deployment

Core Features

1. AMI Creation & Management

# Create AMI from successful deployment
aws-research-wizard deploy save-ami --stack genomics-env --name "genomics-v1.0" --description "Genomics pack with 54 bioinformatics tools"

# Deploy from existing AMI
aws-research-wizard deploy --from-ami ami-12345678 --instance r5.large

# List available AMIs
aws-research-wizard ami list --domain genomics

2. AMI Sharing & Distribution

# Share AMI with specific AWS account
aws-research-wizard ami share --ami-id ami-12345678 --account-id 123456789012

# Make AMI public (for community sharing)
aws-research-wizard ami publish --ami-id ami-12345678 --public

# Copy AMI to multiple regions
aws-research-wizard ami distribute --ami-id ami-12345678 --regions us-west-2,us-east-1,eu-west-1

3. Community AMI Registry

# Submit AMI to community registry
aws-research-wizard ami submit --ami-id ami-12345678 --registry community

# Search community AMIs
aws-research-wizard ami search --domain genomics --tags "covid-19,variant-calling"

# Install from community registry
aws-research-wizard deploy --from-registry community/genomics-covid-v2.1

Technical Implementation

AMI Creation Pipeline

1. Post-Deployment AMI Creation

# Extension to terraform/environments/aws/main.tf
resource "aws_ami_from_instance" "research_ami" {
  count               = var.create_ami ? 1 : 0
  name                = "${var.domain_name}-${var.ami_version}-${timestamp()}"
  source_instance_id  = aws_instance.research_node.id

  # Wait for setup completion
  depends_on = [null_resource.setup_complete]

  tags = {
    DomainPack = var.domain_name
    Version    = var.ami_version
    Created    = timestamp()
    PackageCount = length(var.spack_packages)
  }
}

2. AMI Optimization Script

#!/bin/bash
# ami-cleanup.sh - Prepare instance for AMI creation

# Remove sensitive data
rm -rf /home/ec2-user/.ssh/*
rm -rf /root/.ssh/*
rm -f /var/log/research-setup.log

# Clean package caches
rm -rf /opt/spack/var/spack/cache/*
conda clean -a -y
yum clean all

# Remove bash history
rm -f /home/ec2-user/.bash_history
rm -f /root/.bash_history

# Clear system logs
truncate -s 0 /var/log/messages
truncate -s 0 /var/log/secure

# Create AMI metadata
cat > /home/ec2-user/ami-info.json << EOF
{
  "domain": "${domain_name}",
  "version": "${ami_version}",
  "packages": ${package_count},
  "created": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "instance_type": "${instance_type}",
  "region": "${aws_region}"
}
EOF

AMI Registry & Sharing

1. Community AMI Registry

// internal/ami/registry.go
type AMIRegistry struct {
    PublicAMIs    map[string]AMIMetadata
    CommunityAMIs map[string]AMIMetadata
    UserAMIs      map[string]AMIMetadata
}

type AMIMetadata struct {
    AMIID          string            `json:"ami_id"`
    Name           string            `json:"name"`
    Domain         string            `json:"domain"`
    Version        string            `json:"version"`
    Description    string            `json:"description"`
    Tags           []string          `json:"tags"`
    PackageCount   int               `json:"package_count"`
    InstanceTypes  []string          `json:"supported_instances"`
    Regions        []string          `json:"available_regions"`
    CreatedBy      string            `json:"created_by"`
    CreatedAt      time.Time         `json:"created_at"`
    Downloads      int               `json:"downloads"`
    Rating         float64           `json:"rating"`
    TestResults    TestResults       `json:"test_results"`
}

2. AMI Sharing Infrastructure

# AMI sharing configuration
sharing_config:
  auto_share_regions:
    - us-west-2
    - us-east-1
    - eu-west-1
    - ap-northeast-1

  community_registry:
    enabled: true
    approval_required: true
    moderation_rules:
      - security_scan: true
      - package_verification: true
      - performance_benchmark: true

  public_sharing:
    enabled: true
    requires_approval: true
    cost_optimization: true

Integration with Existing System

1. Extended Terraform Configuration

# terraform/environments/aws/variables.tf
variable "create_ami" {
  description = "Create AMI after successful deployment"
  type        = bool
  default     = false
}

variable "ami_version" {
  description = "Version tag for created AMI"
  type        = string
  default     = "1.0"
}

variable "source_ami_id" {
  description = "Use existing AMI instead of building from scratch"
  type        = string
  default     = ""
}

2. CLI Command Extensions

// internal/commands/ami/ami.go
func NewAMICommand() *cobra.Command {
    cmd := &cobra.Command{
        Use:   "ami",
        Short: "Manage research environment AMIs",
        Long:  "Create, share, and deploy AMIs for research environments",
    }

    cmd.AddCommand(
        createCreateCommand(),
        createListCommand(),
        createShareCommand(),
        createPublishCommand(),
        createSearchCommand(),
        createDeployCommand(),
    )

    return cmd
}

AMI Sharing Ecosystem

  • Curated AMIs - Officially maintained and tested
  • Version management - Clear versioning and changelog
  • Multi-region availability - Distributed for global access
  • Security updates - Regular patching and updates

2. Community AMI Marketplace

  • User-contributed AMIs - Community-created specialized environments
  • Rating system - User feedback and quality metrics
  • Tagging system - Searchable by research area, tools, datasets
  • Usage analytics - Download counts, performance metrics

3. Educational AMI Library

  • Workshop AMIs - Pre-configured for tutorials and classes
  • Course-specific - Tailored for specific academic programs
  • Reproducible research - Exact environments for paper reproduction
  • Collaborative projects - Shared environments for team research

Security & Compliance

1. AMI Security Scanning

# Automated security scan before sharing
aws-research-wizard ami scan --ami-id ami-12345678
# - Vulnerability assessment
# - Malware detection
# - Compliance checking
# - License validation

2. Access Control

# AMI access control policy
access_control:
  private_amis:
    - owner_only: true
    - explicit_sharing: true

  community_amis:
    - moderation_required: true
    - security_scan: true
    - performance_validation: true

  public_amis:
    - approval_workflow: true
    - regular_security_updates: true
    - usage_monitoring: true

Cost Management

1. AMI Storage Optimization

  • Incremental snapshots - Only store changes between versions
  • Automated cleanup - Remove old AMIs based on policy
  • Regional optimization - Store AMIs in cost-effective regions
  • Compression - Optimize AMI size for storage costs

2. Cost Sharing Model

# Cost sharing configuration
cost_model:
  official_amis:
    - funded_by: aws_research_wizard_project
    - free_to_users: true

  community_amis:
    - creator_pays: snapshot_storage
    - users_pay: ec2_instance_costs

  enterprise_amis:
    - subscription_model: true
    - premium_support: true

Implementation Roadmap

Phase 1: Core AMI Creation (Q1 2025)

  • Add AMI creation to deployment pipeline
  • Implement AMI cleanup and optimization
  • Basic AMI metadata and tagging
  • CLI commands for AMI management

Phase 2: AMI Sharing (Q2 2025)

  • Account-to-account AMI sharing
  • Multi-region AMI distribution
  • AMI registry infrastructure
  • Community AMI submission workflow

Phase 3: AMI Marketplace (Q3 2025)

  • Public AMI gallery
  • Search and discovery features
  • Rating and review system
  • Usage analytics and metrics

Phase 4: Advanced Features (Q4 2025)

  • Automated AMI updates
  • Security scanning integration
  • Performance benchmarking
  • Cost optimization tools

Configuration Examples

1. Domain Pack AMI Configuration

# configs/ami/genomics.yaml
ami_config:
  domain: genomics
  base_ami: ami-0c02fb55956c7d316  # Amazon Linux 2
  create_ami: true

  optimization:
    cleanup_build_artifacts: true
    remove_package_caches: true
    compress_logs: true

  sharing:
    auto_share_regions: ["us-west-2", "us-east-1"]
    community_registry: true
    public_sharing: false

  metadata:
    tags: ["genomics", "bioinformatics", "variant-calling"]
    description: "Complete genomics environment with 54 tools"
    supported_instances: ["r5.large", "r5.xlarge", "r5.2xlarge"]

2. User AMI Creation Workflow

# 1. Deploy environment normally
aws-research-wizard deploy --domain genomics --instance r5.large

# 2. Customize environment (install additional tools, data, etc.)
ssh -i ~/.ssh/id_rsa ec2-user@instance-ip
# ... customization work ...

# 3. Create AMI from customized environment
aws-research-wizard ami create --stack genomics-env --name "my-genomics-env" --tags "covid-19,custom"

# 4. Share with collaborators
aws-research-wizard ami share --ami-id ami-12345678 --account-id 123456789012

# 5. Submit to community registry
aws-research-wizard ami submit --ami-id ami-12345678 --registry community

Success Metrics

1. Performance Metrics

  • Deployment time reduction: 20-30 minutes → 2-3 minutes
  • Success rate improvement: 85% → 99%
  • Cost reduction: 40% less compute time for setup

2. Adoption Metrics

  • AMI creation rate: Track user-created AMIs
  • Community sharing: Number of shared AMIs
  • Deployment preference: AMI vs. package installation ratio

3. Quality Metrics

  • AMI ratings: Community feedback scores
  • Performance benchmarks: Standardized testing results
  • Security compliance: Clean security scans

This AMI strategy transforms AWS Research Wizard from a deployment tool into a comprehensive research environment ecosystem, enabling rapid collaboration and reproducible research at scale.