Ai Research

Ai Research

Deep learning and AI research with GPU optimization

Overview

Domain Pack: ai-research Version: 1.0.0 Categories: computer-science Maintainers: AWS Research Wizard Team

Quick Start

# Deploy this domain pack
aws-research-wizard deploy --domain ai-research --size medium

# Get detailed information
aws-research-wizard config info ai-research

# List available workflows
aws-research-wizard workflow list --domain ai-research

Software Stack

Core Packages

  • python: python@3.11.0 %gcc@11.4.0 +ssl+zlib
  • py-torch: py-torch@2.0.1 %gcc@11.4.0 +cuda+nccl
  • py-tensorflow: py-tensorflow@2.13.0 %gcc@11.4.0 +cuda
  • py-jax: py-jax@0.4.13 %gcc@11.4.0 +cuda
  • py-numpy: py-numpy@1.24.3 %gcc@11.4.0 +blas+lapack
  • py-scipy: py-scipy@1.11.1 %gcc@11.4.0
  • py-pandas: py-pandas@2.0.3 %gcc@11.4.0
  • py-scikit-learn: py-scikit-learn@1.3.0 %gcc@11.4.0
  • py-matplotlib: py-matplotlib@3.7.1 %gcc@11.4.0
  • py-jupyter: py-jupyter@1.0.0 %gcc@11.4.0

And 3 more packages…

Optimization Settings

  • Compiler: gcc@11.4.0
  • Target Architecture: x86_64_v3
  • Optimization Flags: -O3 -march=native

AWS Infrastructure

Instance Types

  • Small: g5.xlarge
  • Medium: g5.4xlarge
  • Large: p4d.24xlarge

Storage Configuration

  • Type: gp3
  • Size: 1000 GB
  • IOPS: 16000
  • Throughput: 1000 MB/s

Research Workflows

This domain pack includes 2 pre-configured research workflows:

Distributed Training

Multi-GPU distributed deep learning training

# Run this workflow
aws-research-wizard workflow run distributed_training --domain ai-research
  • Input Data: s3://aws-research-data/imagenet/
  • Expected Output: Trained model checkpoints

Hyperparameter Tuning

Large-scale hyperparameter optimization

# Run this workflow
aws-research-wizard workflow run hyperparameter_tuning --domain ai-research
  • Input Data: s3://aws-research-data/ml-datasets/
  • Expected Output: Optimal hyperparameter configurations

Cost Estimates

Workload Size Estimated Daily Cost
Small $25-50/day
Medium $40-120/day
Large $200-800/day

!!! note “Cost Optimization” These estimates assume on-demand pricing. Significant savings are possible with:

- **Spot Instances**: 70-90% savings for fault-tolerant workloads
- **Reserved Instances**: 30-60% savings for predictable usage
- **Savings Plans**: 20-72% savings with flexible commitment

Example Configuration

# ai-research-research-config.yaml
domain: ai-research
size: medium
aws:
  region: us-east-1
  availability_zone: us-east-1a
compute:
  instance_type: g5.4xlarge
  instance_count: 1
storage:
  type: gp3
  size_gb: 1000

Getting Help

Other domain packs in Computer Science: