Optimize the cost and performance of your Amazon ECS clusters running on EC2 instances by rightsizing the underlying EC2 instances within their Auto Scaling Groups (ASGs). CloudFix analyzes resource utilization (CPU, Memory) across the instances managed by an ECS Capacity Provider or leverages AWS Compute Optimizer recommendations. Based on this analysis, CloudFix identifies opportunities to change the instance type for the entire ASG to better match workload requirements, potentially recommending memory-optimized or compute-optimized instance families.

Manual Fix Required

CloudFix identifies the rightsizing opportunity but requires manual action to implement the change. Modifying the instance type in an Auto Scaling Group’s launch configuration or launch template affects all future instances launched by the ASG and often requires careful planning to manage instance replacement and potential downtime. Users must manually update the ASG configuration.

Contents

Overview

Problem Statement

EC2 instances backing ECS clusters are often configured with a single instance type within an Auto Scaling Group managed by a Capacity Provider. If the chosen instance type is not well-matched to the actual resource needs (CPU vs. Memory) of the containerized workloads, it can lead to inefficient resource utilization and unnecessary costs across the entire cluster.

Solution Identification

CloudFix analyzes the average CPU and memory utilization across all EC2 instances within an ASG associated with an ECS Capacity Provider. Alternatively, for clusters where applicable, it ingests recommendations from AWS Compute Optimizer. By comparing the dominant resource constraint (CPU or Memory) with the characteristics of the current instance type, CloudFix identifies situations where switching the entire ASG to a different instance family (e.g., from general-purpose to memory-optimized or compute-optimized) could lead to better cost-efficiency based on On-Demand pricing differences. Spot instance retyping is excluded.

AWS Services Affected

Service Icon
Amazon ECS
Amazon ECS
Amazon EC2
Amazon EC2
ECS Capacity Providers
ECS Capacity Providers
EC2 Auto Scaling
EC2 Auto Scaling
AWS Compute Optimizer
AWS Compute Optimizer

How CloudFix Identifies the Opportunity

CloudFix identifies potential ECS EC2 retyping opportunities based on the following:

  • Detects EC2 instances belonging to an ECS cluster’s Capacity Provider ASG using CUR data and ECS API calls (DescribeContainerInstances).
  • Analyzes average CPU and Memory utilization across the instances in the ASG.
  • Alternatively, uses AWS Compute Optimizer recommendations if available for the instances.
  • Recommends changing the instance type in the ASG’s launch configuration/template if a different instance family (e.g., memory-optimized, compute-optimized) better suits the observed average utilization pattern and offers potential On-Demand cost savings.
  • Excludes Spot instances from retyping recommendations.
  • Estimates potential savings based on the reduction in On-Demand costs.

Manual Fix Steps

After CloudFix identifies an ECS EC2 retyping opportunity:

  1. Review Recommendation: Examine the identified ASG, the current instance type, the recommended instance type, and the supporting utilization data (average CPU/Memory) or Compute Optimizer findings provided by CloudFix.
  2. Validate Recommendation: Assess if the recommended instance type aligns with your application’s performance needs. Consider peak loads, specific task requirements, and potential impacts of changing instance families (e.g., network performance, storage options).
  3. Update ASG Configuration: Modify the Launch Configuration or Launch Template associated with the Auto Scaling Group. Change the `InstanceType` parameter to the recommended type.
  4. Implement Instance Refresh/Replacement: Changing the ASG configuration only affects *new* instances. To apply the change to existing instances, you need to either:
    • Use EC2 Auto Scaling Instance Refresh to perform a rolling replacement of old instances with new ones based on the updated configuration.
    • Manually terminate old instances (allowing the ASG to launch replacements with the new type) in a controlled manner, potentially using ECS container instance draining to gracefully stop tasks first.
  5. Monitor Cluster: After the instances have been replaced, monitor ECS service performance, task placement, and CloudWatch metrics for the new instances to ensure the cluster operates as expected.

FAQ

Q: Why is this a manual fix?
A: Modifying an ASG’s instance type and replacing the running instances is an operational change that can impact cluster capacity and potentially cause downtime if not managed carefully. User intervention is required for planning and execution.

Q: Does this finder recommend changing instance *size* (e.g., m5.large to m5.xlarge)?
A: This specific finder focuses on changing the instance *family* or *type* (e.g., m5.large to r5.large) based on CPU/Memory balance, not necessarily changing the size. Other rightsizing finders might recommend size changes.

Q: What happens if I use Spot instances in my ASG?
A: CloudFix explicitly excludes recommending retyping for Spot instances or switching On-Demand to Spot via this finder.

Q: Is downtime required?
A: Yes, replacing the EC2 instances within the ASG typically requires terminating old instances and launching new ones. While instance refresh and instance draining can minimize disruption, careful planning is needed to avoid impacting service availability.