Category: Compute

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

AWS announces the new capacity-optimized allocation strategy for Amazon EC2 Auto Scaling and EC2 Fleet. This new strategy automatically makes the most efficient use of spare capacity while still taking advantage of the steep discounts offered by Spot Instances. It’s a new way for you to gain easy access to extra EC2 compute capacity in the AWS Cloud.

This post compares how the capacity-optimized allocation strategy deploys capacity compared to the current lowest-price allocation strategy.


Spot Instances are spare EC2 compute capacity in the AWS Cloud available to you at savings of up to 90% off compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back.

When making requests for Spot Instances, customers can take advantage of allocation strategies within services such as EC2 Auto Scaling and EC2 Fleet. The allocation strategy determines how the Spot portion of your request is fulfilled from the possible Spot Instance pools you provide in the configuration.

The existing allocation strategy available in EC2 Auto Scaling and EC2 Fleet is called “lowest-price” (with an option to diversify across N pools). This strategy allocates capacity strictly based on the lowest-priced Spot Instance pool or pools. The “diversified” allocation strategy (available in EC2 Fleet but not in EC2 Auto Scaling) spreads your Spot Instances across all the Spot Instance pools you’ve specified as evenly as possible.

As the AWS global infrastructure has grown over time in terms of geographic Regions and Availability Zones as well as the raw number of EC2 Instance families and types, so has the amount of spare EC2 capacity. Therefore it is important that customers have access to tools to help them utilize spare EC2 capacity optimally. The new capacity-optimized strategy for both EC2 Auto Scaling and EC2 Fleet provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics.


To illustrate how the capacity-optimized allocation strategy deploys capacity compared to the existing lowest-price allocation strategy, here are examples of Auto Scaling group configurations and use cases for each strategy.

Lowest-price (diversified over N pools) allocation strategy

The lowest-price allocation strategy deploys Spot Instances from the pools with the lowest price in each Availability Zone. This strategy has an optional modifier SpotInstancePools that provides the ability to diversify over the N lowest-priced pools in each Availability Zone.

Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The lowest-price strategy does not account for pool capacity depth as it deploys Spot Instances.

As a result, the lowest-price allocation strategy is a good choice for workloads with a low cost of interruption that want the lowest possible prices, such as:

  • Time-insensitive workloads
  • Extremely transient workloads
  • Workloads that are easily check-pointed and restarted


The following example configuration shows how capacity could be allocated in an Auto Scaling group using the lowest-price allocation strategy diversified over two pools:

  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      "Overrides": [
          "InstanceType": "c3.large"
          "InstanceType": "c4.large"
          "InstanceType": "c5.large"
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "lowest-price",
      "SpotInstancePools": 2
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to lowest-price over two SpotInstancePools.

First, EC2 Auto Scaling tries to make sure that it balances the requested capacity across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the lowest-price allocation strategy allocates the Spot Instance launches to the lowest-priced pool per zone.

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1b (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1c (10 c3.large, 10 c4.large)
Availability Zone Instance type Spot Instances allocated Spot price
us-east-1a c3.large 10 $0.0294
us-east-1a c4.large 10 $0.0308
us-east-1a c5.large 0 $0.0408
us-east-1b c3.large 10 $0.0294
us-east-1b c4.large 10 $0.0308
us-east-1b c5.large 0 $0.0387
us-east-1c c3.large 10 $0.0294
us-east-1c c4.large 10 $0.0331
us-east-1c c5.large 0 $0.0353

The cost for this Auto Scaling group is $1.83/hour. Of course, the Spot Instances are allocated according to the lowest price and are not optimized for capacity. The Auto Scaling group could experience higher interruptions if the lowest-priced Spot Instance pools are not as deep as others, since upon interruption the Auto Scaling group will attempt to re-provision instances into the lowest-priced Spot Instance pools.

Capacity-optimized allocation strategy

There is a price associated with interruptions, restarting work, and checkpointing. While the overall hourly cost of capacity-optimized allocation strategy might be slightly higher, the possibility of having fewer interruptions can lower the overall cost of your workload.

The effectiveness of the capacity-optimized allocation strategy depends on following Spot best practices by being flexible and providing as many instance types and Availability Zones (Spot Instance pools) as possible in the configuration. It is also important to understand that as capacity demands change, the allocations provided by this strategy also change over time.

Remember that Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The capacity-optimized strategy does account for pool capacity depth as it deploys Spot Instances, but it does not account for Spot prices.

As a result, the capacity-optimized allocation strategy is a good choice for workloads with a high cost of interruption, such as:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing


The following example configuration shows how capacity could be allocated in an Auto Scaling group using the capacity-optimized allocation strategy:

  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      "Overrides": [
          "InstanceType": "c3.large"
          "InstanceType": "c4.large"
          "InstanceType": "c5.large"
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "capacity-optimized"
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices (especially critical when using the capacity-optimized allocation strategy) and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to capacity-optimized.

First, EC2 Auto Scaling tries to make sure that the requested capacity is evenly balanced across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the capacity-optimized allocation strategy optimizes the Spot Instance launches by analyzing capacity metrics per instance type per zone. This is because this strategy effectively optimizes by capacity instead of by the lowest price (hence its name).

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (20 c4.large)
  • 20 Spot Instances from us-east-1b (20 c3.large)
  • 20 Spot Instances from us-east-1c (20 c5.large)
Availability Zone Instance type Spot Instances allocated Spot price
us-east-1a c3.large 0 $0.0294
us-east-1a c4.large 20 $0.0308
us-east-1a c5.large 0 $0.0408
us-east-1b c3.large 20 $0.0294
us-east-1b c4.large 0 $0.0308
us-east-1b c5.large 0 $0.0387
us-east-1c c3.large 0 $0.0294
us-east-1c c4.large 0 $0.0308
us-east-1c c5.large 20 $0.0353

The cost for this Auto Scaling group is $1.91/hour, only 5% more than the lowest-priced example above. However, notice the distribution of the Spot Instances is different. This is because the capacity-optimized allocation strategy determined this was the most efficient distribution from an available capacity perspective.


Consider using the new capacity-optimized allocation strategy to make the most efficient use of spare capacity. Automatically deploy into the most available Spot Instance pools—while still taking advantage of the steep discounts provided by Spot Instances.

This allocation strategy may be especially useful for workloads with a high cost of interruption, including:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

No matter which allocation strategy you choose, you still enjoy the steep discounts provided by Spot Instances. These discounts are possible thanks to the stable Spot pricing made available with the new Spot pricing model.

Chad Schmutzer is a Principal Developer Advocate for the EC2 Spot team. Follow him on twitter to get the latest updates on saving at scale with Spot Instances, to provide feedback, or just say HI.

from AWS Compute Blog

ICYMI: Serverless Q2 2019

ICYMI: Serverless Q2 2019

This post is courtesy of Moheeb Zara, Senior Developer Advocate – AWS Serverless

Welcome to the sixth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

April - June 2019

Amazon EventBridge

Before we dive in to all that happened in Q2, we’re excited about this quarter’s launch of Amazon EventBridge, the serverless event bus that connects application data from your own apps, SaaS, and AWS-as-a-service. This allows you to create powerful event-driven serverless applications using a variety of event sources.

Our very own AWS Solutions Architect, Mike Deck, sat down with AWS Serverless Hero Jeremy Daly and recorded a podcast on Amazon EventBridge. It’s a worthy listen if you’re interested in exploring all the features offered by this launch.

Now, back to Q2, here’s what’s new.

AWS Lambda

Lambda Monitoring

Amazon CloudWatch Logs Insights now allows you to see statistics from recent invocations of your Lambda functions in the Lambda monitoring tab.

Additionally, as of June, you can monitor the [email protected] functions associated with your Amazon CloudFront distributions directly from your Amazon CloudFront console. This includes a revamped monitoring dashboard for CloudFront distributions and [email protected] functions.

AWS Step Functions

Step Functions

AWS Step Functions now supports workflow execution events, which help in the building and monitoring of even-driven serverless workflows. Automatic Execution event notifications can be delivered upon start/completion of CloudWatch Events/Amazon EventBridge. This allows services such as AWS Lambda, Amazon SNS, Amazon Kinesis, or AWS Step Functions to respond to these events.

Additionally you can use callback patterns to automate workflows for applications with human activities and custom integrations with third-party services. You create callback patterns in minutes with less code to write and maintain, run without servers and infrastructure to manage, and scale reliably.

Amazon API Gateway

API Gateway Tag Based Control

Amazon API Gateway now offers tag-based access control for WebSocket APIs using AWS Identity and Access Management (IAM) policies, allowing you to categorize API Gateway resources for WebSocket APIs by purpose, owner, or other criteria.  With the addition of tag-based access control to WebSocket resources, you can now give permissions to WebSocket resources at various levels by creating policies based on tags. For example, you can grant full access to admins to while limiting access to developers.

You can now enforce a minimum Transport Layer Security (TLS) version and cipher suites through a security policy for connecting to your Amazon API Gateway custom domain.

In addition, Amazon API Gateway now allows you to define VPC Endpoint policies, enabling you to specify which Private APIs a VPC Endpoint can connect to. This enables granular security control using VPC Endpoint policies.

AWS Amplify

Amplify CLI (part of the open source Amplify Framework) now includes support for adding and configuring AWS Lambda triggers for events when using Amazon Cognito, Amazon Simple Storage Service, and Amazon DynamoDB as event sources. This means you can setup custom authentication flows for mobile and web applications via the Amplify CLI and Amazon Cognito User Pool as an authentication provider.

Amplify Console

Amplify Console,  a Git-based workflow for continuous deployment and hosting for fullstack serverless web apps, launched several updates to the build service including SAM CLI and custom container support.

Amazon Kinesis

Amazon Kinesis Data Firehose can now utilize AWS PrivateLink to securely ingest data. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely over the Amazon network. When AWS PrivateLink is used with Amazon Kinesis Data Firehose, all traffic to a Kinesis Data Firehose from a VPC flows over a private connection.

You can now assign AWS resource tags to applications in Amazon Kinesis Data Analytics. These key/value tags can be used to organize and identify resources, create cost allocation reports, and control access to resources within Amazon Kinesis Data Analytics.

Amazon Kinesis Data Firehose is now available in the AWS GovCloud (US-East), Europe (Stockholm), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and EU (London) regions.

For a complete list of where Amazon Kinesis Data Analytics is available, please see the AWS Region Table.

AWS Cloud9

Cloud9 Quick Starts

Amazon Web Services (AWS) Cloud9 integrated development environment (IDE) now has a Quick Start which deploys in the AWS cloud in about 30 minutes. This enables organizations to provide developers a powerful cloud-based IDE that can edit, run, and debug code in the browser and allow easy sharing and collaboration.

AWS Cloud9 is also now available in the EU (Frankfurt) and Asia Pacific (Tokyo) regions. For a current list of supported regions, see AWS Regions and Endpoints in the AWS documentation.

Amazon DynamoDB

You can now tag Amazon DynamoDB tables when you create them. Tags are labels you can attach to AWS resources to make them easier to manage, search, and filter.  Tagging support has also been extended to the AWS GovCloud (US) Regions.

DynamoDBMapper now supports Amazon DynamoDB transactional API calls. This support is included within the AWS SDK for Java. These transactional APIs provide developers atomic, consistent, isolated, and durable (ACID) operations to help ensure data correctness.

Amazon DynamoDB now applies adaptive capacity in real time in response to changing application traffic patterns, which helps you maintain uninterrupted performance indefinitely, even for imbalanced workloads.

AWS Training and Certification has launched Amazon DynamoDB: Building NoSQL Database–Driven Applications, a new self-paced, digital course available exclusively on edX.

Amazon Aurora

Amazon Aurora Serverless MySQL 5.6 can now be accessed using the built-in Data API enabling you to access Aurora Serverless with web services-based applications, including AWS LambdaAWS AppSync, and AWS Cloud9. For more check out this post.

Sharing snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly is now possible. We are also giving you the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

You can now set the minimum capacity of your Aurora Serverless DB clusters to 1 Aurora Capacity Unit (ACU). With Aurora Serverless, you specify the minimum and maximum ACUs for your Aurora Serverless DB cluster instead of provisioning and managing database instances. Each ACU is a combination of processing and memory capacity. By setting the minimum capacity to 1 ACU, you can keep your Aurora Serverless DB cluster running at a lower cost.

AWS Serverless Application Repository

The AWS Serverless Application Repository is now available in 17 regions with the addition of the AWS GovCloud (US-West) region.

Region support includes Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US West (N. California, Oregon), and US East (N. Virginia, Ohio).

Amazon Cognito

Amazon Cognito has launched a new API – AdminSetUserPassword – for the Cognito User Pool service that provides a way for administrators to set temporary or permanent passwords for their end users. This functionality is available for end users even when their verified phone or email are unavailable.

Serverless Posts





Events this quarter

Senior Developer Advocates for AWS Serverless spoke at several conferences this quarter. Here are some recordings worth watching!

Tech Talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the ones from Q2.


Twitch Series

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

Build with Serverless on Twitch

Serverless expert and AWS Specialist Solutions architect, Heitor Lessa, has been hosting a weekly Twitch series since April. Join him and others as they build an end-to-end airline booking solution using serverless. The final episode airs on August 7th at Wednesday 8:00am PT.

Here’s a recap of the last quarter:

AWS re:Invent

AWS re:Invent 2019

AWS re:Invent 2019 is around the corner! From December 2 – 6 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. Be sure to take a look at the growing catalog of serverless sessions this year.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft

AWS Serverless Heroes

We urge you to explore the efforts of our AWS Serverless Heroes Community. This is a worldwide network of AWS Serverless experts with a diverse background of experience. For example, check out this post from last month where Marcia Villalba demonstrates how to set up unit tests for serverless applications.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

from AWS Compute Blog

Why AWS is the best place for your Windows workloads, and how Microsoft is changing their licensing to try to awkwardly force you into Azure

Why AWS is the best place for your Windows workloads, and how Microsoft is changing their licensing to try to awkwardly force you into Azure

This post is contributed by Sandy Carter, Vice President at AWS. It is also located on LinkedIn

Many companies today are considering how to migrate to the cloud to take advantage of the agility and innovation that the cloud brings. Having the right to choose the best provider for your business is critical.

AWS is the best cloud for running your Windows workloads and our experience running Windows applications has earned our customers’ trust. It’s been more than 11 years since AWS first made it possible for customers to run their Windows workloads on AWS—longer than Azure has even been around, and according to a report by IDC, we host nearly two times as many Windows Server instances in the cloud as Microsoft. And more and more enterprises are entrusting their Windows workloads to AWS because of its greater reliability, higher performance, and lower cost, with the number of AWS enterprise customers using AWS for Windows Server growing more than 400% in the past three years.

In fact, we are seeing a trend of customers moving from Azure to AWS. eMarketer started their digital transformation with Azure, but found performance challenges and higher costs that led them to migrate all of their workloads over to AWS. Why did they migrate? They found a better experience, stronger support, higher availability, and better performance, with 4x faster launch times and 35% lower costs compared to Azure. Ancestry, a leader in consumer genomics, went all-in on development in the cloud moving 10 PB data and 400 Windows-based applications in less than 9 months. They also modernized to Linux with .NET Core and leveraged advanced technologies including serverless and containers. With results like that, you can see why organizations like Sysco, Edwards Life Sciences, Expedia, and NextGen Healthcare have chosen AWS to upgrade, migrate, and modernize their Windows workloads

If you are interested in seeing your cost savings over running on-premises or over running on Azure,  send us an email at [email protected] or visit why AWS is the best cloud for Windows.

from AWS Compute Blog

DevOps on AWS Radio: Automating AWS IoT (Episode 25)

DevOps on AWS Radio: Automating AWS IoT (Episode 25)

In this episode, we chat with Michael Neil a DevOps Automation Engineer here at Mphasis Stelligent about the  AWS IoT platform. AWS IoT consists of many products and services, it can be difficult to know where to start when piecing together each of the offerings to create an IoT solution. Paul Duvall and Michael Neil will give you an overview of the AWS IoT platform, guide you in how to get started with AWS IoT, teach you how to automate it, and walk through a use case using AWS IoT. Listen here:

DevOps on AWS News

Episode Topics

  1. Michael Neil Intro & Background 
  2. Overview of AWS IoT and AWS IoT Services
    1. Device software
      1. IoT Greengrass, IoT Device SDK
    2. Control services
      1. AWS IoT Core,  Device Defender, AWS IoT Things Graph
    3. Data services
      1. AWS IoT Analytics, AWS IoT Events
  3. Continuous Delivery with AWS IoT
    1. How is CD different when it comes to embedded devices and AWS IoT?
    2. How do you provision devices at the edge, MCUExpresso IDE?
    3. How to do CD w/ IoT via AWS CodePipeline and  AWS CodeBuild.
    4. How to do just-in-time provisioning, give it the right permissions.
  4. Bootstrapping Automation
    1. Bootstrapping process
    2. How started automating via the SDK
  5. Automating and provisioning  AWS IoT Services
    1. IoT Greengrass
    2. IoT Things
  6.  Integrations with other AWS Services 
    1. Amazon Simple Storage Service (Amazon S3)
    2. AWS Lambda
    3. Amazon Simple Queue Service (SQS)
    4. Amazon DynamoDB
    5. Amazon Kinesis Data Firehose
    6. Amazon QuickSight
  7. Amazon FreeRTOS
  8. Automobile Assembly Line Use Case 
    1. How might they employ AWS IoT?
    2. How to do Continuous Delivery?
    3. Machine Learning

Additional Resources


About DevOps on AWS Radio

On DevOps on AWS Radio, we cover topics around applying DevOps principles and practices such as Continuous Delivery on the Amazon Web Services cloud. This is what we do at Stelligent for our customers. We’ll bring listeners in and speak with engineers who’ve recently published on our blog and we’ll also be reaching out to the wider DevOps on AWS community to get their thoughts and insights.

The overall vision of this podcast is to describe how listeners can create a one-click (or “no click”) implementation of their software systems and infrastructure in the Amazon Web Services cloud so that teams can deliver software to users whenever there’s a business need to do so. The podcast will delve into the cultural, process, tooling, and organizational changes that can make this possible including:

  • Automation of
    • Networks (e.g. VPC)
    • Compute (EC2, Containers, Serverless, etc.)
    • Storage (e.g. S3, EBS, etc.)
    • Database and Data (RDS, DynamoDB, etc.)
  • Organizational and Team Structures and Practices
  • Team and Organization Communication and Collaboration
  • Cultural Indicators
  • Version control systems and processes
  • Deployment Pipelines
    • Orchestration of software delivery workflows
    • Execution of these workflows
  • Application/service Architectures – e.g. Microservices
  • Automation of Build and deployment processes
  • Automation of testing and other verification approaches, tools and systems
  • Automation of security practices and approaches
  • Continuous Feedback systems
  • Many other Topics…


The post DevOps on AWS Radio: Automating AWS IoT (Episode 25) appeared first on Stelligent.

from Blog – Stelligent

Optimizing Amazon ECS task density using awsvpc network mode

Optimizing Amazon ECS task density using awsvpc network mode

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS


AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.



To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.


Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.


Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.


Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.


Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
    "Role": {
        "Path": "/aws-service-role/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                    "Effect": "Allow",
                    "Principal": {
                        "Service": ""
                    "Action": "sts:AssumeRole"
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name

For more information, see Using Service-Linked Roles for Amazon ECS.


Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.


Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.



If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

from AWS Compute Blog

Using AWS App Mesh with Fargate

Using AWS App Mesh with Fargate

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS


AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.


Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.



After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.


AWS compute model of the Color App.


Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.


Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.


Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.


Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-appmesh-colorapp

Step 2: Deploy the green task to Fargate

The script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller


Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

    Type: AWS::AppMesh::Route
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
              - VirtualNode: colorteller-green-vn
                Weight: 1
            Prefix: "/"
$ ./
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-appmesh-colorapp

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.



In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

from AWS Compute Blog

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

This post is contributed by Robin Salkeld, Sr. Software Development Engineer

Amazon SQS is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. Asynchronous workflows have always been the primary use case for SQS. Using queues ensures one component can keep running smoothly without losing data when another component is unavailable or slow.

We were surprised, then, to discover that many customers use SQS in synchronous workflows. For example, many applications use queues to communicate between frontends and backends when processing a login request from a user.

Why would anyone use SQS for this? The service stores messages for up to 14 days with high durability, but messages in a synchronous workflow often must be processed within a few minutes, or even seconds. Why not just set up an HTTPS endpoint?

The more we talked to customers, the more we understood. Here’s what we learned:

  • Creating a queue is often easier and faster than creating an HTTPS endpoint and the infrastructure necessary to ensure the endpoint’s scalability.
  • Queues are safe by default because they are locked down to the AWS account that created them. In addition, any DDoS attempt on your service is absorbed by SQS instead of loading down your own servers.
  • There is generally no need to create firewall rules for the communication between microservices if they use queues.
  • Although SQS provides durable storage (which isn’t necessary for short-lived messages), it is still a cost-effective solution for this use case. This is especially true when you consider all the messaging broker maintenance that is done for you.

However, setting up efficient two-way communication through one-way queues requires some non-trivial client-side code. In our previous two-part post series on implementing enterprise integration patterns with AWS messaging services, Point-to-point channels and Publish-subscribe channels, we discussed the Request-Response Messaging Pattern. In this pattern, each requester creates a temporary destination to receive each response message.

The simplest approach is to create a new queue for each response, but this is like building a road just so a single car can drive on it before tearing it down. Technically, this can work (and SQS can create and delete queues quickly), but we can definitely make it faster and cheaper.

To better support short-lived, lightweight messaging destinations, we are pleased to present the Amazon SQS Temporary Queue Client. This client makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill.

Virtual queues

The key concept behind the client is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS and no costs associated with creating a virtual queue.

The Temporary Queue Client includes the AmazonSQSVirtualQueuesClient class for creating and managing virtual queues. This class implements the AmazonSQS interface and adds support for attributes related to virtual queues. You can create a virtual queue using this client by calling the CreateQueue API action and including the HostQueueURL queue attribute. This attribute specifies the existing SQS queue on which to host the virtual queue. The queue URL for a virtual queue is in the form <host queue URL>#<virtual queue name>. For example:

When you call the SendMessage or SendMessageBatch API actions on AmazonSQSVirtualQueuesClient with a virtual queue URL, the client first extracts the virtual queue name. It then attaches this name as an additional message attribute to each message, and sends the messages to the host queue. When you call the ReceiveMessage API action on a virtual queue, the calling thread waits for messages to appear in the in-memory buffer for the virtual queue. Meanwhile, a background thread polls the host queue and dispatches messages to these buffers according to the additional message attribute.

This mechanism is similar to how the AmazonSQSBufferedAsyncClient prefetches messages, and the benefits are similar. A single call to SQS can provide messages for up to 10 virtual queues, reducing the API calls that you pay for by up to a factor of ten. Deleting a virtual queue simply removes the client-side resources used to implement them, again without making API calls to SQS.

The diagram below illustrates the end-to-end process for sending messages through virtual queues:

Sending messages through virtual queues

Virtual queues are similar to virtual machines. Just as a virtual machine provides the same experience as a physical machine, a virtual queue divides the resources of a single SQS queue into smaller logical queues. This is ideal for temporary queues, since they frequently only receive a handful of messages in their lifetime. Virtual queues are currently implemented entirely within the Temporary Queue Client, but additional support and optimizations might be added to SQS itself in the future.

In most cases, you don’t have to manage virtual queues yourself. The library also includes the AmazonSQSTemporaryQueuesClient class. This class automatically creates virtual queues when the CreateQueue API action is called and creates host queues on demand for all queues with the same queue attributes. To optimize existing application code that creates and deletes queues, you can use this class as a drop-in replacement implementation of the AmazonSQS interface.

The client also includes the AmazonSQSRequester and AmazonSQSResponder interfaces, which enable two-way communication through SQS queues. The following is an example of an RPC implementation for a web application’s login process.

 * This class handles a user's login request on the client side.
public class LoginClient {

    // The SQS queue to send the requests to.
    private final String requestQueueUrl;

    // The AmazonSQSRequester creates a temporary queue for each response.
    private final AmazonSQSRequester sqsRequester = AmazonSQSRequesterClientBuilder.defaultClient();

    private final LoginClient(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;

     * Send a login request to the server.
    public String login(String body) throws TimeoutException {
        SendMessageRequest request = new SendMessageRequest()

        // This:
        //  - creates a temporary queue
        //  - attaches its URL as an attribute on the message
        //  - sends the message
        //  - receives the response from the temporary queue
        //  - deletes the temporary queue
        //  - returns the response
        // If something goes wrong and the server's response never shows up, this method throws a
        // TimeoutException.
        Message response = sqsRequester.sendMessageAndGetResponse(request, 20, TimeUnit.SECONDS);
        return response.getBody();

 * This class processes users' login requests on the server side.
public class LoginServer {

    // The SQS queue to poll for login requests.
    // Assume that on construction a thread is created to poll this queue and call
    // handleLoginRequest() below for each message.
    private final String requestQueueUrl;

    // The AmazonSQSResponder sends responses to the correct response destination.
    private final AmazonSQSResponder sqsResponder = AmazonSQSResponderClientBuilder.defaultClient();

    private final AmazonSQS(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;

     * Handle a login request sent from the client above.
    public void handleLoginRequest(Message message) {
        // Assume doLogin does the actual work, and returns a serialized result
        String response = doLogin(message.getBody());

        // This extracts the URL of the temporary queue from the message attribute and sends the
        // response to that queue.
        sqsResponder.sendResponseMessage(MessageContent.fromMessage(message), new MessageContent(response));  

Cleaning up

As with any other AWS SDK client, your code should call the shutdown() method when the temporary queue client is no longer needed. The AmazonSQSRequester interface also provides a shutdown() method, which shuts down its internal temporary queue client. This ensures that the in-memory resources needed for any virtual queues are reclaimed, and that the host queue that the client automatically created is also deleted automatically.

However, in the world of distributed systems things are a little more complex. Processes can run out of memory and crash, and hosts can reboot suddenly and unexpectedly. There are even cases where you don’t have the opportunity to run custom code on shutdown.

The Temporary Queue Client client addresses this issue as well. For each host queue with recent API calls, the client periodically uses the TagQueue API action to attach a fresh tag value that indicates the queue is still being used. The tagging process serves as a heartbeat to keep the queue alive. According to a configurable time period (by default, 5 minutes), a background thread uses the ListQueues API action to obtain the URLs of all queues with the configured prefix. Then, it deletes each queue that has not been tagged recently. The mechanism is similar to how the Amazon DynamoDB Lock Client expires stale lock leases.

If you use the AmazonSQSTemporaryQueuesClient directly, you can customize how long queues must be idle before they is deleted by configuring the IdleQueueRetentionPeriodSeconds queue attribute. The client supports setting this attribute on both host queues and virtual queues. For virtual queues, setting this attribute ensures that the in-memory resources do not become a memory leak in the presence of application bugs.

Any API call to a queue marks it as non-idle, including ReceiveMessage calls that don’t return any messages. The only reason to increase the retention period attribute is to give the client more time when it can’t send heartbeats—for example, due to garbage collection pauses or networking issues.

But what if you want to use this client in a fleet of a thousand EC2 instances? Won’t every client spend a lot of time checking every queue for idleness? Doesn’t that imply duplicate work that increases as the fleet is scaled up?

We thought of this too. The Temporary Queue Client creates a shared queue for all clients using the same queue prefix, and uses this queue as a work queue for the distributed task. Instead of every client calling the ListQueues API action every five minutes, a new seed message (which triggers the sweeping process) is sent to this queue every five minutes.

When one of the clients receives this message, it calls the ListQueues API action and sends each queue URL in the result as another kind of message to the same shared work queue. The work of actually checking each queue for idleness is distributed roughly evenly to the active clients, ensuring scalability. There is even a mechanism that works around the fact that the ListQueues API action currently only returns no more than 1,000 queue URLs at time.


We are excited about how the new Amazon SQS Temporary Queue Client makes more messaging patterns easier and cheaper for you. Download the code from GitHub, have a look at Temporary Queues in the Amazon SQS Developer Guide, try out the client, and let us know what you think!

from AWS Compute Blog

Dance like Nobody’s Watching; Encrypt like Everyone Is

Dance like Nobody’s Watching; Encrypt like Everyone Is

While AWS is making computing easier, it can be challenging to know how to effectively use encryption. In this screencast, we provide an overview of the encryption landscape on AWS. This includes services like AWS Certificate Manager, AWS Key Management Service, and the Encryption SDK, which provide encryption in transit and at rest. In addition, we share how to enable encryption for services such as Amazon DynamoDB, Amazon EBS, Amazon RDS, and Amazon S3. Finally, we show you how to automate encryption provisioning for some of these services using AWS CloudFormation.

Below, I’ve included a screencast of the talk I gave last week at the AWS NYC Summit in July 2019 along with a transcript (generated by Amazon Transcribe).

This is Paul Duvall – founder and CTO Mphasis Stelligent. I gave this talk at the AWS New York City Summit in 2019 so just sharing this as a screencast as well. At Stelligent, we’ve been helping companies apply DevOps practices on AWS for over a decade now. So what I’ll be sharing here is based on that perspective. When you think about encryption, encryption is about protecting your data from unauthorized access so you’re gonna learn how to apply encryption in a practical way and doing that through automation. But, before we get into this, I want to share a perspective on what we often see when it comes to security and compliance, in general, at most enterprises. 

What we often see is that security is something that’s the responsibility of a separate team or might be multiple teams that are further downstream in the software development life cycle. So, if you imagine you have a development team and they’re writing code, writing tests, maybe they’re performing continuous integration. They might be doing this over a period of a few weeks or so and then they’re ready to release, so let’s just say it goes to QA, then it might go to a change advisory board, internal audit, and in some cases, it might go to a separate security team that gets involved, and the problem is that there’s often a significant amount of time between, say, when a developer commits the code and then when there might be any kind of security and compliance checks that are applied – that are comprehensive in nature. And, so even if there are some security control directives that are well documented, it doesn’t mean that they’re always run for every release and doesn’t mean that they’re run the exact same way every single time. And then there’s also the fact that it could be weeks before the developer made that change and the developer’s going to lack context if (the security team) brings it up, and there’s gonna be pressure to release and things like that. The reason this occurs in many organizations is because of the cost as the result of processes that might require human intervention, even if it’s just one thing that they have to do so (these compliance checks) often get batched and they get scheduled across all the different application service teams and that these central security, operations, and audit teams have to support. Another reason, and when you think about from an AWS perspective, because (even though) you can automate all of these things, there might simply be the lack of knowledge that you can (perform) this (automation). So it’s not just sort of the old style data centers, non-cloud types of companies, but even companies that are using AWS might lack the knowledge that they can actually just check a box or automate this through the SDK or through CloudFormation. 

And so, as Werner Vogels talks about, the bottom line of all this is that security is everyone’s job, and the beauty of this now is that AWS gives you the tools to bake the security and compliance into every step of the software development process. So, from the standpoint of encryption, you can automate all the encryption as a part of your software development process. You can also automate things like static analysis and runtime checks against your software in order to ensure that you’re always in compliance with your encryption directives. And so, as Werner also says, you can “dance like nobody’s watching, but we encrypt like everyone is”. AWS announced – and this is as of July 2019 – that 117 AWS services now integrate with KMS. As of now, there are about 175-180 total services on the AWS platform and so much more than half of these services provide that provide (this integration with KMS). These might be storage services like S3, EBS, database services like RDS and DynamoDB. The plan is to eventually have all the services have this capability. 

In terms of I’ll be covering, I’ll be talking about how do you automate all of this. How do you incorporate this into the software development lifecycle, how to use things like AWS CloudFormation? How do you use the SDK or how’d you get access to the API ultimately in order to make this a part of that software development process. When developers need to apply things like client-side encryption or they need to manage secrets – things like that. We’ll go over that a little bit in terms of client-side encryption. Once that you need to send (data) over the wire, you need to encrypt in transit we’ll be talking about things like AWS Certificate Manager and CloudFront along with ELB. And then how do you encrypt that data at rest, either through database services like RDS, DynamoDB, EBS, S3 and so forth. And then the underlying service that allows you to encrypt all these (resources) is the Key Management Service. So we’ll go over that and also how to give fine-grained permission to keys and fine-grained permission to the service itself. Then when it gets into production and you want to detect whether or not encryption is enabled or not against all your AWS accounts, we’ll cover AWS Config Rules and CloudWatch Event Rules as well. AWS just recently announced that they provide encryption over VPC for certain instances – the Nitro instances – so briefly cover that. Then finally we’ll talk about logging. You can log all of your API calls, but then – from an encryption standpoint – how do you know when those keys are used and then any of the mitigations you might have to go through as a result of that monitoring and logging. 

So there’s a heuristic that we use when we look at really anything that we’re building in deploying, testing, and releasing into production. And, there are three steps to this, and the first is to codify: codify all the things. So whether it’s in an AWS service or it’s application code, configuration, infrastructure, or the data: we can codify all that. We can use things like AWS CloudFormation to automate the provisioning configuration of these services: whether it’s database or storage, or it’s the pipeline itself, containers and things like that. How do we codify all of that? And the next thing we consider as a part of this heuristic is then how do you create a pipeline out of this? And not just how do you code it and version it, but how do you put it through a series of stages and actions in which your building, testing, deploying, and getting it out to end users and the users might not just be end users of the services and applications that your customers consume but it also might be internally within AWS and some of the security services or even AWS accounts. You might put (these services) through a deployment pipeline as well. And, the last part of this is then how do you secure it? How do you ensure that security is run as a part of your pipelines? How do you ensure that you have security of the pipeline through hardened-build images and that you’re ensuring that everything goes through the proper checks, how to give fine-grained permission to all the resources in your AWS accounts. So these are the three steps that we consider and, from an encryption standpoint, we’re gonna look at how do you codify that encryption. How do you put the encryption through a pipeline and then how do you secure that? 

So let’s take a look at a brief example of doing this from an automation lens, and so AWS CloudFormation is the service that allows you to automate that provisioning. So let’s imagine you have a bootstrap template that you create a stack out of. And so you have sort of the core (and you might be running this from the command line) services that you wanna have set up. Whether that’s KMS – the key management service – AWS Config Rules, and Identity and Access Management and finally, the pipeline itself. And, then in the pipeline itself, you can put stages and actions and services that you might be automating as a part of that. In this example, let’s imagine that we have a directive that we want to have encryption enabled for all of our AWS services and so we’re gonna look at that from a couple of different perspectives. One is every time you build up the software system, you want to make sure that they’re not going to introduce any security vulnerabilities. In the context of encryption, we want to make sure that anything that we build that needs encryption has it turned on, and so we can use a static analysis tool like cfn_nag, which is an open source tool that has 45 built-in rules to it and you can look for encryption on certain AWS resources. And if it doesn’t have the encryption, we can fail the build, give notification and, before we even launch the infrastructure, we can have remediation to that and committed back to the version control repository and then we’re on our way. But then we can also set up detection controls as well with automated detection mechanisms and we can do that through AWS Config Rules. AWS Config notices any state change to a number of different resources in your AWS account, so we could set up a Config Rule that looks for changes to say that your DynamoDB tables or your RDS databases or ensuring your EBS volumes are encrypted. So, we can do that at it from a static perspective as part of the pipeline. But then we can also automate the provisioning of these resources and deploy these rules and, under the hood, these rules are running these checks in something like AWS Lambda. You know, we can write that in a number of different languages: Python, Ruby, and so forth. We can run this based on schedule or based on whatever the event is could be based on schedule, whether there’s a state change and then we perform remediation actions based on that could be to slack developers and say, you wanna automate this and here’s how you automate it where it might be to disable that resource or to automatically say, turn encryption on – if that’s the rule that we’re looking at, that’s a kind of overall view of how you might do this. 

So let’s get into the first part of this and that is automation. Automation is providing a number of different ways on AWS. So, AWS CloudFormation provides this common language you can define in JSON or YAML or you could have higher level tools such as the  AWS CDK (Cloud development kit) that provides an abstraction layer that generates CloudFormation templates for you, but this way you can define this in code. You can define this in a template, you can version it, you can test the CloudFormation template itself, just as you would any other software asset. You can also use the SDKs, there’s AWS SDKs for all the common languages that are used, whether it’s JavaScript, Java, .NET, Python, Ruby, so forth any of those common languages that you might be using. You get access to all the APIs as a result of that and CloudFormation provides support for most AWS resources and so, from an encryption standpoint, you’ll see some of these examples is there might be a property like a KMS key, or server-side encryption (SSE) that you can turn on which is just a property/boolean you make true and then you’re on your way. You’re able to automate that process. It’s great to have that checkbox in the console, but if you want a repeatable, testable process then AWS CloudFormation is one of the great ways of doing that. 


So this is generic – having nothing to do with encryption – has to do with more about CloudFormation just to give you a high-level view if you haven’t seen it before. You have things like parameters and resources, and outputs. These are some of the common constructs that you’ll see in a CloudFormation template. And so you generally will see multiple parameters that you be able to provide to customize the behavior of the stack that you’re going to create from this template. Then you might have multiple (AWS) resources. Typically you’re going to see a CloudFormation template that has hundreds of lines of configuration code, it’s declarative in nature, and so it determines how it applies these changes and you can set dependencies and things like that. It can definitely get a more complex and simple example that you see but the idea is that you were we’re defining all over you can define all your AWS resources in code using this CloudFormation language. 


From a development standpoint, especially when it comes to client-side encryption, AWS provides the encryption SDK to support client-side encryption. It provides support for a number of different programming languages like Java, C, Python, and provides CLI support so you can really run it with any language that you happen to be using. The other service you come across as developers is that you often have the need to store secrets somewhere, and that could be the user name and password for the database, it could be API keys and so forth. The AWS Secrets Manager is a fully managed service that allows you to create these secrets, generate random secrets, and provide cross-account access and then integrates with a number of different services. So, for example, it integrates with DynamoDB and RDS, and RedShift automatically, so you can generate username and password randomly, never even see (these secrets) and rotate through that automatically using the Secrets Manager manager service. And we’ll touch on this a little bit later when it comes to the management as well. 

And so this is an example of using the (AWS) Encryption SDK. And this is a Python example in which we’re taking some plain text source data. We want to encrypt it and then we convert into ciphertext and then we can decrypt it as well. One thing to note, when people look at encryption, they always kind of think back two decades ago when there was a 20 to 40% hit on their performance. With the use of KMS, there is not that performance hit as KMS actually performs the encryption so you don’t have the hit on the service that you used on the compute side or the database, or whatever you happen to be using so keep that in mind when it comes to encryption, because often people believe that there’s a performance degradation, and that’s not the case. 

So we’ve talked about encrypting on the client-side. Then when we send (data) over the wire (i.e. in transit) we want to have the capability to secure the connection between the end user and the endpoint that they’re hitting and AWS provides a number of different services around that. One is the AWS certificate manager or ACM and what this does is allows you to generate public certificates, part of a certificate authority, and it also rotates that certificate for you and so you don’t get into the bind of, say, having a certificate expired and then that means your website’s down. And if you’ve ever been through that before, you know the troubles that ensue as a part of that. And so (ACM) manages a lot for you. It’s freely available, there are 25 certificates that you apply to an Application Load Balancer. Another service (that supports in transit encryption) that’s really useful is CloudFront, which is a CDN so it provides the performance caching and things like that. But what it also does is it gives you access to AWS Shield for volumetric DDOS protection, and so you, and you just get that built in for with CloudFront. In fact, if you take a look at the Well-Architected Framework in the Security Pillar – (AWS goes) over some examples that in the GitHub repositories, some quickstarts and so they have some CloudFormation code so forth in which it uses CloudFront with Certificate Manager and you can get all that set up in less than an hour, so definitely have a look at that. 

This is a simple ACM example, in CloudFormation, in which we’re generating a certificate against TLD against our top-level domain, which is And so you can see it’s pretty simple to set this up. So, we’re assuming maybe you created the domain using Route53, you’ve set up your DNS and you attach that certificate to your Load Balancer, CloudFront, and then you’re often running with this CloudFormation code. 

The endpoint could be any number of things, but, in the case of encryption in transit, we have this example where you’re hitting a website. So when we go to this website, we see the lock and we can look at more information as well. But we know that there’s a certificate authority that’s identified that the connection between the end user and this website has been encrypted, so it’s secure in transit. 

So, at rest, how do we encrypt at rest? And there’s a number of different services of those 117 services that help you encrypt your data. One is EBS, Amazon Elastic Block Store. In fact, recently, EBS now provides the ability to encrypt your volumes by default. So all the volumes that you create with EBS or you can select them one by one to encrypt it. But again, basically, it’s pretty much a checkbox for all these. You have a KMS key and you associate that KMS key with that particular service, or with that the resource in that particular service. Other ones are RDS and DynamoDB. We’ll take a look at an example of DynamoDB in a moment. Amazon S3 for objects storage – you can encrypt that on the server side as well. And so the underpinnings of all this, as I mentioned, is the Key Management Service, which we’ll get to that in a moment, but that allows you to create, manage these keys and rotate them and things like that. But what we’re doing is we’re creating these keys and then attaching that key id to the resource of those particular services so that were able to encrypt our data at rest. 

Here’s an example of encrypting data at rest in CloudFormation for DynamoDB. So we have a DynamoDB table resource in CloudFormation, we have some of the other properties and attributes that were specifying, but the one here is we have the SSESpecification property, we have a boolean and we turn that on and now for this particular table that we’ve defined using CloudFormation, we have encryption turned on at rest using this property. As we discussed, you could have a checkbox but with this allows you do is make that part of the software development process and codify it. 

And then you see the results of that in the DynamoDB console. 

The underpinnings of everything I’m talking about, from an encryption standpoint, is the key management service or KMS, and you can create and manage these keys and then these keys – what are known as customer master keys. There are (also) some built-in AWS-managed keys for specific services as well – such as S3 and others. But you can also create your own CMKs these customer master keys and these customer master keys then allow you to generate data keys and then with these encrypted data keys, then you can encrypt the plain text for any of the services or any of the data that’s used by some of the services that you need to encrypt and that has KMS support. You can also get fine-grained access through something known as a key policy as well and you’ll see an example of that. The other ability you have is automatic rotation, so you can check a box or you can automate, of course, but you can check a box in KMS to say I want an annual automatic rotation of this particular key or these keys. For fine-grained access to the KMS service itself, we can do that through an Identity and Access Management service – through IAM. The other service is AWS CloudHSM. So this is a cloud-based hardware security module. It provides asymmetric encryption and it’s a single tenant. If you look at KMS, it provides is symmetric encryption. This means that you’re using the same key to encrypt and decrypt with asymmetric encryption using different keys on encryption/decryption used the same math but different keys. Both are FIPS 140-2 compliant. This is the NIST standard that it adheres to but KMS is Level 2 of FIPS 140-2. CloudHSM gives that extra level when it comes to asymmetric and using the single tenant hardware security module, it’s level 3 and KMS is multi-tenant. 

We discussed this before but here’s Secrets Manager, which enables you to create secrets across all your services you need a state for, you can use the secrets manager for rotating these secrets, generating random secrets, and giving you cross-account access to these secrets as well. 

Here’s an example of defining a KMS key in CloudFormation. I’m able to enable that key rotation. We talked about how you enable the checkbox in the console but in this case, we’re defining it as code. Enabling key rotation means this custom master key that I’m creating here will automatically rotate once a year by doing it this way. I can also set the pending window for deletion. So if I want to delete this key, I can set anywhere from 7 days to 30 days, 30 days being the default between the time when I disable a key and then when that key automatically gets deleted. And this just gives us time because once a key is gone, it’s gone. You can’t use it. If you created a key, you attached it to a resource and that key ultimately gets deleted, you’re never going to get it back. So, it gives you a window to make sure and we’ll talk about how you find the use of your KMS keys using CloudTrail a little bit later. So the rest of this is that key policy I talked about this is giving fine-grained permission on the key itself. Identity and Access Management gives you fine-grained permission on the use of KMS. And then this gives you fine-grained permission on the key here that we’re creating in CloudFormation. So this indicates which IAM user has access, and then the second policy allows the administration of this key. Lastly, the use of that key is defined as well. And you can see how you’re able to define the principal on which actions they’re able to create, there is actually a number of other actions that you’re typically defining your confirmation template. 

And then you see that in the Key Management (Service) console. And so if I went over to say the key rotation tab, I would see a checkbox there that would indicate that it’s automatically being rotated. 

So, I mentioned how we can statically look for whether or not encryption is going to be enabled, at least from an automation standpoint, through CloudFormation and tools like cfn_nag. But from a runtime perspective, how do you detect these changes, or how do you detect that encryption may have been turned off, and so you have the directive that everything needs to be encrypted, but maybe it didn’t get automated or maybe someone turned it off after it went into production. Well, you can have these detective controls in place using something like AWS Config Rules and Config Rules notices any state changes and, in the case of encryption, let’s imagine that you know, someone turned off or never enabled encryption for DynamoDB, it notices that change and then you can configure it in such a way so that it runs through some kind of remediation workflow: it might slack the developer and let them know how to codify that, it could automatically enable it or it could just disable that resource altogether or at least maybe give some kind of warning that will let them know that they won’t be able to use that resource if it’s not encrypted. There are lots of different ways you can do this, but ultimately these Config Rules are defined in AWS Lambda so you can write your own custom rules but then there’s also there’s 86 managed rules that Config Rules comes with of which six are encryption related. So if you want to extend that to say all the AWS services that KMS supports, you could do that through the custom rules. And then there are some other services that are relevant as well, such as Cloudwatch event rules gives you this near real-time stream events, since it can perform actions based on that. Then Inspector also helps you from a detection standpoint as well. 

And so here’s an example of defining a Config Rule. We’re provisioning this Config Rule in AWS CloudFormation. We’re using one of those managed Config Rules for encryption. And this one is saying that we want to make sure that any CloudTrail that gets created needs to have encryption enabled. If it’s not, it’s going to let us know and then we might have a workflow process, as I talked about before, in terms of auto-remediation or notification, it goes to a knowledge base. However, we decide that the best way to inform and be responsive to developers in terms of what that directive the overall control directive is. 

And then we see the Config Rules dashboard that lets us know that we’re not compliant with the particular rule that we’ve set up. 

On the networking side, AWS announced encryption across the VPC for all its new Nitro instances on and then you can also encrypt between your data center and AWS using the AWS VPN. You can define that code as well. 

The last thing I want to talk about is logging. So AWS CloudTrail logs all API calls not just securing, compliance, and encryption, but it’s gonna log all the calls that you make to the AWS API. But when it comes to encryption, what CloudTrail helps us with is the use of, say, the Key Management Service or CloudHSM and it’s going to notify us when particular keys or use. This becomes useful if you wanted to table a key and you want to know which users are using that (key), then how they’re accessing things like that, they can also encrypt the CloudTrail trails themselves. You saw the detection check for that with AWS Config Rules. 

And then here’s how we do that in CloudFormation. We’re defining the AWS CloudTrail Trail resource and then we have a KMS ID. You might imagine that maybe elsewhere in this CloudFormation template we’ve automated the provisioning of the KMS Key and we’ve automated the provisioning of the KMS alias (and that’s what we’re referring to here). So the alias points back to that key and then we’re using able to use that key ID and attach that to this trail. We can enable things like log file validation to ensure that nothing got modified. But this is the way that we have to encrypt that trail and then be in compliance with that (encryption) directive and then be in compliance – operationally – with that Config Rule that we’ve set up to run. 

And so this is a JSON Payload from CloudTrail. We can see the KMS API action get called, it’s trying to decrypt, at what time it happened, against this key, against this EBS volume, and from this resource. And this is the user that was making that request and so it can help us troubleshoot in a case that we need to disable a key. It’ll give us some time to hunt down the use of that key before we actually go through that action. 

Overall, these are the takeaways. Don’t write the crypto yourself – AWS provides AES 256-bit GCM encryption, so you definitely don’t need the write the crypto yourself. If you want to look at the third-party attestations in terms of SOC compliance and FIPS 140-2 standard, PCI, and so forth. You can actually use AWS Artifact for that – if your auditors are looking for that and you have that requirement. (With this), you have that level of trust to know that the third party has looked at this and they understand how the service works and within the AWS data centers and so forth. The other thing we went over is how encryption becomes part of that software development life cycle using CloudFormation, you can use other tools for that, you can build in static analysis checks to ensure that encryption is occurring prior to launching the resources as a part of your software systems, as a part of your infrastructure. You can automate all these things as a part of a deployment pipeline. You can get encryption in transit through the use of CloudFront, through the use of the AWS Certificate Manager – to get that transport layer of encryption with CloudFront, you can integrate that with AWS Shield to get that DDOS protection. Of course, KMS is the underpinning of all this. KMS allows us to create keys and delete them grant access to them, get the fine-grained permission. You can rotate keys. You’re assured it doesn’t go outside the hardware-security module on which it’s running. You can also use Secrets Manager to store secrets for things like usernames and passwords, things that you needed a state for and you need to have encrypted, it will perform the rotation for you and allowing you to generate random secrets. Likewise, with ACM, it performs this certificate rotation as well. We also run detective controls for runtime encryption checks using AWS Config Rules or CloudWatch Event Rules, so that once it’s in use (whether it’s preproduction or production) we can run those checks to ensure that we’re always in compliance. We can use CloudTrail and we encrypt CloudTrail logs, but we can also monitor key usage to ensure that we know how the keys are being used and any actions we might need to take before, say, we delete a key. And then finally, when it comes to internal or external audits that you need to perform – if you’re able to build this all into your end-to-end software development lifecycle, it makes that whole process easier and you’re always in compliance with the directives that you have in place. You’re always in compliance with any of the compliance regimes that are out there both inside the cloud that AWS provides but also inside the cloud because of the services and the way you’re able to use these services as a part of your overall software development lifecycle. 

Thanks very much. You can reach me on Twitter and you can reach me at the email address. And if you have any questions, feel free to reach out to us at Stelligent. 

The post Dance like Nobody’s Watching; Encrypt like Everyone Is appeared first on Stelligent.

from Blog – Stelligent

AWS re:Inforce: Novelties + Key Insights

AWS re:Inforce: Novelties + Key Insights

Are you a cloud security expert or enthusiast? Were you at the first-ever security-focused AWS conference in Boston? If your answers are Yes and No respectively, I have just one more question for you; Where were you?

The first-ever AWS re:Inforce was definitely a success by all means (aside from all the free t-shirts I got). It highlighted all the security components you need to properly secure your account, infrastructure, and application in AWS.

Here are my key takeaways that will highlight features to help you better secure your workload.

10 Security Pillars of AWS

Access Layer

Who has access to your account and what can they do?
  1. Federated Access
  2. Programmatic Key Rotation
  3. Enforce Multi-Factor Authentication
  4. Disable Root Account Programmatic Access
  5. Utilize IAM Groups to grant permissions
  6. Cognito – Identity management for your apps

Account Layer

Is my account exposed or compromised?
  1. Amazon GuardDuty to detect intrusion
  2. AWS Config to monitor changes to Account
  3. AWS Trusted Advisor to audit security best practices
  4. AWS Organizations to manage multiple accounts
  5. AWS Control Tower to secure and enforce security standards across accounts

Network Layer

Is my network properly secured?
  1. Network ACLs to control VPC incoming and outgoing traffic
  2. VPC to isolate cloud resources
  3. AWS Shield for DDoS protection
  4. Web Application Firewall (WAF): Filter malicious web traffic
  5. PrivateLink: Securely access services hosted on AWS
  6. Firewall Manager: Manage WAF rules across accounts

Compute Layer

Can my compute infrastructure be hacked for bitcoin mining?
  1. AWS Systems Manager for patching
  2. AMI Hardening using CIS Standards
  3. Security Groups to limit port access
  4. AWS Inspector to identify security vulnerabilities
  5. AWS CloudFront to limit exposure of your origin servers
  6. Application Load Balancers to limit direct traffic to your app servers

Application Layer

Can my application be compromised or brought down by hackers?
  1. AWS Shield and Shield Advanced for DDoS protection
  2. AWS X-Ray for application request tracing
  3. AWS Cloudwatch for application logs
  4. Application runtime monitoring – Contrast e.t.c.
  5. AWS Inspector to identify application vulnerabilities

Pipeline Layer

Am I enforcing security standards in my build and deploy systems?
  1. Infrastructure code analysis with cfn_nag
  2. Application code analysis – Spotbugs, Fortify
  3. Dependency vulnerability Checks – OWASP
  4. Docker image scanning (if using docker) – Twistlock, Anchore CLI

Storage Layer

Always encrypt everything!
  1. KMS encryption for EBS volumes
  2. Server-Side Encryption for S3 Buckets
  3. RDS Encryption

Data Layer

Is my data safe? Am I leaking secrets?
  1. AWS Secrets Manager to rotate and manage secrets
  2. Amazon Macie to discover and classify data
  3. Regular Data backups and replication across regions
  4. Data Integrity Checks
  5. Client-side encryption

Transport Layer

Am I securely moving my data?
  1. Enforce SSL/TLS Encryption of all traffic
  2. AWS Certificate Manager to generate SSL Certificates
  3. ACM Private CA to create and deploy private certificates

Operation Layer

Are my engineers ready for security threats and breaches?
  1. Use PlayBooks and Runbooks to plan and prepare for security threats and breaches
  2. Utilize Cloud Native services when possible to leverage AWS best security practices

Other Noteworthy Mentions

Nitro Innovation

Nitro allows micro-services concepts to be applied to hardware. This enables faster development and deployment of new instance types; while creating higher throughput and stability. Some security features include:

  • Utilizes nitro controller as the root of trust
  • Hardware acceleration of encryption
  • Firmware is cryptographically validated
  • Encryption keys are secured into nitro devices
  • No SSH, hence, no human access

Nitro with FireCracker

This is most notably being used for running serverless workload (Lambda) by enabling the sharing of hardware infrastructure between multiple accounts. The security features of Nitro makes this possible. Some features include:

  • Minimal device model reduces memory footprint and attack surface area
  • User-space code in <125ms, 150microVM per second per host
  • Low memory overhead with a high density of VMs on each server

AWS Control Tower

The easiest way to set up and govern a secure, compliant multi-account AWS environment. Features include

  • Prescriptive guidance on IAM, Landing Zones
  • Workflows to provision compliant accounts
  • Set up AWS with multi-account structure
  • Pre-configured architectures


That’s all folks! I’m looking forward to AWS re:Inforce 2020 in Houston. Until then, Stay Secured My Friends!

The post AWS re:Inforce: Novelties + Key Insights appeared first on Stelligent.

from Blog – Stelligent

Deploying an Nginx-based HTTP/HTTPS load balancer with Amazon Lightsail

Deploying an Nginx-based HTTP/HTTPS load balancer with Amazon Lightsail

In this post, I discuss how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I define load balancers and explain their value. Then, I briefly weigh the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I cover how to set up a NGINX-based load balancer inside of a Lightsail instance.

If you feel like you already understand what load balancers are, and the pros and cons of self-hosted load balancers vs. managed services, feel free to skip ahead to the deployment section.

What is a load balancer?

Although load balancers offer many different functionalities, for the sake of this discussion, I focus on one main task: A load balancer accepts your users’ traffic and routes it to the right server.

For example, if I assign the load balancer the DNS name:, anyone visiting the site first encounters the load balancer. The load balancer then routes the request to one of the backend servers: web-1, web-2, or web-3. The backend servers then respond to the requestor.

A load balancer provides multiple benefits to your application or website. Here are a few key advantages:

  • Redundancy
  • Publicly available IP addresses
  • Horizontally scaled application capacity


Load balancers usually front at least two backend servers. Using something called a health check, they ensure that these servers are online and available to service user requests. If a backend server goes out of service, the load balancer stops routing traffic to that instance. By running multiple servers, you help to ensure that at least one is always available to respond to incoming traffic. As a result, your users aren’t bogged down by errors from a downed machine.

Publicly available IP addresses

Without a load balancer, a server requires a unique IP address to accept an incoming request via the internet. There are a finite number of these IP addresses, and most cloud providers limit the number of statically assigned public IP addresses that you can have.

By using a load balancer, you can task a single public IP address with servicing multiple backend servers. Later in this post, I return to this topic as I discuss configuring a load balancer.

Horizontally scaled application capacity

As your application or website becomes more popular, its performance may degrade. Adding additional capacity can be as easy as spinning up a new instance and placing it behind your load balancer. If demand drops, you can spin down any unneeded instances to save money.

Horizontal scaling means the deployment of additional identically configured servers to handle increased load. Vertical scaling means the deployment of a more powerful server to handle increased load. If you deploy an underpowered server, expect poor performance, whether you have a single server or ten.

Self-managed load balancer vs. a managed service

Now that you have a better understanding of load balancers and the key benefits that they provide, the next question is: How can you get one into your environment?

On one hand, you could spin up a Lightsail load balancer. These load balancers are all managed by AWS and don’t require any patching or maintenance on your part to stay up-to-date. You only need to name your load balancer and pick instances to service. Your load balancer is then up and running. If you’re so inclined, you can also get a free SSL (secure socket layer) certificate with a few extra clicks.

Lightsail load balancers deploy easily and require essentially no maintenance after they’re operational, for $18 per month (at publication time). Lightsail load balancer design prioritizes easy installation and maintenance. As a result, they lack some advanced configuration settings found with other models.

Consequently, you might prefer to configure your load balancer if you prioritize configuration flexibility or cost reduction. A self-hosted load balancer provides access to many advanced features, and your only hard cost is the instance price.

The downsides of self-hosting are that you are also responsible for the following:

  • Installing the load balancer software.
  • Keeping the software (and the host operating system) updated and secure.

Deploying a NGINX-based load balancer with Lightsail

Although many software-based load balancers are available, I recommend building a solution on NGINX because this wildly popular tool:

  • Is open source/free.
  • Offers great community support.
  • Has a custom Lightsail blueprint.


This tutorial assumes that you already have your backend servers deployed. These servers should all:

  • Be identically configured.
  • Point to a central backend database.

In other words, the target servers shouldn’t each have database copies installed.

To deploy a centralized database, see Lightsail’s managed database offering.

Because I’ve tailored these instructions to generic website and web app hosting, they may not work with specific applications such as WordPress.

Required prerequisites

Before installing an optional SSL certificate, you need to have the following:

  • A purchased domain name.
  • Permissions to update the DNS servers for that domain.

Optional prerequisites

Although not required, the following prerequisites may also be helpful:

  • Familiarity with accessing instances via SSH.
  • Using basic LINUX commands. 

Deploy an NGINX instance

To begin, deploy an NGINX instance in Lightsail, choosing the NGINX blueprint. Make sure that you are deploying it into the same Region as the servers to load balance.

Choose an appropriate instance size for your application, being aware of the amount of RAM and CPU, as well as the data transfer. If you choose an undersized instance, you can always scale it up via a snapshot. However, an oversized instance cannot as easily be scaled down. You may need to rebuild the load balancer on a smaller-sized instance from scratch.

Configure HTTP load balancing

In the following steps, edit the NGINX configuration file to load balance HTTP requests to the appropriate servers.

First, start up an SSH session with your new NGINX instance and change into the appropriate configuration directory:

cd /opt/bitnami/nginx/conf/bitnami/

Make sure that you have the IP addresses for the servers to load balance. In most cases, traffic shouldn’t flow from your load balancer to your instances across the internet. So, make sure to use the instance’s private IP address. You can find this information on the instance management page, in the Lightsail console.

In this example, my three servers have the following private IP addresses:


The configuration file to edit is named bitnami.conf. Open it using your preferred text editor (use sudo to edit the file):

sudo vi bitnami.conf

Clear the contents of the file and add the following code, making sure to substitute the private IP addresses of the servers to load balance:

# Define Pool of servers to load balance upstream webservers { 
server max_fails=3 fail_timeout=30s; 
server max_fails=3 fail_timeout=30s;
server max_fails=3 fail_timeout=30s;

In the code, you used the keyword upstream to define a pool (named webservers) of three servers to which NGINX should route traffic. If you don’t specify how NGINX should route each request, it defaults to round-robin server routing. Two other routing methods are available:

  • Least connected, which routes to the server with the fewest number of active connections.
  • IP hash, which uses a hashing function to enable sticky sessions (otherwise called session persistence).

Discussion on these methods is out of scope for this post. For more information, see Using nginx as HTTP load balancer.

Additionally, I recommend max_fails and fail_timeout to define health checks. Based on the configuration above, NGINX marks a server as down if it fails to respond or responds with an error three times in 30 seconds. If a server is marked down, NGINX continues to probe every 30 seconds. If it receives a positive response, it marks the server as live.

After the code you just inserted to the file, add the following:

# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
   proxy_pass http://webservers;

This code tells NGINX to listen for requests on port 80, the default port for unencrypted web (HTTP) traffic and forward such requests to one of the servers in the webservers group defined by the upstream keyword.

Save the file and quit back to your command prompt.

For the changes to take effect, restart the NGINX service using the Bitnami control script:

sudo /opt/bitnami/ restart nginx

At this point, you should be able to visit the IP address of your NGINX instance in your web browser. The load balancer then routes the request to one of the servers defined in your webservers group.

For reference, here’s the full bitnami.conf file.

# Define Pool of servers to load balance
upstream webservers {
server max_fails=3 fail_timeout=30s;
server max_fails=3 fail_timeout=30s;
server max_fails=3 fail_timeout=30s;
# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
proxy_pass http://webservers;

Configure HTTPS load balancing

Configuring your load balancer to use SSL requires three steps:

  1. Ensure that you have a domain record for your NGINX load balancer instance.
  2. Obtain and install a certificate.
  3. Update the NGINX configuration file.

If you have not already done so, assign your NGINX instance an entry with your DNS provider. Remember, the load balancer is the address your users use to reach your site. For instance, it might be appropriate to create a record that points at your NGINX load balancer. If you need help configuring the DNS in Lightsail, see DNS in Amazon Lightsail.

Similarly, to configure your NGINX instance to use a free SSL certificate from Let’s Encrypt, follow steps 1–7 in Tutorial: Using Let’s Encrypt SSL certificates with your Nginx instance in Amazon Lightsail. You handle step 8 later in this post,

After you configure NGINX to use the SSL certificate and update your DNS, you must modify the configuration file to allow for HTTPS traffic.

Again, use a text editor to open the bitnami.conf file:

sudo vi bitnami.conf

Add the following code to the bottom of the file:

server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;

This code closely resembles the HTTP code added previously. However, in this case, the code tells NGINX to accept SSL connections on the secure port 443 (HTTPS) and forward them to one of your web servers. The rest of the commands instruct NGINX on where to locate SSL certificates, as well as setting various SSL parameters.

Here again, restart the NGINX service:

sudo /opt/bitnami/ restart nginx

Optional steps

At this point, you should be able to access your website using both HTTP and HTTPS. However, there are a couple of optional steps to consider, including:

  • Shutting off direct HTTP/HTTPS access to your web servers.
  • Automatically redirecting incoming load balancer HTTP requests to HTTPS.

It’s probably not a great idea to allow people to access your load-balanced servers directly. Fortunately, you can easily restrict access:

  1. Navigate to each instance’s management page in the Lightsail console.
  2. Choose Networking.
  3. Remove the HTTP and HTTPS (if enabled) firewall port entries.

This restriction shuts down access via the internet while still allowing communications between the load balancer and the servers over the private AWS network.

In many cases, there’s no good reason to allow access to a website or web app over unencrypted HTTP. However, the load balancer configuration described to this point still accepts HTTP requests. To automatically reroute requests from HTTP to HTTPS, make one small change to the configuration file:

  1. Edit the conf file.
  2. Find this code:
server {
listen 80; location / {
proxy_pass http://webservers;
  1. Replace it with this code:
server {
listen 80;
return 301 https://$host$request_uri;

The replacement code instructs NGINX to respond to HTTP requests with a “page has been permanently redirected” message and a citation of the new page address. The new address is simply requested one, only accessed over HTTPS instead of HTTP.

For this change to take effect, you must restart NGINX:

sudo /opt/bitnami/ restart nginx

For reference, this is what the final bitnami.conf file looks like:

# Define the pool of servers to load balance
upstream webservers {
server max_fails=3 fail_timeout=30s;
server max_fails=3 fail_timeout=30s;
server max_fails=3 fail_timeout=30s;
# Redirect traffic on port 80 to use HTTPS
server {
listen 80;
return 301 https://$host$request_uri;
# Forward traffic on port 443 to one of the servers in the web servers group
server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;


This post explained how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I defined load balancers and their utility. I weighed the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I walked you through how to set up a NGINX-based load balancer inside of a Lightsail instance.

Thanks for reading this post. If you have any questions, feel free to contact me on Twitter, @mikegcoleman or visit the Amazon Lightsail forums.

from AWS Compute Blog