Tag: Compute

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

AWS announces the new capacity-optimized allocation strategy for Amazon EC2 Auto Scaling and EC2 Fleet. This new strategy automatically makes the most efficient use of spare capacity while still taking advantage of the steep discounts offered by Spot Instances. It’s a new way for you to gain easy access to extra EC2 compute capacity in the AWS Cloud.

This post compares how the capacity-optimized allocation strategy deploys capacity compared to the current lowest-price allocation strategy.

Overview

Spot Instances are spare EC2 compute capacity in the AWS Cloud available to you at savings of up to 90% off compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back.

When making requests for Spot Instances, customers can take advantage of allocation strategies within services such as EC2 Auto Scaling and EC2 Fleet. The allocation strategy determines how the Spot portion of your request is fulfilled from the possible Spot Instance pools you provide in the configuration.

The existing allocation strategy available in EC2 Auto Scaling and EC2 Fleet is called “lowest-price” (with an option to diversify across N pools). This strategy allocates capacity strictly based on the lowest-priced Spot Instance pool or pools. The “diversified” allocation strategy (available in EC2 Fleet but not in EC2 Auto Scaling) spreads your Spot Instances across all the Spot Instance pools you’ve specified as evenly as possible.

As the AWS global infrastructure has grown over time in terms of geographic Regions and Availability Zones as well as the raw number of EC2 Instance families and types, so has the amount of spare EC2 capacity. Therefore it is important that customers have access to tools to help them utilize spare EC2 capacity optimally. The new capacity-optimized strategy for both EC2 Auto Scaling and EC2 Fleet provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics.

Walkthrough

To illustrate how the capacity-optimized allocation strategy deploys capacity compared to the existing lowest-price allocation strategy, here are examples of Auto Scaling group configurations and use cases for each strategy.

Lowest-price (diversified over N pools) allocation strategy

The lowest-price allocation strategy deploys Spot Instances from the pools with the lowest price in each Availability Zone. This strategy has an optional modifier SpotInstancePools that provides the ability to diversify over the N lowest-priced pools in each Availability Zone.

Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The lowest-price strategy does not account for pool capacity depth as it deploys Spot Instances.

As a result, the lowest-price allocation strategy is a good choice for workloads with a low cost of interruption that want the lowest possible prices, such as:

  • Time-insensitive workloads
  • Extremely transient workloads
  • Workloads that are easily check-pointed and restarted

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the lowest-price allocation strategy diversified over two pools:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "lowest-price",
      "SpotInstancePools": 2
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to lowest-price over two SpotInstancePools.

First, EC2 Auto Scaling tries to make sure that it balances the requested capacity across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the lowest-price allocation strategy allocates the Spot Instance launches to the lowest-priced pool per zone.

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1b (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1c (10 c3.large, 10 c4.large)
Availability Zone Instance type Spot Instances allocated Spot price
us-east-1a c3.large 10 $0.0294
us-east-1a c4.large 10 $0.0308
us-east-1a c5.large 0 $0.0408
us-east-1b c3.large 10 $0.0294
us-east-1b c4.large 10 $0.0308
us-east-1b c5.large 0 $0.0387
us-east-1c c3.large 10 $0.0294
us-east-1c c4.large 10 $0.0331
us-east-1c c5.large 0 $0.0353

The cost for this Auto Scaling group is $1.83/hour. Of course, the Spot Instances are allocated according to the lowest price and are not optimized for capacity. The Auto Scaling group could experience higher interruptions if the lowest-priced Spot Instance pools are not as deep as others, since upon interruption the Auto Scaling group will attempt to re-provision instances into the lowest-priced Spot Instance pools.

Capacity-optimized allocation strategy

There is a price associated with interruptions, restarting work, and checkpointing. While the overall hourly cost of capacity-optimized allocation strategy might be slightly higher, the possibility of having fewer interruptions can lower the overall cost of your workload.

The effectiveness of the capacity-optimized allocation strategy depends on following Spot best practices by being flexible and providing as many instance types and Availability Zones (Spot Instance pools) as possible in the configuration. It is also important to understand that as capacity demands change, the allocations provided by this strategy also change over time.

Remember that Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The capacity-optimized strategy does account for pool capacity depth as it deploys Spot Instances, but it does not account for Spot prices.

As a result, the capacity-optimized allocation strategy is a good choice for workloads with a high cost of interruption, such as:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the capacity-optimized allocation strategy:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "capacity-optimized"
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices (especially critical when using the capacity-optimized allocation strategy) and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to capacity-optimized.

First, EC2 Auto Scaling tries to make sure that the requested capacity is evenly balanced across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the capacity-optimized allocation strategy optimizes the Spot Instance launches by analyzing capacity metrics per instance type per zone. This is because this strategy effectively optimizes by capacity instead of by the lowest price (hence its name).

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (20 c4.large)
  • 20 Spot Instances from us-east-1b (20 c3.large)
  • 20 Spot Instances from us-east-1c (20 c5.large)
Availability Zone Instance type Spot Instances allocated Spot price
us-east-1a c3.large 0 $0.0294
us-east-1a c4.large 20 $0.0308
us-east-1a c5.large 0 $0.0408
us-east-1b c3.large 20 $0.0294
us-east-1b c4.large 0 $0.0308
us-east-1b c5.large 0 $0.0387
us-east-1c c3.large 0 $0.0294
us-east-1c c4.large 0 $0.0308
us-east-1c c5.large 20 $0.0353

The cost for this Auto Scaling group is $1.91/hour, only 5% more than the lowest-priced example above. However, notice the distribution of the Spot Instances is different. This is because the capacity-optimized allocation strategy determined this was the most efficient distribution from an available capacity perspective.

Conclusion

Consider using the new capacity-optimized allocation strategy to make the most efficient use of spare capacity. Automatically deploy into the most available Spot Instance pools—while still taking advantage of the steep discounts provided by Spot Instances.

This allocation strategy may be especially useful for workloads with a high cost of interruption, including:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

No matter which allocation strategy you choose, you still enjoy the steep discounts provided by Spot Instances. These discounts are possible thanks to the stable Spot pricing made available with the new Spot pricing model.

Chad Schmutzer is a Principal Developer Advocate for the EC2 Spot team. Follow him on twitter to get the latest updates on saving at scale with Spot Instances, to provide feedback, or just say HI.

from AWS Compute Blog

ICYMI: Serverless Q2 2019

ICYMI: Serverless Q2 2019

This post is courtesy of Moheeb Zara, Senior Developer Advocate – AWS Serverless

Welcome to the sixth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

April - June 2019

Amazon EventBridge

Before we dive in to all that happened in Q2, we’re excited about this quarter’s launch of Amazon EventBridge, the serverless event bus that connects application data from your own apps, SaaS, and AWS-as-a-service. This allows you to create powerful event-driven serverless applications using a variety of event sources.

Our very own AWS Solutions Architect, Mike Deck, sat down with AWS Serverless Hero Jeremy Daly and recorded a podcast on Amazon EventBridge. It’s a worthy listen if you’re interested in exploring all the features offered by this launch.

Now, back to Q2, here’s what’s new.

AWS Lambda

Lambda Monitoring

Amazon CloudWatch Logs Insights now allows you to see statistics from recent invocations of your Lambda functions in the Lambda monitoring tab.

Additionally, as of June, you can monitor the [email protected] functions associated with your Amazon CloudFront distributions directly from your Amazon CloudFront console. This includes a revamped monitoring dashboard for CloudFront distributions and [email protected] functions.

AWS Step Functions

Step Functions

AWS Step Functions now supports workflow execution events, which help in the building and monitoring of even-driven serverless workflows. Automatic Execution event notifications can be delivered upon start/completion of CloudWatch Events/Amazon EventBridge. This allows services such as AWS Lambda, Amazon SNS, Amazon Kinesis, or AWS Step Functions to respond to these events.

Additionally you can use callback patterns to automate workflows for applications with human activities and custom integrations with third-party services. You create callback patterns in minutes with less code to write and maintain, run without servers and infrastructure to manage, and scale reliably.

Amazon API Gateway

API Gateway Tag Based Control

Amazon API Gateway now offers tag-based access control for WebSocket APIs using AWS Identity and Access Management (IAM) policies, allowing you to categorize API Gateway resources for WebSocket APIs by purpose, owner, or other criteria.  With the addition of tag-based access control to WebSocket resources, you can now give permissions to WebSocket resources at various levels by creating policies based on tags. For example, you can grant full access to admins to while limiting access to developers.

You can now enforce a minimum Transport Layer Security (TLS) version and cipher suites through a security policy for connecting to your Amazon API Gateway custom domain.

In addition, Amazon API Gateway now allows you to define VPC Endpoint policies, enabling you to specify which Private APIs a VPC Endpoint can connect to. This enables granular security control using VPC Endpoint policies.

AWS Amplify

Amplify CLI (part of the open source Amplify Framework) now includes support for adding and configuring AWS Lambda triggers for events when using Amazon Cognito, Amazon Simple Storage Service, and Amazon DynamoDB as event sources. This means you can setup custom authentication flows for mobile and web applications via the Amplify CLI and Amazon Cognito User Pool as an authentication provider.

Amplify Console

Amplify Console,  a Git-based workflow for continuous deployment and hosting for fullstack serverless web apps, launched several updates to the build service including SAM CLI and custom container support.

Amazon Kinesis

Amazon Kinesis Data Firehose can now utilize AWS PrivateLink to securely ingest data. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely over the Amazon network. When AWS PrivateLink is used with Amazon Kinesis Data Firehose, all traffic to a Kinesis Data Firehose from a VPC flows over a private connection.

You can now assign AWS resource tags to applications in Amazon Kinesis Data Analytics. These key/value tags can be used to organize and identify resources, create cost allocation reports, and control access to resources within Amazon Kinesis Data Analytics.

Amazon Kinesis Data Firehose is now available in the AWS GovCloud (US-East), Europe (Stockholm), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and EU (London) regions.

For a complete list of where Amazon Kinesis Data Analytics is available, please see the AWS Region Table.

AWS Cloud9

Cloud9 Quick Starts

Amazon Web Services (AWS) Cloud9 integrated development environment (IDE) now has a Quick Start which deploys in the AWS cloud in about 30 minutes. This enables organizations to provide developers a powerful cloud-based IDE that can edit, run, and debug code in the browser and allow easy sharing and collaboration.

AWS Cloud9 is also now available in the EU (Frankfurt) and Asia Pacific (Tokyo) regions. For a current list of supported regions, see AWS Regions and Endpoints in the AWS documentation.

Amazon DynamoDB

You can now tag Amazon DynamoDB tables when you create them. Tags are labels you can attach to AWS resources to make them easier to manage, search, and filter.  Tagging support has also been extended to the AWS GovCloud (US) Regions.

DynamoDBMapper now supports Amazon DynamoDB transactional API calls. This support is included within the AWS SDK for Java. These transactional APIs provide developers atomic, consistent, isolated, and durable (ACID) operations to help ensure data correctness.

Amazon DynamoDB now applies adaptive capacity in real time in response to changing application traffic patterns, which helps you maintain uninterrupted performance indefinitely, even for imbalanced workloads.

AWS Training and Certification has launched Amazon DynamoDB: Building NoSQL Database–Driven Applications, a new self-paced, digital course available exclusively on edX.

Amazon Aurora

Amazon Aurora Serverless MySQL 5.6 can now be accessed using the built-in Data API enabling you to access Aurora Serverless with web services-based applications, including AWS LambdaAWS AppSync, and AWS Cloud9. For more check out this post.

Sharing snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly is now possible. We are also giving you the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

You can now set the minimum capacity of your Aurora Serverless DB clusters to 1 Aurora Capacity Unit (ACU). With Aurora Serverless, you specify the minimum and maximum ACUs for your Aurora Serverless DB cluster instead of provisioning and managing database instances. Each ACU is a combination of processing and memory capacity. By setting the minimum capacity to 1 ACU, you can keep your Aurora Serverless DB cluster running at a lower cost.

AWS Serverless Application Repository

The AWS Serverless Application Repository is now available in 17 regions with the addition of the AWS GovCloud (US-West) region.

Region support includes Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US West (N. California, Oregon), and US East (N. Virginia, Ohio).

Amazon Cognito

Amazon Cognito has launched a new API – AdminSetUserPassword – for the Cognito User Pool service that provides a way for administrators to set temporary or permanent passwords for their end users. This functionality is available for end users even when their verified phone or email are unavailable.

Serverless Posts

April

May

June

Events

Events this quarter

Senior Developer Advocates for AWS Serverless spoke at several conferences this quarter. Here are some recordings worth watching!

Tech Talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the ones from Q2.

Twitch

Twitch Series

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

Build with Serverless on Twitch

Serverless expert and AWS Specialist Solutions architect, Heitor Lessa, has been hosting a weekly Twitch series since April. Join him and others as they build an end-to-end airline booking solution using serverless. The final episode airs on August 7th at Wednesday 8:00am PT.

Here’s a recap of the last quarter:

AWS re:Invent

AWS re:Invent 2019

AWS re:Invent 2019 is around the corner! From December 2 – 6 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. Be sure to take a look at the growing catalog of serverless sessions this year.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft

AWS Serverless Heroes

We urge you to explore the efforts of our AWS Serverless Heroes Community. This is a worldwide network of AWS Serverless experts with a diverse background of experience. For example, check out this post from last month where Marcia Villalba demonstrates how to set up unit tests for serverless applications.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

from AWS Compute Blog

Why AWS is the best place for your Windows workloads, and how Microsoft is changing their licensing to try to awkwardly force you into Azure

Why AWS is the best place for your Windows workloads, and how Microsoft is changing their licensing to try to awkwardly force you into Azure

This post is contributed by Sandy Carter, Vice President at AWS. It is also located on LinkedIn

Many companies today are considering how to migrate to the cloud to take advantage of the agility and innovation that the cloud brings. Having the right to choose the best provider for your business is critical.

AWS is the best cloud for running your Windows workloads and our experience running Windows applications has earned our customers’ trust. It’s been more than 11 years since AWS first made it possible for customers to run their Windows workloads on AWS—longer than Azure has even been around, and according to a report by IDC, we host nearly two times as many Windows Server instances in the cloud as Microsoft. And more and more enterprises are entrusting their Windows workloads to AWS because of its greater reliability, higher performance, and lower cost, with the number of AWS enterprise customers using AWS for Windows Server growing more than 400% in the past three years.

In fact, we are seeing a trend of customers moving from Azure to AWS. eMarketer started their digital transformation with Azure, but found performance challenges and higher costs that led them to migrate all of their workloads over to AWS. Why did they migrate? They found a better experience, stronger support, higher availability, and better performance, with 4x faster launch times and 35% lower costs compared to Azure. Ancestry, a leader in consumer genomics, went all-in on development in the cloud moving 10 PB data and 400 Windows-based applications in less than 9 months. They also modernized to Linux with .NET Core and leveraged advanced technologies including serverless and containers. With results like that, you can see why organizations like Sysco, Edwards Life Sciences, Expedia, and NextGen Healthcare have chosen AWS to upgrade, migrate, and modernize their Windows workloads

If you are interested in seeing your cost savings over running on-premises or over running on Azure,  send us an email at [email protected] or visit why AWS is the best cloud for Windows.

from AWS Compute Blog

Optimizing Amazon ECS task density using awsvpc network mode

Optimizing Amazon ECS task density using awsvpc network mode

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.

 

Overview

To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.

 

Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.

 

Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.

 

Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.

 

Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
{
    "Role": {
        "Path": "/aws-service-role/ecs.amazonaws.com/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ecs.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600
    }
}

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

For more information, see Using Service-Linked Roles for Amazon ECS.

 

Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.

 

Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.

 

Summary

If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

from AWS Compute Blog

Using AWS App Mesh with Fargate

Using AWS App Mesh with Fargate

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

from AWS Compute Blog

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

This post is contributed by Robin Salkeld, Sr. Software Development Engineer

Amazon SQS is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. Asynchronous workflows have always been the primary use case for SQS. Using queues ensures one component can keep running smoothly without losing data when another component is unavailable or slow.

We were surprised, then, to discover that many customers use SQS in synchronous workflows. For example, many applications use queues to communicate between frontends and backends when processing a login request from a user.

Why would anyone use SQS for this? The service stores messages for up to 14 days with high durability, but messages in a synchronous workflow often must be processed within a few minutes, or even seconds. Why not just set up an HTTPS endpoint?

The more we talked to customers, the more we understood. Here’s what we learned:

  • Creating a queue is often easier and faster than creating an HTTPS endpoint and the infrastructure necessary to ensure the endpoint’s scalability.
  • Queues are safe by default because they are locked down to the AWS account that created them. In addition, any DDoS attempt on your service is absorbed by SQS instead of loading down your own servers.
  • There is generally no need to create firewall rules for the communication between microservices if they use queues.
  • Although SQS provides durable storage (which isn’t necessary for short-lived messages), it is still a cost-effective solution for this use case. This is especially true when you consider all the messaging broker maintenance that is done for you.

However, setting up efficient two-way communication through one-way queues requires some non-trivial client-side code. In our previous two-part post series on implementing enterprise integration patterns with AWS messaging services, Point-to-point channels and Publish-subscribe channels, we discussed the Request-Response Messaging Pattern. In this pattern, each requester creates a temporary destination to receive each response message.

The simplest approach is to create a new queue for each response, but this is like building a road just so a single car can drive on it before tearing it down. Technically, this can work (and SQS can create and delete queues quickly), but we can definitely make it faster and cheaper.

To better support short-lived, lightweight messaging destinations, we are pleased to present the Amazon SQS Temporary Queue Client. This client makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill.

Virtual queues

The key concept behind the client is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS and no costs associated with creating a virtual queue.

The Temporary Queue Client includes the AmazonSQSVirtualQueuesClient class for creating and managing virtual queues. This class implements the AmazonSQS interface and adds support for attributes related to virtual queues. You can create a virtual queue using this client by calling the CreateQueue API action and including the HostQueueURL queue attribute. This attribute specifies the existing SQS queue on which to host the virtual queue. The queue URL for a virtual queue is in the form <host queue URL>#<virtual queue name>. For example:

https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue#MyVirtualQueueName

When you call the SendMessage or SendMessageBatch API actions on AmazonSQSVirtualQueuesClient with a virtual queue URL, the client first extracts the virtual queue name. It then attaches this name as an additional message attribute to each message, and sends the messages to the host queue. When you call the ReceiveMessage API action on a virtual queue, the calling thread waits for messages to appear in the in-memory buffer for the virtual queue. Meanwhile, a background thread polls the host queue and dispatches messages to these buffers according to the additional message attribute.

This mechanism is similar to how the AmazonSQSBufferedAsyncClient prefetches messages, and the benefits are similar. A single call to SQS can provide messages for up to 10 virtual queues, reducing the API calls that you pay for by up to a factor of ten. Deleting a virtual queue simply removes the client-side resources used to implement them, again without making API calls to SQS.

The diagram below illustrates the end-to-end process for sending messages through virtual queues:

Sending messages through virtual queues

Virtual queues are similar to virtual machines. Just as a virtual machine provides the same experience as a physical machine, a virtual queue divides the resources of a single SQS queue into smaller logical queues. This is ideal for temporary queues, since they frequently only receive a handful of messages in their lifetime. Virtual queues are currently implemented entirely within the Temporary Queue Client, but additional support and optimizations might be added to SQS itself in the future.

In most cases, you don’t have to manage virtual queues yourself. The library also includes the AmazonSQSTemporaryQueuesClient class. This class automatically creates virtual queues when the CreateQueue API action is called and creates host queues on demand for all queues with the same queue attributes. To optimize existing application code that creates and deletes queues, you can use this class as a drop-in replacement implementation of the AmazonSQS interface.

The client also includes the AmazonSQSRequester and AmazonSQSResponder interfaces, which enable two-way communication through SQS queues. The following is an example of an RPC implementation for a web application’s login process.

/**
 * This class handles a user's login request on the client side.
 */
public class LoginClient {

    // The SQS queue to send the requests to.
    private final String requestQueueUrl;

    // The AmazonSQSRequester creates a temporary queue for each response.
    private final AmazonSQSRequester sqsRequester = AmazonSQSRequesterClientBuilder.defaultClient();

    private final LoginClient(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Send a login request to the server.
     */
    public String login(String body) throws TimeoutException {
        SendMessageRequest request = new SendMessageRequest()
                .withMessageBody(body)
                .withQueueUrl(requestQueueUrl);

        // This:
        //  - creates a temporary queue
        //  - attaches its URL as an attribute on the message
        //  - sends the message
        //  - receives the response from the temporary queue
        //  - deletes the temporary queue
        //  - returns the response
        //
        // If something goes wrong and the server's response never shows up, this method throws a
        // TimeoutException.
        Message response = sqsRequester.sendMessageAndGetResponse(request, 20, TimeUnit.SECONDS);
        
        return response.getBody();
    }
}

/**
 * This class processes users' login requests on the server side.
 */
public class LoginServer {

    // The SQS queue to poll for login requests.
    // Assume that on construction a thread is created to poll this queue and call
    // handleLoginRequest() below for each message.
    private final String requestQueueUrl;

    // The AmazonSQSResponder sends responses to the correct response destination.
    private final AmazonSQSResponder sqsResponder = AmazonSQSResponderClientBuilder.defaultClient();

    private final AmazonSQS(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Handle a login request sent from the client above.
     */
    public void handleLoginRequest(Message message) {
        // Assume doLogin does the actual work, and returns a serialized result
        String response = doLogin(message.getBody());

        // This extracts the URL of the temporary queue from the message attribute and sends the
        // response to that queue.
        sqsResponder.sendResponseMessage(MessageContent.fromMessage(message), new MessageContent(response));  
    }
}

Cleaning up

As with any other AWS SDK client, your code should call the shutdown() method when the temporary queue client is no longer needed. The AmazonSQSRequester interface also provides a shutdown() method, which shuts down its internal temporary queue client. This ensures that the in-memory resources needed for any virtual queues are reclaimed, and that the host queue that the client automatically created is also deleted automatically.

However, in the world of distributed systems things are a little more complex. Processes can run out of memory and crash, and hosts can reboot suddenly and unexpectedly. There are even cases where you don’t have the opportunity to run custom code on shutdown.

The Temporary Queue Client client addresses this issue as well. For each host queue with recent API calls, the client periodically uses the TagQueue API action to attach a fresh tag value that indicates the queue is still being used. The tagging process serves as a heartbeat to keep the queue alive. According to a configurable time period (by default, 5 minutes), a background thread uses the ListQueues API action to obtain the URLs of all queues with the configured prefix. Then, it deletes each queue that has not been tagged recently. The mechanism is similar to how the Amazon DynamoDB Lock Client expires stale lock leases.

If you use the AmazonSQSTemporaryQueuesClient directly, you can customize how long queues must be idle before they is deleted by configuring the IdleQueueRetentionPeriodSeconds queue attribute. The client supports setting this attribute on both host queues and virtual queues. For virtual queues, setting this attribute ensures that the in-memory resources do not become a memory leak in the presence of application bugs.

Any API call to a queue marks it as non-idle, including ReceiveMessage calls that don’t return any messages. The only reason to increase the retention period attribute is to give the client more time when it can’t send heartbeats—for example, due to garbage collection pauses or networking issues.

But what if you want to use this client in a fleet of a thousand EC2 instances? Won’t every client spend a lot of time checking every queue for idleness? Doesn’t that imply duplicate work that increases as the fleet is scaled up?

We thought of this too. The Temporary Queue Client creates a shared queue for all clients using the same queue prefix, and uses this queue as a work queue for the distributed task. Instead of every client calling the ListQueues API action every five minutes, a new seed message (which triggers the sweeping process) is sent to this queue every five minutes.

When one of the clients receives this message, it calls the ListQueues API action and sends each queue URL in the result as another kind of message to the same shared work queue. The work of actually checking each queue for idleness is distributed roughly evenly to the active clients, ensuring scalability. There is even a mechanism that works around the fact that the ListQueues API action currently only returns no more than 1,000 queue URLs at time.

Summary

We are excited about how the new Amazon SQS Temporary Queue Client makes more messaging patterns easier and cheaper for you. Download the code from GitHub, have a look at Temporary Queues in the Amazon SQS Developer Guide, try out the client, and let us know what you think!

from AWS Compute Blog

Deploying an Nginx-based HTTP/HTTPS load balancer with Amazon Lightsail

Deploying an Nginx-based HTTP/HTTPS load balancer with Amazon Lightsail

In this post, I discuss how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I define load balancers and explain their value. Then, I briefly weigh the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I cover how to set up a NGINX-based load balancer inside of a Lightsail instance.

If you feel like you already understand what load balancers are, and the pros and cons of self-hosted load balancers vs. managed services, feel free to skip ahead to the deployment section.

What is a load balancer?

Although load balancers offer many different functionalities, for the sake of this discussion, I focus on one main task: A load balancer accepts your users’ traffic and routes it to the right server.

For example, if I assign the load balancer the DNS name: www.example.com, anyone visiting the site first encounters the load balancer. The load balancer then routes the request to one of the backend servers: web-1, web-2, or web-3. The backend servers then respond to the requestor.

A load balancer provides multiple benefits to your application or website. Here are a few key advantages:

  • Redundancy
  • Publicly available IP addresses
  • Horizontally scaled application capacity

Redundancy

Load balancers usually front at least two backend servers. Using something called a health check, they ensure that these servers are online and available to service user requests. If a backend server goes out of service, the load balancer stops routing traffic to that instance. By running multiple servers, you help to ensure that at least one is always available to respond to incoming traffic. As a result, your users aren’t bogged down by errors from a downed machine.

Publicly available IP addresses

Without a load balancer, a server requires a unique IP address to accept an incoming request via the internet. There are a finite number of these IP addresses, and most cloud providers limit the number of statically assigned public IP addresses that you can have.

By using a load balancer, you can task a single public IP address with servicing multiple backend servers. Later in this post, I return to this topic as I discuss configuring a load balancer.

Horizontally scaled application capacity

As your application or website becomes more popular, its performance may degrade. Adding additional capacity can be as easy as spinning up a new instance and placing it behind your load balancer. If demand drops, you can spin down any unneeded instances to save money.

Horizontal scaling means the deployment of additional identically configured servers to handle increased load. Vertical scaling means the deployment of a more powerful server to handle increased load. If you deploy an underpowered server, expect poor performance, whether you have a single server or ten.

Self-managed load balancer vs. a managed service

Now that you have a better understanding of load balancers and the key benefits that they provide, the next question is: How can you get one into your environment?

On one hand, you could spin up a Lightsail load balancer. These load balancers are all managed by AWS and don’t require any patching or maintenance on your part to stay up-to-date. You only need to name your load balancer and pick instances to service. Your load balancer is then up and running. If you’re so inclined, you can also get a free SSL (secure socket layer) certificate with a few extra clicks.

Lightsail load balancers deploy easily and require essentially no maintenance after they’re operational, for $18 per month (at publication time). Lightsail load balancer design prioritizes easy installation and maintenance. As a result, they lack some advanced configuration settings found with other models.

Consequently, you might prefer to configure your load balancer if you prioritize configuration flexibility or cost reduction. A self-hosted load balancer provides access to many advanced features, and your only hard cost is the instance price.

The downsides of self-hosting are that you are also responsible for the following:

  • Installing the load balancer software.
  • Keeping the software (and the host operating system) updated and secure.

Deploying a NGINX-based load balancer with Lightsail

Although many software-based load balancers are available, I recommend building a solution on NGINX because this wildly popular tool:

  • Is open source/free.
  • Offers great community support.
  • Has a custom Lightsail blueprint.

Prerequisites

This tutorial assumes that you already have your backend servers deployed. These servers should all:

  • Be identically configured.
  • Point to a central backend database.

In other words, the target servers shouldn’t each have database copies installed.

To deploy a centralized database, see Lightsail’s managed database offering.

Because I’ve tailored these instructions to generic website and web app hosting, they may not work with specific applications such as WordPress.

Required prerequisites

Before installing an optional SSL certificate, you need to have the following:

  • A purchased domain name.
  • Permissions to update the DNS servers for that domain.

Optional prerequisites

Although not required, the following prerequisites may also be helpful:

  • Familiarity with accessing instances via SSH.
  • Using basic LINUX commands. 

Deploy an NGINX instance

To begin, deploy an NGINX instance in Lightsail, choosing the NGINX blueprint. Make sure that you are deploying it into the same Region as the servers to load balance.

Choose an appropriate instance size for your application, being aware of the amount of RAM and CPU, as well as the data transfer. If you choose an undersized instance, you can always scale it up via a snapshot. However, an oversized instance cannot as easily be scaled down. You may need to rebuild the load balancer on a smaller-sized instance from scratch.

Configure HTTP load balancing

In the following steps, edit the NGINX configuration file to load balance HTTP requests to the appropriate servers.

First, start up an SSH session with your new NGINX instance and change into the appropriate configuration directory:

cd /opt/bitnami/nginx/conf/bitnami/

Make sure that you have the IP addresses for the servers to load balance. In most cases, traffic shouldn’t flow from your load balancer to your instances across the internet. So, make sure to use the instance’s private IP address. You can find this information on the instance management page, in the Lightsail console.

In this example, my three servers have the following private IP addresses:

  • 192.0.2.67
  • 192.0.2.206
  • 192.0.2.198

The configuration file to edit is named bitnami.conf. Open it using your preferred text editor (use sudo to edit the file):

sudo vi bitnami.conf

Clear the contents of the file and add the following code, making sure to substitute the private IP addresses of the servers to load balance:

# Define Pool of servers to load balance upstream webservers { 
server 192.0.2.67 max_fails=3 fail_timeout=30s; 
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}

In the code, you used the keyword upstream to define a pool (named webservers) of three servers to which NGINX should route traffic. If you don’t specify how NGINX should route each request, it defaults to round-robin server routing. Two other routing methods are available:

  • Least connected, which routes to the server with the fewest number of active connections.
  • IP hash, which uses a hashing function to enable sticky sessions (otherwise called session persistence).

Discussion on these methods is out of scope for this post. For more information, see Using nginx as HTTP load balancer.

Additionally, I recommend max_fails and fail_timeout to define health checks. Based on the configuration above, NGINX marks a server as down if it fails to respond or responds with an error three times in 30 seconds. If a server is marked down, NGINX continues to probe every 30 seconds. If it receives a positive response, it marks the server as live.

After the code you just inserted to the file, add the following:

# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
   proxy_pass http://webservers;
   }
}

This code tells NGINX to listen for requests on port 80, the default port for unencrypted web (HTTP) traffic and forward such requests to one of the servers in the webservers group defined by the upstream keyword.

Save the file and quit back to your command prompt.

For the changes to take effect, restart the NGINX service using the Bitnami control script:

sudo /opt/bitnami/ctlscript.sh restart nginx

At this point, you should be able to visit the IP address of your NGINX instance in your web browser. The load balancer then routes the request to one of the servers defined in your webservers group.

For reference, here’s the full bitnami.conf file.

# Define Pool of servers to load balance
upstream webservers {
server 192.0.2.67 max_fails=3 fail_timeout=30s;
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}
# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
proxy_pass http://webservers;
}
}

Configure HTTPS load balancing

Configuring your load balancer to use SSL requires three steps:

  1. Ensure that you have a domain record for your NGINX load balancer instance.
  2. Obtain and install a certificate.
  3. Update the NGINX configuration file.

If you have not already done so, assign your NGINX instance an entry with your DNS provider. Remember, the load balancer is the address your users use to reach your site. For instance, it might be appropriate to create a record that points http://www.yourdomain.com/ at your NGINX load balancer. If you need help configuring the DNS in Lightsail, see DNS in Amazon Lightsail.

Similarly, to configure your NGINX instance to use a free SSL certificate from Let’s Encrypt, follow steps 1–7 in Tutorial: Using Let’s Encrypt SSL certificates with your Nginx instance in Amazon Lightsail. You handle step 8 later in this post,

After you configure NGINX to use the SSL certificate and update your DNS, you must modify the configuration file to allow for HTTPS traffic.

Again, use a text editor to open the bitnami.conf file:

sudo vi bitnami.conf

Add the following code to the bottom of the file:

server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     }
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;
}

This code closely resembles the HTTP code added previously. However, in this case, the code tells NGINX to accept SSL connections on the secure port 443 (HTTPS) and forward them to one of your web servers. The rest of the commands instruct NGINX on where to locate SSL certificates, as well as setting various SSL parameters.

Here again, restart the NGINX service:

sudo /opt/bitnami/ctlscript.sh restart nginx

Optional steps

At this point, you should be able to access your website using both HTTP and HTTPS. However, there are a couple of optional steps to consider, including:

  • Shutting off direct HTTP/HTTPS access to your web servers.
  • Automatically redirecting incoming load balancer HTTP requests to HTTPS.

It’s probably not a great idea to allow people to access your load-balanced servers directly. Fortunately, you can easily restrict access:

  1. Navigate to each instance’s management page in the Lightsail console.
  2. Choose Networking.
  3. Remove the HTTP and HTTPS (if enabled) firewall port entries.

This restriction shuts down access via the internet while still allowing communications between the load balancer and the servers over the private AWS network.

In many cases, there’s no good reason to allow access to a website or web app over unencrypted HTTP. However, the load balancer configuration described to this point still accepts HTTP requests. To automatically reroute requests from HTTP to HTTPS, make one small change to the configuration file:

  1. Edit the conf file.
  2. Find this code:
server {
listen 80; location / {
proxy_pass http://webservers;
}
}
  1. Replace it with this code:
server {
listen 80;
return 301 https://$host$request_uri;
}

The replacement code instructs NGINX to respond to HTTP requests with a “page has been permanently redirected” message and a citation of the new page address. The new address is simply requested one, only accessed over HTTPS instead of HTTP.

For this change to take effect, you must restart NGINX:

sudo /opt/bitnami/ctlscript.sh restart nginx

For reference, this is what the final bitnami.conf file looks like:

# Define the pool of servers to load balance
upstream webservers {
server 192.0.2.67 max_fails=3 fail_timeout=30s;
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}
# Redirect traffic on port 80 to use HTTPS
server {
listen 80;
return 301 https://$host$request_uri;
}
# Forward traffic on port 443 to one of the servers in the web servers group
server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     }
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;
}

Conclusion

This post explained how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I defined load balancers and their utility. I weighed the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I walked you through how to set up a NGINX-based load balancer inside of a Lightsail instance.

Thanks for reading this post. If you have any questions, feel free to contact me on Twitter, @mikegcoleman or visit the Amazon Lightsail forums.

from AWS Compute Blog

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Scaling Kubernetes deployments with Amazon CloudWatch metrics

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

from AWS Compute Blog

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

This post is contributed by Taka Matsumoto, Cloud Support Engineer

AWS Directory Service lets you run Microsoft Active Directory as a managed service. Directory Service for Microsoft Active Directory, also referred to as AWS Managed Microsoft AD, is powered by Microsoft Windows Server 2012 R2. It manages users and makes it easy to integrate with compatible AWS services and other applications. Using the log forwarding feature, you can stay aware of all security events in Amazon CloudWatch Logs. This helps monitor events like the addition of a new user.

When new users are created in your AWS Managed Microsoft AD, you might go through the initial setup workflow manually. However, AWS Step Functions can coordinate new user creation activities into serverless workflows that automate the process. With Step Functions, AWS Lambda can be also used to run code for the automation workflows without provisioning or managing servers.

In this post, I show how to create and trigger a new user creation workflow in Step Functions. This workflow creates a WorkSpace in Amazon WorkSpaces and a user in Amazon Connect using AWS Managed Microsoft AD, Step Functions, Lambda, and Amazon CloudWatch Logs.

Overview

The following diagram shows the solution graphically.

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Walkthrough

Using the following procedures, create an automated user creation workflow with AWS Managed Microsoft AD. The solution requires the creation of new resources in CloudWatch, Lambda, and Step Functions, and a new user in Amazon WorkSpaces and Amazon Connect. Here’s the list of steps:

  1. Enable log forwarding.
  2. Create the Lambda functions.
  3. Set up log streaming.
  4. Create a state machine in Step Functions.
  5. Test the solution.

Requirements

To follow along, you need the following resources:

  • AWS Managed Microsoft AD
    • Must be registered with Amazon WorkSpaces
    • Must be registered with Amazon Connect

In this example, you use an Amazon Connect instance with SAML 2.0-based authentication as identity management. For more information, see Configure SAML for Identity Management in Amazon Connect.

Enable log forwarding

Enable log forwarding for your AWS Managed Microsoft AD.  Use /aws/directoryservice/<directory id> for the CloudWatch log group name. You will use this log group name when creating a Log Streaming in Step 3.

Create Lambda functions

Create two Lambda functions. The first starts a Step Functions execution with CloudWatch Logs. The second performs a user registration process with Amazon WorkSpaces and Amazon Connect within a Step Functions execution.

Create the first function with the following settings:

  • Name: DS-Log-Stream-Function
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: stateMachineArn
    • Value: arn:aws:states:<Region>:<AccountId>:stateMachine:NewUserWorkFlow
  • IAM role with the following permissions:
    • AWSLambdaBasicExecutionRole
    • The following permissions policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "states:StartExecution",
            "Resource": "*"
        }
    ]
}
import base64
import boto3
import gzip
import json
import re
import os
def lambda_handler(event, context):
    logEvents = DecodeCWPayload(event)
    print('Event payload:', logEvents)
    returnResultDict = []
    
    # Because there can be more than one message pushed in a single payload, use a for loop to start a workflow for every user
    for logevent in logEvents:
        logMessage = logevent['message']
        upnMessage =  re.search("(<Data Name='UserPrincipalName'>)(.*?)(<\/Data>)",logMessage)
        if upnMessage != None:
            upn = upnMessage.group(2).lower()
            userNameAndDomain = upn.split('@')
            userName = userNameAndDomain[0].lower()
            userNameAndDomain = upn.split('@')
            domainName = userNameAndDomain[1].lower()
            sfnInputDict = {'Username': userName, 'UPN': upn, 'DomainName': domainName}
            sfnResponse = StartSFNExecution(json.dumps(sfnInputDict))
            print('Username:',upn)
            print('Execution ARN:', sfnResponse['executionArn'])
            print('Execution start time:', sfnResponse['startDate'])
            returnResultDict.append({'Username': upn, 'ExectionArn': sfnResponse['executionArn'], 'Time': str(sfnResponse['startDate'])})

    returnObject = {'Result':returnResultDict}
    return {
        'statusCode': 200,
        'body': json.dumps(returnObject)
    }

# Helper function decode the payload
def DecodeCWPayload(payload):
    # CloudWatch Log Stream event 
    cloudWatchLog = payload['awslogs']['data']
    # Base 64 decode the log 
    base64DecodedValue = base64.b64decode(cloudWatchLog)
    # Uncompress the gzipped decoded value
    gunzipValue = gzip.decompress(base64DecodedValue)
    dictPayload = json.loads(gunzipValue)
    decodedLogEvents = dictPayload['logEvents']
    return decodedLogEvents

# Step Functions state machine execution function
def StartSFNExecution(sfnInput):
    sfnClient = boto3.client('stepfunctions')
    try:
        response = sfnClient.start_execution(
            stateMachineArn=os.environ['stateMachineArn'],
            input=sfnInput
        )
        return response
    except Exception as e:
        return e

For the other function used to perform a user creation task, use the following settings:

  • Name: SFN-New-User-Flow
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: nameDelimiter
    • Value: . [period]

This delimiter is used to split the username into a first name and last name, as Amazon Connect instances with SAML-based authentication require both a first name and last name for users. For more information, see CreateUser API and UserIdentity Info.

  • Key: bundleId
  • Value: <WorkSpaces bundle ID>

Run the following AWS CLI command to return Amazon-owned WorkSpaces bundles. Use one of the bundle IDs for the key-value pair.

aws workspaces describe-workspace-bundles –owner AMAZON

  • Key: directoryId
  • Value: <WorkSpaces directory ID>

Run the following AWS CLI command to return Amazon WorkSpaces directories. Use your directory ID for the key-value pair.

aws workspaces describe-workspace-directories

  • Key: instanceId
  • Value: <Amazon Connect instance ID>

Find the Amazon Connect instance ID the Amazon Connect instance ID.

  • Key: routingProfile
  • Value: <Amazon Connect routing profile>

Run the following AWS CLI command to list routing profiles with their IDs. For this walkthrough, use the ID for the basic routing profile.

aws connect list-routing-profiles –instance-id <instance id>

  • Key: securityProfile
  • Value: <Amazon Connect security profile>

Run the following AWS CLI command to list security profiles with their IDs. For this walkthrough, use the ID for an agent security profile.

aws connect list-security-profiles –instance-id  <instance id>

  • IAM role permissions:
    • AWSLambdaBasicExecutionRole

The following permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "connect:CreateUser",
                "workspaces:CreateWorkspaces"
            ],
            "Resource": "*"
        }
    ]
}
import json
import os
import boto3

def lambda_handler(event, context):
    userName = event['input']['User']
    nameDelimiter = os.environ['nameDelimiter']
    if nameDelimiter in userName:
        firstName = userName.split(nameDelimiter)[0]
        lastName = userName.split(nameDelimiter)[1]
    else:
        firstName = userName
        lastName = userName
    domainName = event['input']['Domain']
    upn = event['input']['UPN']
    serviceName = event['input']['Service']
    if serviceName == 'WorkSpaces':
        # Setting WorkSpaces variables
        workspacesDirectoryId = os.environ['directoryId']
        workspacesUsername = upn
        workspacesBundleId = os.environ['bundleId']
        createNewWorkSpace = create_new_workspace(
            directoryId=workspacesDirectoryId,
            username=workspacesUsername,
            bundleId=workspacesBundleId
        )
        return createNewWorkSpace
    elif serviceName == 'Connect':
        createConnectUser = create_connect_user(
            connectUsername=upn,
            connectFirstName=firstName,
            connectLastName=lastName,
            securityProfile=os.environ['securityProfile'], 
            routingProfile=os.environ['routingProfile'], 
            instanceId=os.environ['instanceId']
        )
        return createConnectUser
    else:
        print(serviceName, 'is not recognized...')
        print('Available service names are WorkSpaces and Connect')
        unknownServiceException = {
            'statusCode': 500,
            'body': json.dumps(f'Service name, {serviceName}, is not recognized')}
        raise Exception(unknownServiceException)

class FailedWorkSpaceCreationException(Exception):
    pass

class WorkSpaceResourceExists(Exception):
    pass

def create_new_workspace(directoryId, username, bundleId):
    workspacesClient = boto3.client('workspaces')
    response = workspacesClient.create_workspaces(
        Workspaces=[{
                'DirectoryId': directoryId,
                'UserName': username,
                'BundleId': bundleId,
                'WorkspaceProperties': {
                    'RunningMode': 'AUTO_STOP',
                    'RunningModeAutoStopTimeoutInMinutes': 60,
                    'RootVolumeSizeGib': 80,
                    'UserVolumeSizeGib': 100,
                    'ComputeTypeName': 'VALUE'
                    }}]
                    )
    print('create_workspaces response:',response)
    for pendingRequest in response['PendingRequests']:
        if pendingRequest['UserName'] == username:
            workspacesResultObject = {'UserName':username, 'ServiceName':'WorkSpaces', 'Status': 'Success'}
            return {
                'statusCode': 200,
                'body': json.dumps(workspacesResultObject)
                }
    for failedRequest in response['FailedRequests']:
        if failedRequest['WorkspaceRequest']['UserName'] == username:
            errorCode = failedRequest['ErrorCode']
            errorMessage = failedRequest['ErrorMessage']
            errorResponse = {'Error Code:', errorCode, 'Error Message:', errorMessage}
            if errorCode == "ResourceExists.WorkSpace": 
                raise WorkSpaceResourceExists(str(errorResponse))
            else:
                raise FailedWorkSpaceCreationException(str(errorResponse))
                
def create_connect_user(connectUsername, connectFirstName,connectLastName,securityProfile,routingProfile,instanceId):
    connectClient = boto3.client('connect')
    response = connectClient.create_user(
                    Username=connectUsername,
                    IdentityInfo={
                        'FirstName': connectFirstName,
                        'LastName': connectLastName
                        },
                    PhoneConfig={
                        'PhoneType': 'SOFT_PHONE',
                        'AutoAccept': False,
                        },
                    SecurityProfileIds=[
                        securityProfile,
                        ],
                    RoutingProfileId=routingProfile,
                    InstanceId = instanceId
                    )
    connectSuccessResultObject = {'UserName':connectUsername,'ServiceName':'Connect','FirstName': connectFirstName, 'LastName': connectLastName,'Status': 'Success'}
    return {
        'statusCode': 200,
        'body': json.dumps(connectSuccessResultObject)
        }

Set up log streaming

Create a new CloudWatch Logs subscription filter that sends log data to the Lambda function DS-Log-Stream-Function created in Step 2.

  1. In the CloudWatch console, choose Logs, Log Groups, and select the log group, /aws/directoryservice/<directory id>, for the directory set up in Step 1.
  2. Choose Actions, Stream to AWS Lambda.
  3. Choose Destination, and select the Lambda function DS-Log-Stream-Function.
  4. For Log format, choose Other as the log format and enter “<EventID>4720</EventID>” (include the double quotes).
  5. Choose Start streaming.

If there is an existing subscription filter for the log group, run the following AWS CLI command to create a subscription filter for the Lambda function, DS-Log-Stream-Function.

aws logs put-subscription-filter \

--log-group-name /aws/directoryservice/<directoryid> \

--filter-name NewUser \

--filter-pattern "<EventID>4720</EventID>" \

--destination-arn arn:aws:lambda:<Region>:<ACCOUNT_NUMBER>:function:DS-Log-Stream-Function

For more information, see Using CloudWatch Logs Subscription Filters.

Create a state machine in Step Functions

The next step is to create a state machine in Step Functions. This state machine runs the Lambda function, SFN-New-User-Flow, to create a user in Amazon WorkSpaces and Amazon Connect.

Define the state machine, using the following settings:

  • Name: NewUserWorkFlow
  • State machine definition: Copy the following state machine definition:
{
    "Comment": "An example state machine for a new user creation workflow",
    "StartAt": "Parallel",
    "States": {
        "Parallel": {
            "Type": "Parallel",
            "End": true,
            "Branches": [
                {
                    "StartAt": "CreateWorkSpace",
                    "States": {
                        "CreateWorkSpace": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "WorkSpaces"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                }
                            ],
                            "End": true
                        },
                        "WorkSpacesPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.workspacesResult"
                            },
                            "End": true
                        }
                    }
                },
                {
                    "StartAt": "CreateConnectUser",
                    "States": {
                        "CreateConnectUser": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "Connect"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                }
                            ],
                            "End": true,
                            "ResultPath": "$.connectResult"
                        },
                        "ConnectPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.connectResult"
                            },
                            "End": true
                        }
                    }
                }
            ]
        }
    }
}

After entering the name and state machine definition, choose Next.

Configure the settings by choosing Create an IAM role for me. This creates an IAM role for the state machine to run the Lambda function SFN-New-User-Flow.

Here’s the list of states in the NewUserWorkFlow state machine definition:

  • Start—When the state machine starts, it creates a parallel state to start both the CreateWorkSpace and CreateConnectUser states.
  • CreateWorkSpace—This task state runs the SFN-New-User-Flow Lambda function to create a new WorkSpace for the user. If this is successful, it goes to the End state.
  • WorkSpacesPassState—This pass state returns the result from the CreateWorkSpace state.
  • CreateConnectUse — This task state runs the SFN-New-User-Flow Lambda function to create a user in Amazon Connect. If this is successful, it goes to the End state.
  • ConnectPassState—This pass state returns the result from the CreateWorkSpace state.
  • End

The following diagram shows how these states relate to each other.

Step Functions State Machine

Test the solution

It’s time to test the solution. Create a user in AWS Managed Microsoft AD. The new user should have the following attributes:

  • First name: SFNFirst
  • Last name: SFNLast
  • Email: [email protected]
  • Username: SFNFirst.SFNLast

This starts a new state machine execution in Step Functions. Here’s the flow:

  1. When there is a user creation event (Event ID: 4720) in the AWS Managed Microsoft AD security log, CloudWatch invokes the Lambda function, DS-Log-Stream-Function, to start a new state machine execution in Step Functions.
  2. To create a new WorkSpace and create a user in the Amazon Connect instance, the state machine execution runs tasks to invoke the other Lambda function, SFN-New-User-Flow.

Conclusion

This solution automates the initial user registration workflow. Step Functions provides the flexibility to customize the workflow to meet your needs. This walkthrough included Amazon WorkSpaces and Amazon Connect; both services are used to register the new user. For organizations that create a number of new users on a regular basis, this new user automation workflow can save time when configuring resources for a new user.

The event source of the automation workflow can be any event that triggers the new user workflow, so the event source isn’t limited to CloudWatch Logs. Also, the integrated service used for new user registration can be any AWS service that offers API and works with AWS Managed Microsoft AD. Other programmatically accessible services within or outside AWS can also fill that role.

In this post, I showed you how serverless workflows can streamline and coordinate user creation activities. Step Functions provides this functionality, with the help of Lambda, Amazon WorkSpaces, AWS Managed Microsoft AD, and Amazon Connect. Together, these services offer increased power and functionality when managing users, monitoring security, and integrating with compatible AWS services.

from AWS Compute Blog

New: Using Amazon EC2 Instance Connect for SSH access to your EC2 Instances

New: Using Amazon EC2 Instance Connect for SSH access to your EC2 Instances

This post is courtesy of Saloni Sonpal – Senior Product Manager – Amazon EC2

Today, AWS is introducing Amazon EC2 Instance Connect, a new way to control SSH access to your EC2 instances using AWS Identity and Access Management (IAM).

About Amazon EC2 Instance Connect

While infrastructure as code (IaC) tools such as Chef and Puppet have become customary in the industry for configuring servers, you occasionally must access your instances to fine-tune, consult system logs, or debug application issues. The most common tool to connect to Linux servers is Secure Shell (SSH). It was created in 1995 and is now installed by default on almost every Linux distribution.

When connecting to hosts via SSH, SSH key pairs are often used to individually authorize users. As a result, organizations have to store, share, manage access for, and maintain these SSH keys.

Some organizations also maintain bastion hosts, which help limit network access into hosts by the use of a single jump point. They provide logging and prevent rogue SSH access by adding an additional layer of network obfuscation. However, running bastion hosts comes with challenges. You maintain the installed user keys, handle rotation, and make sure that the bastion host is always available and, more importantly, secured.

Amazon EC2 Instance Connect simplifies many of these issues and provides the following benefits to help improve your security posture:

  • Centralized access control – You get centralized access control to your EC2 instances on a per-user and per-instance level. IAM policies and principals remove the need to share and manage SSH keys.
  • Short-lived keys – SSH keys are not persisted on the instance, but are ephemeral in nature. They are only accessible by an instance at the time that an authorized user connects, making it easier to grant or revoke access in real time. This also allows you to move away from long-lived keys. Instead, you generate one-time SSH keys each time that an authorized user connects, eliminating the need to track and maintain keys.
  • Auditability – User connections via EC2 Instance Connect are logged to AWS CloudTrail, providing the visibility needed to easily audit connection requests and maintain compliance.
  • Ubiquitous access – EC2 Instance Connect works seamlessly with your existing SSH client. You can also connect to your instances from a new browser-based SSH client in the EC2 console, providing a consistent experience without having to change your workflows or tools.

How EC2 Instance Connect works

When the EC2 Instance Connect feature is enabled on an instance, the SSH daemon (sshd) on that instance is configured with a custom AuthorizedKeysCommand script. This script updates AuthorizedKeysCommand to read SSH public keys from instance metadata during the SSH authentication process, and connects you to the instance.

The SSH public keys are only available for one-time use for 60 seconds in the instance metadata. To connect to the instance successfully, you must connect using SSH within this time window. Because the keys expire, there is no need to track or manage these keys directly, as you did previously.

Configuring an EC2 instance for EC2 Instance Connect

To get started using EC2 Instance Connect, you first configure your existing instances. Currently, EC2 Instance Connect supports Amazon Linux 2 and Ubuntu. Install RPM or Debian packages respectively to enable the feature. New Amazon Linux 2 instances have the EC2 Instance Connect feature enabled by default, so you can connect to those newly launched instances right away using SSH without any further configuration.

First, configure an existing instance. In this case, set up an Amazon Linux 2 instance running in your account. For the steps for Ubuntu, see Set Up EC2 Instance Connect.

  1. Connect to the instance using SSH. The instance is running a relatively recent version of Amazon Linux 2:
    [[email protected] ~]$ uname -srv
    Linux 4.14.104-95.84.amzn2.x86_64 #1 SMP Fri Jun 21 12:40:53 UTC 2019
  2. Use the yum command to install the ec2-instance-connect RPM package.
    $ sudo yum install ec2-instance-connect
    Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
    Resolving Dependencies
    --> Running transaction check
    ---> Package ec2-instance-connect.noarch 0:1.1-9.amzn2 will be installed
    --> Finished Dependency Resolution
    
    ........
    
    Installed:
      ec2-instance-connect.noarch 0:1.1-9.amzn2                                                                                                     
    
    Complete!

    This RPM installs a few scripts locally and changes the AuthorizedKeysCommand and AuthorizedKeysCommandUser configurations in /etc/ssh/sshd_config. If you are using a configuration management tool to manage your sshd configuration, install the package and add the lines as described in the documentation.

With ec2-instance-connect installed, you are ready to set up your users and have them connect to instances.

Set up IAM users

First, allow an IAM user to be able to push their SSH keys up to EC2 Instance Connect. Create a new IAM policy so that you can add it to any other users in your organization.

  1. In the IAM console, choose Policies, Create Policy.
  2. On the Create Policy page, choose JSON and enter the following JSON document. Replace $REGION and $ACCOUNTID with your own values:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ec2-instance-connect:SendSSHPublicKey"
                ],
                "Resource": [
                    "arn:aws:ec2:$REGION:$ACCOUNTID:instance/*"
                ],
                "Condition": {
                    "StringEquals": {
                        "ec2:osuser": "ec2-user"
                    }
                }
            }
        ]
    }

    Use ec2-user as the value for ec2:osuser with Amazon Linux 2. Specify this so that the metadata is made available for the proper SSH user. For more information, see Actions, Resources, and Condition Keys for Amazon EC2 Instance Connect Service.

  3. Choose Review Policy.
  4. On the Review Policy page, name the policy, give it a description, and choose Create Policy.
  5. On the Create Policy page, attach the policy to something by creating a new group (I named mine “HostAdmins”).
  6. On the Attach Policy page, attach the policy that you just created and choose Next Step.
  7. Choose Create Group.
  8. On the Groups page, select your newly created group and choose Group Actions, Add Users to Group.
  9. Select the user or users to add to this group, then choose Add Users.

Your users can now use EC2 Instance Connect.

Connecting to an instance using EC2 Instance Connect

With your instance configured and the users set with the proper policy, connect to your instance with your normal SSH client or directly, using the AWS Management Console.

To offer a seamless SSH experience, EC2 Instance Connect wraps up these steps in a command line tool. It also offers a browser-based interface in the console, which takes care of the SSH key generation and distribution for you.

To connect with your SSH client

  1. Generate the new private and public keys mynew_key and mynew_key.pub, respectively:
    $ ssh-keygen -t rsa -t mynew_key 
  2. Use the following AWS CLI command to authorize the user and push the public key to the instance using the send-ssh-public-key command. To support this, you need the latest version of the AWS CLI.
    $ aws ec2-instance-connect send-ssh-public-key --region us-east-1 --instance-id i-0989ec3292613a4f9 --availability-zone us-east-1f --instance-os-user ec2-user --ssh-public-key file://mynew_key.pub
    {
        "RequestId": "505f8675-710a-11e9-9263-4d440e7745c6", 
        "Success": true
    } 
  3. After authentication, the public key is made available to the instance through the instance metadata for 60 seconds. During this time, connect to the instance using the associated private key:
    $ ssh -i mynew_key [email protected]

If for some reason you don’t connect within that 60-second window, you see the following error:

$ ssh -i mynew_key [email protected]
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

If you do, run the send-ssh-public-key command again to connect using SSH.

Now, connect to your instance from the console.

To connect from the Amazon EC2 console

  1. Open the Amazon EC2 console.
  2. In the left navigation pane, choose Instances and select the instance to which to connect.
  3. Choose Connect.
  4. On the Connect To Your Instance page, choose EC2 Instance Connect (browser-based SSH connection), Connect.

The following terminal window opens and you are now connected through SSH to your instance.

Auditing with CloudTrail

For every connection attempt, you can also view the event details. These include the destination instance ID, OS user name, and public key, all used to make the SSH connection that corresponds to the SendSSHPublicKey API calls in CloudTrail.

In the CloudTrail console, search for SendSSHPublicKey.

If EC2 Instance Connect has been used recently, you should see records of your users having called this API operation to send their SSH key to the target host. Viewing the event’s details shows you the instance and other valuable information that you might want to use for auditing.

In the following example, see the JSON from a CloudTrail event that shows the SendSSHPublicKey command in use:

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "User",
        "principalId": "1234567890",
        "arn": "arn:aws:iam:: 1234567890:$USER",
        "accountId": "1234567890",
        "accessKeyId": "ABCDEFGHIJK3RFIONTQQ",
        "userName": "$ACCOUNT_NAME",
        "sessionContext": {
            "attributes": {
                "mfaAuthenticated": "true",
                "creationDate": "2019-05-07T18:35:18Z"
            }
        }
    },
    "eventTime": "2019-06-21T13:58:32Z",
    "eventSource": "ec2-instance-connect.amazonaws.com",
    "eventName": "SendSSHPublicKey",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "34.204.194.237",
    "userAgent": "aws-cli/1.15.61 Python/2.7.16 Linux/4.14.77-70.59.amzn1.x86_64 botocore/1.10.60",
    "requestParameters": {
        "instanceId": "i-0989ec3292613a4f9",
        "osUser": "ec2-user",
        "SSHKey": {
            "publicKey": "ssh-rsa <removed>\\n"
        }
    },
    "responseElements": null,
    "requestID": "df1a5fa5-710a-11e9-9a13-cba345085725",
    "eventID": "070c0ca9-5878-4af9-8eca-6be4158359c4",
    "eventType": "AwsApiCall",
    "recipientAccountId": "1234567890"
}

If you’ve configured your AWS account to collect CloudTrail events in an S3 bucket, you can download and audit the information programmatically. For more information, see Getting and Viewing Your CloudTrail Log Files.

Conclusion

EC2 Instance Connect offers an alternative to complicated SSH key management strategies and includes the benefits of using built-in auditability with CloudTrail. By integrating with IAM and the EC2 instance metadata available on all EC2 instances, you get a secure way to distribute short-lived keys and control access by IAM policy.

There are some additional features in the works for EC2 Instance Connect. In the future, AWS hopes to launch tag-based authorization, which allows you to use resource tags in the condition of a policy to control access. AWS also plans to enable EC2 Instance Connect by default in popular Linux distros in addition to Amazon Linux 2.

EC2 Instance Connect is available now at no extra charge in the US East (Ohio and N. Virginia), US West (N. California and Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, and Paris), and South America (São Paulo) AWS Regions.

from AWS Compute Blog