Month: April 2019

New – Amazon S3 Batch Operations

New – Amazon S3 Batch Operations

AWS customers routinely store millions or billions of objects in individual Amazon Simple Storage Service (S3) buckets, taking advantage of S3’s scale, durability, low cost, security, and storage options. These customers store images, videos, log files, backups, and other mission-critical data, and use S3 as a crucial part of their data storage strategy.

Batch Operations
Today, I would like to tell you about Amazon S3 Batch Operations. You can use this new feature to easily process hundreds, millions, or billions of S3 objects in a simple and straightforward fashion. You can copy objects to another bucket, set tags or access control lists (ACLs), initiate a restore from Glacier, or invoke an AWS Lambda function on each one.

This feature builds on S3’s existing support for inventory reports (read my S3 Storage Management Update post to learn more), and can use the reports or CSV files to drive your batch operations. You don’t have to write code, set up any server fleets, or figure out how to partition the work and distribute it to the fleet. Instead, you create a job in minutes with a couple of clicks, turn it loose, and sit back while S3 uses massive, behind-the-scenes parallelism to take care of the work. You can create, monitor, and manage your batch jobs using the S3 Console, the S3 CLI, or the S3 APIs.

A Quick Vocabulary Lesson
Before we get started and create a batch job, let’s review and introduce a couple of important terms:

Bucket – An S3 bucket holds a collection of any number of S3 objects, with optional per-object versioning.

Inventory Report – An S3 inventory report is generated each time a daily or weekly bucket inventory is run. A report can be configured to include all of the objects in a bucket, or to focus on a prefix-delimited subset.

Manifest – A list (either an Inventory Report, or a file in CSV format) that identifies the objects to be processed in the batch job.

Batch Action – The desired action on the objects described by a Manifest. Applying an action to an object constitutes an S3 Batch Task.

IAM Role – An IAM role that provides S3 with permission to read the objects in the inventory report, perform the desired actions, and to write the optional completion report. If you choose Invoke AWS Lambda function as your action, the function’s execution role must grant permission to access the desired AWS services and resources.

Batch Job – References all of the items above. Each job has a status and a priority; higher priority (numerically) jobs take precedence over those with lower priority.

Running a Batch Job
Ok, let’s use the S3 Console to create and run a batch job! In preparation for this blog post I enabled inventory reports for one of my S3 buckets (jbarr-batch-camera) earlier this week, with the reports routed to jbarr-batch-inventory:

I select the desired inventory item, and click Create job from manifest to get started (I can also click Batch operations while browsing my list of buckets). All of the relevant information is already filled in, but I can choose an earlier version of the manifest if I want (this option is only applicable if the manifest is stored in a bucket that has versioning enabled). I click Next to proceed:

I choose my operation (Replace all tags), enter the options that are specific to it (I’ll review the other operations later), and click Next:

I enter a name for my job, set its priority, and request a completion report that encompasses all tasks. Then I choose a bucket for the report and select an IAM Role that grants the necessary permissions (the console also displays a role policy and a trust policy that I can copy and use), and click Next:

Finally, I review my job, and click Create job:

The job enters the Preparing state. S3 Batch Operations checks the manifest and does some other verification, and the job enters the Awaiting your confirmation state (this only happens when I use the console). I select it and click Confirm and run:

I review the confirmation (not shown) to make sure that I understand the action to be performed, and click Run job. The job enters the Ready state, and starts to run shortly thereafter. When it is done it enters the Complete state:

If I was running a job that processed a substantially larger number of objects, I could refresh this page to monitor status. One important thing to know: After the first 1000 objects have been processed, S3 Batch Operations examines and monitors the overall failure rate, and will stop the job if the rate exceeds 50%.

The completion report contains one line for each of my objects, and looks like this:

Other Built-In Batch Operations
I don’t have enough space to give you a full run-through of the other built-in batch operations. Here’s an overview:

The PUT copy operation copies my objects, with control of the storage class, encryption, access control list, tags, and metadata:

I can copy objects to the same bucket to change their encryption status. I can also copy them to another region, or to a bucket owned by another AWS account.

The Replace Access Control List (ACL) operation does exactly that, with control over the permissions that are granted:

And the Restore operation initiates an object-level restore from the Glacier or Glacier Deep Archive storage class:

Invoking AWS Lambda Functions
I have saved the most general option for last. I can invoke a Lambda function for each object, and that Lambda function can programmatically analyze and manipulate each object. The Execution Role for the function must trust S3 Batch Operations:

Also, the Role for the Batch job must allow Lambda functions to be invoked.

With the necessary roles in place, I can create a simple function that calls Amazon Rekognition for each image:

import boto3
def lambda_handler(event, context):
    s3Client = boto3.client('s3')
    rekClient = boto3.client('rekognition')
    
    # Parse job parameters
    jobId = event['job']['id']
    invocationId = event['invocationId']
    invocationSchemaVersion = event['invocationSchemaVersion']

    # Process the task
    task = event['tasks'][0]
    taskId = task['taskId']
    s3Key = task['s3Key']
    s3VersionId = task['s3VersionId']
    s3BucketArn = task['s3BucketArn']
    s3Bucket = s3BucketArn.split(':')[-1]
    print('BatchProcessObject(' + s3Bucket + "/" + s3Key + ')')
    resp = rekClient.detect_labels(Image={'S3Object':{'Bucket' : s3Bucket, 'Name' : s3Key}}, MaxLabels=10, MinConfidence=85)
    
    l = [lb['Name'] for lb in resp['Labels']]
    print(s3Key + ' - Detected:' + str(sorted(l)))

    results = [{
        'taskId': taskId,
        'resultCode': 'Succeeded',
        'resultString': 'Succeeded'
    }]
    
    return {
        'invocationSchemaVersion': invocationSchemaVersion,
        'treatMissingKeysAs': 'PermanentFailure',
        'invocationId': invocationId,
        'results': results
    }

With my function in place, I select Invoke AWS lambda function as my operation when I create my job, and choose my BatchProcessObject function:

Then I create and confirm my job as usual. The function will be invoked for each object, taking advantage of Lambda’s ability to scale and allowing this moderately-sized job to run to completion in less than a minute:

I can find the “Detected” messages in the CloudWatch Logs Console:

As you can see from my very simple example, the ability to easily run Lambda functions on large numbers of S3 objects opens the door to all sorts of interesting applications.

Things to Know
I am looking forward to seeing and hearing about the use cases that you discover for S3 Batch Operations! Before I wrap up, here are some final thoughts:

Job Cloning – You can clone an existing job, fine-tune the parameters, and resubmit it as a fresh job. You can use this to re-run a failed job or to make any necessary adjustments.

Programmatic Job Creation – You could attach a Lambda function to the bucket where you generate your inventory reports and create a fresh batch job each time a report arrives. Jobs that are created programmatically do not need to be confirmed, and are immediately ready to execute.

CSV Object Lists – If you need to process a subset of the objects in a bucket and cannot use a common prefix to identify them, you can create a CSV file and use it to drive your job. You could start from an inventory report and filter the objects based on name or by checking them against a database or other reference. For example, perhaps you use Amazon Comprehend to perform sentiment analysis on all of your stored documents. You can process inventory reports to find documents that have not yet been analyzed and add them to a CSV file.

Job Priorities – You can have multiple jobs active at once in each AWS region. Your jobs with a higher priority take precedence, and can cause existing jobs to be paused momentarily. You can select an active job and click Update priority in order to make changes on the fly:

Learn More
Here are some resources to help you learn more about S3 Batch Operations:

Documentation – Read about Creating a Job, Batch Operations, and Managing Batch Operations Jobs.

Tutorial Videos – Check out the S3 Batch Operations Video Tutorials to learn how to Create a Job, Manage and Track a Job, and to Grant Permissions.

Now Available
You can start using S3 Batch Operations in all commercial AWS regions except Asia Pacific (Osaka) today. S3 Batch Operations is also available in both of the AWS GovCloud (US) regions.

Jeff;

from AWS News Blog https://aws.amazon.com/blogs/aws/new-amazon-s3-batch-operations/

V2 AWS SDK for Go adds Context to API operations

V2 AWS SDK for Go adds Context to API operations

The v2 AWS SDK for Go developer preview made a breaking change in the release of v0.8.0. The v0.8.0 release added a new parameter, context.Context, to the SDK’s Send and Paginate Next methods.

Context was added as a required parameter to the Send and Paginate Next methods to enable you to use the v2 SDK for Go in your application with cancellation and request tracing.

Using the Context pattern helps reduce the chance of code paths mistakenly dropping the Context, causing the cancellation and tracing chain to be lost. When the Context is lost, it can be difficult to track down the missing cancellation and tracing metrics within an application.

Migrating to v0.8.0

After you update your application to depend on v0.8.0 of the v2 SDK, you’ll encounter compile errors. This is because of the Context parameter that was added to the Send and Paginate Next methods.

If your application is already using the Context pattern, you can now pass the Context into Send and Paginate Next methods directly, instead of calling SetContext on the request returned by the client’s operation request method.

If you don’t need a Context within your application, you can use context.Background() or context.TODO() instead of specifying a Context, such as a timeout, deadline, cancel, or httptrace.ClientTrace.

Example code: before v0.8.0

The following code is an example of an application using the Amazon S3 service’s PutObject API operation with the v2 SDK before v0.8.0. The example code is
using the req.SetContext method to specify the Context for the PutObject operation.

func uploadObject(ctx context.Context, bucket, key string, obj io.ReadSeeker) error
	req := svc.PutObjectRequest(&s3.PutObjectInput{
		Bucket: &bucket,
		Key:    &key,
		Body:   obj,
	})
	req.SetContext(ctx)

	_, err := req.Send()
	return err
}

Example code: updated to v0.8.0

To migrate the previous example code to use v0.8.0 of the v2 SDK, we need to remove the req.SetContext method call, and pass the Context directly to
the Send method instead. This change will make the example code compatible with v0.8.0 of the v2 SDK.

func uploadObject(ctx context.Context, bucket, key string, obj io.ReadSeeker) error
	req := svc.PutObjectRequest(&s3.PutObjectInput{
		Bucket: &bucket,
		Key:    &key,
		Body:   obj,
	})

	_, err := req.Send(ctx)
	return err
}

What’s next for the v2 SDK for Go developer preview?

We’re working to improve usability and reduce pain points with the v2 SDK. Two specific areas we’re looking at are the SDK’s request lifecycle and error handling.

Improving the SDK’s request lifecycle will help reduce your application’s CPU and memory performance when using the SDK. It also makes it easier for you to extend and modify the SDK’s core functionality.

For the SDK’s error handling, we’re investigating alternative approaches, such as typed errors for API operation exceptions. By using typed errors, your application can assert directly against the error type. This would reduce the need to do string comparisons for SDK API operation response errors.

See our issues on Github to share your feedback, questions, and feature requests, and to stay current with the v2 AWS SDK for Go developer preview as it moves to GA.

from AWS Developer Blog https://aws.amazon.com/blogs/developer/v2-aws-sdk-for-go-adds-context-to-api-operations/

Announcing AWS X-Ray Analytics – An Interactive approach to Trace Analysis

Announcing AWS X-Ray Analytics – An Interactive approach to Trace Analysis

AWS X-Ray now includes Analytics, an interactive approach to analyzing user request paths (i.e., traces). Analytics will allow you to easily understand how your application and its underlying services are performing. With X-Ray Analytics, you can quickly detect application issues, pinpoint the root cause of the issue, determine the severity of the issues, and identify which end users were impacted. 

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/04/aws_x_ray_interactive_approach_analyze_traces/

AWS Certificate Manager Private Certificate Authority Increases Certificate Limit To One Million

AWS Certificate Manager Private Certificate Authority Increases Certificate Limit To One Million

AWS Certificate Manager (ACM) Private Certificate Authority (CA) has increased the limits per account for certificate creation and certificate revocation. The limits on the lifetime total certificates generated and revoked by a single CA has been increased from 50,000 to 1,000,000. This applies to all regions for which ACM Private CA is available. This limit increase is intended for organizations doing large scale certificate deployment to IOT, endpoints, and devices.

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/04/aws-certificate-manager-private-certificate-authority-increases-certificate-limit-to-one-million/

AWS Serverless Application Model (SAM) supports IAM permissions and custom responses for Amazon API Gateway

AWS Serverless Application Model (SAM) supports IAM permissions and custom responses for Amazon API Gateway

You can now use a single property setting in the AWS Serverless Application Model (AWS SAM) to control access using IAM permissions for all paths and methods of an Amazon API Gateway API. In addition, you can now configure custom responses for your APIs using simple AWS SAM syntax.  

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/aws_serverless_application_Model_support_IAM/

New VMware Cloud on AWS Navigate Track

New VMware Cloud on AWS Navigate Track

Today, we are excited to announce the new VMware Cloud on AWS Navigate track

Together, VMware Cloud on AWS delivers a seamlessly integrated hybrid cloud solution that extends on-premises VMware environments to AWS elastic, bare-metal infrastructure that is fully integrated as part of the AWS. VMware Cloud on AWS will bring the capabilities of VMware’s enterprise class Software-Defined Data Center (SDDC) technologies to the AWS Cloud, and enable customers to run any application across vSphere-based private, public, and hybrid cloud environments.

The VMware Cloud on AWS Navigate track creates a prescriptive journey for APN Partners who want to build expertise in supporting AWS Customer projects for VMware Cloud solutions on AWS.

Learn more about the AWS Navigate Program, and all the AWS Navigate tracks available on the APN blog

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/04/vmware-navigate-track/

Announcing General Availability of Amazon Managed Blockchain for Hyperledger Fabric

Announcing General Availability of Amazon Managed Blockchain for Hyperledger Fabric

Amazon Web Services (AWS) announces general availability of Amazon Managed Blockchain, which is a fully managed service that makes it easy to create and manage scalable blockchain networks using the popular open source frameworks Hyperledger Fabric and Ethereum. Hyperledger Fabric is available today. Ethereum is coming soon.

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/04/introducing-amazon-managed-blockchain/

Amazon ECR now supports AWS PrivateLink in AWS GovCloud (US-West)

Amazon ECR now supports AWS PrivateLink in AWS GovCloud (US-West)

Amazon Elastic Container Registry (ECR) now has support for AWS PrivateLink in the AWS GovCloud (US-West) Region. AWS PrivateLink is a networking technology designed to enable access to AWS services in a highly available and scalable manner, while keeping all the network traffic within the AWS network. When you create a AWS PrivateLink endpoint for Amazon ECR, the service endpoints appear as elastic network interfaces with a private IP address in your Amazon Virtual Private Cloud (VPC).

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-ecr-now-supports-aws-privatelink-in-aws-govcloud–us-west/