Tag: CLI

Netflix Studio Hack Day — May 2019

Netflix Studio Hack Day — May 2019

Netflix Studio Hack Day — May 2019

By Tom Richards, Carenina Garcia Motion, and Marlee Tart

Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies.

For the most recent hack day, we channeled our creative energy towards our studio efforts. The goal remained the same: team up with new colleagues and have fun while learning, creating, and experimenting. We know even the silliest idea can spur something more.

The most important value of hack days is that they support a culture of innovation. We believe in this work, even if it never ships, and love to share the creativity and thought put into these ideas.

Below, you can find videos made by the hackers of some of our favorite hacks from this event.

Project Rumble Pack

You’re watching your favorite episode of Voltron when, after a suspenseful pause, there’s a huge explosion — and your phone starts to vibrate in your hands.

The Project Rumble Pak hack day project explores how haptics can enhance the content you’re watching. With every explosion, sword clank, and laser blast, you get force feedback to amp up the excitement.

For this project, we synchronized Netflix content with haptic effects using Immersion Corporation technology.

By Hans van de Bruggen and Ed Barker

The Voice of Netflix

Introducing The Voice of Netflix. We trained a neural net to spot words in Netflix content and reassemble them into new sentences on demand. For our stage demonstration, we hooked this up to a speech recognition engine to respond to our verbal questions in the voice of Netflix’s favorite characters. Try it out yourself at blogofsomeguy.com/v!

By Guy Cirino and Carenina Garcia Motion

TerraVision

TerraVision re-envisions the creative process and revolutionizes the way our filmmakers can search and discover filming locations. Filmmakers can drop a photo of a look they like into an interface and find the closest visual matches from our centralized library of locations photos. We are using a computer vision model trained to recognize places to build reverse image search functionality. The model converts each image into a small dimensional vector, and the matches are obtained by computing the nearest neighbors of the query.

By Noessa Higa, Ben Klein, Jonathan Huang, Tyler Childs, Tie Zhong, and Kenna Hasson

Get Out!

Have you ever found yourself needing to give the Evil Eye™ to colleagues who are hogging your conference room after their meeting has ended?

Our hack is a simple web application that allows employees to select a Netflix meeting room anywhere in the world, and press a button to kick people out of their meeting room if they have overstayed their meeting. First, the app looks up calendar events associated with the room and finds the latest meeting in the room that should have already ended. It then automatically calls in to that meeting and plays walk-off music similar to the Oscar’s to not-so-subtly encourage your colleagues to Get Out! We built this hack using Java (Springboot framework), the Google OAuth and Calendar APIs (for finding rooms) and Twilio API (for calling into the meeting), and deployed it on AWS.

By Abi Seshadri and Rachel Rivera

You can also check out highlights from our past events: November 2018, March 2018, August 2017, January 2017, May 2016, November 2015, March 2015, February 2014 & August 2014.

Thanks to all the teams who put together a great round of hacks in 24 hours.


Netflix Studio Hack Day — May 2019 was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Netflix TechBlog – Medium https://medium.com/netflix-techblog/netflix-studio-hack-day-may-2019-b4a0ecc629eb?source=rss—-2615bd06b42e—4

Referencing the AWS SDK for .NET Standard 2.0 from Unity, Xamarin, or UWP

Referencing the AWS SDK for .NET Standard 2.0 from Unity, Xamarin, or UWP

In March 2019, AWS announced support for .NET Standard 2.0 in SDK for .NET. They also announced plans to remove the Portable Class Library (PCL) assemblies from NuGet packages in favor of the .NET Standard 2.0 binaries.

If you’re starting a new project targeting a platform supported by .NET Standard 2.0, especially recent versions of Unity, Xamarin and UWP, you may want to use the .NET Standard 2.0 assemblies for the AWS SDK instead of the PCL assemblies.

Currently, it’s challenging to consume .NET Standard 2.0 assemblies from NuGet packages directly in your PCL, Xamarin, or UWP applications. Unfortunately, the new csproj file format and NuGet don’t let you select assemblies for a specific target framework (in this case, .NET Standard 2.0). This limitation can cause problems because NuGet always restores the assemblies for the target framework of the project being built (in this case, one of the legacy PCL assemblies).

Considering this limitation, our guidance is for your application to directly reference the AWS SDK assemblies (DLL files) instead of the NuGet packages.

  1. Go to the NuGet page for the specific package (for example, AWSSDK.Core) and choose Download Package.
  2. Rename the downloaded .nupkg file with a .zip extension.
  3. Open it to extract the assemblies for a specific target framework (for example /lib/netstandard2.0/AWSSDK.Core.dll).

When using Unity (2018.1 or newer), choose .NET 4.x Equivalent as Scripting Runtime Version and copy the AWS SDK for .NET assemblies into the Asset folder.

Because this process can be time-consuming and error-prone, you should use a script to perform the download and extraction, especially if your project references multiple AWS services. The following PowerShell script downloads and extracts all the latest SDK .dll files into the current folder:

<#
.Synopsis
    Downloads all assemblies of the AWS SDK for .NET for a specific target framework.
.DESCRIPTION
    Downloads all assemblies of the AWS SDK for .NET for a specific target framework.
    This script allows specifying a version of the SDK to download or a target framework.

.NOTES
    This script downloads all files to the current folder (the folder returned by Get-Location).
    This script depends on GitHub to retrieve the list of assemblies to download and on NuGet
    to retrieve the relative packages.

.EXAMPLE
   ./DownloadSDK.ps1

   Downloads the latest AWS SDK for .NET assemblies for .NET Standard 2.0.

.EXAMPLE
    ./DownloadSDK.ps1 -TargetFramework net35

    Downloads the latest AWS SDK for .NET assemblies for .NET Framework 3.5.
    
.EXAMPLE
    ./DownloadSDK.ps1 -SDKVersion 3.3.0.0

    Downloads the AWS SDK for .NET version 3.3.0.0 assemblies for .NET Standard 2.0.

.PARAMETER TargetFramework
    The name of the target framework for which to download the AWS SDK for .NET assemblies. It must be a valid Target Framework Moniker, as described in https://docs.microsoft.com/en-us/dotnet/standard/frameworks.

.PARAMETER SDKVersion
    The AWS SDK for .NET version to download. This must be in the full four-number format (e.g., "3.3.0.0") and it must correspond to a tag on the https://github.com/aws/aws-sdk-net/ repository.
#>

Param (
    [Parameter()]
    [ValidateNotNullOrEmpty()]
    [string]$TargetFramework = 'netstandard2.0',
    [Parameter()]
    [ValidateNotNullOrEmpty()]
    [string]$SDKVersion = 'master'
)

function DownloadPackageAndExtractDll
{
    Param (
        [Parameter(Mandatory = $true)]
        [string] $name,
        [Parameter(Mandatory = $true)]
        [string] $version
    )

    Write-Progress -Activity "Downloading $name"

    $packageUri = "https://www.nuget.org/api/v2/package/$name/$version"
    $filePath = [System.IO.Path]::GetTempFileName()
    $WebClient.DownloadFile($packageUri, $filePath)

    #Invoke-WebRequest $packageUri -OutFile $filePath
    try {
        $zipArchive = [System.IO.Compression.ZipFile]::OpenRead($filePath)
        $entry = $zipArchive.GetEntry("lib/$TargetFramework/$name.dll")
        if ($null -ne $entry)
        {
            $entryStream = $entry.Open()
            $dllPath = Get-Location | Join-Path -ChildPath "./$name.dll"
            $dllFileStream = [System.IO.File]::OpenWrite($dllPath)
            $entryStream.CopyTo($dllFileStream)
            $dllFileStream.Close();
        }
    }
    finally {
        if ($null -ne $dllFileStream)
        {
            $dllFileStream.Dispose()
        }
        if ($null -ne $entryStream)
        {
            $entryStream.Dispose()
        }
        if ($null -ne $zipArchive)
        {
            $zipArchive.Dispose()
        }
        Remove-Item $filePath
    }
}

try {
    $WebClient = New-Object System.Net.Webclient
    Add-Type -AssemblyName System.IO.Compression.FileSystem

    $sdkVersionsUri = "https://raw.githubusercontent.com/aws/aws-sdk-net/$SDKVersion/generator/ServiceModels/_sdk-versions.json"
    $versions = Invoke-WebRequest $sdkVersionsUri | ConvertFrom-Json
    DownloadPackageAndExtractDll "AWSSDK.Core" $versions.CoreVersion
    foreach ($service in $versions.ServiceVersions.psobject.Properties)
    {
        DownloadPackageAndExtractDll "AWSSDK.$($service.Name)" $service.Value.Version
    }    
}
finally {
    if ($null -ne $WebClient)
    {
        $WebClient.Dispose()
    } 
}

At this time, not all features specific to the PCL and Unity SDK libraries have been ported over to .NET Standard 2.0. To suggest features, changes, or leave other feedback to make PCL and Unity development easier, open an issue on our aws-sdk-net-issues GitHub repo.

This workaround will only be needed until PCL assemblies are removed from the NuGet packages. At that time, restoring the NuGet packages from an iOS, Android or UWP project (either a Xamarin or Unity project) should result in the .NET Standard 2.0 assemblies being referenced and included in your build outputs.

from AWS Developer Blog https://aws.amazon.com/blogs/developer/referencing-the-aws-sdk-for-net-standard-2-0-from-unity-xamarin-or-uwp/

Predictive CPU isolation of containers at Netflix

Predictive CPU isolation of containers at Netflix

By Benoit Rostykus, Gabriel Hartmann

Noisy Neighbors

We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too.

When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. However, the key insight here is that these caches are partially shared among the CPUs, which means that perfect performance isolation of co-hosted containers is not possible. If the container running on the core next to your container suddenly decides to fetch a lot of data from the RAM, it will inevitably result in more cache misses for you (and hence a potential performance degradation).

Linux to the rescue?

Traditionally it has been the responsibility of the operating system’s task scheduler to mitigate this performance isolation problem. In Linux, the current mainstream solution is CFS (Completely Fair Scheduler). Its goal is to assign running processes to time slices of the CPU in a “fair” way.

CFS is widely used and therefore well tested and Linux machines around the world run with reasonable performance. So why mess with it? As it turns out, for the large majority of Netflix use cases, its performance is far from optimal. Titus is Netflix’s container platform. Every month, we run millions of containers on thousands of machines on Titus, serving hundreds of internal applications and customers. These applications range from critical low-latency services powering our customer-facing video streaming service, to batch jobs for encoding or machine learning. Maintaining performance isolation between these different applications is critical to ensuring a good experience for internal and external customers.

We were able to meaningfully improve both the predictability and performance of these containers by taking some of the CPU isolation responsibility away from the operating system and moving towards a data driven solution involving combinatorial optimization and machine learning.

The idea

CFS operates by very frequently (every few microseconds) applying a set of heuristics which encapsulate a general concept of best practices around CPU hardware use.

Instead, what if we reduced the frequency of interventions (to every few seconds) but made better data-driven decisions regarding the allocation of processes to compute resources in order to minimize collocation noise?

One traditional way of mitigating CFS performance issues is for application owners to manually cooperate through the use of core pinning or nice values. However, we can automatically make better global decisions by detecting collocation opportunities based on actual usage information. For example if we predict that container A is going to become very CPU intensive soon, then maybe we should run it on a different NUMA socket than container B which is very latency-sensitive. This avoids thrashing caches too much for B and evens out the pressure on the L3 caches of the machine.

Optimizing placements through combinatorial optimization

What the OS task scheduler is doing is essentially solving a resource allocation problem: I have X threads to run but only Y CPUs available, how do I allocate the threads to the CPUs to give the illusion of concurrency?

As an illustrative example, let’s consider a toy instance of 16 hyperthreads. It has 8 physical hyperthreaded cores, split on 2 NUMA sockets. Each hyperthread shares its L1 and L2 caches with its neighbor, and shares its L3 cache with the 7 other hyperthreads on the socket:

If we want to run container A on 4 threads and container B on 2 threads on this instance, we can look at what “bad” and “good” placement decisions look like:

The first placement is intuitively bad because we potentially create collocation noise between A and B on the first 2 cores through their L1/L2 caches, and on the socket through the L3 cache while leaving a whole socket empty. The second placement looks better as each CPU is given its own L1/L2 caches, and we make better use of the two L3 caches available.

Resource allocation problems can be efficiently solved through a branch of mathematics called combinatorial optimization, used for example for airline scheduling or logistics problems.

We formulate the problem as a Mixed Integer Program (MIP). Given a set of K containers each requesting a specific number of CPUs on an instance possessing d threads, the goal is to find a binary assignment matrix M of size (d, K) such that each container gets the number of CPUs it requested. The loss function and constraints contain various terms expressing a priori good placement decisions such as:

  • avoid spreading a container across multiple NUMA sockets (to avoid potentially slow cross-sockets memory accesses or page migrations)
  • don’t use hyper-threads unless you need to (to reduce L1/L2 thrashing)
  • try to even out pressure on the L3 caches (based on potential measurements of the container’s hardware usage)
  • don’t shuffle things too much between placement decisions

Given the low-latency and low-compute requirements of the system (we certainly don’t want to spend too many CPU cycles figuring out how containers should use CPU cycles!), can we actually make this work in practice?

Implementation

We decided to implement the strategy through Linux cgroups since they are fully supported by CFS, by modifying each container’s cpuset cgroup based on the desired mapping of containers to hyper-threads. In this way a user-space process defines a “fence” within which CFS operates for each container. In effect we remove the impact of CFS heuristics on performance isolation while retaining its core scheduling capabilities.

This user-space process is a Titus subsystem called titus-isolate which works as follows. On each instance, we define three events that trigger a placement optimization:

  • add: A new container was allocated by the Titus scheduler to this instance and needs to be run
  • remove: A running container just finished
  • rebalance: CPU usage may have changed in the containers so we should reevaluate our placement decisions

We periodically enqueue rebalance events when no other event has recently triggered a placement decision.

Every time a placement event is triggered, titus-isolate queries a remote optimization service (running as a Titus service, hence also isolating itself… turtles all the way down) which solves the container-to-threads placement problem.

This service then queries a local GBRT model (retrained every couple of hours on weeks of data collected from the whole Titus platform) predicting the P95 CPU usage of each container in the coming 10 minutes (conditional quantile regression). The model contains both contextual features (metadata associated with the container: who launched it, image, memory and network configuration, app name…) as well as time-series features extracted from the last hour of historical CPU usage of the container collected regularly by the host from the kernel CPU accounting controller.

The predictions are then fed into a MIP which is solved on the fly. We’re using cvxpy as a nice generic symbolic front-end to represent the problem which can then be fed into various open-source or proprietary MIP solver backends. Since MIPs are NP-hard, some care needs to be taken. We impose a hard time budget to the solver to drive the branch-and-cut strategy into a low-latency regime, with guardrails around the MIP gap to control overall quality of the solution found.

The service then returns the placement decision to the host, which executes it by modifying the cpusets of the containers.

For example, at any moment in time, an r4.16xlarge with 64 logical CPUs might look like this (the color scale represents CPU usage):

Results

The first version of the system led to surprisingly good results. We reduced overall runtime of batch jobs by multiple percent on average while most importantly reducing job runtime variance (a reasonable proxy for isolation), as illustrated below. Here we see a real-world batch job runtime distribution with and without improved isolation:

Notice how we mostly made the problem of long-running outliers disappear. The right-tail of unlucky noisy-neighbors runs is now gone.

For services, the gains were even more impressive. One specific Titus middleware service serving the Netflix streaming service saw a capacity reduction of 13% (a decrease of more than 1000 containers) needed at peak traffic to serve the same load with the required P99 latency SLA! We also noticed a sharp reduction of the CPU usage on the machines, since far less time was spent by the kernel in cache invalidation logic. Our containers are now more predictable, faster and the machine is less used! It’s not often that you can have your cake and eat it too.

Next Steps

We are excited with the strides made so far in this area. We are working on multiple fronts to extend the solution presented here.

We want to extend the system to support CPU oversubscription. Most of our users have challenges knowing how to properly size the numbers of CPUs their app needs. And in fact, this number varies during the lifetime of their containers. Since we already predict future CPU usage of the containers, we want to automatically detect and reclaim unused resources. For example, one could decide to auto-assign a specific container to a shared cgroup of underutilized CPUs, to better improve overall isolation and machine utilization, if we can detect the sensitivity threshold of our users along the various axes of the following graph.

We also want to leverage kernel PMC events to more directly optimize for minimal cache noise. One possible avenue is to use the Intel based bare metal instances recently introduced by Amazon that allow deep access to performance analysis tools. We could then feed this information directly into the optimization engine to move towards a more supervised learning approach. This would require a proper continuous randomization of the placements to collect unbiased counterfactuals, so we could build some sort of interference model (“what would be the performance of container A in the next minute, if I were to colocate one of its threads on the same core as container B, knowing that there’s also C running on the same socket right now?”).

Conclusion

If any of this piques your interest, reach out to us! We’re looking for ML engineers to help us push the boundary of containers performance and “machine learning for systems” and systems engineers for our core infrastructure and compute platform.


Predictive CPU isolation of containers at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Netflix TechBlog – Medium https://medium.com/netflix-techblog/predictive-cpu-isolation-of-containers-at-netflix-91f014d856c7?source=rss—-2615bd06b42e—4

Making our Android Studio Apps Reactive with UI Components & Redux

Making our Android Studio Apps Reactive with UI Components & Redux

By Juliano Moraes, David Henry, Corey Grunewald & Jim Isaacs

Recently Netflix has started building mobile apps to bring technology and innovation to our Studio Physical Productions, the portion of the business responsible for producing our TV shows and movies.

Our very first mobile app is called Prodicle and was built for Android & iOS using the same reactive architecture in both platforms, which allowed us to build 2 apps from scratch in 3 months with 4 software engineers.

The app helps production crews organize their shooting days through shooting milestones and keeps everyone in a production informed about what is currently happening.

Here is a shooting day for Glow Season 3.

We’ve been experimenting with an idea to use reactive components on Android for the last two years. While there are some frameworks that implement this, we wanted to stay very close to the Android native framework. It was extremely important to the team that we did not completely change the way our engineers write Android code.

We believe reactive components are the key foundation to achieve composable UIs that are scalable, reusable, unit testable and AB test friendly. Composable UIs contribute to fast engineering velocity and produce less side effect bugs.

Our current player UI in the Netflix Android app is using our first iteration of this componentization architecture. We took the opportunity with building Prodicle to improve upon what we learned with the Player UI, and build the app from scratch using Redux, Components, and 100% Kotlin.

Overall Architecture

Fragments & Activities

— Fragment is not your view.

Having large Fragments or Activities causes all sorts of problems, it makes the code hard to read, maintain, and extend. Keeping them small helps with code encapsulation and better separation of concerns — the presentation logic should be inside a component or a class that represents a view and not in the Fragment.

This is how a clean Fragment looks in our app, there is no business logic. During the onViewCreated we pass pre-inflated view containers and the global redux store’s dispatch function.

UI Components

Components are responsible for owning their own XML layout and inflating themselves into a container. They implement a single render(state: ComponentState) interface and have their state defined by a Kotlin data class.

A component’s render method is a pure function that can easily be tested by creating a permutation of possible states variances.

Dispatch functions are the way components fire actions to change app state, make network requests, communicate with other components, etc.

A component defines its own state as a data class in the top of the file. That’s how its render() function is going to be invoked by the render loop.

It receives a ViewGroup container that will be used to inflate the component’s own layout file, R.layout.list_header in this example.

All the Android views are instantiated using a lazy approach and the render function is the one that will set all the values in the views.

Layout

All of these components are independent by design, which means they do not know anything about each other, but somehow we need to layout our components within our screens. The architecture is very flexible and provides different ways of achieving it:

  1. Self Inflation into a Container: A Component receives a ViewGroup as a container in the constructor, it inflates itself using Layout Inflater. Useful when the screen has a skeleton of containers or is a Linear Layout.
  2. Pre inflated views. Component accepts a View in its constructor, no need to inflate it. This is used when the layout is owned by the screen in a single XML.
  3. Self Inflation into a Constraint Layout: Components inflate themselves into a Constraint Layout available in its constructor, it exposes a getMainViewId to be used by the parent to set constraints programmatically.

Redux

Redux provides an event driven unidirectional data flow architecture through a global and centralized application state that can only be mutated by Actions followed by Reducers. When the app state changes it cascades down to all the subscribed components.

Having a centralized app state makes disk persistence very simple using serialization. It also provides the ability to rewind actions that have affected the state for free. After persisting the current state to the disk the next app launch will put the user in exactly the same state they were before. This removes the requirement for all the boilerplate associated with Android’s onSaveInstanceState() and onRestoreInstanceState().

The Android FragmentManager has been abstracted away in favor of Redux managed navigation. Actions are fired to Push, Pop, and Set the current route. Another Component, NavigationComponent listens to changes to the backStack and handles the creation of new Screens.

The Render Loop

Render Loop is the mechanism which loops through all the components and invokes component.render() if it is needed.

Components need to subscribe to changes in the App State to have their render() called. For optimization purposes, they can specify a transformation function containing the portion of the App State they care about — using selectWithSkipRepeats prevents unnecessary render calls if a part of the state changes that the component does not care about.

The ComponentManager is responsible for subscribing and unsubscribing Components. It extends Android ViewModel to persist state on configuration change, and has a 1:1 association with Screens (Fragments). It is lifecycle aware and unsubscribes all the components when onDestroy is called.

Below is our fragment with its subscriptions and transformation functions:

ComponentManager code is below:

Recycler Views

Components should be flexible enough to work inside and outside of a list. To work together with Android’s recyclerView implementation we’ve created a UIComponent and UIComponentForList, the only difference is the second extends a ViewHolder and does not subscribe directly to the Redux Store.

Here is how all the pieces fit together.

Fragment:

The Fragment initializes a MilestoneListComponent subscribing it to the Store and implements its transformation function that will define how the global state is translated to the component state.

List Component:

A List Component uses a custom adapter that supports multiple component types, provides async diff in the background thread through adapter.update() interface and invokes item components render() function during onBind() of the list item.

Item List Component:

Item List Components can be used outside of a list, they look like any other component except for the fact that UIComponentForList extends Android’s ViewHolder class. As any other component it implements the render function based on a state data class it defines.

Unit Tests

Unit tests on Android are generally hard to implement and slow to run. Somehow we need to mock all the dependencies — Activities, Context, Lifecycle, etc in order to start to test the code.

Considering our components render methods are pure functions we can easily test it by making up states without any additional dependencies.

In this unit test example we initialize a UI Component inside the before() and for every test we directly invoke the render() function with a state that we define. There is no need for activity initialization or any other dependency.

Conclusion & Next Steps

The first version of our app using this architecture was released a couple months ago and we are very happy with the results we’ve achieved so far. It has proven to be composable, reusable and testable — currently we have 60% unit test coverage.

Using a common architecture approach allows us to move very fast by having one platform implement a feature first and the other one follow. Once the data layer, business logic and component structure is figured out it becomes very easy for the following platform to implement the same feature by translating the code from Kotlin to Swift or vice versa.

To fully embrace this architecture we’ve had to think a bit outside of the platform’s provided paradigms. The goal is not to fight the platform, but instead to smooth out some rough edges.


Making our Android Studio Apps Reactive with UI Components & Redux was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Netflix TechBlog – Medium https://medium.com/netflix-techblog/making-our-android-studio-apps-reactive-with-ui-components-redux-5e37aac3b244?source=rss—-2615bd06b42e—4

Getting started with the AWS Cloud Development Kit and Python

Getting started with the AWS Cloud Development Kit and Python

This post introduces you to the new Python bindings for the AWS Cloud Development Kit (AWS CDK).

What’s the AWS CDK, you might ask? Good question! You are probably familiar with the concept of infrastructure as code (IaC). When you think of IaC, you might think of things like AWS CloudFormation.

AWS CloudFormation allows you to define your AWS infrastructure in JSON or YAML files that can be managed within your source code repository, just like any other code. You can do pull requests and code reviews. When everything looks good, you can use these files as input into an automated process (CI/CD) that deploys your infrastructure changes.

The CDK actually builds on AWS CloudFormation and uses it as the engine for provisioning AWS resources. Rather than using a declarative language like JSON or YAML to define your infrastructure, the CDK lets you do that in your favorite imperative programming language. This includes languages such as TypeScript, Java, C#, and now Python.

About this post
Time to read 19 minutes
Time to complete (estimated) 30 minutes
Cost to complete $0 free tier (tiny fraction of a penny if you aren’t free tier)
Learning level Intermediate (200)
Services used

AWS CDK

AWS CloudFormation

Why would an imperative language be better than a declarative language? Well, it may not always be but there are some real advantages: IDE integration and composition.

IDE integration

You probably have your favorite IDE for your favorite programming language. It provides all kinds of useful features that make you a more productive developer (for example, code completion, integrated documentation, or refactoring tools).

With CDK, you automatically get all of those same advantages when defining your AWS infrastructure. That’s because you’re doing it in the same language that you use for your application code.

Composition

One of the things that modern programming languages do well is composition. By that, I mean the creation of new, higher-level abstractions that hide the details of what is happening underneath and expose a much simpler API. This is one of the main things that we do as developers, creating higher levels of abstraction to simplify code.

It turns out that this is also useful when defining your infrastructure. The existing APIs to AWS services are, by design, fairly low level because they are trying to expose as much functionality as possible to a broad audience of developers. IaC tools like AWS CloudFormation expose a declarative interface, but that interface is at the same level of the API, so it’s equally complex.

In contrast, CDK allows you to compose new abstractions that hide details and simplify common use cases. Then, it packages that code up as a library in your language of choice so that others can easily take advantage.

One of the other neat things about the CDK is that it is designed to support multiple programming languages. The core of the system is written in TypeScript, but bindings for other languages can be added.

That brings me back to the topic of this post, the Python bindings for CDK.

Sample Python application

First, there is some installation that must happen. Rather than describe all of that here, see Getting Started with the AWS CDK.

Create the application

Now, create a sample application.

$ mkdir my_python_sample
$ cd my_python_sample
$ cdk init
Available templates:
* app: Template for a CDK Application
└─ cdk init app --language=[csharp|fsharp|java|python|typescript]
* lib: Template for a CDK Construct Library
└─ cdk init lib --language=typescript
sample-app: Example CDK Application with some constructs
└─ cdk init sample-app —language=[python|typescript]

The first thing you do is create a directory that contains your Python CDK sample. The CDK provides a CLI tool to make it easy to perform many CDK-related operations. You can see that you are running the init command with no parameters.

The CLI is responding with information about all the things that the init command can do. There are different types of apps that you can initialize and there are a number of different programming languages available. Choose sample-app and python, of course.

$ cdk init --language python sample-app
Applying project template sample-app for python
Initializing a new git repository...
Executing python -m venv .env
Welcome to your CDK Python project!

You should explore the contents of this template. It demonstrates a CDK app with two instances of a stack (`HelloStack`) which also uses a user-defined construct (`HelloConstruct`). 

The `cdk.json` file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .env directory.

After the init process completes, you can use the following steps to get your project set up.

'''
$ source .env/bin/activate
$ pip install -r requirements.txt
'''

At this point you can now synthesize the CloudFormation template for this code.

'''
$ cdk synth
'''

You can now begin exploring the source code, contained in the hello directory. There is also a very trivial test included that can be run like this:

'''
$ pytest
'''

To add additional dependencies, for example other CDK libraries, just add to your requirements.txt file and rerun the pip install -r requirements.txt command.

Useful commands:

cdk ls          list all stacks in the app
cdk synth       emits the synthesized CloudFormation template
cdk deploy      deploy this stack to your default AWS account/region
cdk diff        compare deployed stack with current state
cdk docs        open CDK documentation

Enjoy!

So, what just happened? Quite a bit, actually. The CDK CLI created some Python source code for your sample application. It also created other support files and infrastructure to make it easy to get started with CDK in Python. Here’s what your directory contains now:

(.env) $ tree
.
├── README.md
├── app.py
├── cdk.json
├── hello
│   ├── __init__.py
│   ├── hello_construct.py
│   └── hello_stack.py
├── requirements.txt
├── setup.py
└── tests
    ├── __init__.py
    └── unit
        ├── __init__.py
        └── test_hello_construct.py

Take a closer look at the contents of your directory:

  • README.md—The introductory README for this project.
  • app.py—The “main” for this sample application.
  • cdk.json—A configuration file for CDK that defines what executable CDK should run to generate the CDK construct tree.
  • hello—A Python module directory.
    • hello_construct.py—A custom CDK construct defined for use in your CDK application.
    • hello_stack.py—A custom CDK stack construct for use in your CDK application.
  • requirements.txt—This file is used by pip to install all of the dependencies for your application. In this case, it contains only -e . This tells pip to install the requirements specified in setup.py. It also tells pip to run python setup.py develop to install the code in the hello module so that it can be edited in place.
  • setup.py—Defines how this Python package would be constructed and what the dependencies are.
  • tests—Contains all tests.
  • unit—Contains unit tests.
    • test_hello_construct.py—A trivial test of the custom CDK construct created in the hello package. This is mainly to demonstrate how tests can be hooked up to the project.

You may have also noticed that as the init command was running, it mentioned that it had created a virtualenv for the project as well. I don’t have time to go into virtualenvs in detail for this post. They are basically a great tool in the Python world for isolating your development environments from your system Python environment and from other development environments.

All dependencies are installed within this virtual environment and have no effect on anything else on your machine. When you are done with this example, you can just delete the entire directory and everything goes away.

You don’t have to use the virtualenv created here but I highly recommend that you do. Here’s how you would initialize your virtualenv and then install all of your dependencies.

$ source .env/bin/activate
(.env) $ pip install -r requirements.txt
...
(.env) $ pytest
============================= test session starts ==============================
platform darwin -- Python 3.7.0, pytest-4.4.0, py-1.8.0, pluggy-0.9.0
rootdir: /Users/garnaat/projects/cdkdev/my_sample
collected 1 item                                                              
tests/unit/test_hello_construct.py .                                     [100%]
=========================== 1 passed in 0.67 seconds ===========================

As you can see, you even have tests included, although they are admittedly simple at this point. It does give you a way to make sure your sample application and all of its dependencies are installed correctly.

Generate an AWS CloudFormation template

Okay, now that you know what’s here, try to generate an AWS CloudFormation template for the constructs that you are defining in your CDK app. You use the CDK Toolkit (the CLI) to do this.

$ cdk synth 
Multiple stacks selected (hello-cdk-1, hello-cdk-2), but output is directed to stdout. Either select one stack, or use --output to send templates to a directory. 
$

Hmm, that was unexpected. What does this mean? Well, as you will see in a minute, your CDK app actually defines two stacks, hello-cdk-1 and hello-cdk-2. The synth command can only synthesize one stack at a time. It is telling you about the two that it has found and asking you to choose one of them.

$ cdk synth hello-cdk-1
Resources:
  MyFirstQueueFF09316A:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 300
    Metadata:
      aws:cdk:path: hello-cdk-1/MyFirstQueue/Resource
  MyFirstQueueMyFirstTopicSubscription774591B6:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: sqs
      TopicArn:
        Ref: MyFirstTopic0ED1F8A4
      Endpoint:
        Fn::GetAtt:
          - MyFirstQueueFF09316A
          - Arn
    Metadata:
      aws:cdk:path: hello-cdk-1/MyFirstQueue/MyFirstTopicSubscription/Resource
  MyFirstQueuePolicy596EEC78:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
          - Action: sqs:SendMessage
            Condition:
              ArnEquals:
                aws:SourceArn:
                  Ref: MyFirstTopic0ED1F8A4
            Effect: Allow
            Principal:
              Service: sns.amazonaws.com
            Resource:
              Fn::GetAtt:
               - MyFirstQueueFF09316A
                - Arn
        Version: "2012-10-17"
      Queues:
        - Ref: MyFirstQueueFF09316A
    Metadata:
      aws:cdk:path: hello-cdk-1/MyFirstQueue/Policy/Resource
  MyFirstTopic0ED1F8A4:
    Type: AWS::SNS::Topic
    Properties:
      DisplayName: My First Topic
    Metadata:
      aws:cdk:path: hello-cdk-1/MyFirstTopic/Resource
  MyHelloConstructBucket0DAEC57E1:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: hello-cdk-1/MyHelloConstruct/Bucket-0/Resource
  MyHelloConstructBucket18D9883BE:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: hello-cdk-1/MyHelloConstruct/Bucket-1/Resource
  MyHelloConstructBucket2C1DA3656:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: hello-cdk-1/MyHelloConstruct/Bucket-2/Resource
  MyHelloConstructBucket398A5DE67:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Metadata:
      aws:cdk:path: hello-cdk-1/MyHelloConstruct/Bucket-3/Resource
  MyUserDC45028B:
    Type: AWS::IAM::User
    Metadata:
      aws:cdk:path: hello-cdk-1/MyUser/Resource
  MyUserDefaultPolicy7B897426:
    Type: AWS::IAM::Policy
    Properties:
      PolicyDocument:
        Statement:
         - Action:
              - s3:GetObject*
              - s3:GetBucket*
              - s3:List*
            Effect: Allow
            Resource:
              - Fn::GetAtt:
                  - MyHelloConstructBucket0DAEC57E1
                  - Arn
             - Fn::Join:
                  - ""
                  - - Fn::GetAtt:
                        - MyHelloConstructBucket0DAEC57E1
                        - Arn
                    - /*
          - Action:
              - s3:GetObject*
              - s3:GetBucket*
              - s3:List*
            Effect: Allow
            Resource:
              - Fn::GetAtt:
                 - MyHelloConstructBucket18D9883BE
                  - Arn
              - Fn::Join:
                  - ""
                 - - Fn::GetAtt:
                        - MyHelloConstructBucket18D9883BE
                        - Arn
                    - /*
          - Action:
              - s3:GetObject*
              - s3:GetBucket*
              - s3:List*
            Effect: Allow
            Resource:
              - Fn::GetAtt:
                  - MyHelloConstructBucket2C1DA3656
                  - Arn
             - Fn::Join:
                 - ""
                  - - Fn::GetAtt:
                        - MyHelloConstructBucket2C1DA3656
                        - Arn
                    - /*
          - Action:
              - s3:GetObject*
              - s3:GetBucket*
              - s3:List*
            Effect: Allow
            Resource:
              - Fn::GetAtt:
                  - MyHelloConstructBucket398A5DE67
                  - Arn
              - Fn::Join:
                  - ""
                  - - Fn::GetAtt:
                        - MyHelloConstructBucket398A5DE67
                       - Arn
                    - /*
        Version: "2012-10-17"
      PolicyName: MyUserDefaultPolicy7B897426
      Users:
       - Ref: MyUserDC45028B
    Metadata:
     aws:cdk:path: hello-cdk-1/MyUser/DefaultPolicy/Resource
  CDKMetadata:
    Type: AWS::CDK::Metadata
    Properties:
      Modules: aws-cdk=0.27.0,@aws-cdk/assets=0.27.0,@aws-cdk/aws-autoscaling-api=0.27.0,@aws-cdk/aws-cloudwatch=0.27.0,@aws-cdk/aws-codepipeline-api=0.27.0,@aws-cdk/aws-ec2=0.27.0,@aws-cdk/aws-events=0.27.0,@aws-cdk/aws-iam=0.27.0,@aws-cdk/aws-kms=0.27.0,@aws-cdk/aws-lambda=0.27.0,@aws-cdk/aws-logs=0.27.0,@aws-cdk/aws-s3=0.27.0,@aws-cdk/aws-s3-notifications=0.27.0,@aws-cdk/aws-sns=0.27.0,@aws-cdk/aws-sqs=0.27.0,@aws-cdk/aws-stepfunctions=0.27.0,@aws-cdk/cdk=0.27.0,@aws-cdk/cx-api=0.27.0,@aws-cdk/region-info=0.27.0,jsii-runtime=Python/3.7.0

That’s a lot of YAML. 147 lines to be exact. If you take some time to study this, you can probably understand all of the AWS resources that are being created. You could probably even understand why they are being created. Rather than go through that in detail right now, instead focus on the Python code that makes up your CDK app. It’s a lot shorter and a lot easier to understand.

First, look at your “main,” app.py.

#!/usr/bin/env python3

from aws_cdk import cdk
from hello.hello_stack import MyStack

app = cdk.App()

MyStack(app, "hello-cdk-1", env={'region': 'us-east-2'})
MyStack(app, "hello-cdk-2", env={'region': 'us-west-2'})

app.run()

Well, that’s short and sweet. You are creating an App, adding two instances of some class called MyStack to the app, and then calling the run method of the App object.

Now find out what’s going on in the MyStack class.

from aws_cdk import (
    aws_iam as iam,
    aws_sqs as sqs,
    aws_sns as sns,
    cdk
)

from hello_construct import HelloConstruct

class MyStack(cdk.Stack):
    def __init__(self, app: cdk.App, id: str, **kwargs) -&gt; None:
        super().__init__(app, id, **kwargs)

        queue = sqs.Queue(
            self, "MyFirstQueue",
            visibility_timeout_sec=300,
        )

        topic = sns.Topic(
            self, "MyFirstTopic",
            display_name="My First Topic"
        )

        topic.subscribe_queue(queue)

        hello = HelloConstruct(self, "MyHelloConstruct", num_buckets=4)
        user = iam.User(self, "MyUser")
        hello.grant_read(user)

This is a bit more interesting. This code is importing some CDK packages and then using those to create a few AWS resources.

First, you create an SQS queue called MyFirstQueue and set the visibility_timeout value for the queue. Then you create an SNS topic called MyFirstTopic.

The next line of code is interesting. You subscribe the SNS topic to the SQS queue and it’s all happening in one simple and easy to understand line of code.

If you have ever done this with the SDKs or with the CLI, you know that there are several steps to this process. You have to create an IAM policy that grants the topic permission to send messages to the queue, you have to create a topic subscription, etc. You can see the details in the AWS CloudFormation stack generated earlier.

All of that gets simplified into a single, readable line of code. That’s an example of what CDK constructs can do to hide complexity in your infrastructure.

The final thing happening here is that you are creating an instance of a HelloConstruct class. Look at the code behind this.


from aws_cdk import (
     aws_iam as iam,
     aws_s3 as s3,
     cdk,
)

class HelloConstruct(cdk.Construct):

    @property
    def buckets(self):
        return tuple(self._buckets)

    def __init__(self, scope: cdk.Construct, id: str, num_buckets: int) ->
 None:
        super().__init__(scope, id)
        self._buckets = []
        for i in range(0, num_buckets):
            self._buckets.append(s3.Bucket(self, f"Bucket-{i}"))

    def grant_read(self, principal: iam.IPrincipal):
        for b in self.buckets:
            b.grant_read(principal, "*")

This code shows an example of creating your own custom constructs in CDK that define arbitrary AWS resources under the hood while exposing a simple API.

Here, your construct accepts an integer parameter num_buckets in the constructor and then creates that number of buckets inside the scope passed in. It also exposes a grant_read method that automatically grants the IAM principal passed in read permissions to all buckets associated with your construct.

Deploy the AWS CloudFormation templates

The whole point of CDK is to create AWS infrastructure and so far you haven’t done any of that. So now use your CDK program to generate the AWS CloudFormation templates. Then, deploy those templates to your AWS account and validate that the right resources got created.

$ cdk deploy
This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening).
Please confirm you intend to make the following modifications:

IAM Statement Changes
┌───┬───────────────┬────────┬───────────────┬───────────────┬────────────────┐
│   │ Resource      │ Effect │ Action        │ Principal     │ Condition      │
├───┼───────────────┼────────┼───────────────┼───────────────┼────────────────┤
│ + │ ${MyFirstQueu │ Allow  │ sqs:SendMessa │ Service:sns.a │ "ArnEquals": { │
│   │ e.Arn}        │        │ ge            │ mazonaws.com  │   "aws:SourceA │
│   │               │        │               │               │ rn": "${MyFirs │
│   │               │        │               │               │ tTopic}"       │
│   │               │        │               │               │ }              │
├───┼───────────────┼────────┼───────────────┼───────────────┼────────────────┤
│ + │ ${MyHelloCons │ Allow  │ s3:GetBucket* │ AWS:${MyUser} │                │
│   │ truct/Bucket- │        │ s3:GetObject* │               │                │
│   │ 0.Arn}        │        │ s3:List*      │               │                │
│   │ ${MyHelloCons │        │               │               │                │
│   │ truct/Bucket- │        │               │               │                │
│   │ 0.Arn}/*      │        │               │               │                │
├───┼───────────────┼────────┼───────────────┼───────────────┼────────────────┤
│ + │ ${MyHelloCons │ Allow  │ s3:GetBucket* │ AWS:${MyUser} │                │
│   │ truct/Bucket- │        │ s3:GetObject* │               │                │
│   │ 1.Arn}        │        │ s3:List*      │               │                │
│   │ ${MyHelloCons │        │               │               │                │
│   │ truct/Bucket- │        │               │               │                │
│   │ 1.Arn}/*      │        │               │               │                │
├───┼───────────────┼────────┼───────────────┼───────────────┼────────────────┤
│ + │ ${MyHelloCons │ Allow  │ s3:GetBucket* │ AWS:${MyUser} │                │
│   │ truct/Bucket- │        │ s3:GetObject* │               │                │
│   │ 2.Arn}        │        │ s3:List*      │               │                │
│   │ ${MyHelloCons │        │               │               │                │
│   │ truct/Bucket- │        │               │               │                │
│   │ 2.Arn}/*      │        │               │               │                │
├───┼───────────────┼────────┼───────────────┼───────────────┼────────────────┤
│ + │ ${MyHelloCons │ Allow  │ s3:GetBucket* │ AWS:${MyUser} │                │
│   │ truct/Bucket- │        │ s3:GetObject* │               │                │
│   │ 3.Arn}        │        │ s3:List*      │               │                │
│   │ ${MyHelloCons │        │               │               │                │
│   │ truct/Bucket- │        │               │               │                │
│   │ 3.Arn}/*      │        │               │               │                │
└───┴───────────────┴────────┴───────────────┴───────────────┴────────────────┘
(NOTE: There may be security-related changes not in this list. See http://bit.ly/cdk-2EhF7Np)

Do you wish to deploy these changes (y/n)?

Here, the CDK is telling you about the security-related changes that this deployment includes. It shows you the resources or ARN patterns involved, the actions being granted, and the IAM principals to which the grants apply. You can review these and press y when ready. You then see status reported about the resources being created.

hello-cdk-1: deploying...
hello-cdk-1: creating CloudFormation changeset...
0/12 | 8:41:14 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-0 (MyHelloConstructBucket0DAEC57E1)
0/12 | 8:41:14 AM | CREATE_IN_PROGRESS | AWS::IAM::User | MyUser (MyUserDC45028B)
0/12 | 8:41:14 AM | CREATE_IN_PROGRESS | AWS::IAM::User | MyUser (MyUserDC45028B) Resource creation Initiated
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::CDK::Metadata | CDKMetadata
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-3 (MyHelloConstructBucket398A5DE67)
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-1 (MyHelloConstructBucket18D9883BE)
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-0 (MyHelloConstructBucket0DAEC57E1) Resource creation Initiated
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::SQS::Queue | MyFirstQueue (MyFirstQueueFF09316A)
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-2 (MyHelloConstructBucket2C1DA3656)
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::SNS::Topic | MyFirstTopic (MyFirstTopic0ED1F8A4)
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-3 (MyHelloConstructBucket398A5DE67) Resource creation Initiated
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-1 (MyHelloConstructBucket18D9883BE) Resource creation Initiated
0/12 | 8:41:15 AM | CREATE_IN_PROGRESS | AWS::SQS::Queue | MyFirstQueue (MyFirstQueueFF09316A) Resource creation Initiated
0/12 | 8:41:16 AM | CREATE_IN_PROGRESS | AWS::SNS::Topic | MyFirstTopic (MyFirstTopic0ED1F8A4) Resource creation Initiated
0/12 | 8:41:16 AM | CREATE_IN_PROGRESS | AWS::S3::Bucket | MyHelloConstruct/Bucket-2 (MyHelloConstructBucket2C1DA3656) Resource creation Initiated
1/12 | 8:41:16 AM | CREATE_COMPLETE | AWS::SQS::Queue | MyFirstQueue (MyFirstQueueFF09316A)
1/12 | 8:41:17 AM | CREATE_IN_PROGRESS | AWS::CDK::Metadata | CDKMetadata Resource creation Initiated
2/12 | 8:41:17 AM | CREATE_COMPLETE | AWS::CDK::Metadata | CDKMetadata
3/12 | 8:41:26 AM | CREATE_COMPLETE | AWS::SNS::Topic | MyFirstTopic (MyFirstTopic0ED1F8A4)
3/12 | 8:41:28 AM | CREATE_IN_PROGRESS | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6)
3/12 | 8:41:29 AM | CREATE_IN_PROGRESS | AWS::SQS::QueuePolicy | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78)
3/12 | 8:41:29 AM | CREATE_IN_PROGRESS | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6) Resource creation Initiated
4/12 | 8:41:30 AM | CREATE_COMPLETE | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6)
4/12 | 8:41:30 AM | CREATE_IN_PROGRESS | AWS::SQS::QueuePolicy | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78) Resource creation Initiated
5/12 | 8:41:30 AM | CREATE_COMPLETE | AWS::SQS::QueuePolicy | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78)
6/12 | 8:41:35 AM | CREATE_COMPLETE | AWS::S3::Bucket | MyHelloConstruct/Bucket-0 (MyHelloConstructBucket0DAEC57E1)
7/12 | 8:41:36 AM | CREATE_COMPLETE | AWS::S3::Bucket | MyHelloConstruct/Bucket-3 (MyHelloConstructBucket398A5DE67)
8/12 | 8:41:36 AM | CREATE_COMPLETE | AWS::S3::Bucket | MyHelloConstruct/Bucket-1 (MyHelloConstructBucket18D9883BE)
9/12 | 8:41:36 AM | CREATE_COMPLETE | AWS::S3::Bucket | MyHelloConstruct/Bucket-2 (MyHelloConstructBucket2C1DA3656)
10/12 | 8:41:50 AM | CREATE_COMPLETE | AWS::IAM::User | MyUser (MyUserDC45028B)
10/12 | 8:41:53 AM | CREATE_IN_PROGRESS | AWS::IAM::Policy | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426)
10/12 | 8:41:53 AM | CREATE_IN_PROGRESS | AWS::IAM::Policy | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426) Resource creation Initiated
11/12 | 8:42:02 AM | CREATE_COMPLETE | AWS::IAM::Policy | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426)
12/12 | 8:42:03 AM | CREATE_COMPLETE | AWS::CloudFormation::Stack | hello-cdk-1

✅ hello-cdk-1

Stack ARN:
arn:aws:cloudformation:us-east-2:433781611764:stack/hello-cdk-1/87482f50-6c27-11e9-87d0-026465bb0bfc

At this point, the CLI presents you with another summary of IAM changes and asks you to confirm. This is because your CDK sample application creates two stacks in two different AWS Regions. Approve the changes for the second stack and you see similar status output.

Clean up

Now you can use the AWS Management Console to look at the resources that were created and validate that it all makes sense. After you are finished, you can easily destroy all of these resources with a single command.

$ cdk destroy
Are you sure you want to delete: hello-cdk-2, hello-cdk-1 (y/n)? y

hello-cdk-2: destroying...
   0 | 8:48:31 AM | DELETE_IN_PROGRESS   | AWS::CloudFormation::Stack | hello-cdk-2 User Initiated
   0 | 8:48:33 AM | DELETE_IN_PROGRESS   | AWS::CDK::Metadata     | CDKMetadata 
   0 | 8:48:33 AM | DELETE_IN_PROGRESS   | AWS::IAM::Policy       | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426) 
   0 | 8:48:33 AM | DELETE_IN_PROGRESS   | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6) 
   0 | 8:48:33 AM | DELETE_IN_PROGRESS   | AWS::SQS::QueuePolicy  | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78) 
   1 | 8:48:34 AM | DELETE_COMPLETE      | AWS::SQS::QueuePolicy  | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78) <br />   2 | 8:48:34 AM | DELETE_COMPLETE      | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6) 
   3 | 8:48:34 AM | DELETE_COMPLETE      | AWS::IAM::Policy       | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426) 
   4 | 8:48:35 AM | DELETE_COMPLETE      | AWS::CDK::Metadata     | CDKMetadata 
   4 | 8:48:35 AM | DELETE_IN_PROGRESS   | AWS::IAM::User         | MyUser (MyUserDC45028B) 
   4 | 8:48:36 AM | DELETE_IN_PROGRESS   | AWS::SNS::Topic        | MyFirstTopic (MyFirstTopic0ED1F8A4)
   4 | 8:48:36 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-0 (MyHelloConstructBucket0DAEC57E1) 
   4 | 8:48:36 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-2 (MyHelloConstructBucket2C1DA3656) 
   4 | 8:48:36 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-1 (MyHelloConstructBucket18D9883BE) 
   4 | 8:48:36 AM | DELETE_IN_PROGRESS   | AWS::SQS::Queue        | MyFirstQueue (MyFirstQueueFF09316A) 
   4 | 8:48:36 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-3 (MyHelloConstructBucket398A5DE67) 
   5 | 8:48:36 AM | DELETE_COMPLETE      | AWS::SNS::Topic        | MyFirstTopic (MyFirstTopic0ED1F8A4) 
   6 | 8:48:36 AM | DELETE_COMPLETE      | AWS::IAM::User         | MyUser (MyUserDC45028B) 
 6 Currently in progress: hello-cdk-2, MyFirstQueueFF09316A

 ✅  hello-cdk-2: destroyed
hello-cdk-1: destroying...
   0 | 8:49:38 AM | DELETE_IN_PROGRESS   | AWS::CloudFormation::Stack | hello-cdk-1 User Initiated
   0 | 8:49:40 AM | DELETE_IN_PROGRESS   | AWS::CDK::Metadata     | CDKMetadata 
   0 | 8:49:40 AM | DELETE_IN_PROGRESS   | AWS::IAM::Policy       | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426) 
   0 | 8:49:40 AM | DELETE_IN_PROGRESS   | AWS::SQS::QueuePolicy  | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78) 
   0 | 8:49:40 AM | DELETE_IN_PROGRESS   | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6) 
   1 | 8:49:41 AM | DELETE_COMPLETE      | AWS::IAM::Policy       | MyUser/DefaultPolicy (MyUserDefaultPolicy7B897426) 
   2 | 8:49:41 AM | DELETE_COMPLETE      | AWS::SQS::QueuePolicy  | MyFirstQueue/Policy (MyFirstQueuePolicy596EEC78) 
   3 | 8:49:41 AM | DELETE_COMPLETE      | AWS::SNS::Subscription | MyFirstQueue/MyFirstTopicSubscription (MyFirstQueueMyFirstTopicSubscription774591B6) 
   4 | 8:49:42 AM | DELETE_COMPLETE      | AWS::CDK::Metadata     | CDKMetadata 
   4 | 8:49:42 AM | DELETE_IN_PROGRESS   | AWS::IAM::User         | MyUser (MyUserDC45028B) 
   4 | 8:49:42 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-2 (MyHelloConstructBucket2C1DA3656) 
   4 | 8:49:42 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-3 (MyHelloConstructBucket398A5DE67) 
   4 | 8:49:42 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-0 (MyHelloConstructBucket0DAEC57E1) 
   4 | 8:49:42 AM | DELETE_IN_PROGRESS   | AWS::SNS::Topic        | MyFirstTopic (MyFirstTopic0ED1F8A4) 
   4 | 8:49:42 AM | DELETE_SKIPPED       | AWS::S3::Bucket        | MyHelloConstruct/Bucket-1 (MyHelloConstructBucket18D9883BE) 
   5 | 8:49:42 AM | DELETE_COMPLETE      | AWS::IAM::User         | MyUser (MyUserDC45028B) 
   5 | 8:49:42 AM | DELETE_IN_PROGRESS   | AWS::SQS::Queue        | MyFirstQueue (MyFirstQueueFF09316A) 
   6 | 8:49:43 AM | DELETE_COMPLETE      | AWS::SNS::Topic        | MyFirstTopic (MyFirstTopic0ED1F8A4) 
 6 Currently in progress: hello-cdk-1, MyFirstQueueFF09316A
   7 | 8:50:43 AM | DELETE_COMPLETE      | AWS::SQS::Queue        | MyFirstQueue (MyFirstQueueFF09316A)

 ✅  hello-cdk-1: destroyed

 

Conclusion

In this post, I introduced you to the AWS Cloud Development Kit. You saw how it enables you to define your AWS infrastructure in modern programming languages like TypeScript, Java, C#, and now Python. I showed you how to use the CDK CLI to initialize a new sample application in Python, and walked you though the project structure. I taught you how to use the CDK to synthesize your Python code into AWS CloudFormation templates and deploy them through AWS CloudFormation to provision AWS infrastructure. Finally, I showed you how to clean up these resources when you’re done.

Now it’s your turn. Go build something amazing with the AWS CDK for Python! To help get you started, see the following resources:

The CDK and the Python language binding are currently in developer preview, so I’d love to get feedback on what you like, and where AWS can do better. The team lives on GitHub at https://github.com/awslabs/aws-cdk where it’s easy to get directly in touch with the engineers building the CDK. Raise an issue if you discover a bug or want to make a feature request. Join the conversation on the aws-cdk Gitter channel to ask questions.

 

from AWS Developer Blog https://aws.amazon.com/blogs/developer/getting-started-with-the-aws-cloud-development-kit-and-python/

Node.js 6 is approaching End-of-Life – upgrade your AWS Lambda functions to the Node.js 10 LTS

Node.js 6 is approaching End-of-Life – upgrade your AWS Lambda functions to the Node.js 10 LTS

This blog was authored by Liz Parody, Developer Relations Manager at NodeSource.

 

Node.js 6.x (“Boron”), which has been maintained as a long-term stable (LTS) release line since fall of 2016, is reaching its scheduled end-of-life (EOL) on April 30, 2019. After the maintenance period ends, Node.js 6 will no longer be included in Node.js releases. This includes releases that address critical bugs, security fixes, patches, or other important updates.

[Image source]

Recently, AWS has been reminding users to upgrade AWS Lambda functions built on the Node.js 6 runtime to a newer version. This is because language runtimes that have reached EOL are unsupported in Lambda.

Requests for feature additions to this release line aren’t accepted. Continued use of the Node.js 6 runtime after April 30, 2019 increases your exposure to various risks, including the following:

  • Security vulnerabilities – Node.js contributors are constantly working to fix security flaws of all severity levels (low, moderate, and high). In the February 2019 Security Release, all actively maintained Node.js release lines were patched, including “Boron”. After April 30, security releases will no longer be applied to Node.js 6, increasing the potential for malicious attacks.
  • Software incompatibility – Newer versions of Node.js better support current best practices and newer design patterns. For example, the popular async/await pattern to interact with promises was first introduced in the Node.js 8 (“Carbon”) release line. “Boron” users can’t take advantage of this feature. If you don’t upgrade to a newer release line, you miss out on features and improvements that enable you to write better, more performant applications.
  • Compliance issues – This risk applies most to teams in highly regulated industries such as healthcare, finance, or ecommerce. It also applies to those who deal with sensitive data such as personally identifiable information (PII). Exposing these types of data to unnecessary risk can result in severe consequences, ranging from extended legal battles to hefty fines.
  • Poor performance and reliability – The Node.js 10 (“Dubnium”) runtime is significantly faster than Node.js 6, with the capacity to perform twice as many operations per second. Lambda is an especially popular choice for applications that must deliver low latency and high performance. Upgrading to a newer version of the Node.js runtime is a relatively painless way to improve the performance of your application.
  • Higher operating costs – The performance benefits of the Node.js 10 runtime compared to Node.js 6 can directly translate to reduced operational costs. Aside from missing the day-to-day savings, running an unmaintained version of the Node.js runtime also significantly increases the likelihood of unexpected costs associated with an outage or critical issue.

Key differences between Node.js 6 and Node.js 10

Metrics provided by the Node.js Benchmarking working group highlight the performance benefits of upgrading from Node.js 6 to the most recent LTS release line, Node.js 10:

  • Operations per second are nearly two times higher in Node.js 10 versus Node.js 6.
  • Latency has decreased by 65% in Node.js 10 versus Node.js 6.
  • The footprint after load is 35% lower in Node.js 10 versus Node.js 6, resulting in improved performance in the event of a cold start.

While benchmarks don’t always reflect real-world results, the trend is clear that performance is increasing in each new Node.js release. [Data Source]

The most recent LTS release line is Node.js 10 (“Dubnium”). This release line features several enhancements and improvements over earlier versions, including the following:

  • Node.js 10 is the first release line to upgrade to OpenSSL version 1.1.0.
  • Native support for HTTP/2, first added to the Node.js 8 LTS release line, was stabilized in Node.js 10. It offers massive performance improvements over HTTP/1 (including reduced latency and minimized protocol overhead), and adds support for request prioritization and server push.
  • Node.js 10 introduces new JavaScript language capabilities, such as Function.prototype.toString() and mitigations for side-channel vulnerabilities, to help prevent information leaks.

“While there are a handful of new features, the standout changes in Node.js 10.0.0 are improvements to error handling and diagnostics that will improve the overall developer experience.” James Snell, a member of the Node.js Technical Steering Committee (TSC) [Quote source]

Upgrade using the N|Solid Lambda layer

AWS doesn’t currently offer the Node.js 10 runtime in Lambda. However, you may want to test the Node.js 10 runtime version in a development or staging environment before rolling out updates to production Lambda functions.

Before AWS adds the Node.js 10 runtime version for Lambda, NodeSource’s N|Solid runtime is available for use as a Lambda layer. It includes a 100%-compatible version for the Node.js 10 LTS release line.

If you install N|Solid as a Lambda layer, you can begin migration and testing before the Node.js 6 EOL date. You can also easily switch to the Node.js 10 runtime provided by AWS when it’s available. Choose between versions based on the Node.js 8 (“Carbon”) and 10 (“Dubnium”) LTS release lines. It takes just a few minutes to get up and running.

First, when you’re creating a function, choose Use custom runtime in function code or layer. (If you’re migrating an existing function, you can change the runtime for the function.)

 

Next, add a new Lambda layer, and choose Provide a layer version ARN. You can find the latest ARN for the N|Solid Lambda layer here. Enter the N|Solid runtime ARN for your AWS Region and Node.js version (Node.js 8 “Carbon” or Node.js 10 “Dubnium”). This is where you can use Node.js 10.

 

That’s it! Your Lambda function is now set up to use Node.js 10.

You can also update your functions to use the N|Solid Lambda layer with the AWS CLI.

To update an existing function:

aws lambda update-function-configuration --function-name <YOUR_FUNCTION_NAME> --layers arn:aws:lambda:<AWS_REGION>:800406105498:layer:nsolid-node-10:6 --runtime provided

In addition to the Node.js 10 runtime, the Lambda layer provided by NodeSource includes N|Solid. N|Solid for AWS Lambda provides low-impact performance monitoring for Lambda functions. To take advantage of this feature, you can also sign up for a free NodeSource account. After you sign up, you just need to set your N|Solid license key as an environment variable in your Lambda function.

That’s all you have to do to start monitoring your Node.js Lambda functions. After you add your license key, your Lambda function invocations should show up on the Functions tab of your N|Solid dashboard.

For more information, see our N|Solid for AWS Lambda getting started guide.

Upgrade to Node.js 10 LTS (“Dubnium”) outside of Lambda

Not only are workloads in Lambda affected, but you must consider other locations where you’re running Node.js 6. I review three more ways to upgrade your version of Node.js in other compute environments.

Use NVM

One of the best practices for upgrading Node.js versions is using NVM. NVM, or Node Version Manager, lets you manage multiple active Node.js versions.

To install NVM on *nix systems, you can use the install script using cURL.

$ curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.34.0/install.sh | bash

or Wget:

$ wget -qO- https://raw.githubusercontent.com/creationix/nvm/v0.34.0/install.sh | bash

For Windows-based systems, you can use NVM for Windows.

After NVM is installed, you can manage your versions of Node.js with some simple AWS CLI commands.

To download, compile, and install the latest release of Node.js:

$ nvm install node # "node" is an alias for the latest version

To install a specific version of Node.js:

$ nvm install 10.10.0 # or 8.5.0, 8.9.1, etc.

Upgrade manually

To upgrade Node.js without a tool like NVM, you can manually install a new version. NodeSource provides Linux distributions for Node.js, and recommends that you upgrade using the NodeSource Node.js Binary Distributions.

To install Node.js 10:

Using Ubuntu

$ curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - 
$ sudo apt-get install -y nodejs

Using Amazon Linux

$ curl -sL https://rpm.nodesource.com/setup_10.x | sudo bash -
$ sudo yum install -y nodejs

Most production applications built on Node.js make use of LTS release lines. We highly recommend that you upgrade any application or Lambda function currently using the Node.js 6 runtime version to Node.js 10, the newest LTS version.

To hear more about the latest release line, check out NodeSource’s webinar, New and Exciting Features Landing in Node.js 12. This release line officially becomes the current LTS version in October 2019.

About the Author

Liz is a self-taught Software Engineer focused on JavaScript, and Developer Relations Manager at NodeSource. She organizes different community events such as JSConf Colombia, Pioneras Developers, Startup Weekend and has been a speaker at EmpireJS, MedellinJS, PionerasDev, and GDG.

She loves sharing knowledge, promoting JavaScript and Node.js ecosystem and participating in key tech events and conferences to enhance her knowledge and network.

Disclaimer
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

 

from AWS Developer Blog https://aws.amazon.com/blogs/developer/node-js-6-is-approaching-end-of-life-upgrade-your-aws-lambda-functions-to-the-node-js-10-lts/

Android Rx onError Guidelines

Android Rx onError Guidelines

By Ed Ballot

“Creating a good API is hard.” — anyone who has created an API used by others

As with any API, wrapping your data stream in a Rx observable requires consideration for reasonable error handling and intuitive behavior. The following guidelines are intended to help developers create consistent and intuitive API.

Since we frequently create Rx Observables in our Android app, we needed a common understanding of when to use onNext() and when to use onError() to make the API more consistent for subscribers. The divergent understanding is partially because the name “onError” is a bit misleading. The item emitted by onError() is not a simple error, but a throwable that can cause significant damage if not caught. Our app has a global handler that prevents it from crashing outright, but an uncaught exception can still leave parts of the app in an unpredictable state.

TL;DR — Prefer onNext() and only use onError() for exceptional cases.

Considerations for onNext / onError

The following are points to consider when determining whether to use onNext() versus onError().

The Contract

First here are the definitions of the two from the ReactiveX contract page:

OnNext
conveys an item that is emitted by the Observable to the observer

OnError
indicates that the Observable has terminated with a specified error condition and that it will be emitting no further items

As pointed out in the above definition, a subscription is automatically disposed after onError(), just like after onComplete(). Because of this, onError() should only be used to signal a fatal error and never to signal an intermittent problem where more data is expected to stream through the subscription after the error.

Treat it like an Exception

Limit using onError() for exceptional circumstances when you’d also consider throwing an Error or Exception. The reasoning is that the onError() parameter is a Throwable. An example for differentiating: a database query returning zero results is typically not an exception. The database returning zero results because it was forcibly closed (or otherwise put in a state that cancels the running query) would be an exceptional condition.

Be Consistent

Do not make your observable emit a mix of both deterministic and non-deterministic errors. Something is deterministic if the same input always result in the same output, such as dividing by 0 will fail every time. Something is non-deterministic if the same inputs may result in different outputs, such as a network request which may timeout or may return results before the timeout. Rx has convenience methods built around error handling, such as retry() (and our retryWithBackoff()). The primary use of retry() is to automatically re-subscribe an observable that has non-deterministic errors. When an observable mixes the two types of errors, it makes retrying less obvious since retrying a deterministic failures doesn’t make sense — or is wasteful since the retry is guaranteed to fail. (Two notes: 1. retry can also be used in certain deterministic cases like user login attempts, where the failure is caused by incorrectly entering credentials. 2. For mixed errors, retryWhen() could be used to only retry the non-deterministic errors.) If you find your observable needs to emit both types of errors, consider whether there is an appropriate separation of concerns. It may be that the observable can be split into several observables that each have a more targeted purpose.

Be Consistent with Underlying APIs

When wrapping an asynchronous API in Rx, consider maintaining consistency with the underlying API’s error handling. For example, if you are wrapping a touch event system that treats moving off the device’s touchscreen as an exception and terminates the touch session, then it may make sense to emit that error via onError(). On the other hand, if it treats moving off the touchscreen as a data event and allows the user to drag their finger back onto the screen, it makes sense to emit it via onNext().

Avoid Business Logic

Related to the previous point. Avoid adding business logic that interprets the data and converts it into errors. The code that the observable is wrapping should have the appropriate logic to perform these conversions. In the rare case that it does not, consider adding an abstraction layer that encapsulates this logic (for both normal and error cases) rather than building it into the observable.

Passing Details in onError()

If your code is going to use onError(), remember that the throwable it emits should include appropriate data for the subscriber to understand what went wrong and how to handle it.

For example, our Falcor response handler uses a FalcorError class that includes the Status from the callback. Repositories could also throw an extension of this class, if extra details need to be included.


Android Rx onError Guidelines was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Netflix TechBlog – Medium https://medium.com/netflix-techblog/android-rx-onerror-guidelines-e68e8dc7383f?source=rss—-2615bd06b42e—4

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

by Guillaume du Pontavice, Phill Williams and Kylee Peña (on behalf of our Streaming Algorithms, Audio Algorithms, and Creative Technologies teams)

Remember the epic opening sequence of Stranger Things 2? The thrill of that car chase through Pittsburgh not only introduced a whole new set of mysteries, but it returned us to a beloved and dangerous world alongside Dustin, Lucas, Mike, Will and Eleven. Maybe you were one of the millions of people who watched it in HDR, experiencing the brilliant imagery as it was meant to be seen by the creatives who dreamt it up.

Imagine this scene without the sound. Even taking away one part of the soundtrack — the brilliant synth-pop score or the perfectly mixed soundscape of a high speed chase — is the story nearly as thrilling and emotional?

Most conversations about streaming quality focus on video. In fact, Netflix has led the charge for most of the video technology that drives these conversations, from visual quality improvements like 4K and HDR, to behind-the-scenes technologies that make the streaming experience better for everyone, like adaptive streaming, complexity-based encoding, and AV1.

We’re really proud of the improvements we’ve brought to the video experience, but the focus on those makes it easy to overlook the importance of sound, and sound is every bit as important to entertainment as video. Variances in sound can be extremely subtle, but the impact on how the viewer perceives a scene differently is often measurable. For example, have you ever seen a TV show where the video and audio were a little out of sync?

Among those who understand the vital nature of sound are the Duffer brothers. In late 2017, we received some critical feedback from the brothers on the Stranger Things 2 audio mix: in some scenes, there was a reduced sense of where sounds are located in the 5.1-channel stream, as well as audible degradation of high frequencies.

Our engineering team and Creative Technologies sound expert joined forces to quickly solve the issue, but a larger conversation about higher quality audio continued. Series mixes were getting bolder and more cinematic with tight levels between dialog, music and effects elements. Creative choices increasingly tested the limits of our encoding quality. We needed to support these choices better.

At Netflix, we work hard to bring great audio to our members. We began streaming 5.1 surround audio in 2010, and began streaming Dolby Atmos in 2016, but wanted to bring studio quality sound to our members around the world. We want your experience to be brilliant even if you aren’t listening with a state-of-the-art home theater system. Just as we support initiatives like HDR and Netflix Calibrated Mode to maintain creative intent in streaming you picture, we wanted to do the same for the sound. That’s why we developed and launched high-quality audio.

To learn more about the people and inspiration behind this effort, check out this video. In this tech blog, we’ll dive deep into what high-quality audio is, how we deliver it to members worldwide, and why it’s so important to us.

What do we mean by “studio quality” sound?

If you’ve ever been in a professional recording studio, you’ve probably noted the difference in how things sound. One reason for that is the files used in mastering sessions are 24-bit 48 kHz with a bitrate of around 1 Mbps per channel. Studio mixes are uncompressed, which is why we consider them to be the “master” version.

Our high-quality sound feature is not lossless, but it is perceptually transparent. That means that while the audio is compressed, it is indistinguishable from the original source. Based on internal listening tests, listening test results provided by Dolby, and scientific studies, we determined that for Dolby Digital Plus at and above 640 kbps, the audio coding quality is perceptually transparent. Beyond that, we would be sending you files that have a higher bitrate (and take up more bandwidth) without bringing any additional value to the listening experience.

In addition to deciding 640 kbps — a 10:1 compression ratio when compared to a 24-bit 5.1 channel studio master — was the perceptually transparent threshold for audio, we set up a bitrate ladder for 5.1-channel audio ranging from 192 up to 640 kbps. This ranges from “good” audio to “transparent” — there aren’t any bad audio experiences when you stream!

At the same time, we revisited our Dolby Atmos bitrates and increased the highest offering to 768 kbps. We expect these bitrates to evolve over time as we get more efficient with our encoding techniques.

Our high-quality sound is a great experience for our members even if they aren’t audiophiles. Sound helps to tell the story subconsciously, shaping our experience through subtle cues like the sharpness of a phone ring or the way a very dense flock of bird chirps can increase anxiety in a scene. Although variances in sound can be nuanced, the impact on the viewing and listening experience is often measurable.

And perhaps most of all, our “studio quality” sound is faithful to what the mixers are creating on the mix stage. For many years in the film and television industry, creatives would spend days on the stage perfecting the mix only to have it significantly degraded by the time it was broadcast to viewers. Sometimes critical sound cues might even be lost to the detriment of the story. By delivering studio quality sound, we’re preserving the creative intent from the mix stage.

Adaptive Streaming for Audio

Since we began streaming, we’ve used static audio streaming at a constant bitrate. This approach selects the audio bitrate based on network conditions at the start of playback. However, we have spent years optimizing our adaptive streaming engine for video, so we know adaptive streaming has obvious benefits. Until now, we’ve only used adaptive streaming for video.

Adaptive streaming is a technology designed to deliver media to the user in the most optimal way for their network connection. Media is split into many small segments (chunks) and each chunk contains a few seconds of playback data. Media is provided in several qualities.

An adaptive streaming algorithm’s goal is to provide the best overall playback experience — even under a constrained environment. A great playback experience should provide the best overall quality, considering both audio and video, and avoid buffer starvation which leads to a rebuffering event — or playback interruption.

Constrained environments can be due to changing network conditions and device performance limitations. Adaptive streaming has to take all these into account. Delivering a great playback experience is difficult.

Let’s first look at how static audio streaming paired with adaptive video operates in a session with variable network conditions — in this case, a sudden throughput drop during the session.

The top graph shows both the audio and video bitrate, along with the available network throughput. The audio bitrate is fixed and has been selected at playback start whereas video bitrate varies and can adapt periodically.

The bottom graph shows audio and video buffer evolution: if we are able to fill the buffer faster than we play out, our buffer will grow. If not, our buffer will shrink.

In the first session above, the adaptive streaming algorithm for video has reacted to the throughput drop and was able to quickly stabilize both the audio and video buffer level by down-switching the video bitrate.

In the second scenario below, under the same network conditions we used a static high-quality audio bitrate at session start instead.

Our adaptive streaming for video logic is reacting but in this case, the available throughput is becoming less than the sum of audio and video bitrate, and our buffer starts draining. This ultimately leads to a rebuffer.

In this scenario, the video bitrate dropped below the audio bitrate, which might not provide the best playback experience.

This simple example highlights that static audio streaming can lead to suboptimal playback experiences with fluctuating network conditions. This motivated us to use adaptive streaming for audio.

By using adaptive streaming for audio, we allow audio quality to adjust during playback to bandwidth capabilities, just like we do for video.

Let’s consider a playback session with exactly the same network conditions (a sudden throughput drop) to illustrate the benefit of adaptive streaming for audio.

In this case we are able to select a higher audio bitrate when network conditions supported it and we are able to gracefully switch down the audio bitrate and avoid a rebuffer event by maintaining healthy audio and video buffer levels. Moreover, we were able to maintain a higher video bitrate when compared to the previous example.

The benefits are obvious in this simple case, but extending it to our broad streaming ecosystem was another challenge. There were many questions we had to answer in order to move forward with adaptive streaming for audio.

What about device reach? We have hundreds of millions of TV devices in the field, with different CPU, network and memory profiles, and adaptive audio has never been certified. Do these devices even support audio stream switching?

  • We had to assess this by testing adaptive audio switching on all Netflix supported devices.
  • We also added adaptive audio testing in our certification process so that every new certified device can benefit from it.

Once we knew that adaptive streaming for audio was achievable on most of our TV devices, we had to answer the following questions as we designed the algorithm:

  • How could we guarantee that we can improve audio subjective quality without degrading video quality and vice-versa?
  • How could we guarantee that we won’t introduce additional rebuffers or increase the startup delay with high-quality audio?
  • How could we guarantee that this algorithm will gracefully handle devices with different performance characteristics?

We answered these questions via experimentation that led to fine-tuning the adaptive streaming for audio algorithm in order to increase audio quality without degrading the video experience. After a year of work, we were able to answer these questions and implement adaptive audio streaming on a majority of TV devices.

Enjoying a Higher Quality Experience

By using our listening tests and scientific data to choose an optimal “transparent” bitrate, and designing an adaptive audio algorithm that could serve it based on network conditions, we’ve been able to enable this feature on a wide variety of devices with different CPU, network and memory profiles: the vast majority of our members using 5.1 should be able to enjoy new high-quality audio.

And it won’t have any negative impact on the streaming experience. The adaptive bitrate switching happens seamlessly during a streaming experience, with the available bitrates ranging from good to transparent, so you shouldn’t notice a difference other than better sound. If your network conditions are good, you’ll be served up the best possible audio, and it will now likely sound like it did on the mixing stage. If your network has an issue — your sister starts a huge download or your cat unplugs your router — our adaptive streaming will help you out.

After years perfecting our adaptive video switching, we’re thrilled that a similar approach can enable studio quality sound to make it to members’ households, ensuring that every detail of the mix is preserved. Uniquely combining creative technology with engineering teams at Netflix, we’ve been able to not only solve a problem, but use that problem to improve the quality of audio for millions of our members worldwide.

Preserving the original creative intent of the hard-working people who make shows like Stranger Things is a top priority, and we know it enhances your viewing — and listening — experience for many more moments of joy. Whether you’ve fallen into the Upside Down or you’re being chased by the Demogorgon, get ready for a sound experience like never before.


Engineering a Studio Quality Experience With High-Quality Audio at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Netflix TechBlog – Medium https://medium.com/netflix-techblog/engineering-a-studio-quality-experience-with-high-quality-audio-at-netflix-eaa0b6145f32?source=rss—-2615bd06b42e—4

V2 AWS SDK for Go adds Context to API operations

V2 AWS SDK for Go adds Context to API operations

The v2 AWS SDK for Go developer preview made a breaking change in the release of v0.8.0. The v0.8.0 release added a new parameter, context.Context, to the SDK’s Send and Paginate Next methods.

Context was added as a required parameter to the Send and Paginate Next methods to enable you to use the v2 SDK for Go in your application with cancellation and request tracing.

Using the Context pattern helps reduce the chance of code paths mistakenly dropping the Context, causing the cancellation and tracing chain to be lost. When the Context is lost, it can be difficult to track down the missing cancellation and tracing metrics within an application.

Migrating to v0.8.0

After you update your application to depend on v0.8.0 of the v2 SDK, you’ll encounter compile errors. This is because of the Context parameter that was added to the Send and Paginate Next methods.

If your application is already using the Context pattern, you can now pass the Context into Send and Paginate Next methods directly, instead of calling SetContext on the request returned by the client’s operation request method.

If you don’t need a Context within your application, you can use context.Background() or context.TODO() instead of specifying a Context, such as a timeout, deadline, cancel, or httptrace.ClientTrace.

Example code: before v0.8.0

The following code is an example of an application using the Amazon S3 service’s PutObject API operation with the v2 SDK before v0.8.0. The example code is
using the req.SetContext method to specify the Context for the PutObject operation.

func uploadObject(ctx context.Context, bucket, key string, obj io.ReadSeeker) error
	req := svc.PutObjectRequest(&s3.PutObjectInput{
		Bucket: &bucket,
		Key:    &key,
		Body:   obj,
	})
	req.SetContext(ctx)

	_, err := req.Send()
	return err
}

Example code: updated to v0.8.0

To migrate the previous example code to use v0.8.0 of the v2 SDK, we need to remove the req.SetContext method call, and pass the Context directly to
the Send method instead. This change will make the example code compatible with v0.8.0 of the v2 SDK.

func uploadObject(ctx context.Context, bucket, key string, obj io.ReadSeeker) error
	req := svc.PutObjectRequest(&s3.PutObjectInput{
		Bucket: &bucket,
		Key:    &key,
		Body:   obj,
	})

	_, err := req.Send(ctx)
	return err
}

What’s next for the v2 SDK for Go developer preview?

We’re working to improve usability and reduce pain points with the v2 SDK. Two specific areas we’re looking at are the SDK’s request lifecycle and error handling.

Improving the SDK’s request lifecycle will help reduce your application’s CPU and memory performance when using the SDK. It also makes it easier for you to extend and modify the SDK’s core functionality.

For the SDK’s error handling, we’re investigating alternative approaches, such as typed errors for API operation exceptions. By using typed errors, your application can assert directly against the error type. This would reduce the need to do string comparisons for SDK API operation response errors.

See our issues on Github to share your feedback, questions, and feature requests, and to stay current with the v2 AWS SDK for Go developer preview as it moves to GA.

from AWS Developer Blog https://aws.amazon.com/blogs/developer/v2-aws-sdk-for-go-adds-context-to-api-operations/

New — Analyze and debug distributed applications interactively using AWS X-Ray Analytics

New — Analyze and debug distributed applications interactively using AWS X-Ray Analytics

Developers spend a lot of time searching through application logs, service logs, metrics, and traces to understand performance bottlenecks and to pinpoint their root causes. Correlating this information to identify its impact on end users comes with its own challenges of mining the data and performing analysis. This adds to the triaging time when using a distributed microservices architecture, where the call passes through several microservices. To address these challenges, AWS launched AWS X-Ray Analytics.

X-Ray helps you analyze and debug distributed applications, such as those built using a microservices architecture. Using X-Ray, you can understand how your application and its underlying services are performing to identify and troubleshoot the root causes of performance issues and errors. It helps you debug and triage distributed applications wherever those applications are running, whether the architecture is serverless, containers, Amazon EC2, on-premises, or a mixture of all of these.

AWS X-Ray Analytics helps you quickly and easily understand:

  • Any latency degradation or increase in error or fault rates.
  • The latency experienced by customers in the 50th, 90th, and 95th percentiles.
  • The root cause of the issue at hand.
  • End users who are impacted, and by how much.
  • Comparisons of trends, based on different criteria. For example, you can understand if new deployments caused a regression.

In this post, I walk you through several use cases to see how you can use X-Ray Analytics to address these issues.

AWS X-Ray Analytics Walkthrough

The following is a service map of an online store application hosted on Amazon EC2 and serverless technologies like Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. Using this service map, you can easily see that there are faults in the “products” microservice in the selected time range.

Use X-Ray Analytics to explore the root cause and end-user impact. Looking at the X-Ray Analytics console, you can determine that the 50th-percentile customers have latency of around 1.6 seconds. The 95th-percentile customers have latency of more that 2.5 seconds using the response time distribution.

This chart also helps you see the overall latency distribution of the requests in the selected group for the selected time range. You can learn more about X-Ray groups and their use cases in the Deep dive into AWS X-Ray groups and use cases post.

Now, you want to triage the increase in latency in requests that are taking more than 1.5 seconds and get to its root cause. Select those traces from the graph, as shown below. You see that all the numbers in the chart, like Time series activity and tables, are automatically updated based on the filter criteria. Also, a new Filtered traces trend line, indicated in blue, is added.

This Filtered trace set A trend line keeps updating as you add new criteria. For example, looking at the following tables, you can easily see that around 85% of these high-latency requests result in 500 errors, and Emma is the most impacted customer.

To focus on the traces that result in 500 errors, select that row from the table and see the filtered traces and other data points getting updated. In the Root Cause section, see the root cause of issues resulting in this increased latency. You can see that the DynamoDB wait in the “products” service has resulted in around 57% of the errors. You can also view individual traces that match the selected filters, as shown.

Selecting the Fault Root Cause using the cog icon helps in viewing the fault exception. This indicates that the configured, provisioned throughput capacity of the DynamoDB table has exceeded its capacity, giving a clear indication of the root cause of the issue.

You just saw how you can use X-Ray Analytics to detect an increase in latency and understand the root cause of the issue and end-user impact.

Comparison of trends

Now, see how you can compare two trends using the compare functionality in X-Ray Analytics. You can use this functionality to compare any two filter expressions. For example, you can compare performance experience between two users, or compare and analyze whether a new deployment caused any regression.

Say that you have deployed a new Lambda function at 3:40 AM. You want to compare five minutes before and five minutes after the deployment was completed to understand whether any regression was caused, and what the impact is to end users.

Use the compare functionality provided in X-Ray Analytics. In this case, two different time ranges are represented. Filtered trace set A, starting from 3:35 AM to 3:40 AM, is shown in blue, and Filtered trace set B, starting from 3:40 AM to 3:45 AM, is shown in green.

In compare mode, the percentage deviation column that is automatically calculated clearly indicates that 500 errors decreased by 32 percentage points after the new deployment was completed. This gives a clear indication to the DevOps team that the new deployment didn’t cause any regression and was successful in reducing errors.

Identifying outlying users

Take an example in which one of the end users, “Ava,” is complaining about degradation in performance experience from the application. None of the other users have reported this issue.

Use the compare feature in X-Ray Analytics to compare the response time of all users (blue trend line) with that of Ava (green trend line). Looking at the following response time distribution graph, it’s not easy to notice the difference in end-user experience based on the data.

However, as you look into the details of other attributes, like the annotations that you added during code instrumentation (annotation.ACL_CACHED) and response time root cause, you can get actionable insights. You see that the performance latency is in the “api” service and related to the time spent in the “generate_acl” module. Correlate that to the ACL not being cached, based on the approximate 55% delta that you see in Ava’s requests compared to other users.

You can also validate this by looking at the traces from the trace list and see that there is a 300-millisecond delay added by the “generate_acl” module. This shows how X-Ray Analytics helps correlate different attributes to understand the root cause of the issue.

Getting Started

To get started using X-Ray Analytics, visit the AWS Management Console for X-Ray. There is no additional charge for using this feature.

from AWS Developer Blog https://aws.amazon.com/blogs/developer/new-analyze-and-debug-distributed-applications-interactively-using-aws-x-ray-analytics/