Tag: Cloud

Optimizing AWS Control Tower For Multiple AWS Accounts And Teams

Optimizing AWS Control Tower For Multiple AWS Accounts And Teams

Control tower

You can see everything from up here!

One of the major benefits of optimizing Amazon Web Service is that it comes with an extensive set of tools for managing deployments and user identities. Most organizations can meticulously manage how their cloud environment is set up and how users can access different parts of that environment through AWS IAM.

However, there are times when even the most extensive IAM and other management tools just aren’t enough. For larger corporations or businesses who are scaling their cloud deployment on a higher level, setting up multiple AWS accounts—run by different teams—is often the solution.

The need for multi-account AWS environment isn’t something that Amazon ignores. In fact, the company has introduced AWS Control Tower, whose sole purpose is to make setting up new multi-account AWS environments easy.

You may also enjoy:  How AWS Control Tower Lowers the Barrier to Enterprise Cloud Migration

Quick Environment Setup With AWS Control Tower

As the name suggests, AWS Control Tower is designed to give you a comprehensive bird’s-eye view of multiple cloud environments. Control Tower is designed to make deploying, managing, and monitoring multiple AWS accounts and teams easy. The way it is set up also makes deploying AWS environments simple.

Rather than going through the setup process of new AWS accounts manually, you can now automate the creation of multiple AWS accounts and environments using Control Tower.

First, you need to define the blueprint that will be used by all of the environments; this is very similar to setting up a base operating system for OEM devices.

Blueprints are designed to make sure that the new AWS environments comply with best practices and are set up correctly from the beginning. Any customization can then be made on a per-account basis, giving the organization maximum flexibility with their cloud environments.

Among the things that the AWS Control Tower blueprints provide are identity management, access management, centralized logging, and cross-account security audits. Provisioning of cloud resources and network configurations are also included in the blueprints. You even have the ability to customize the blueprint you use to specific requirements.

Easy Monitoring Of Environments

Since AWS Control Tower is designed as a centralization tool from the beginning, you can also expect easy monitoring and maintenance of multiple AWS accounts and teams from this platform. There are guardrails added to the blueprints of AWS environments, so you know your environments are secure from the beginning. All you need to do is enforce the security policies; even that is easy and centralized.

Service control policies (SCPs) are monitored constantly. When configurations of the environments don’t comply with the required policies, warnings are triggered, and you are informed immediately. Every new account created using AWS Control Tower utilizes the same set of policies, leading to a more standardized cloud environment as a whole.

What’s interesting about the SCPs is the fact that you can dig deep into details—particularly details about accounts that don’t comply with the predefined security policies—and make adjustments as necessary. You always know the kind of information security and policy violations you are dealing with and you know exactly who to address to get the issues fixed.

As an added bonus, AWS Control Tower provides extensive reports, including on governance of workloads, security control policies, and the state of the cloud environments in general. The tool goes beyond setting up a landing zone based on best-practices. It helps you monitor those landing zones meticulously too.

Automation Is The Key

From the previous explanation, it is easy to see how AWS Control Tower is incredibly useful for organizations who need to set up multiple cloud environments. The tool allows for top administrators and business owners to keep an eye of their cloud deployment while maintaining high visibility of individual environment, deployment, and user.

That said, the AWS Control Tower still doesn’t stop there. It adds one crucial element that puts Amazon as the leader in this specific market segment: automation. Account provisioning, resource provisioning, and even the complete set up of landing zones can be fully automated with ‘recipes’ that are defined in blueprints.

Ibexlabs, for example, is already leveraging AWS Control Tower on behalf of current clients and has designed an onboarding process specifically to leverage the tool for new enterprises, too. As well as creating a landing zone with log archive and audit account, the team leverages Control Tower to launch VPCs and subnets for the organization in addition to portfolio setup.

Ibexlabs also scripts the installation of a comprehensive suite of other tools to enhance client usage of AWS including: Jenkins; CircleCI; Datadog; NewRelic; OpenVPN; and VPC peering within the accounts. On top of all this, Ibexlabs leverages CloudFormation with launch configuration and autoscaling as well as other app services according to the clients’ needs.

Automation eliminates countless mundane tasks associated with setting up and securing a new cloud environment. What used to be a tedious process that could take hours—if not days—to complete is now one or two clicks away. Automation makes the whole system more robust and flexible since customizations can now be done on a specific deployment level.

We really have to see the implementation of automation in AWS Control Tower as a part of a bigger trend. Amazon has been automating many of its AWS components in recent years, signaling a serious shift beyond DevOps. As it gets easier for even the most complex organizations to maintain its cloud environments, the days of developers running their own environments may soon be here.

Regardless of the shift, AWS Control Tower is a step in the right direction. Organizations that require multiple AWS accounts can now gain access to the resources they need without jumping through hoops of performing setup of those environments manually.

This post was originally published here.

Further Reading

AWS DevOps: Introduction to DevOps on AWS

Top 3 Areas to Automate in AWS to Avoid Overpaying Cloud Costs

from DZone Cloud Zone

Adopting Kubernetes? Here’s What You Should Definitely Not Do

Adopting Kubernetes? Here’s What You Should Definitely Not Do

Image title

Never, ever, ever forget or neglect to do these things

Kubernetes is changing the infrastructure landscape is by enabling platform teams to scale much more effectively. Rather than convincing every team to secure VMs properly, manage network devices, and then following up to make sure they’ve done so, platform teams can now hide all of these details behind a K8s abstraction. This lets both application and platform teams move more quickly: application teams because they don’t need to know all the details, and platform teams because they are free to change them.

You may also enjoy:  Kubernetes #Fails

There are some great tutorials and online courses on the Kubernetes website. If you’re new to Kubernetes, you should definitely check these out. But it’s also helpful to understand what to do.

Not Budgeting for Maintenance

The first big failure mode with Kubernetes is not budgeting for maintenance. Just because K8s hides a lot of details from application developers doesn’t mean that those details aren’t there. Someone still needs to allocate time for upgrades, setting up monitoring, and thinking about provisioning new nodes.

You should budget for (at least) quarterly upgrades of master and node infrastructure. Or however frequently you were upgrading VM images, make sure you are now doing the same for Kubernetes infrastructure as well.

Who should do these upgrades? If you’re going to roll out K8s in a way that is standard across your org (which you should be doing!), this needs to be your infrastructure team.

Moving Too Fast

The second big failure mode is that teams move so quickly they forget that adopting a new paradigm for orchestrating services creates new challenges around observability. Not only is a move to K8s often coincident with a move to microservices (which necessitates new observability tools) but pods and other K8s abstractions are often shorter-lived than traditional VMs, meaning that the way that telemetry is gathered from the application also needs to change.

The solution is to build in observability from day one. This includes instrumenting code in such a way that you can take both a user-centric as well as an infrastructure-centric view of requests — and understand how instrumentation data is transmitted, aggregated, and analyzed. Waiting until you’ve experienced an outage is too late to address these issues, as it will virtually impossible to get the data you need to understand and remediate that outage.

Not Accounting for Infrastructure

With all the hype around Kubernetes —and of course, it’s many, many benefits —it’s easy to assume that it will magically solve all your infrastructure problems. What’s great about K8s is that it goes a long way toward isolating those problems (so that platform teams can solve them more effectively) but they’ll still be there, in need of a solution.

So in addition to managing OS upgrades, vulnerability scans, and patches, your infrastructure team will also need to run, monitor, and upgrade K8s master components (API server, etcd) as well as all of the node components (docker, kubelet). If you choose a managed K8s solution, then a lot of that work will be taken care of for you, but you still need to initiate master and node upgrades. And even if they are easy, node upgrades can still be disruptive: You’ll want to make sure you have enough capacity to move services around during the upgrade. While it’s good news that application developers no longer need to think about these issues, the platform team (or someone else) still does.

Not Embracing the K8s Community

The K8s community is an incredible resource that really can’t be overvalued. Kubernetes is certainly not the first open source orchestration tool, but it’s got a vibrant and quickly growing community. This is really what’s powering the continued development of K8s, as it continues to turn out new features.

The platform boasts thousands of contributors, including collaboration with all major cloud providers and dozens of tech companies (you can check out the list here ). If you have questions or need help, it’s almost guaranteed that you can find the answer on Github or Slack, or find someone who can point you in the right direction.

And last, but certainly not least, is that contributing to and being a part of the community and can be a great way to meet other developers who might one day become members of your team.

Not Thinking Through “Matters of State”

Of course, how you divide your application into smaller services is a critical decision to get right. But for K8s specifically, it’s really important to think about how you are going to handle state: whether it’s using StatefulSets, leveraging your provider’s block storage devices, or moving to a completely managed storage solution, implementing stateful services correctly the first time around is going to save you huge headaches.

It’s all too easy to get burned by a corrupted shard in a database or other storage system, and recovering from these sorts of failures is by definition more complex when running on K8s. Needless to say, make sure you are testing disaster recovery for stateful services as they are deployed on your cluster (and not just trusting that it will just work like it did before you moved to K8s).

Not Accounting for Migration

Another important item to address is what your application looked like you began implementing K8s. Did it already have hundreds of services? If so, your biggest concern should be understanding how to migrate those services in an incremental but seamless way.

Are you just breaking the first few services off of your monolith? Making sure you have the infrastructure to support an influx of services is going to be critical to a successful implementation.

Wrapping Up: The Need for Distributed Tracing

K8s and other orchestration and coordination tools like service meshes are really only half the story. They provide flexibility in how services are deployed as well as the ability to react quickly, but they don’t offer insight into what’s actually happening in your services.

The other half is about building that insight and understanding how performance is impacting your users. That’s where LightStep comes in: , we enable teams to understand how a bad deployment in one service is affecting users 5, 10, or 100 services away.

Further Reading

Kubernetes Anti-Patterns: Let’s Do GitOps, Not CIOps!

Creating an Affordable Kubernetes Cluster

from DZone Cloud Zone

Consul Connect Integration in HashiCorp Nomad

Consul Connect Integration in HashiCorp Nomad

At Hashiconf EU 2019, we announced native Consul Connect integration in Nomad available in a technology preview release. A beta release candidate for Nomad 0.10 that includes Consul Connect integration is now available. This blog post presents an overview of service segmentation, and how to use features in Nomad to enable end-to-end mTLS between services through Consul Connect.

Background

The transition to cloud environments and a microservices architecture represents a generational challenge for IT. This transition means shifting from largely dedicated servers in a private datacenter to a pool of compute capacity available on demand. The networking layer transitions from being heavily dependent on the physical location and IP address of services and applications to using a dynamic registry of services for discovery, segmentation, and composition. An enterprise IT team does not have the same control over the network or the physical locations of compute resources and must think about service-based connectivity. The runtime layer shifts from deploying artifacts to a static application server to deploying applications to a cluster of resources that are provisioned on-demand.

HashiCorp Nomad’s focus on ease of use, flexibility, and performance, enables operators to deploy a mix of microservice, batch, containerized, and non-containerized applications in a cloud-native environment. Nomad already integrates with HashiCorp Consul to provide dynamic service registration and service configuration capabilities.

Another core challenge is service segmentation. East-West firewalls use IP-based rules to secure ingress and egress traffic. But in a dynamic world where services move across machines and machines are frequently created and destroyed, this perimeter-based approach is difficult to scale as it results in complex network topologies and a sprawl of short-lived firewall rules and proxy configurations.

Consul Connect provides service-to-service connection authorization and encryption using mutual Transport Layer Security (mTLS). Applications can use sidecar proxies in a service mesh configuration to automatically establish TLS connections for inbound and outbound connections without being aware of Connect at all. From the application’s point of view, it uses a localhost connection to send outbound traffic, and the details of TLS termination and forwarding to the right destination service are handled by Connect.

Nomad 0.10 will extend Nomad’s Consul integration capabilities to include native Connect integration. This enables services being managed by Nomad to easily opt into mTLS between services, without having to make additional code changes to their application. Developers of microservices can continue to focus on their core business logic while operating in a cloud native environment and realizing the security benefits of service segmentation. Prior to Nomad 0.10, job specification authors would have to directly run and manage Connect proxies and did not get network level isolation between tasks.

Nomad 0.10 introduces two new stanzas to Nomad’s job specification—connect and sidecar_service. The rest of this blog post shows how to leverage Consul Connect with an example dashboard application that communicates with an API service.

Prerequisites

Consul

Connect integration with Nomad requires Consul 1.6 or later. The Consul agent can be run in dev mode with the following command:

bash
$ consul agent -dev

Nomad

Nomad must schedule onto a routable interface in order for the proxies to connect to each other. The following steps show how to start a Nomad dev agent configured for Connect:
bash
$ sudo nomad agent -dev-connect

CNI Plugins

Nomad uses CNI plugins to configure the task group networks, these need to be downloaded to /opt/cni/bin on the Nomad client nodes.

Envoy

Nomad launches and manages Envoy, which runs alongside applications that opt into Connect integration. Envoy acts like a proxy to provide secure communication with other applications in the cluster. Nomad will launch Envoy using its official Docker container.

Also, note that the Connect integration in 0.10 works only in Linux environments.

Example Overview

The example in this blog post enables secure communication between a web application and an API service. The web application and the API service are run and managed by Nomad. Nomad additionally configures Envoy proxies to run alongside these applications. The API service is a simple microservice that increments a count every time it is invoked. It then returns the current count as JSON. The web application is a dashboard that displays the value of the count.

Architecture Diagram

The following Nomad architecture diagram illustrates the flow of network traffic between the dashboard web application and the API microservice. As shown below, traffic originating from the dashboard to the API is proxied through Envoy and secured via mTLS.

Networking Model

Prior to Nomad 0.10, Nomad’s networking model optimized for simplicity by running all applications in host networking mode. This means that applications running on the same host could see each other and communicate with each other over localhost.

In order to support security features in Consul Connect, Nomad 0.10 introduces network namespace support. This is a new network model within Nomad where task groups are a single network endpoint and share a network namespace. This is analogous to a Kubernetes Pod. In this model, tasks launched in the same task group share a network stack that is isolated from the host where possible. This means the local IP of the task will be different than the IP of the client node. Users can also configure a port map to expose ports through the host if they wish.

Configuring Network Stanza

Nomad’s network stanza will become valid at the task group level in addition to the resources stanza of a task. The network stanza will get an additional ‘mode’ option which tells the client what network mode to run in. The following network modes are available:

  • “none” – Task group will have an isolated network without any network interfaces.
  • “bridge” – Task group will have an isolated network namespace with an interface that is bridged with the host
  • “host” – Each task will join the host network namespace and a shared network namespace is not created. This matches the current behavior in Nomad 0.9

Additionally, Nomad’s port stanza now includes a new “to” field. This field allows for configuration of the port to map to inside of the allocation or task. With bridge networking mode, and the network stanza at the task group level, all tasks in the same task group share the network stack including interfaces, routes, and firewall rules. This allows Connect enabled applications to bind only to localhost within the shared network stack, and use the proxy for ingress and egress traffic.

The following is a minimal network stanza for the API service in order to opt into Connect.

hcl
network {
mode = "bridge"
}

The following is the network stanza for the web dashboard application, illustrating the use of port mapping.

hcl
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
}

Configuring Connect in the API service

In order to enable Connect in the API service, we will need to specify a network stanza at the group level, and use the connect stanza inside the service definition. The following snippet illustrates this

“`hcl
group "api" {
network {
mode = "bridge"
}

service {
name = "count-api"
port = "9001"

connect {
sidecar_service {}
}
}

task "web" {
driver = "docker"
config {
image = "hashicorpnomad/counter-api:v1"
}
}
“`

Nomad will run Envoy in the same network namespace as the API service, and register it as a proxy with Consul Connect.

Configuring Upstreams

In order to enable Connect in the web application, we will need to configure the network stanza at the task group level. We will also need to provide details about upstream services it communicates with, which is the API service. More generally, upstreams should be configured for any other service that this application depends on.

The following snippet illustrates this.

“`hcl
group "dashboard" {
network {
mode ="bridge"
port "http" {
static = 9002
to = 9002
}
}

service {
name = "count-dashboard"
port = "9002"

connect {
sidecarservice {
proxy {
upstreams {
destination
name = "count-api"
localbindport = 8080
}
}
}
}
}

task "dashboard" {
driver = "docker"
env {
COUNTINGSERVICEURL = "http://${NOMADUPSTREAMADDRcountapi}"
}
config {
image = "hashicorpnomad/counter-dashboard:v1"
}
}
}
“`

In the above example, the static = 9002 parameter requests the Nomad scheduler reserve port 9002 on a host network interface. The to = 9002 parameter forwards that host port to port 9002 inside the network namespace. This allows you to connect to the web frontend in a browser by visiting http://<host_ip>:9002.

The web frontend connects to the API service via Consul Connect. The upstreams stanza defines the remote service to access (count-api) and what port to expose that service on inside the network namespace (8080). The web frontend is configured to communicate with the API service with an environment variable, $COUNTING_SERVICE_URL. The upstream's address is interpolated into that environment variable. In this example, $COUNTING_SERVICE_URL will be set to “localhost:8080”.

With this set up, the dashboard application communicates over localhost to the proxy’s upstream local bind port in order to communicate with the API service. The proxy handles mTLS communication using Consul to route traffic to the correct destination IP where the API service runs. The Envoy proxy on the other end terminates TLS and forwards traffic to the API service listening on localhost.

Job Specification

The following job specification contains both the API service and the web dashboard. You can run this using nomad run connect.nomad after saving the contents to a file named connect.nomad.

“`hcl
job "countdash" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
}

 service {
   name = "count-api"
   port = "9001"

   connect {
     sidecar_service {}
   }
 }

 task "web" {
   driver = "docker"
   config {
     image = "hashicorpnomad/counter-api:v1"
   }
 }

}

group "dashboard" {
network {
mode ="bridge"
port "http" {
static = 9002
to = 9002
}
}

 service {
   name = "count-dashboard"
   port = "9002"

   connect {
     sidecar_service {
       proxy {
         upstreams {
           destination_name = "count-api"
           local_bind_port = 8080
         }
       }
     }
   }
 }

 task "dashboard" {
   driver = "docker"
   env {
     COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
   }
   config {
     image = "hashicorpnomad/counter-dashboard:v1"
   }
 }

}
}
“`

UI

The web UI in Nomad 0.10 shows details relevant to Connect integration whenever applicable. The allocation details page now shows information about each service that is proxied through Connect.

In the above screenshot from the allocation details page for the dashboard application, the UI shows the Envoy proxy task. It also shows the service (count-dashboard) as well as the name of the upstream (count-api).

Limitations

  • The Consul binary must be present in Nomad's $PATH to run the Envoy proxy sidecar on client nodes.
  • Consul Connect Native is not yet supported.
  • Consul Connect HTTP and gRPC checks are not yet supported.
  • Consul ACLs are not yet supported.
  • Only the Docker, exec, and raw exec drivers support network namespaces and Connect.
  • Variable interpolation for group services and checks are not yet supported.

Conclusion

In this blog post, we shared an overview of native Consul Connect integration in Nomad. This enables job specification authors to easily opt in to mTLS across services. For more information, see the Consul Connect guide.

from Hashicorp Blog

What’s New in Terraform V0.12

What’s New in Terraform V0.12

Man reading newspaper

Read all about it! Terraform gets new features!

Over recent years, infrastructure as code has made creating and managing a complex cloud environment —plus making necessary ongoing changes to that environment—significantly more streamlined. Rather than having to handle infrastructure management separately, developers can push infrastructure changes and updates alongside updates to application code.

Terraform from HashiCorp has been a leading tool in the charge of making infrastructure as code more accessible. While early versions, admittedly, involved a steep learning curve, as upgrades have been made the tool becomes more and more workable. The latest version of Terraform v0.12, introduced a number of changes that make using the programming language inside the tool much simpler.

You may also enjoy: Intro to Terraform for Infrastructure as Code

Command Improvements

Terraform v0.12 variations have seen major updates indeed since a lot of commands and features have been changed over the last three months. HCL, the underlying language used by Terraform, has been updated too; we will get to this in a bit.

The use of first-class expressions is perhaps the most notable update of them all in the new version. Rather than wrapping expressions in interpolation sequences using double quotes, expressions can now be used natively. This makes more sense as many developers may be more used to the latter. Developers can now use variables like var.conf[1] rather than calling it using the old format of “${var.conf}”.

Expressions are also strengthened by the introduction of for expressions to filter lists and map values. The use of for as a filter is also a great improvement; it makes advanced infrastructure manipulation based on conditions easier to achieve.

There is also a more general operator. You can use resource.*.field syntax in any application rather than for components with their count specifically set. The same general operator is also compatible with any list of value you add to your code.

At Caylent, we love the use of 1:1 JSON mapping. Conversion of HCL configuration to and from JSON happens seamlessly and with no possibility of errors and problems. This may look like a small update at first, but it is a simple change that will make life significantly smoother for developers who manage their own infrastructure using Terraform.

More Updates in Terraform v0.12

Other changes are just as interesting. As mentioned before, HashiCorp updated the HCL programming language extensively in this version. There are limitations to using HCL and we —developers —have been relying on workarounds to make things work. Terraform v0.12, as the previous points demonstrate, really eliminates a number of bottlenecks we’ve been facing all along.

There is also the fact that you can now use conditional operators to configure infrastructure. First of all, I’m glad to inform you that null is now a recognized value. When you assign null to parameters or fields, the default value set by the infrastructure will be used instead.

Conditional operators like ? and : are also handy to use, plus you can now rely on lazy evaluation of results when using Terraform to code your infrastructure. Using the previous example, fields that contain null actually get omitted from the upstream API calls.

Complex lists and maps are also now supported. This is perhaps the biggest change in this version. While Terraform supported only simple values in the past, you can now use complex lists and maps in both inputs and outputs. You can even code lines to control modules and other components.

Another version upgrade worth mentioning is the new access to resources details using remote state outputs. The v0.12 release sees terraform_remote_state data source changing slightly so that all the remote state outputs are now available as a single map value, in contrast to how previously they were top-level attributes.

It’s also worth commenting on the new, improved “Context-Rich Error Messages” which make debugging much easier than in previous versions. Not all messages will include the following elements but most will conform to this structure:

  • A short problem description
  • A specific configuration construct reference
  • Any reference values necessary
  • A longer, more thorough problem and possible solutions

Last but certainly not least, Terraform now recognizes references as first-class values. Yes, no more multiple quotes and having to nest references when coding updates for your infrastructure. Resource identifiers can also be outputted or used as parameters, depending on how your infrastructure is set up.

Real-World Challenges

The new version of Terraform has seen many big updates since the start of v0.12.0 in May 2019, and the engineers behind this framework worked really hard in making sure that no configuration changes are needed. In fact, the average Terraform user will not have to make any changes when updating to Terraform v0.12.

However, breaking changes are to be expected and if you do have to mitigate changes that need to be made due to the upgrade to Terraform v0.12, HashiCorp is working on an automated tool that makes the process possible. You can use the tool to calculate compatibility and anticipate potential issues that may stem from the upgrade.

One last thing to note about this latest update: it’s only the beginning. Terraform v0.12 is a signal from the engineers behind this tool that they are serious about pushing Terraform (and HCL) further. Upcoming releases and iterations are already in the pipeline, with refinements like better versioning and seamless upgrade being among the major changes.

It is inspiring to see engineers work so hard to ensure that there are little to no breaking changes with every version they push. Terraform may well be the IAC language of the future if it continues to bring positive updates and useful tools to its framework at the current pace.

This post was originally published here.

Further Reading

The Top 7 Infrastructure-As-Code Tools For Automation

Infrastructure-As-Code: A DevOps Way To Manage IT Infrastructure

from DZone Cloud Zone

TechTalks With Tom Smith: Tools and Techniques for Migration

TechTalks With Tom Smith: Tools and Techniques for Migration

birds migrating

Time to pack it up and move it out. Or, rather, up.

To understand the current state of migrating legacy apps to microservices, we spoke to IT executives from 18 different companies. We asked, “What are the most effective techniques and tools for migrating legacy apps to microservices?” Here’s what we learned:

You may also enjoy: The Best Cloud Migration Approach: Lift-And-Shift, Replatform, Or Refactor?

Kubernetes

  • We did everything with Docker and Kubernetes. Datastore was on PostgreSQL. Depends on the use case. We went through an era of containers wars and Docker/Kubernetes has won. It’s less about the technology you’re going to use and how are you going to get there and get your team aligned. More about the journey than the tooling. Alignment and culture are the hardest part.
  • Spring and Java tend to be the most regarded stacks. Seeing a shift to serverless with Lambda and K8s in the cloud. CloudWatch can watch all your containers for you. Use best of class CI/CD tools.
  • The manual process is required. In addition to decomposing legacy apps into components a challenge in understanding how the legacy app acquires its configuration and how to pass the identical configuration to the microservice. If you use K8s as your target platform, it supports two of three ways of mapping to a legacy app – it requires thinking and design work, it is not easily automated.
  • K8s is the deployment vehicle of choice. On the app side, we are seeing features or sub-services move into cloud-native design one-by-one. People start with the least critical and get confidence in running in a dual hybrid mode. Start with one or two services and quickly get to 10, 20, and more. Scalability, fault tolerance, and geographic distributions. Day two operations are quite different than with a traditional, legacy app.
  • Docker and K8s are the big ones. All are derivatives on the infrastructure side.
  • Containerizing our workloads and standardizing all runtime aspects on K8s primitives is one of the most major factors in this effort. These include environment variable override scheme, secrets handling, service discovery, load balancing, automatic failover procedures, network encryption, access control, processes scheduling, joining nodes to a cluster, monitoring and more. Each of these aspects had been handled in a bespoke way in the past for each service. Abstracting the infrastructure further away from the operating system has also contributed to the portability of the product. We’re employing a message delivery system to propagate events as the main asynchronous interaction between services. The events payload is encoded using a Protocol Buffers schema. This gives us an easy to maintain and evolve a contract model between the services. A nice property of that technique is also the type of safety and ease of use that comes with using the generated model. We primarily use Java as our choice runtime. Adopting Spring Boot has helped to standardize how we externalize and consume configuration and allowed us to hook into an existing ecosystem of available integrations (like Micrometer, gRPC, etc.). We adopted Project Reactor as our reactive programming library of choice. The learning curve was steep, but it has helped us to apply common principled solutions to very complex problems. It greatly contributes to the resilience of the system.

Docker

  • Containers (Docker) and container orchestration platforms help a lot in managing the microservices architecture and its dependencies. Tools for generating microservices client and server APIs. Examples would be Swagger for REST APIs, and Google’s Protocol Buffers for internal microservice-to-microservice APIs.

Prometheus

  • K8s, debugging and tracing tools like Prometheus, controller, logging. Getting traffic into K8s and within K8s efficient communications with Envoy, sidecar proxy for networking functions (routing caching). API gateway meter, API management, API monitoring, dev portal. A gateway that’s lightweight, flexible, and portable taking into account east-west traffic with micro-gateways.
  • When going to microservices, you need to think about scale, there are so many APIs, how do you keep up with them and the metrics. You might want to use Prometheus, CloudWatch if going to Lambda when going to microservices you have to bring open telemetry, tracing, logging. How to debug on a monolithic application as a developer I can attach a debugger to a binary, I can’t do that with microservices. With microservices, every outage is like a murder mystery. This is a big issue for serverless and microservices.

Other

  • There are a couple. It comes down to accepting CI/CD you’re going to have trouble. Consider the build and delivery pipeline as part of the microservice itself. Everything is in one place. With microservices, you scatter functionality all over the place. More independent and less tightly bound. The pipeline becomes an artifact. Need to provide test coverage across microservices as permutations grow, Automation begins at the developer’s desk.
  • There are many tools that help the migration to microservices, such as Spring Boot and Microprofile, both of which simplify the development of standalone apps in a microservices architecture. Stream processing technologies are becoming increasingly popular for building microservices, as there is a lot of overlap between a “stream-oriented” architecture and a “microservices-oriented” architecture. In-memory technology platforms are useful when high-performance microservices are a key part of the architecture.
  • Legacy apps should be migrated to independent, loosely coupled services through gradual decomposition, by splitting off capabilities using a strangler pattern. Select functionality based on a domain with clear boundaries that need to be modified or scaled independently. Make sure your teams are organized to build and support their portions of the application, avoid dependencies and bottlenecks that minimize the benefits of microservice adoption, and take advantage of infrastructure-level tools for monitoring, log management, and other supporting capabilities.
  • There are many tools available such as Amazon’s AWS Application Discovery Services, which will help IT understand your application and workloads, while others help IT understand server and database migration. Microsoft Azure has tools that help you define the business case and understand your current application and workloads. There is an entire ecosystem of partners who provide similar tools that may fit your specific needs for your migration that help you map dependencies, optimize workload and determine the best cloud computing model, so it may be prudent to look at others if you have specific needs or requirements. We provide the ability to monitor your applications in real-time both from an app performance as well as business performance perspective, providing data you need to see your application in action, validate the decisions and money spent on the migration, improve your user experience and provide the ability to rapidly change your development and release cycles.
  • Have a service mesh in place before you start making the transition. API ingress management platform (service control platform). An entry point for every connect in the monolith. Implement security and observability. Existing service mesh solutions are super hyper-focused on greenfield and K8s. Creates an enclosure around the monolith during the transitions.
  • There are a couple of approaches that work well when moving to microservices. Break the application down into logical components that each fit well in a single microservice, and then build it again as component pieces, which will minimize code changes while fully taking advantage of running smaller autonomous application components that are easier to deploy and manage separately. Build data access-as -a-service for the application to use for all data request and write calls. This moves data complexity into its own domain, decoupling the data from the application components. It’s essential to embrace containers, container orchestration and use DevOps tools – integrating security into your processes and automating throughout.
  • You need a hybrid integration platform that supports direct legacy to microservice communication so that you can choose the ideal digital environment without compromises based on your original IT debt.
  • Design Patterns such as the Strangler pattern are effective in migrating components of legacy apps to the microservices architecture. Saga, Circuit Breaker, Chassis and Contract are other design patterns that can help. Another technique/practice is to decompose by domain models so that microservices reflect business functionality. A third aspect is to have data segregation within each microservice or for a group of related microservices, supplemented by a more traditional permanent data store for some applications. Non-relational databases, lightweight message queues, API gateways, and serverless platforms are speeding up migrations. Server-side JavaScript and newer languages such as Go are fast becoming the programming platforms of choice for developing self-sufficient services.

Here’s who shared their insights:

Further Reading

TechTalks With Tom Smith: What Devs Need to Know About Kubernetes

A Guide to Cloud Migration

from DZone Cloud Zone

HashiCorp Consul Enterprise Supports VMware NSX Service Mesh Federation

HashiCorp Consul Enterprise Supports VMware NSX Service Mesh Federation

Recently at VMworld 2019 in San Francisco, VMware announced a new open specification for Service Mesh Federation. This specification defines a common standard to facilitate secure communication between different service mesh solutions.

Service mesh is quickly becoming a necessity for organizations embarking upon application modernization and transitioning to microservice architectures. Consul service mesh provides unified support across a heterogeneous environment: bare metal, virtual machines, Kubernetes, and other workloads. However, some organizations may choose to run different mesh technologies on different platforms. For these customers, federation becomes critical to enable secure connectivity across the boundaries of different mesh deployments.

We have partnered with VMware to support the Service Mesh Federation Specification. This blog will explain how services running in HashiCorp Consul service mesh can discover and connect with services in VMware NSX Service Mesh (NSX-SM).

What is Service Mesh Federation

consul service mesh federation

Service Mesh Federation is the ability for services running in separate meshes to communicate as if they were running in the same mesh. For example, a Consul service can communicate with an NSX-SM service running in a remote cluster in the same way it would communicate with another Consul service running in the same cluster.

How Does Consul Enterprise Support Service Mesh Federation

Service Sync

The first step towards supporting federation is Service Sync: sharing which services are running on each mesh. To accomplish this, Consul Enterprise implements the Service Mesh Federation Spec via the new Consul federation service. The Consul federation service communicates with NSX-SM’s federation service to keep the service lists in sync so that each mesh is aware of each other’s services.

consul service mesh federation service

First, Consul sends the foo service to the remote federation service and receives the bar service.

consul service sync

Next, Consul creates a Consul bar service to represent the remote bar service.

Inter-Mesh Communication: Consul to NSX-SM

With services synced, Consul services can now talk to remote services as if they were running in the same cluster. To do this, they configure their upstreams to route to the remote service’s name.

In this example, the Consul foo service wants to call the NSX-SM bar service. We configure an upstream so that port 8080 routes to bar:

service {
name = "foo"
connect {
sidecar_service {
proxy {
upstreams = [
{
destination_name = "bar"
local_bind_port = 8080
}
]
}
}
}
}

Then from the foo service, we simply need to talk to http://localhost:8080:

$ curl http://localhost:8080
<response from bar service>

Under the hood, we’re using the Consul service mesh sidecar proxies to encrypt all the traffic using TLS.

Consul connect to nsx service mesh

Inter-Mesh Communication: NSX-SM to Consul

From the bar service running in NSX-SM, we can use KubeDNS to talk to the foo service in Consul:

$ curl foo.default.svc.cluster.local
<response from foo service>

This request will route to the Consul Mesh Gateway and then to foo’s sidecar proxy. The sidecar proxy decrypts the traffic and then routes it to the foo service.

Conclusion

Service mesh federation between Consul Enterprise and NSX-SM allows traffic to flow securely beyond the boundary of each individual mesh, enabling flexibility and interoperability. If you would like to learn more about Consul Enterprise’s integration with NSX-SM, please reach out to our sales representatives to schedule a demo.

For more information about this and other features of HashiCorp Consul, please visit: https://www.hashicorp.com/products/consul.

from Hashicorp Blog

AWS Cloud Gets New Software Defined Perimeter Offering

AWS Cloud Gets New Software Defined Perimeter Offering

News

AWS Cloud Gets New Software Defined Perimeter Offering

Israeli security specialist Safe-T has unveiled a Software Defined Perimeter (SDP) offering for the Amazon Web Services Inc. (AWS) cloud.

Now available on the AWS Marketplace is Safe-T Software Defined Perimeter 3.51, letting users securely access cloud services without having to use a virtual private network (VPN).

Supporting protocols like HTTP/S, RDH5 and WebDAV, SDP provides an access suite for remote users and partners for internal services, the company said, including Web, Remote Desktop Protocol (RDP), NTFS, e-mail and so on.

The company, a “Zero Trust provider” that describes itself as a vendor of Secure Access solutions for on-premise and hybrid cloud environments, said its offering comes with a “Bring Your Own License” (BYOL) model flexibility.

“More and more organizations are moving towards the cloud to cut costs and increase agility, selecting AWS Marketplace as their cloud infrastructure,” said company exec Eitan Bremler. “With the introduction of our SDP solution on the AWS Marketplace, we can now reach and help more businesses to simply secure their cloud, data, and application access with easy and quick purchase subscriptions to Safe-T’s SDP.”

In addition to obviating the need for VPN access, the company’s Web site says SDP provides capabilities including:

  • Firewall is constantly in a deny-all state, no open port (inbound or outbound) is required for access.
  • Supports a variety of applications — HTTP/S, SMTP, SFTP, SSH, APIs, RDH5, WebDAV.
  • Bi-directional traffic is handled on outbound connections from the LAN to the outside world.
  • Defines new reverse-access rules on-demand.
  • Allows client-less access to data, services, networks and APIs.
  • Robust partner authentication options.
  • Performs SSL decryption in a secure zone.
  • Scans all incoming traffic using the organization’s security solutions.
  • Hides DMZ components which can be hacked and utilized to access the network.
  • Detects and reports on the presence of bots and malicious insiders for quick event resolution.
  • Provides only direct application/service access, thereby blocking network access.

“The addition of the AWS Marketplace listing, positions Safe-T as the Zero Trust vendor with the ability to provide the widest range of SDP solutions in the market, with a fit to all types of organizations — those that prefer to purchase and deploy On-Premise solutions, those organizations that consume Software-as-a-Service (SaaS) solutions, and now also consumers of Infrastructure-as-a-Service (IaaS) solutions,” the company said.

About the Author

David Ramel is an editor and writer for Converge360.

from News

TechTalks With Tom Smith: VMworld Hybrid Cloud and Multicloud Conversations

TechTalks With Tom Smith: VMworld Hybrid Cloud and Multicloud Conversations

Toys having conversation

Take a seat and see what everyone’s talking about

In addition to meeting with VMware executives and covering the keynotes, I was able to meet with a number of other IT executives and companies during the conference to learn what they and their companies were doing to make developers’ lives simpler and easier. 

You may also enjoy:  TechTalks With Tom Smith: VMware Courts Developers

Here’s what I learned:

Jim Souders, CEO of Adaptiva, announced the VMware edition of its OneSite peer-to-peer content distribution product to work with VMware’s Workspace ONE product to distribute software from the cloud across enterprise endpoints with speed and scale. According to Jim, this will help developers to automate vulnerable processes, eliminating the need for developers to build scripts and tool so they can focus on DevOps rather than security.

Shashi Kiran, Chief Marketing Officer of Aryaka, shared key findings from their State of the WAN report in which they surveyed more than 800 network and IT practitioners worldwide where nearly 50% of enterprises are implementing a multi-cloud strategy, are leveraging 5+ providers and SaaS apps, with 15% having more than 1,000 apps deployed. Implications for developers are to develop with multi-cloud in mind.

Tom Barsi, SVP, Business & Corporate Development, Carbon Black currently monitors 15 million endpoints but when it’s incorporated into VMware’s vSphere and Workspace ONE, the number of endpoints will grow by an order of magnitude for even greater endpoint denial response, freeing developers to focus on building applications without concern for endpoint security.

Don Foster, Senior Director Worldwide Solutions Marketing, Commvault — with cloud migrations, ransomware attacks, privacy regulations and a multi-cloud world, Commvault is helping clients to be “more than ready” for the new era of IT. They are also humanizing the company as they hosted a Data Therapy Dog Park during the conference.

Jonathan Ellis, CTO & Co-founder, and Kathryn Erickson, Director of Strategic Partnerships, DataStax —  spoke on the ease of deploying new applications in a hyper-converged infrastructure (HCI) with the key benefits of management, availability, better security, and consistent operations across on-premises and cloud environments. If DSE/Cassandra is on K8s or VMware, developers can access additional resources without going through DBAs or the infrastructure team.

Scott Johnson, Chief Product Officer, Docker — continuing to give developers a choice of languages and platforms while providing operations a unified pipeline. Compose helps developers assemble multiple container operations without rewriting code. Docker also helps improve the security of K8s deployments by installing with smart defaults along with image scanning and digital signatures.

Mark Jamensky, E.V.P. Products, Emboticscloud management platforms help with DevOps pipelines and accelerate application development by providing automation, governance, and speed under control. It enables developers to go fast with guardrails.

Duan van der Westhuizen, V.P. of Marketing, Faction — shared the key findings of their first VMware Cloud on AWS survey. 29% of respondents plan to start running or increase workloads in the next 12 months. Tech services, financial services, education, and healthcare are the industries showing the most interest. The key drivers are scalability, strategic IT initiative, and cost savings and the top use cases are data center extension, disaster recovery, and cloud migration.

Ambuj Kumar, CEO and Nishank Vaish, Product Management, Fortanix — discussed developers’ dislike of hardware security modules (HSM) and how self-defending key management services (SDKMS) can securely generate, store, and use cryptographic key and certificates, as well as secrets like passwords, API keys, tokens, and blobs of data to achieve a consistent level of high performance. 

Sam Kumasamy, Senior Product Marketing Manager, Gigamon — owns 38% of all network visibility market share giving them the ability to seel all traffic across virtual and physical environments provides visibility and analytics for digital apps and services in any cloud, container, K8s cluster, or Docker presence to enable applications to run fast while remaining secure. 

Stan Zaffos, Senior V.P. Product Marketing and Gregory Touretsky, Technical Product Manager and Solutions Architect, Infinidat are helping clients achieve multi-petabyte scale as more applications on more devices are generating more data. They are building out their developer portal to enable developers to leverage APIs and share code, use cases, and solutions.

Rich Petersen, President/Co-founder, JetStream Software are helping move virtual machines to the cloud for running applications, storage systems, and disaster recovery platforms. The I/O filters provide the flexibility to copy data to a physical device for shipment and sending only newly-written data over the network resulting in near-zero recovery time objective (RTO) and recovery point objective (RPO).

Josh Epstein, CMO, Kaminario — is providing a Storage-as-a-Service (STaaS) platform enabling developers to think about stored service as a box with shared storage arrays with data reduction/compression, deduplication, data mobility, data replication, orchestration, and the ability to spin up storages instances using K8s at a traditional data center or in a public cloud with the public API framework.

Kevin Deierling, Vice President, Marketing, Mellanox Technologies — provides remote direct memory access (RDMA) networking solutions to enable virtualized machine learning (ML) solutions that achieve higher GPU use and efficiency. Hardware compute accelerators boosts app performance in virtualized deployments.

Adam Hicks, Senior Solution Architect, Morpheus — provides a multi-cloud management platform for hybrid IT and DevOps automation for unified multi-cloud container management. The platform reduces the number of tools developers and operations need to automate development and deployment. Compliance is achieved with role-based access, approvals, quotas, and policy enforcement. Agile DevOps with self-service provisioning with APIs. Manage day-2 operations: scaling, logging, monitoring, backup, and migration.

Ingo Fuchs, Chief Technologist, Cloud and DevOps, NetApp —  as new cloud environments promote greater collaboration between developers and IT operations, NetApp is enabling the cloning of standardized developer workspaces so developers can get up and running quickly, new workspace versions can be rolled out simultaneously, standard datasets can be made available for DevTest, and developers are able to go back to a previous state with a single API call if they need to correct a mistake. 

Kamesh Pemmaraju, Head of Product Marketing, Platform9 — provides a SaaS-managed Hybrid Cloud solution that delivers fully automated day-2 operations with a 99.9% SLA for K8s, bare-metal, and VM-based environments. Developers get a public-cloud experience with databases, data services, open-source frameworks, Spark, and more deployed with a single click.

Mike Condy, System Consultant, Quest is focusing on monitoring, operations, and cloud from an IT management perspective. They are optimized, and provide support for K8s and Swarm and enable clients to compare on-premise to cloud to identify the optimal placement of workload from a performance and cost perspective. They enable clients to see how containers are performing, interacting, and scaling up or down with heat maps and optimization recommendations.

Peter FitzGibbon, V.P. Product Alliances, Rackspace — consistent with the move to hybrid-cloud environments are supporting customers, and developers, by providing new offerings around managed VMware Cloud on AWS, K8s and container services, cloud-native support, managed security, and integration and API management assessment. Peter also felt like the Tanzu announcement would help to bring technology and people together.

Roshan Kumar, Senior Product Marketing Manager, Redis Labs — developers tend to be the first adopters of Redis for the cloud or Docker since the database can hold data in specific structures with no objects or relational standing. Data is always in state. The database works with 62 programming languages. It addresses DevOps and Ops concerns for backup and disaster recovery with high availability, reliability, and scalability. While AWS has had more than 2,000 node failures, no data has been lost on Redis due to their primary and backup servers.

Image title

Chris Wahl, Chief Technologist and Rebecca Fitzhugh, Principal Technologist, Rubrik — focused on developers the last three to four years and provide an environment to serve the needs of different application environments. Rubrik Build is an open-source community helping to build the future of cloud data management and supporting a programmatic approach to automation for developers and infrastructure engineers with an API-fist architecture.

Mihir Shah, CEO and Surya Varanasi, CTO, StorCentric for Nexsan — provide purpose-built storage for backup, databases, and secure archive to ensure compliance, protect against ransomware and hackers, and provide litigation support. Maintain data integrity by using a combination of two cryptographic hashes for unique identification. Enables developers to move seamlessly to the cloud and automate data compliance using APIs.

Mario Blandini, CMO & Chief Evangelist, Tintri by DDN — Virtualization is the new normal. More than 75 percent of new workloads are now virtualized, and companies are beginning to make significant investments in virtual desktop infrastructure (VDI). Tintri enables developers to manage their virtual machines through automation, isolate traffic between virtual machines, and use self-service automation in the development phase.

Danny Allen, V.P., Product Strategy, Veeam — moving to a subscription model to provide cloud data availability and backup which is critical for applications and containers. Agility to move workloads from infrastructure to cloud-agnostic portable data storage. Acceleration of business data and backup to containers. Enables DevOps to iterate on existing workloads with the ability to run scripts to mask data.

Steve Athanas, President, VMware User Group (VMUG) — The active community of more than 150,000 members works to connect users, share stories, and solve problems. Steve says, “if you have VMware in your stack, there’s probably a VMUG member in your company.” He would love for more developers to become involved in VMUG to help others understand how dev and ops can work better together, address each others’ pain, and solve business problems. 

Nelson Nahum, Co-founder & CEO and Greg Newman, V.P., Marketing, Zadara — offers NVMe-as-a-Service to make storage as simple as possible. It includes optimized object on-premises storage-as-a-service for big data analytics, AI/ML, and video-on-demand. Developers can try Zadara risk-free

Further Reading

3 Pitfalls Everyone Should Avoid with Hybrid Multicloud (Part 1)

Solving for Endpoint Compliance in a Cloud-First Landscape

from DZone Cloud Zone

Three Features to Control and Manage AWS API Gateway Invocations

Three Features to Control and Manage AWS API Gateway Invocations

faucet and bucket

Without management, the bucket quickly overfills

This post is part of my blog post series about AWS API Gateway and Lambda functions, but this time the focus is solely on API Gateway. So, after having a working Lambda function behind AWS API Gateway, the next move is to strive to ensure the Lambda function is executed under control.

You may also enjoy:  Building Microservices: Using an API Gateway

The Gist

This blog post describes three features that can facilitate in controlling the access to an API Gateway method and reduce the number of Lambda function’s invocations:

  1. Defining Throttling
  2. Managing API Keys
  3. Using Cache

1. Throttling

A threshold is a combination of the number of calls per second and the number of allowed bursts. The API Gateway throttles requests using the token bucket algorithm when a token is considered a single request.

To depict this algorithm simply, you can imagine a hose pouring water into a bucket on the one side and a pump drawing the water (from the same bucket) on the other side, while the bucket’s capacity is measured every specific time frame.

In this allegory, the hose flow is equivalent to the number of requests, and the bucket size is the burst rate.

Setting throttling parameters: rate and burst
Setting throttling parameters: rate and burst

In the image above, the maximum flow of requests per second is 500, and the depth of the bucket is 100 requests, and thus 100 requests can be handled in a given time. After a request has been processed, it leaves the bucket and allows other requests to be processed; the request processing rate sets the pace. If the bucket is full, the requests will be handled at the next cycle (next second). However, in case the incoming requests flow higher than the processing rate, then the request is rejected, and an error is returned.

A client shall receive an error in case his call has exceeded the threshold:

The remote server returned an error: (429)

AWS allows defining the rate and the burst parameters in two places: the Stage and the Usage plan.

Stage Definitions

By default, a definition on the stage level is permeated to all the methods under the same stage. Nevertheless, you can override this definition per method by selecting “override for this method”:

Overriding the throttling parameters
Overriding the throttling parameters

Setting the Throttling in A Usage Plan

Before addressing the throttling parameters in a usage plan, let’s briefly describe the usage plan concept. In short, a usage plan is a set of rules that operates as a barrier between the client and the target of the API Gateway (i.e. Lambda function). This set of rules can be applied to one or more APIs and stages.

Two API definitions under one usage plan
Two API definitions under one usage plan

Besides the throttling definition, a usage plan can set the quota for incoming requests, which is the number of requests per period (day, week, month). The quota setting is more for a business case than for load control purposes, as it limits the absolute number of requests per period.

The quota feature enables avoiding a situation where an API Gateway method is flooded with unwanted traffic under the radar. It is recommended to use this feature to avoid bursting your budget in a development environment. As for production environments, this feature may be risky, as you can create a denial of service for legit requests.

Quota definition
Quota definition

As mentioned, the advantage of usage plan is the granularity it provides. The throttling parameters for a method (i.e. GET, POST) can be defined in a single location rather than in each stage separately. This simple configuration can prevent confusion and sets standardisation across different stages and methods.

Define throttling for each method
Define throttling for each method

Which Has the Precedence?

So, we saw that the throttling definition could be configured in three different places. Who wins then?

In case all three are configured (Stage level, Usage plan level, and an individual method under a usage plan), the precedence is granted to the lower values. For instance, if a rate definition in a usage plan is lower than the defined rate in the Stage, then it supersedes the stage’s definition.

On top of simplifying the throttling management by allowing a unified configuration, a usage plan has another useful feature that facilitates controlling the requests flow — setting API Keys, which leads us to the next section.

2. API Key

The API Key is a string used to authorize calls in a low-level way; the server expects to receive this authorization string as part of each request’s header. Unless the header includes a valid API Key, the API Gateway rejects the request. It is a simple way to validate requests and distinguish between authorised and arbitrary requests.

Setting the API key is done on the server level; hence, you can disable or delete an existing API key and cause all incoming requests using this key to fail (error 403). Practically, it forces the requests to align with a new API key.

Generating an API Key

The API Key can be either autogenerated by AWS or customed by the user.

Another option is to import one or more keys as a CSV format and assign them into existing usage plans (one or more). The mandatory fields are key and name (column names are not case sensitive):
API Key creation
API Key creation
Name,key,description,Enabled,usageplanIds MyApiKeyName,apikeyuniqueForMe,A descriptive information,TRUE,c7y23b

Associating the API Key With a Usage Plan

To be effective, the API Key must be associated with a usage plan, one or more; otherwise, it will not be attached to any API. Once attached, the API keys are applied to each API under the usage plan.

Usage plan
Usage Plan

Moreover, a usage plan can have one or more keys. When the API key is disabled, it becomes obsolete from all its usage plans.

The last step to enforce the usage of an API Key is enabling this feature in the resource level, under the Method Request definition.

Enforcing API Key in the Resource level
API Key creation

The API Key in Action

Exemplifying the API key as part of the header (screenshot using Postman):

API in the header
API in the header

In the example above, the request will yield a 403 error as the API Key is not part of the header (the checkbox is not ticked):

{ "message": "Forbidden" }

When calling the service with a valid API key, the response is as expected: 200.

I find the API Key feature useful to filter unsolicited requests. When developing an API that is exposed to the world, it can assure only authorized calls will be handled. However, this is not a proper way to apply authorization since the key is exposed in the header, therefore, use authentication services like AWS Cognito or standards like OAuth.

3. Caching

Although caching does not restrict access, it functions as a barrier between the request and its execution; it can reduce the invocations of a Lambda function or other API Gateway end-target.

The caching properties are defined on the highest stage level of all resources, but it can be overridden in each method of the stage as well. That enabled a different cache configuration for each method.

Setting the Cache Parameters

The caching mechanism is based on a caching key, which is a combination of all cache parameters and their values. For example, assuming the param1 parameter is signed to be cached, then the call method?param1=111&param2=222 and method?param1=111&param2=333 will yield the same response.

Another interesting example is when enabling API cache without setting any cache parameter. In this case, any parameter will be considered as a cache key, and thus method?param1=111&param2=222 will return the same response as method?param1=444&param2=333 .

The cache parameters are defined for each API operation (GET, POST, PUT); both header and query string can include cache parameters.

Cache Parameters: query string and header
Cache Parameters: query string and header

Caching and CORS

Since the API Gateway populates the cache based on the cache parameters, different clients can send the same parameters and receive the same cached response. This behavior makes sense, but there is a nuance when handling CORS responses.

A CORS response should include Access-Control-Allow-Origin in its header, then the browser, which executed the call, validates if the origin is allowed to the response: if the caller, the source, is absent from the permitted origins, the browser raises an error (mostly Javascript error).

An edge case can happen if, by any chance, another client has populated an entry in the cache while using the same cache keys as our CORS call. In this case, the response will lack the essential header-value, and thus the response will be rejected by the client.

To overcome this scenario, it is better to add a header-parameter, named Origin, as a cache parameter; it will enforce building a combined cache key that includes the origin. With that, the entry in the cache will be populated with Origin and bypass the edge case described above. For further reading about CORS calls, click here.

Client Cache-Control: What? How?

An interesting nuance about controlling the cache: a client can bypass the cache mechanism by setting a parameter in the request’s header Cache-Control: max-age=0 . We can decide how the API Gateway respond to this header’s value. There are three options:

Defining the response to cache invalidation
Defining the response to cache invalidation

The first option ignores the Cache-Control key but returns a warning in the response header:

199 Cache-control headers were ignored because the caller was unauthorized.

The second option will just ignore the request. However, the third option will generate a 403 error:

403 error
403 error

In case you strictly want to reject any misalignment and rigidly enforce requests, the last option is recommended as it raises an error. This error is logged and can be monitored later on.

There is another way to control the cache invalidation, and it is more rigid as it uses the underlying policy layer. Like any other policy, it allows a low-level control. You can allow or deny the cache invalidation based on a specific user/role or anyone. All you have to do is edit the Resource Policy, as the example below demonstrates:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "execute-api:InvalidateCache",
            "Resource":
                "arn:aws:execute-api:us-west-2:933711111590:v4sxxxxc/*/GET/test"
        }
    ]
}

In this case, the server will throw a 403 error with a detailed explanation:

User: anonymous is not authorized to perform: execute-api:Invoke on resource: arn:aws:execute-api:us-west-2:********1590:v4smxxxbc/test/GET/test

Lastly, What About Monitoring?

Well, we covered three techniques to offload the number of invocations and filter them. Now, how do we measure that? Well, the answer is CloudWatch.

Measuring the Number of Errors

CloudWatch metric 4XXError displays the number of errors over a period. These errors can originate from bursting the throttling definitions or lacking an API key. These errors can be analysed in the CloudWatch insights tool (see an example for an SQL-like query) or by going into the raw logs themselves.

fields @timestamp, @message
| filter @message like '403'
| sort @timestamp desc
| limit 200

Monitoring the Cache Usage

Examining the logs shows tangible evidence if using the cache is lucrative. CloudWatch exposes two metrics to measure the usage of cache per API or stage. These metrics can reveal whether the cache’s parameters should be changed, for example, extending its capacity or TTL (time-to-live); the graph below presents the cache hits and its misses:

CloudWatch cache metrics diagram

CloudWatch cache metrics diagram

Analyzing metrics and raising alerts when specific threshold bursts are recommended to ensure our parameters and configuration are appropriately tuned.

Wrapping Up

Well, that’s it for now. I hope you enjoyed reading this post and find these API Gateway features useful to filter invocations and avoid unnecessary calls.

Until the next time, keep on clouding ⛅.

— Lior

Further Reading 

Rule Your Microservices With an API Gateway: Part I

API Gateways Are Going Through an Identity Crisis

from DZone Cloud Zone