Tag: Open Source

Using GraalVM to Build Minimal Docker Images for Java Applications

Using GraalVM to Build Minimal Docker Images for Java Applications

Optimizing the size of Docker images has several benefits. One of these is faster deployment times, which is very important if your application needs to scale out quickly to respond to an unexpected traffic burst. In this post, I’ll show you an interesting approach for optimizing Docker images for Java applications, which also helps to improve startup times. The examples used are based on another post that I published several months ago, Reactive Microservices Architecture on AWS.

How does the Java application work?

The Java application is implemented using Java 11, with Vert.x 3.6 as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level I/O library Netty. The application consists of five different verticles covering different aspects of the business logic.

To build the application, I used Maven with different profiles. The first profile (which is the default profile) uses a “standard” build to create an Uber JAR – a self-contained application with all dependencies. The second profile uses GraalVM to compile a native image. The standard build uses jlink to build a custom Java runtime with a limited set of modules. (A command line tool, jlink allows you to link sets of modules and their transitive dependencies to create a runtime image.)

Build a custom JDK distribution using jlink

An interesting feature of JDK 9 is the Java Platform Module Feature (JPMS), also known as Project Jigsaw, which was developed to build modular Java runtimes that include only the necessary dependencies. For this application, you need only a limited set of modules, which can be specified during a build process. To prepare for your build, download Amazon Corretto 11, unpack it, and delete any unnecessary files such as the src.zip-file which is shipped with the JDK. In the following sections, to improve understanding, a multi-stage build is used, and the different parts of the build are covered separately.

Step 1: Build a custom runtime module

In the first step of the build process, build a custom runtime with just a few modules necessary to run your application, and then write the result to /opt/minimal:

FROM debian:9-slim AS builder
LABEL maintainer="Sascha Möllering <[email protected]>"

# First step: build java runtime module
RUN set -ex && \
    apt-get update && apt-get install -y wget unzip && \
    wget https://d3pxv6yz143wms.cloudfront.net/11.0.3.7.1/amazon-corretto-11.0.3.7.1-linux-x64.tar.gz -nv && \
    mkdir -p /opt/jdk && \
    tar zxvf amazon-corretto-11.0.3.7.1-linux-x64.tar.gz -C /opt/jdk --strip-components=1 && \
    rm amazon-corretto-11.0.3.7.1-linux-x64.tar.gz && \
    rm /opt/jdk/lib/src.zip

RUN /opt/jdk/bin/jlink \
    --module-path /opt/jdk/jmods \
    --verbose \
    --add-modules java.base,java.logging,java.naming,java.net.http,java.se,java.security.jgss,java.security.sasl,jdk.aot,jdk.attach,jdk.compiler,jdk.crypto.cryptoki,jdk.crypto.ec,jdk.internal.ed,jdk.internal.le,jdk.internal.opt,jdk.naming.dns,jdk.net,jdk.security.auth,jdk.security.jgss,jdk.unsupported,jdk.zipfs \
    --output /opt/jdk-minimal \
    --compress 2 \
    --no-header-files

Step 2: Copy the custom runtime to the target image

Next, copy the freshly-created custom runtime from the build image to the actual target image. In this step, you again use debian:9-slim as the base image. After you copy the minimal runtime, copy your Java application to /opt, add Docker health checks, and start the Java process:

FROM debian:9-slim
LABEL maintainer="Sascha Möllering <[email protected]>"

COPY --from=builder /opt/jdk-minimal /opt/jdk-minimal

ENV JAVA_HOME=/opt/jdk-minimal
ENV PATH="$PATH:$JAVA_HOME/bin"

RUN mkdir /opt/app && apt-get update && apt-get install curl -y
COPY target/reactive-vertx-1.5-fat.jar /opt/app

HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:8080/health/check || exit 1

EXPOSE 8080

CMD ["java", "-server", "-XX:+DoEscapeAnalysis", "-XX:+UseStringDeduplication", \
        "-XX:+UseCompressedOops", "-XX:+UseG1GC", \
        "-jar", "opt/app/reactive-vertx-1.5-fat.jar"]

Compile Java to native using GraalVM

GraalVM is an open source, high-performance polyglot virtual machine from Oracle. Use it to compile native images ahead of time to improve startup performance, and reduce the memory consumption and file size of JVM-based applications. The framework that allows ahead-of-time-compilation is called SubstrateVM.

In the following section, you can see the relevant snippet of the pom.xml-file. Create an additional Maven profile called native-image-fargate that uses the native-image-maven plugin to compile the source code to a native image during the phase “package“:

<profile>
    <id>native-image-fargate</id>
    <build>
        <plugins>
            <plugin>
                <groupId>com.oracle.substratevm</groupId>
                <artifactId>native-image-maven-plugin</artifactId>
                <version>${graal.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>native-image</goal>
                        </goals>
                        <phase>package</phase>
                    </execution>
                </executions>
                <configuration>
                    <imageName>${project.artifactId}</imageName>
                    <mainClass>${vertx.verticle}</mainClass>
                    <buildArgs>--enable-all-security-services -H:+ReportUnsupportedElementsAtRuntime --allow-incomplete-classpath</buildArgs>
                </configuration>
            </plugin>
        </plugins>
    </build>
</profile>

Docker multi-stage build

Your goal is to define a reproducible build environment that needs as few dependencies as possible. To achieve that, create a self-contained build process that uses a Docker multi-stage build.

An interesting aspect of multi-stage builds is that you can use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base image, and begins a new stage of the build. You can pick the necessary files and copy them from one stage to another, which is great because that allows you to limit the number of files you have to copy. Use this feature to build your application in one stage and copy your compiled artifact and additional files to your target image.

In the following section, you can see the two different stages of the build. Your Dockerfile (which is called Dockerfile-native) is split into two parts: the builder image and the target image.

The first code example shows the builder image, which is based on graalvm-ce. During your build, you must install Maven, set some environment variables, and copy the necessary files into the Docker image. For the build, you need the source code and the pom.xml-file. After successfully copying the files into the Docker image, the build of the application to an executable binary is started by using the profile native-image-fargate. Of course, it would be also possible to use the Maven base image and install GraalVM (the entire build process would be a bit different).

FROM oracle/graalvm-ce:1.0.0-rc16 AS build-aot

RUN yum update -y
RUN yum install wget -y
RUN wget https://www-eu.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz -P /tmp
RUN tar xf /tmp/apache-maven-3.6.1-bin.tar.gz -C /opt
RUN ln -s /opt/apache-maven-3.6.1 /opt/maven
RUN ln -s /opt/graalvm-ce-1.0.0-rc16 /opt/graalvm

ENV JAVA_HOME=/opt/graalvm
ENV M2_HOME=/opt/maven
ENV MAVEN_HOME=/opt/maven
ENV PATH=${M2_HOME}/bin:${PATH}
ENV PATH=${JAVA_HOME}/bin:${PATH}

COPY ./pom.xml ./pom.xml
COPY src ./src/

ENV MAVEN_OPTS='-Xmx6g'
RUN mvn -Dmaven.test.skip=true -Pnative-image-fargate clean package

Now the second part of the multi-stage build process begins: creating the actual target image. This image is based on debian:9-slim and sets two environment variables to TLS-specific settings, because the application uses TLS to communicate with Amazon Kinesis Data Streams.

FROM debian:9-slim
LABEL maintainer="Sascha Möllering <[email protected]>"

ENV javax.net.ssl.trustStore /cacerts
ENV javax.net.ssl.trustAnchors /cacerts

RUN apt-get update && apt-get install -y curl

COPY --from=build-aot target/reactive-vertx /usr/bin/reactive-vertx
COPY --from=build-aot /opt/graalvm/jre/lib/amd64/libsunec.so /libsunec.so
COPY --from=build-aot /opt/graalvm/jre/lib/security/cacerts /cacerts

HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:8080/health/check || exit 1

EXPOSE 8080

CMD [ "/usr/bin/reactive-vertx" ]

Building your target image is easy. Run the following command:

docker build . -t <your_docker_repo>/reactive-vertx-native -f Dockerfile-native

To build a standard Docker image with an Uber JAR, run the following command:

docker build . -t <your_docker_repo>/reactive-vertx -f Dockerfile

After you successfully finish both builds, running the command docker images shows the following result:

REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
smoell/reactive-vertx          latest              391f944bb553        19 minutes ago      181MB
<none>                         <none>              389ee5ec6a8c        19 minutes ago      411MB
smoell/reactive-vertx-native   latest              ecd72b58a3d2        25 minutes ago      133MB
<none>                         <none>              d93993d1d5ab        26 minutes ago      2.89GB
debian                         9-slim              92d2f0789514        4 days ago          55.3MB
oracle/graalvm-ce              1.0.0-rc16          131b80926177        2 weeks ago         1.72G

Here you have the different base images used for your build (oracle/graalvm-ce:1.0.0-rc16 and debian:9-slim), the temporary images you used during your build (without a proper name), and your target images smoell/reactive-vertx and smoell/reactive-vertx-native.

Conclusion

In this post, I described how Java applications can be compiled to a native image using GraalVM using a self-contained application based on a Docker multi-stage build. I also showed how a custom JDK distribution can be created using jlink for smaller target images. I hope I’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption.

from AWS Open Source Blog

Amazonians at OSCON 2019: A Week of Open Source

Amazonians at OSCON 2019: A Week of Open Source

AWS image for OSCON with the words code, contribute, collaborate, commit.

AWS is happy once again to be a Diamond sponsor at OSCON, and we’re looking forward to seeing you there! We’ve got a busy schedule this year, including an [email protected] day (Tuesday) open to all OSCON 2019 pass holders.

Schedule

MONDAY, July 15

TUESDAY, July 16

  • 9:00am-5:00pm: [email protected] [F150/151]
    Join us to understand how to engage, participate, contribute and grow with AWSOpen. We will discuss what it takes to keep open source open, and where we see open source heading over the next 20 years. We’ll also cover the story of how AWS open sourced the SAM toolset, tour the AWS IDE Toolkits and how you can help to make the tooling better for everyone, and more. We’ll have team members around to help answer questions, and close the day out with a happy hour.
    Full agenda and registration for [email protected] day

WEDNESDAY, July 17

THURSDAY, July 18

Stop by the AWS booth (#301) to collect our game piece for the OSCON Attendee Game, add to your sticker collection with all our sticker options, and pick up an AWS t-shirt made just for OSCON attendees. Quantities are limited, so you’ll want to come early!

Still need to get your ticket to OSCON? Register now with the code AWS25 to get 25% off current ticket prices.

from AWS Open Source Blog

Using Pod Security Policies with Amazon EKS Clusters

Using Pod Security Policies with Amazon EKS Clusters

You asked for it and with Kubernetes 1.13 we have enabled it:  Amazon Elastic Container Service for Kubernetes (EKS) now supports Pod Security Policies. In this post we will review what PSPs are, how to enable them in the Kubernetes control plane and how to use them, from both the cluster admin and the developer perspective.

What is a Pod Security Policy and why should I care?

As a cluster admin, you may have wondered how to enforce certain policies concerning runtime properties for pods in a cluster. For example, you may want to prevent developers from running a pod with containers that don’t define a user (hence, run as root). You may have documentation for developers about setting the security context in a pod specification, and developers may follow it … or they may choose not to. In any case, you need a mechanism to enforce such policies cluster-wide.

The solution is to use Pod Security Policies (PSP) as part of a defense-in-depth strategy.

As a quick reminder, a pod’s security context defines privileges and access control settings, such as discretionary access control (for example, access to a file based on a certain user ID), capabilities (for example, by defining an AppArmor profile), configuring SECCOMP (by filtering certain system calls), as well as allowing you to implement mandatory access control (through SELinux).

A PSP, on the other hand, is a cluster-wide resource, enabling you as a cluster admin to enforce the usage of security contexts in your cluster. The enforcement of PSPs is carried out by the API server’s admission controller. In a nutshell: if a pod spec doesn’t meet what you defined in a PSP, the API server will refuse to launch it. For PSPs to work, the respective admission plugin must be enabled, and permissions must be granted to users. An EKS 1.13 cluster now has the PSP admission plugin enabled by default, so there’s nothing EKS users need to do.

In general, you want to define PSPs according to the least-privilege principle: from enforcing rootless containers, to read-only root filesystems, to limitations on what can be mounted from the host (the EC2 instance the containers in a pod are running on).

Usage

A new EKS 1.13 cluster creates a default policy named eks.privileged that has no restriction on what kind of pod can be accepted into the system (equivalent to running the cluster with the PodSecurityPolicy controller disabled).

To check the existing pod security policies in your EKS cluster:

$ kubectl get psp
NAME             PRIV   CAPS   SELINUX    RUNASUSER   FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES
eks.privileged   true   *      RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *

Now, to describe the default policy we’ve defined for you:

$ kubectl describe psp eks.privileged

As you can see in the output below – anything goes! This policy is permissive to any sort of pod specification:

Name:  eks.privileged

Settings:
  Allow Privileged:                       true
  Allow Privilege Escalation:             true
  Default Add Capabilities:               <none>
  Required Drop Capabilities:             <none>
  Allowed Capabilities:                   *
  Allowed Volume Types:                   *
  Allow Host Network:                     true
  Allow Host Ports:                       0-65535
  Allow Host PID:                         true
  Allow Host IPC:                         true
  Read Only Root Filesystem:              false
  SELinux Context Strategy: RunAsAny
    User:                                 <none>
    Role:                                 <none>
    Type:                                 <none>
    Level:                                <none>
  Run As User Strategy: RunAsAny
    Ranges:                               <none>
  FSGroup Strategy: RunAsAny
    Ranges:                               <none>
  Supplemental Groups Strategy: RunAsAny
    Ranges:                               <none>

Note that any authenticated users can create any pods on this EKS cluster as currently configured, and here’s the proof:

$ kubectl describe clusterrolebindings eks:podsecuritypolicy:authenticated

The  output of above command shows that the cluster role eks:podsecuritypolicy:privileged is assigned to any system:authenticated users:

Name:         eks:podsecuritypolicy:authenticated
Labels:       eks.amazonaws.com/component=pod-security-policy
              kubernetes.io/cluster-service=true
Annotations:  kubectl.kubernetes.io/last-applied-configuration: ...

Role:
  Kind:  ClusterRole
  Name:  eks:podsecuritypolicy:privileged
Subjects:
  Kind   Name                  Namespace
  ----   ----                  ---------
  Group  system:authenticated

Note that if multiple PSPs are available, the Kubernetes admission controller selects the first policy that validates successfully. Policies are ordered alphabetically by their name, and a policy that does not change pod is preferred over mutating policies.

Now let’s create a new PSP that we will call  eks.restrictive . First, create a dedicated namespace as well as a service account. We’ll use this service account for a non-admin user:

$ kubectl create ns psp-eks-restrictive
namespace/psp-eks-restrictive created

$ kubectl -n psp-eks-restrictive create sa eks-test-user
serviceaccount/eks-test-user created

$ kubectl -n psp-eks-restrictive create rolebinding eks-test-editor \
             --clusterrole=edit \
             --serviceaccount=psp-eks-restrictive:eks-test-user

rolebinding.rbac.authorization.k8s.io/eks-test-editor created

Next, create two aliases to highlight the difference between admin and non-admin users:

$ alias kubectl-admin='kubectl -n psp-eks-restrictive'
$ alias kubectl-dev='kubectl --as=system:serviceaccount:psp-eks-restrictive:eks-test-user -n psp-eks-restrictive'

Now, with the cluster admin role, create a policy that disallows creation of pods using host networking:

$ cat > /tmp/eks.restrictive-psp.yaml <<EOF
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: eks.restrictive
spec:
  hostNetwork: false
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  volumes:
  - '*'
EOF

$ kubectl-admin apply -f /tmp/eks.restrictive-psp.yaml
podsecuritypolicy.policy/eks.restrictive created

Also, don’t forget to remove the default (permissive policy) eks.privileged :

$ kubectl delete psp eks.privileged
$ kubectl delete clusterrole eks:podsecuritypolicy:privileged
$ kubectl delete clusterrolebindings eks:podsecuritypolicy:authenticated

WARNING
Deleting the default EKS policy before adding your own PSP can impair the cluster. When you delete the default policy, no pods can be created on the cluster, except those that meet the security context in your new namespace. For an existing cluster, be sure to create multiple restrictive policies that cover all of your running pods and namespaces before deleting the default policy

Now, to confirm that the policy has been created:

$ kubectl get psp
NAME              PRIV    CAPS   SELINUX    RUNASUSER   FSGROUP    SUPGROUP   READONLYROOTFS   VOLUMES
eks.restrictive   false          RunAsAny   RunAsAny    RunAsAny   RunAsAny   false            *

Finally, try creating a pod that violates the policy, as the unprivileged user (simulating a developer):

$ kubectl-dev apply -f- <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
    - name: busybox
      image: busybox
      command: [ "sh", "-c", "sleep 1h" ]
EOF

As you might expect, you get the following result:

Error from server (Forbidden): error when creating "STDIN": pods "busybox" is forbidden: unable to validate against any pod security policy: []

The above operation failed because we have not yet given the developer the appropriate permissions. In other words, there is no role binding for the developer user eks-test-user. So let’s change this by creating a role psp:unprivileged for the pod security policy eks.restrictive:

$ kubectl-admin create role psp:unprivileged \
  --verb=use \
  --resource=podsecuritypolicy \
  --resource-name=eks.restrictive

role.rbac.authorization.k8s.io/psp:unprivileged created

Now, create the rolebinding to grant the eks-test-user the use verb on the eks.restrictive policy.

$ kubectl-admin create rolebinding eks-test-user:psp:unprivileged \
  --role=psp:unprivileged \
  --serviceaccount=psp-eks-restrictive:eks-test-user

rolebinding.rbac.authorization.k8s.io/eks-test-user:psp:unprivileged created

To  verify that eks-test-user can use the PSP eks.restrictive:

$ kubectl-user auth can-i use podsecuritypolicy/eks.restrictive
yes

At this point in time the developer eks.restrictive user should be able to create a pod:

$ kubectl-user apply -f- <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
    - name: busybox
      image: busybox
      command: [ "sh", "-c", "sleep 1h" ]
EOF
pod/busybox created

Yay, that worked! However, we would expect that a host networking-based pod creation should be rejected, because of what we defined in our eks.restrictive PSP, above:

$ kubectl-user apply -f- <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: privileged
spec:
  hostNetwork: true
  containers:
    - name: busybox
      image: busybox
      command: [ "sh", "-c", "sleep 1h" ]
EOF

Error from server (Forbidden): error when creating "STDIN": pods "privileged" is forbidden: unable to validate against any pod security policy: [spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]

Great! This confirms that the PSP  eks.restrictive works as expected, restricting the privileged pod creation by the developer.

What’s new

For all new EKS clusters using Kubernetes version 1.13, PSPs are now available. For clusters that have been upgraded from previous versions, a fully-permissive PSP is automatically created during the upgrade process. Your main task is to define sensible PSPs that are scoped for your environment, and enable them as described above. By sensible, I mean that (for example) you may choose to be less restrictive in a dev/test environment compared to a production environment. Or, equally possible, different projects or teams might require different levels of protection and hence different PSPs.

Here’s a final tip: as a cluster admin, be sure to educate your developers about security contexts in general and PSPs in particular. Have your CI/CD pipeline testing PSP as part of your smoke tests, along with other security-related topics such as testing permissions defined via RBAC roles and bindings.

You can learn more about PSP in the Amazon EKS documentation. Please leave any comments below or reach out to me via Twitter!

— Michael

from AWS Open Source Blog

Growing and Sharing Open Source Wisdom Globally

Growing and Sharing Open Source Wisdom Globally

Open Source in the Enterprise book in 4 languages, cover images.

During my keynote at O’Reilly Media’s Open Source Conference (OSCON) in July, 2018, I had the pleasure of announcing a new book, Open Source in the Enterprise.  A collaboration between O’Reilly Media and AWS, this book is intended to empower enterprises with open source best practices. Our hope was to help the reader gain the many benefits of open source software and its innovative, collaborative development process.

Although I wrote this guide with a single co-author, Andy Oram, it embodied the open source wisdom of crowds and drew insights from many reviewers. The book is for anyone who wants to learn more about consuming open source code, effective collaboration with communities, or how to engage so that your contributions are accepted. We packed it with insights into the culture, values, and benefits of open source software development. The book can assist decision makers, code developers, and everyone who works with them on projects touched by software.

Over the past year, the book has been shared with customers globally in many forums: AWS Summits, re:Invent, and open source software conferences including All Things Open and O’Reilly Media’s Software Architecture. If you missed getting a copy, you’ll find the English language pdf here. Now, AWS is offering four translations to spread these insights still further and more deeply into organizations worldwide.

We all live in a connected world, and no human endeavor reflects that development more than the open source movement. It crosses borders, enables global collaboration, and helps millions globally to build new futures. In that spirit, we’ve translated and are distributing “Open Source in the Enterprise” in Simplified Chinese, French, Korean, and Spanish. The book content is available under a Creative Commons 4.0. License. In addition to the English-language version, all four translations are free for download at opensource.amazon.com.

If you are at the AWS Summit, June 20, or KubeCon, CloudNativeCon China, June 25, both located in Shanghai, look out for the print version of the book at the AWS booth.

We’d love to hear of any topics in the book you’d like to learn more about, via [email protected]. And we welcome both feedback and input. Help us share and build on our collective open source knowledge to support the global community. Also let us know what other languages would be well served by a translation, so we can add those to our list.

from AWS Open Source Blog

Manage Your Open Distro for Elasticsearch Alerting Monitors With odfe-monitor-cli

Manage Your Open Distro for Elasticsearch Alerting Monitors With odfe-monitor-cli

When you use Open Distro for Elasticsearch Alerting, you create monitors in Kibana. Setting up monitors with a UI is fast and convenient, making it easy to get started. If monitoring is a major workload for your cluster, though, you may have hundreds or even thousands of monitors to create, update, and tune over time. Setting so many monitors using the Kibana UI would be time-consuming and tedious. Fortunately, the Alerting plugin has a REST API that makes it easier for you to manage your monitors from the command line.

If you’re new to the alerting features in Open Distro for Elasticsearch, take a look at some prior posts, where we covered the basics of setting up a monitor in Kibana and alerting on Open Distro for Elasticsearch Security audit logs.

The Alerting plugin’s REST API lets you perform CRUD and other operations on your monitors. odfe-monitor-cli uses this API for its requests, but lets you save your monitors in YAML files. You can build an automated pipeline to deploy monitors to your cluster and use that pipeline to deploy the same monitors to multiple clusters that support development, testing, and production. You can maintain your monitors in a source control system for sharing, versioning, and review. The CLI helps you guard against drift by reading monitors from your cluster and diffing them against your YAML files.

This blog post explains how to manage your monitors using YAML files through odfe-monitor-cli, available on GitHub under the Apache 2.0 license.

Prerequisite

odfe-monitor-cli currently uses HTTP basic authentication. Make sure basic authentication is enabled on your cluster.

Install odfe-monitor-cli

The install process is a single command:

curl -sfL https://raw.githubusercontent.com/mihirsoni/odfe-monitor-cli/master/godownloader.sh | bash -s -- -b /usr/local/bin

Note: See the odfe-monitor-cli README for other installation methods and instructions on how to build from source.

Once installation is successful, verify that it works as expected:

$ odfe-monitor-cli
This application will help you to manage the Opendistro alerting monitors using YAML files.

Usage:
  odfe-monitor-cli [command]
...

Create and sync destinations

You define destinations in Open Distro for Elasticsearch Alerting to specify where messages (Slack, Chime, or custom) should be sent. odfe-monitor-cli doesn’t support managing destinations yet, so you need to use the Kibana UI to create them.

First, navigate to https://localhost:5601 to access Kibana. Log in, and select the Alerting tab. Select Destinations, and create a destination.

 

Open Distro for Elasticsearch's Alerting Destination definition pane. Setting up a destination for alerts.

On your computer, create a new directory, odfe-monitor-cli.This directory will hold the monitors you create, and any monitors or destinations you sync from your cluster.

$ mkdir odfe-monitor-cli
$ cd odfe-monitor-cli
$ odfe-monitor-cli sync --destinations #Sync remote destination

The final command in that sequence fetches all remote destinations and writes them to a new file, destinations.yml. The file contains a map of destination names and IDs. You’ll use the destination name later when you create a monitor. If you view the file using cat destinations.yml, it should look like this:

#destinations.yml file content
sample_destination: _6wzIGsBoP5_pydBFBzc

If you already have existing monitors on your cluster and would like to preserve them, you can sync those, as well. If not, skip this step. This command fetches all remote monitors to monitors.yml:

odfe-monitor-cli sync --monitors #Sync existing remote monitors

You can add additional directories under your root directory and break your monitors into multiple YAML files, organizing them however you see fit. When you use odfe-monitor-cli to send changes to your cluster, it walks the entire directory structure under the current directory, finding all .yml files. Use the --rootDir option to change the root directory to traverse.

Create a new monitor

Use a text editor to create a new file error-count-alert.yml. Copy and paste the yml below to that file, and change destinationId to the name of an existing destination. You can place your file anywhere in or below the odfe-monitor-cli directory.

- name: 'Sample Alerting monitor'
  schedule:
    period:
      interval: 10
      unit: MINUTES
  enabled: true
  inputs:
    - search:
        indices:
          - log* # Change this as per monitor, this is just an example
        query: # This block should be valid Elasticsearch query
          size: 0
          query:
            match_all: {
              boost: 1.0
            }
  triggers:
    - name: '500'
      severity: '2'
      condition: | #This is how you can create multiline
        // Performs some crude custom scoring and returns true if that score exceeds a certain value
        int score = 0;
        for (int i = 0; i < ctx.results[0].hits.hits.length; i++) {
          // Weighs 500 errors 10 times as heavily as 503 errors
          if (ctx.results[0].hits.hits[i]._source.http_status_code == "500") {
            score += 10;
          } else if (ctx.results[0].hits.hits[i]._source.http_status_code == "503") {
            score += 1;
          }
        }
        if (score > 99) {
          return true;
        } else {
          return true;
        }
      actions:
        - name: Sample Action
          destinationId: sample_destination #This destination should be available in destinations.yaml file otherwise it will throw an error.
          subject: 'There is an error'
          message: |
            Monitor  just entered an alert state. Please investigate the issue.
            - Trigger: 
            - Severity: 
            - Period start: 
            - Period end: 

odfe-monitor-cli provides a diff command that retrieves monitors from your cluster and walks your local directory structure to show you any differences between your cluster’s monitors and your local monitors. You can use the diff command to validate that no one has changed the monitors in your cluster. For now, call the diff command to verify that it finds the new monitor you just created.

$ odfe-monitor-cli diff
---------------------------------------------------------
 These monitors are currently missing in alerting
---------------------------------------------------------
name: 'Sample Alerting monitor'
type: 'monitor'
schedule:
...

After verifying the diff, you could get any new or changed monitors reviewed by peers, or approved by your management or security department.

You use the push command to send your local changes to your Open Distro for Elasticsearch cluster. When you use push, odfe-monitor-cli calls the Run Monitor API to verify your monitor configurations and ensure that there are no errors. If any error occurs, odfe-monitor-cli displays the error with details. You can fix them and re-run the push command until you get a clean run.

By default, the push command runs in dry run mode, simply diffing and checking the syntax of any additions. Because it doesn’t publish anything to the cluster, it won’t publish any accidental changes. Use the --submit option to send your changes to your cluster when you’re ready:

$ odfe-monitor-cli push --submit

The push command does the following:

  • Runs and validates modified and new monitors.
  • Creates new monitors and updates existing monitors when the --submit flag is provided.
    Warning
    : Pushing changes with --submit overrides any changes you have made to existing monitors on your cluster (via Kibana or any other way).
  • Does not delete any monitors. Provide --delete along with --submit to delete all untracked monitors. Be careful! You can’t un-delete monitors.

Conclusion

This post introduced you to odfe-monitor-cli, a command-line interface for managing monitors on your Open Distro for Elasticsearch cluster. odfe-monitor-cli makes it easy to store your monitors in version control and deploy these monitors to your Open Distro for Elasticsearch cluster. You can validate that your monitors work as intended and share monitors between environments.

Have an issue or question? Want to contribute? Check out the Open Distro for Elasticsearch forums. You can file issues here. We welcome your participation on the project! See you on the forums and code repos!

from AWS Open Source Blog

Scale HPC Workloads with Elastic Fabric Adapter and AWS ParallelCluster

Scale HPC Workloads with Elastic Fabric Adapter and AWS ParallelCluster

In April, 2019, AWS announced the general availability of Elastic Fabric Adapter (EFA), an EC2 network device that improves throughput and scalability of distributed High Performance Computing (HPC) and Machine Learning (ML) workloads. Today, we’re excited to announce support of EFA through AWS ParallelCluster.

EFA is a network interface for Amazon EC2 instances that enables you to run HPC applications requiring high levels of inter-instance communications (such as computational fluid dynamics, weather modeling, and reservoir simulation) at scale on AWS. It uses an industry-standard operating system bypass technique, with a new custom Scalable Reliable Datagram (SRD) Protocol to enhance the performance of inter-instance communications, which is critical to scaling HPC applications. For more on EFA and supported instance types, see Elastic Fabric Adapter (EFA) for Tightly-Coupled HPC Workloads.

AWS ParallelCluster takes care of the undifferentiated heavy lifting involved in setting up an HPC cluster with EFA enabled. When you set the enable_efa = compute flag in your cluster section, AWS ParallelCluster will add EFA to all network-enhanced instances. Under the cover, AWS ParallelCluster performs the following steps:

  1. Sets InterfaceType = efa in the Launch Template.
  2. Ensures that the security group has rules to allow all inbound and outbound traffic to itself. Unlike traditional TCP traffic, EFA requires an inbound rule and an outbound rule that explicitly allow all traffic to its own security group ID sg-xxxxx. See Prepare an EFA-enabled Security Group for more information.
  3. Installs EFA kernel module, an AWS-specific version of the Libfabric Network Stack, and OpenMPI 3.1.4.
  4. Validates instance type, base os, and a placement group.

To get started, you’ll need to have AWS ParallelCluster set up, see Getting Started with AWS ParallelCluster. For this tutorial, we’ll assume that you have an AWS ParallelCluster installed and are familiar with the ~/.parallelcluster/config file.

Modify your ~/.parallelcluster/config file to include a cluster section that minimally includes the following:

[global]
cluster_template = efa
update_check = true
sanity_check = true

[aws]
aws_region_name = [your_aws_region]

[cluster efa]
key_name =               [your_keypair]
vpc_settings =           public
base_os =                alinux
master_instance_type =   c5.xlarge
compute_instance_type =  c5n.18xlarge
placement_group =        DYNAMIC
enable_efa = compute [vpc public]
vpc_id = [your_vpc]
master_subnet_id = [your_subnet]
  • base_os – currently we support Amazon Linux (alinux), Centos 7 (centos7), and Ubuntu 16.04 (ubuntu1604) with EFA.
  • master_instance_type This can be any instance type (it is outside of the placement group formed for the compute nodes and does not have EFA enabled). We chose c5n.xlarge due to its cheaper price yet still good network performance, as compared with the c5n.18xlarge.
  • compute_instance_type EFA is enabled only on the compute nodes; this is where your code runs when submitted as a job through one of the schedulers, and these instances need to be one of the supported instance types, which at the time of writing includes c5n.18xlarge, i3en.24xlarge, p3dn.24xlarge. See the docs for Currently supported instances.
  • placement_group places your compute nodes physically adjacent, which enables you to benefit fully from EFA’s low network latency and high throughput.
  • enable_efa This is the only new parameter we’ve added to turn on EFA support for the compute nodes. At this time, the only option is compute. This is designed to draw your attention to the fact that EFA is only enabled on the compute nodes.

Now you can create the cluster:

$ pcluster create efa
Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: 3.215.238.41
ClusterUser: ec2-user
MasterPrivateIP: 172.31.25.64

Once cluster creation is complete, you can SSH into the cluster:

$ pcluster ssh efa -i ~/path/to/ssh_key

You can now see that there’s a module, openmpi/3.1.4, available. When this is loaded, you can confirm that mpirun is correctly set on the PATH to be the EFA-enabled version in /opt/amazon/efa:

[[email protected] ~]$ module avail

----------------------------------------------- /usr/share/Modules/modulefiles ------------------------------------------------
dot           module-git    module-info   modules       null          openmpi/3.1.3 use.own
[[email protected] ~]$ module load openmpi/3.1.4
[[email protected] ~]$ which mpirun
/opt/amazon/efa/bin/mpirun

This version of openmpi is compiled with support for libfabric, a library that allows us to communicate over the EFA device through standard mpi commands. At the time of writing, Open MPI is the only mpi library that supports EFA; Intel MPI is expected to be released shortly.

Now you’re ready to submit a job. First create a file submit.sge containing the following:

#!/bin/bash
#$ -pe mpi 2

module load openmpi
mpirun -N 1 -np 2 [command here]

CFD++ Example

EFA speeds up common workloads, such as Computational Fluid Dynamics. In the following example, we ran CFD++ on a 24M cell case using EFA-enabled c5n.18xlarge instances. CFD++ is a flow solver developed by Metacomp Technologies. The model is an example of a Mach 3 external flow calculation (it’s a Klingon bird of prey):

example of a Mach 3 external flow calculation.

You can see the two scaling curves below; the blue curve shows scaling with EFA; the purple curve without EFA. EFA offers significantly greater scaling and is many times more performant at higher core counts.

scaling curves, with and without EFA.

New Docs!

Last, but definitely not least, we are also excited to announce new docs for AWS ParallelCluster. These are available in ten languages and simply the readthedocs version in many ways. Take a look! Of course, you can still submit doc updates by creating a pull request on the AWS Docs GitHub repo.

AWS ParallelCluster is a community-driven project. We encourage submitting a pull request or providing feedback through GitHub issues. User feedback drives our development and pushes us to excel in every way!

from AWS Open Source Blog

Set up Multi-Tenant Kibana Access in Open Distro for Elasticsearch

Set up Multi-Tenant Kibana Access in Open Distro for Elasticsearch

Elasticsearch has become a default choice for storing and analyzing log data to deliver insights on your application’s performance, your security stance, and your users’ interactions with your application. It’s so useful that many teams adopt Elasticsearch early in their development cycle to support DevOps. This grass-roots adoption often mushrooms into a confusing set of clusters and users across a large organization. At some point, you want to centralize logs so that you can manage your spending and usage more closely.

The flip side of a centralized logging architecture is that you must manage access to the data. You want your payments processing department to keep its data private and invisible from, for example, your front end developers. Open Distro for Elasticsearch Security allows you to manage access to data at document- and field-level granularity. You create Roles, assign Action Groups to those roles, and map Users to the roles to control their access to indices.

Access control for Kibana is harder to achieve. Kibana’s visualizations and dashboards normally share a common index, .kibana. If your users have access to that index, then they have access to all of the visualizations in it. The Open Distro for Elasticsearch Security fixes this problem by letting you define Tenants — silos that segregate visualizations and dashboards, for a multi-tenant Kibana experience. In this post, I’ll walk through setting up multi-tenancy for two hypothetical departments, payments and front end.

Prerequisites

Kibana multi-tenancy is enabled out of the box in Open Distro for Elasticsearch. If you have disabled multi-tenancy, our documentation will guide you in enabling it. You’ll also need a running Open Distro for Elasticsearch cluster. I ran esrally, using the http_logs track to generate indexes in my cluster. I’ll use logs-221998 for the payments department and logs-211998 for the front end department.

Important: You must give your roles, users, and tenants different names! I’ve used the convention of appending -role, -user, and -tenant to ensure that the names are unique.

Set up roles

Roles are the basis for access control in Open Distro for Elasticsearch Security. Roles allow you to specify which actions its users can take, and which indices those users can access.

I’ll create two roles — payments-role and frontend-role — each with access to the appropriate underlying index. To create a role, navigate to https://localhost:9200. Log in with a user that has administrator rights (the default admin user is my choice). Click Explore on my own to dismiss the splash screen and click the Security tab in Kibana’s left rail, then click the Roles button:

 

Open Distro for Elasticsearch Security plugin main panel, selecting the roles button

 

Next, click the “+” button to add a new role.

 

Open Distro for Elasticsearch Security plugin, pane for adding a new role

 

In the Overview section, name the role payments-role and then click the Index Permissions tab at the top of the page. You can also give the role cluster-level permissions in the Cluster Permissions tab. For the purposes of this post, I’ll limit to index-level access control.

 

Open Distro for Elasticsearch Security plugin, pane setting index access permission

In the Index Permissions tab, click the Add new index and document Type button. In the resulting page, select logs-221998 from the Index drop-down, then click Save.

 

Open Distro for Elasticsearch Security plugin restricting a role's access to a particular index

 

Clicking Save reveals a Permissions: Action Groups drop-down. Select ALL (you can set permissions for this user to be more restricted, by choosing READ, for example, which would limit to read-only access). Don’t click Save Role Definition yet, you still need to add the tenant. Select the Tenants tab and click the Add button. Fill in the Tenant field with payments-tenant. You can use any unique value for the field; it’s just a name you choose to refer to the tenant.

You are done configuring this role. Click Save Role Definition.

 

Open Distro for Elasticsearch Security plugin, pane for adding a tenant to a role

 

Repeat this process to create the frontend-role role, with access to a different index and a different tenant name. I’m using logs-211998, and frontend-tenant in my cluster.

Set up users

Users in Open Distro for Elasticsearch are authenticated entities. You add them to roles to grant them the permissions that those roles allow. The Open Distro for Elasticsearch Security has an internal user database. If you authenticate directly, via Kibana login or basic HTTP authentication, you will directly be assigned to that user.

You’ll also see the term “backend” in many of the following screens. A backend role is the role supplied by a federated identity provider. The backend role is distinct from, and mapped onto, internal roles and users. It’s a little confusing, but for this post, we can ignore the backend roles.

I’ll create two users — payments-user and frontend-user. Click the Security tab in Kibana’s left rail, then click the Internal User Database button.

 

Open Distro for Elasticsearch Security plugin, main panel selecting the internal user database

 

Click the ‘+’ symbol to create Add a new internal user:

 

Open Distro for Elasticsearch Security plugin, pane for adding a new user to the internal user database

 

Fill in the Username, Password, and Repeat password fields. Click Submit.

 

Open Distro for Elasticsearch Security plugin, pane for setting a new user's username and password

 

Repeat this process to create the frontend-user and frontend-tenant.

Map Users to Roles

The last step is to map the users you created (along with their tenants) to the roles that you created. Click the Security tab and then the Role Mappings button.

 

Open Distro for Elasticsearch Security plugin, main panel, selecting the role mappings button

 

Click the “+” button to add a new role mapping. Select payments-role from the Role dropdown. Click the + Add User button, and type payments-user in the text box. Finally, click Submit.

 

Open Distro for Elasticsearch Security plugin, mapping the user onto the role

 

Repeat the process for frontend-role and frontend-user user.

In order for your users to be able to use Kibana, you need to add them to the kibana_user role, too. From the role mappings screen, click the edit pencil for kibana_user.

 

Open Distro for Elasticsearch Security plugin, showing where to click to add users to the kibana_user role

 

On the next screen, click Add User and type payments-user in the text box. Click Add User again to add the frontend-user. Click Submit to save your changes.

Congratulations, your setup is complete!

Test your tenants

Note: you may run into problems if your browser has cached your identity in cookies. To test in a clean environment, with Firefox, use File > New Private Window, which opens a window with no saved cookies. In Chrome, use File > New Incognito Window.

To test your tenancy and access control, you’ll create a visualization as the payments-user, with the payments-tenant, and verify that you cannot access that visualization when you log in as the frontend-user. In your new window, navigate to https://localhost:5601 and log in as the payments-user user. Click on Explore on my own to dismiss the splash screen. Click the Tenants tab in Kibana’s left rail.

 

Open Distro for Elasticsearch Security plugin selecting the tenant for Kibana visualizations and dashboards

 

You can see that the Private tenant is currently selected. Every role has a Global tenant and a Private tenant. Work that you do when you select the Global tenant is visible to all other users/tenants. Work that you do in the Private tenant is visible only to the logged-in user (currently, the payments-user). Lastly, you can see the payments-tenant tenant. Only users that have roles with the payments-tenant can see visualizations and dashboards created when that tenant is selected. Click Select next to the payments-tenant to choose the payments-tenant tenant.

Now you need to create and save a visualization. First, create an index pattern. Click the Management tab, and then click Index Patterns. Type logs-221998 in the Index Pattern text box. Click Next Step. On the following screen, set your Time filter field name.

 

Creating an index pattern in Kibana, showing how to set the specific index

 

Note: Normally, you use a wildcard for your index pattern. Esrally created 6 indices in my cluster, all with the pattern logs-XXXXXX. When you set up your roles, you gave access to a specific index for each role. In this case, the payments-user has access only to the logs-221998 index. When you create a visualization as this user, Kibana will access all indices that match the wildcard in the index pattern you create now, including the other five indices that are prohibited. Kibana fails with an access error. To work around this issue, type the index name exactly. For centralized logging, make sure that each department uses a unique prefix for its indices. Then your index patterns can contain wildcard values for each department.

In the Visualize tab, build a simple Metric with traffic count (Note, Rally’s http_logs data has timestamps in 1998. You’ll need to set your time selector correctly to see any results.) Save it as payments-traffic.

 

A simple metric visualization in Kibana, with the value 10,716,760

 

Log out and log back in as frontend-user in a New Private Window. On the Tenants tab, you will see that you have the frontend-tenant, not the payments-tenant.

 

Open Distro for Elasticsearch Security plugin, tenant selection pane. The selected tenant does not have access to other tenants

 

Select the Visualize tab and you will be asked to create an index pattern. Use the logs-211998 index. Select the Visualize tab again. Kibana tells you that you have no visualizations.

Conclusion

In this post, you used Open Distro for Elasticsearch Security to create two users with their own Kibana tenants provided through the roles you assigned to them. Open Distro for Elasticsearch’s tenancy model keeps tenants segregated so that your payments department’s visualizations and dashboards are not visible to users in your front end department. You then further restricted access to the underlying indices so that users in your front end department can’t access the data in your payment department’s indices. You’ve created a silo that allows you to manage your sensitive data!

Join in on GitHub to improve project documentation, add examples, submit feature requests, and file bug reports. Check out the code, build a plugin, and open a pull request – we’re happy to review and figure out steps to integrate. We welcome your participation on the project. If you have any questions, don’t hesitate to ask on the community discussion forums.

from AWS Open Source Blog

Announcing Gluon Time Series, an Open-Source Time Series Modeling Toolkit

Announcing Gluon Time Series, an Open-Source Time Series Modeling Toolkit

Today, we announce the availability of Gluon Time Series (GluonTS), an MXNet-based toolkit for time series analysis using the Gluon API. We are excited to give researchers and practitioners working with time series data access to this toolkit, which we have built for our own needs as applied scientists working on real-world industrial time series problems both at Amazon and on behalf of our customers. GluonTS is available as open source software on Github today, under the Apache license, version 2.0.

Time series applications are everywhere

We can find time series data, i.e. collections of data points indexed by time, across many different fields and industries. The time series of item sales in retail, metrics from monitoring devices, applications, or cloud resources, or time series of measurements generated by Internet of Things sensors, are only some of the many examples of time series data. The most common machine learning tasks related to time series are extrapolation (forecasting), interpolation (smoothing), detection (such as outlier, anomaly, or change-point detection), and classification.

Within Amazon, we record and make use of time series data across a variety of domains and applications. Some of these include forecasting the product and labor demand in our supply chain, or making sure that we can elastically scale AWS compute and storage capacity for all AWS customers. Anomaly detection on system and application metrics allows us to automatically detect when cloud-based applications are experiencing operational issues.

With GluonTS, we are open-sourcing a toolkit that we’ve developed internally to build algorithms for these and similar applications. It allows machine learning scientists to build new time series models, in particular deep-learning-based models, and compare them with state-of-the-art models included in GluonTS.

GluonTS highlights

GluonTS enables users to build time series models from pre-built blocks that contain useful abstractions. GluonTS also has reference implementations of popular models assembled from these building blocks, which can be used both as a starting point for model exploration, and for comparison. We’ve included tooling in GluonTS to alleviate researchers’ burden of having to re-implement methods for data processing, backtesting, model comparison, and evaluation. All of these are a time-sink and a source of error — after all, a bug in evaluation code leading to mischaracterization of a model’s actual performance can be much more severe than a bug in an algorithm (which would be detected before it is deployed).

Building blocks for assembling new time series models

We have written GluonTS such that many components can be combined and assembled in different ways, so that we can come up with and test new models quickly. Perhaps the most obvious components to include are neural network architectures, and GluonTS offers a sequence-to-sequence framework, auto-regressive networks, and causal convolutions, to name just a few. We’ve also included finer-grained components. For example, forecasts should typically be probabilistic, to better support optimal decision making. For this, GluonTS offers a number of typical parametric probability distributions, as well as tools for modeling cumulative distribution functions or quantile functions directly, which can be readily included in a neural network architecture. Further probabilistic components such as Gaussian Processes and linear-Gaussian state-space models (including a Kalman filter implementation) are also included, so that combinations of neural network and traditional probabilistic models can easily be created. We’ve also included data transformations such as the venerable Box-Cox transformation, whose parameters can be learned jointly with other model parameters.

Easy comparison with state-of-the-art models

GluonTS contains reference implementations of deep-learning-based time series models from the literature, which showcase how to use the components and can be used as starting points for model exploration. We’ve included models from our own line of research, such as DeepAR and spline quantile function RNNs, but also sequence models from other domains such as WaveNet (originally for speech synthesis, adapted here for the forecasting use case). GluonTS makes it easy to compare against these reference implementations, and also allows easy benchmarking against other models from other open-source libraries, such as Prophet and the R forecast package.

Tooling

GluonTS includes tooling for loading and transforming input data, so that data in different forms can be used and transformed to meet the requirements of a particular model. We have also included an evaluation component that computes many of the accuracy metrics discussed in the forecasting literature, and we look forward to contributions from the community in adding more metrics. As there are subtleties around how exactly the metrics are computed, having a standardized implementation is invaluable for making meaningful and reproducible comparisons between different models.

While metrics are, of course, important, the work of exploring, debugging, and continuously improving models often starts with plotting results on controlled data. For plotting, we rely on Matplotlib, and we’ve included a synthetic data set generator that can simulate time series data with various configurable characteristics.

How does GluonTS relate to Amazon Forecast?

GluonTS is targeted towards researchers, i.e. machine learning, time series modeling, and forecasting experts who want to design novel time series models, build their models from scratch, or require custom models for special use cases. For production use cases and users who don’t need to build custom models, Amazon offers Amazon Forecast, a fully-managed service that uses machine learning to deliver highly accurate forecasts. With Amazon Forecast, no machine learning expertise is required to build accurate, machine learning-based time series forecasting models, as Amazon Forecast employs AutoML capabilities that take care the heavy lifting of selecting, building, and optimizing the right models for you.

Getting started with GluonTS

GluonTS is available on GitHub and on PyPi. After you’ve completed installation, it’s easy to arrive at your first forecast using a pre-built forecasting model. Once you have collected your data, training a model and producing the following plot takes about ten lines of Python.

GluonTS-graph.

The figure above shows the forecast for the volume of Tweets (every five minutes) mentioning the AMZN ticker symbol. This was obtained by training a model on data from the Numenta Anomaly Benchmark dataset.

It is early days for us and GluonTS. We expect GluonTS to evolve over time, and we will add more applications beyond forecasting. Some more work is needed to reach a 1.0 version. We look forward to feedback and contributions to GluonTS in the form of bug reports, proposals for feature enhancements, pull requests for new and improved functionality, and, of course, implementations of the latest and greatest time series models.

Related literature and upcoming events

We have a paper on GluonTS at the ICML 2019 Time Series workshop and we will be giving tutorials at SIGMOD 2019 and KDD 2019 on forecasting, where we will feature GluonTS.

A sub-selection of publications featuring models in GluonTS:

Also see, on the AWS Machine Learning blog: Creating neural time series models with Gluon Time Series.

Lorenzo Stella, Syama Rangapuram, Konstantinos Benidis, Alexander Alexandrov, David Salinas, Danielle Maddix, Yuyang Wang, Valentin Flunkert, Jasper Schulz, and Michael Bohlke-Schneider also contributed to this post as well as to GluonTS.

from AWS Open Source Blog

New! Open Distro for Elasticsearch’s Job Scheduler Plugin

New! Open Distro for Elasticsearch’s Job Scheduler Plugin

Open Distro for Elasticsearch’s JobScheduler plugin provides a framework for developers to accomplish common, scheduled tasks on their cluster. You can implement Job Scheduler’s Service Provider Interface (SPI) to take snapshots, manage your data’s lifecycle, run periodic jobs, and much more.

When you use Job Scheduler, you build a plugin that implements interfaces provided in the Job Scheduler library. You can schedule jobs by specifying an interval, or using a Unix Cron expression to define a more flexible schedule to execute your job. Job Scheduler has a sweeper that listens for update events on the Elasticsearch cluster, and a scheduler that manages when jobs run.

Build, install, code, run!

You can build and install the Job Scheduler plugin by following the instructions in the Open Distro for Elasticsearch Job Scheduler GitHub repo.

Please take a look at the source code – play with it, build with it! Let us know if it doesn’t support your use case or if you have ideas for how to improve it. The sample-extension-plugin example code in the Job Scheduler source repo provides a complete example of using Job Scheduler.

Join in on GitHub to improve project documentation, add examples, submit feature requests, and file bug reports. Check out the code, build a plugin, and open a pull request – we’re happy to review and figure out steps to integrate. We welcome your participation on the project. If you have any questions, don’t hesitate to ask on the community discussion forums.

from AWS Open Source Blog

Store Open Distro for Elasticsearch’s Performance Analyzer Output in Elasticsearch

Store Open Distro for Elasticsearch’s Performance Analyzer Output in Elasticsearch

Open Distro for Elasticsearch‘s Performance Analyzer plugin exposes a REST API that returns metrics from your Elasticsearch cluster. To get the most out of these metrics, you can store them in Elasticsearch and use Kibana to visualize them. While you can use Open Distro for Elasticsearch’s PerfTop to build visualizations, PerfTop doesn’t retain data and is meant to be lightweight.

In this post, I’ll explore Performance Analyzer’s API through a code sample that reads Performance Analyzer’s metrics and writes them to Elasticsearch. You might wonder why Performance Analyzer doesn’t do that already (we welcome your pull requests!). Performance Analyzer is designed as a lightweight co-process for Elasticsearch. If your Elasticsearch cluster is in trouble, it might not be able to respond to requests, and Kibana might be down. If you adopt the sample code, I recommend that you send the data to a different Open Distro for Elasticsearch cluster to avoid this issue.

You can follow along with the sample code I published in our GitHub Community repository. The code is in the pa-to-es folder when you clone the repository. You can find information about the other code samples in past blog posts.

Code overview

The pa-to-es folder contains three Python files (Python version 3.x required) and an Elasticsearch template that sets the type of the @timestamp field to be date. main.py is the application, consisting of an infinite loop that calls Performance Analyzer – pulling metrics, parsing those metrics, and sending them to Elasticsearch:

    while 1:
        print('Gathering docs')
        docs = MetricGatherer().get_all_metrics()
        print('Sending docs: ', len(docs))
        MetricWriter(get_args()).put_doc_batches(docs)

As you can see, main.py supplies two classes — MetricGatherer and MetricWriter— to communicate with Elasticsearch. MetricGatherer.get_all_metrics() loops through the working metric descriptions in metric_descriptions.py calling get_metric() for each.

To get the metrics, MetricGatherer generates a URL of the form:

http://localhost:9600/_opendistro/_performanceanalyzer/metrics?metrics=<metric>&dim=<dimensions>&agg=<aggregation>&nodes=all

(You can get more details on Performance Analyzer’s API in our documentation.) The metric descriptions are namedtuples, providing metric/dimension/aggregation trios. It would be more efficient to send multiples, but I found parsing the results so much more complicated that it made any performance gains less important. To determine the metric descriptions, I generated all of the possible combinations of metric/dimension/aggregation, tested, and retained the working descriptions in metric_descriptions.py. It would be great to build an API that exposes valid combinations rather than working from a static set of descriptions (did I mention, we welcome all pull requests?).

MetricGatherer uses result_parse.ResultParser to interpret the output of the call to Performance Analyzer. The output JSON consists of one element per node. Within that element, it returns a list of fields, followed by a set of records:

{
  "XU9kOXBBQbmFSvkGLv4iGw": {
    "timestamp": 1558636900000,
     "data": {
      "fields":[
        {
          "name":"ShardID",
          "type":"VARCHAR"
        },
        {
          "name":"Latency",
          "type":"DOUBLE"
        },
        {
          "name":"CPU_Utilization",
          "type":"DOUBLE"
        }
      ],
      "records":[
        [
          null,
          null,
          0.016093937677199393
        ]
      ]
    }
  }, ...

ResultParser zips together the separated field names and values and generates a dict, skipping empty values. The records generator function uses this dict as the basis for its return, adding the timestamp from the original return body. records also adds the node name and the aggregation as fields in the dict to facilitate visualizing the data in Kibana.

MetricWriter closes the loop, taking the collection of dicts, each of which will be written as a document to Elasticsearch, building a _bulk body, and POSTing that batch to Elasticsearch. As written, the code is hard-wired to send the _bulk to https://localhost:9200. In practice, you’ll want to change the output to go to a different Elasticsearch cluster. The authentication for the POST request is admin:admin – be sure to change that when you change your passwords for Open Distro for Elasticsearch.

Add the template to your cluster

You can run the code as written, and you will see data flow into your Open Distro for Elasticsearch cluster. However, the timestamp returned by Performance Analyzer is a long int, Elasticsearch will set the mapping as number, and you won’t be able to use Kibana’s time-based functions for the index. I could truncate the timestamp or rewrite it so that the mapping is automatically detected. I chose instead to set a template.

The below template (template.json in the pa-to-es folder) sets the field type for @timestamp to date. You need to send this template to Elasticsearch before you send any data, auto-creating the index. (If you already ran pa-to-es, don’t worry, just DELETE any indices that it created.) You can use Kibana’s developer pane to send the template to Elasticsearch.

Navigate to https://localhost:5601. Log in, dismiss the splash screen, and select the DevTools tab. Click Get to work. Copy-paste the below text into the interactive pane and click the triangle to the right. (Depending on the version of Elasticsearch you’re running, you may receive a warning about type removal. It’s OK to ignore this warning.)

POST _template/pa 
{
    "index_patterns": ["pa-*"],
    "settings": {
        "number_of_shards": 1
    },
    "mappings": {
        "log": {
            "properties": {
                "@timestamp": {
                    "type": "date"
                }
            }
        }
    }
}

Monitoring Elasticsearch

I ran esrally, with the http_logs track against my Open Distro for Elasticsearch, and also ran main.py to gather metrics. I then used the data to build a Kibana dashboard for monitoring my cluster.

A kibana dashboard with metrics gathered by Open Distro for Elasticsearch's Performance Analyzer plugin

Conclusion

The metrics stored in Elasticsearch documents have a single metric/dimensions/aggregation combination, giving you freedom to build Kibana visualizations at the finest granularity. For example, my dashboard exposes CPU utilization down to the Elasticsearch operation level, the disk wait time on each node, and read and write throughput for each operation. In a future post, I will dive deep on building out dashboards and other visualizations with Performance Analyzer data.

from AWS Open Source Blog