Category: Open Source

Add Single Sign-On (SSO) to Open Distro for Elasticsearch Kibana Using SAML and Okta

Add Single Sign-On (SSO) to Open Distro for Elasticsearch Kibana Using SAML and Okta

Open Distro for Elasticsearch Security implements the web browser single sign-on (SSO) profile of the SAML 2.0 protocol. This enables you to configure federated access with any SAML 2.0 compliant identity provider (IdP). In a prior post, I discussed setting up SAML-based SSO using Microsoft Active Directory Federation Services (ADFS). In this post, I’ll cover the Okta-specific configuration.

Prerequisites

 
User  Okta Group

Open Distro Security role

esuser1 ESAdmins all_access
esuser2 ESUsers readall
esuser3 N/A N/A

 

Okta configuration

In your Okta account, click on Application -> Add Application -> Create New App.

add new application

In the next screen, choose Web app as type, SAML 2.0 as the authentication method, and click Create. In the next screen, type in an application name and click Next.

select integration type

In SAML settings, set Single sign on URL and the Audience URI (SP Entity ID). Enter the below kibana url as the Single sign on URL.

https://<kibana_base_url>:<kibana_port>/_opendistro/_security/saml/acs

Make sure to replace the kibana_base_url and kibana_port with your actual Kibana configuration as noted in the prerequisites. In my setup this is https://new-kibana.ad.example.com:5601/....

Add a string for the Audience URI. You can choose any name here. I used kibana-saml. You will use this name in the Elasticsearch Security plugin SAML config as the SP-entity-id.

saml settings

You will pass the user’s group memberships from Okta to Elasticsearch using Okta’s group attribute statements. Set the Name to “Roles”. The name you choose must match the roles_key defined in Open Distro Security’s configuration. Click Next and Finish.

roles to group mapping

On the Application Settings screen, click Identity Provider metadata link to download the metadata XML file and copy it to the Elasticsearch config directory. Set the idp.metadata_file property in Open Distro Security’s config.yml file to the path of the XML file. The path has to be specified relative to the config directory (you can also specify metadata_url instead of file).

downlooad idp metadata file

This metadata file contains the idp.entity_id.

metadata file showing entity id

To complete the configuration of Open Distro for Elasticsearch Security, refer to my prior post on adding single sign-on with ADFS. Follow the steps in that post to map Open Distro Security roles to Okta groups, update Open Distro Security configuration and Kibana configuration, and restart Kibana. My copy of the Security config file with Okta integration is as below:

...
        http_enabled: true
        transport_enabled: true
        order: 1
        http_authenticator:
          type: saml
          challenge: true
          config:
 idp: metadata_file: okta-metadata.xml entity_id: http://www.okta.com/exksz5jfvfaUjGSuU356 sp: entity_id: kibana-saml kibana_url: https://new-kibana.ad.example.com:5601/
            exchange_key: 'MIIDAzCCAeugAwIB...'
        authentication_backend:
          type: noop
...

Once you restart Kibana, you are ready to test the integration. You should observe the same behavior as covered in the ADFS post.

okta login screen

 

kibana esuser2 read screenshot

Conclusion

In this post, I covered SAML authentication for Kibana single sign-on with Okta. You can use a similar process to configure integration with any SAML 2.0 compliant Identity provider. Please refer to the Open Distro for Elasticsearch documentation for additional configuration options for Open Distro for Elasticsearch Security configuration with SAML.

Have an issue or a question? Want to contribute? You can get help and discuss Open Distro for Elasticsearch on our forums. You can file issues here.

from AWS Open Source Blog

Demystifying Elasticsearch Shard Allocation

Demystifying Elasticsearch Shard Allocation

At the core of Open Distro for Elasticsearch’s ability to provide a seamless scaling experience lies its ability distribute its workload across machines. This is achieved via sharding. When you create an index you set a primary and replica shard count for that index. Elasticsearch distributes your data and requests across those shards, and the shards across your data nodes.

The capacity and performance of your cluster depends critically on how Elasticsearch allocates shards on nodes. If all of your traffic goes to one or two nodes because they contain the active indexes in your cluster, those nodes will show high CPU, RAM, disk, and network use. You might have tens or hundreds of nodes in your cluster sitting idle while these few nodes melt down.

In this post, I will dig into Elasticsearch’s shard allocation strategy and discuss the reasons for “hot” nodes in your cluster. With this understanding, you can fix the root cause issues to achieve better performance and a more stable cluster.

Shard skew can result in cluster failure

In an optimal shard distribution, each machine has uniform resource utilization: every shard has the same storage footprint, every request is serviced by every shard, and every request uses CPU, RAM, disk, and network resources equally. As you scale vertically or horizontally, additional nodes contribute equally to performing the work of the cluster, increasing its capacity.

So much for the optimal case. In practice, you run more than one index in a cluster, the distribution of data is uneven, and requests are processed at different rates on different nodes. In a prior post Jon Handler explained how storage usage can become skewed. When shard distribution is skewed, CPU, network, and disk bandwidth usage can also become skewed.

For example, let’s say you have a cluster with three indexes, each with four primary shards, deployed on six nodes as in the figure below. The shards for the square index have all landed on two nodes, while the circle and rounded-rectangle indexes are mixed on four nodes. If the square index is receiving ten times the traffic of the other two indices, those nodes will need ten times the CPU, disk, network, and RAM (likely) as the other four nodes. You either need to overscale based on the requirements for the square index, or see your cluster fall over if you have scaled for the other indexes.

 

Diagram of six elasticsearch nodes with three indexes showing uneven, skewed CPU, RAM, JVM, and I/O usage.

The correct allocation strategy should make intelligent decisions that respect system requirements. This is a difficult problem, and Elasticsearch does a good job of solving it. Let’s dive into Elasticsearch’s algorithm.

ShardsAllocator figures out where to place shards

The ShardsAllocator is an interface in Elasticsearch whose implementations are responsible for shard placement. When shards are unassigned for any reason, ShardsAllocator decides on which nodes in the cluster to place them.

ShardsAllocator engages to determine shard locations in the following conditions:

  • Index Creation – when you add an index to your cluster (or restore an index from snapshot), ShardsAllocator decides where to place its shards. When you increase replica count for an index, it decides locations for the new replica copies.
  • Node failure – if a node drops out of the cluster, ShardsAllocator figures out where to place the shards that were on that node.
  • Cluster resize – if nodes are added or removed from the cluster, ShardsAllocator decides how to rebalance the cluster.
  • Disk high water mark – when disk usage on a node hits the high water mark (90% full, by default), Elasticsearch engages ShardsAllocator to move shards off that node.
  • Manual shard routing – when you manually route shards, ShardsAllocator also moves other shards to ensure that the cluster stays balanced.
  • Routing related setting updates — when you change cluster or index settings that affect shard routing, such as allocation awareness, exclude or include a node (by ip or node attribute), or filter indexes to include/exclude specific nodes.

Shard placement strategy can be broken into two smaller subproblems: which shard to act on, and which target node to place it at. The default Elasticsearch implementation, BalancedShardsAllocator, divides its responsibilities into three major buckets: allocate unassigned shards, move shards, and rebalance shards. Each of these internally solves the primitive subproblems and decides an action for the shard: whether to allocate it on a specific node, move it from one node to another, or simply leave it as-is.

The overall placement operation, called reroute in Elasticsearch, is invoked when there are cluster state changes that can affect shard placement.

Node selection

Elasticsearch gets the list of eligible nodes by processing a series of Allocation Deciders. Node eligibility can vary depending on the shard and on the current allocations on the node. Not all nodes may be eligible to accept a particular shard. For example, Elasticsearch won’t put a replica shard on the same node as the primary. Or, if a node’s disk is full, Elasticsearch cannot place another shard on it.

Elasticsearch follows a greedy approach for shard placement: it makes locally optimal decisions, hoping to reach global optima. A node’s eligibility for a shard is abstracted out to a weight function, then each shard is allocated to the node that is currently most eligible to accept it. Think of this weight function as a mathematical function which, given some parameters, returns the weight of a shard on a node. The most eligible node for a shard is one with minimum weight.

AllocateUnassigned

The first operation that a reroute invocation undertakes is allocateUnassigned. Each time an index is created, its shards (both primary and replicas) are unassigned. When a node leaves the cluster, shards that were on that node are lost. For lost primary shards, their surviving replicas (if any) are promoted to primary (this is done by a different module), and the corresponding replicas are rendered unassigned. All of these are allocated to nodes in this operation.

For allocateUnassigned(), the BalancedShardsAllocator iterates through all unassigned shards, finds the subset of nodes eligible to accept the shard (Allocation Deciders), and out of these, picks the node with minimum weight.

There is a set order in which Elasticsearch picks unassigned shards for allocation. It picks primary shards first, allocating all shards for one index before moving on to the next index’s primaries. To choose indexes, it uses a comparator based on index priority, creation data and index name (see PriorityComparator). This ensures that Elasticsearch assigns all primaries for as many indices as possible, rather than creating several partially-assigned indices. Once Elasticsearch has assigned all primaries, it moves to the first replica for each index. Then, it moves to the second replica for each index, and so on.

Move Shards

Consider a scenario when you are scaling-down your cluster. Responding to seasonal variation in your workload, you have just passed a high-traffic season and are now back to moderate workloads. You want to right-size your cluster by removing some nodes. If you remove nodes that hold data too quickly, you might remove nodes that hold a primary and its replicas, permanently losing that data. A better approach is to exclude a subset of nodes, wait for all shards to move out, and then terminate them.

Or, consider a situation where a node has its disk full and some shards must be moved out to free up space. In such cases, a shard must be moved out of a node. This is handled by the moveShards() operation, triggered right after allocateUnassigned() completes.

For “move shards”, Elasticsearch iterates through each shard in the cluster, and checks whether it can remain on its current node. If not, it selects the node with minimum weight, from the subset of eligible nodes (filtered by deciders), as the target node for this shard. A shard relocation is then triggered from current node to target node.

The move operation only applies to STARTED shards; shards in any other state are skipped. To move shards uniformly from all nodes, moveShards uses a nodeInterleavedShardIterator. This iterator goes breadth first across nodes, picking one shard from each node, followed by the next shard, and so on. Thus all shards on all nodes are evaluated for move, without preferring one over the other.

Rebalance shards

As you hit workload limits, you may decide to add more nodes to scale your cluster. Elasticsearch should automatically detect these nodes and relocate shards for better distribution. The addition or removal of nodes may not always require shard movement – what if the nodes had very few shards (say just one), and extra nodes were added only as a proactive scaling measure?

Elasticsearch generalizes this decision using the weight function abstraction in shard allocator. Given current allocations on a node, the weight function provides the weight of a shard on a node. Nodes with a high weight value are less suited to place the shard than nodes with a lower weight value. Comparing the weight of a shard on different nodes, we can decide if relocating can improve the overall weight distribution.

For rebalance decisions, Elasticsearch computes the weight for each index on every node, and the delta between min and max possible weights for an index. (This can be done at index level, since each shard in an index is treated equal in Elasticsearch.) Indexes are then processed in the order of most unbalanced index first.

Shard movement is a heavy operation. Before actual relocation, Elasticsearch models shard weights pre- and post-rebalance; shards are relocated only if the operation leads to a more balanced distribution of weights.

Finally, rebalancing is an optimization problem. Beyond a threshold, the cost of moving shards begins to outweigh the benefits of balanced weights. In Elasticsearch, this threshold is currently a fixed value, configurable by the dynamic setting cluster.routing.allocation.balance.threshold. When the computed weight delta for an index — the difference between its min and max weights across nodes — is less than this threshold, the index is considered balanced.

Conclusion

In this post, we covered the algorithms that power shard placement and balancing decisions in Elasticsearch. Each reroute invocation goes through the process of allocating unassigned shards, moving shards that must be evacuated from their current nodes, and rebalancing shards wherever possible. Together, they maintain a stable balanced cluster.

In the next post, we will dive deep into the default weight function implementation, which is responsible for selecting one node over another for a given shard’s placement.

from AWS Open Source Blog

Using a Network Load Balancer with the NGINX Ingress Controller on EKS

Using a Network Load Balancer with the NGINX Ingress Controller on EKS

Kubernetes Ingress is an API object that provides a collection of routing rules that govern how external/internal users access Kubernetes services running in a cluster. An ingress controller is responsible for reading the ingress resource information and processing it appropriately. As there are different ingress controllers that can do this job, it’s important to choose the right one for the type of traffic and load coming into your Kubernetes cluster. In this post, we will discuss how to use an NGINX ingress controller on Amazon EKS, and how to front-face it with a Network Load Balancer (NLB).

What is a Network Load Balancer?

An AWS Network Load Balancer functions at the fourth layer of the Open Systems Interconnection (OSI) model. It can handle millions of requests per second. After the load balancer receives a connection request, it selects a target from the target group for the default rule. It attempts to open a TCP connection to the selected target on the port specified in the listener configuration.

Exposing your application on Kubernetes

In Kubernetes, these are several different ways to expose your application; using Ingress to expose your service is one way of doing it. Ingress is not a service type, but it acts as the entry point for your cluster. It lets you consolidate your routing rules into a single resource, as it can expose multiple services under the same IP address.

This post will explain how to use an ingress resource and front it with a NLB (Network Load Balancer), with an example.

Ingress in Kubernetes

This image shows the flow of traffic from the outside world hitting the Ingress resource and is diverted to the required Kubernetes service based on the path rules set up in the Ingress resource..

 

Kubernetes supports a high-level abstraction called Ingress, which allows simple host- or URL-based HTTP routing. An Ingress is a core concept (in beta) of Kubernetes. It is always implemented by a third party proxy; these implementations are known as ingress controllers. An ingress controller is responsible for reading the ingress resource information and processing that data accordingly. Different ingress controllers have extended the specification in different ways to support additional use cases.

Typically, your Kubernetes services will impose additional requirements on your ingress. Examples of this include:

  • Content-based routing: e.g.,  routing based on HTTP method, request headers, or other properties of the  specific request.
  • Resilience: e.g., rate  limiting, timeouts.
  • Support for multiple  protocols: e.g., WebSockets or gRPC.
  • Authentication.

An ingress controller is a daemon or deployment, deployed as a Kubernetes Pod, that watches the endpoint of the API server for updates to the Ingress resource. Its job is to satisfy requests for Ingresses. NGINX ingress is one such implementation. This blog post implements the ingress controller as a deployment with the default values. To suit your use case and for more availability, you can use it as a daemon or increase the replica count.

Why would I choose the NGINX ingress controller over the Application Load Balancer (ALB) ingress controller?

The ALB ingress controller is great, but there are certain use cases where the NLB with the NGINX ingress controller will be a better fit. I will discuss scenarios where you would need a NLB over the ALB later in this post, but first let’s discuss the ingress controllers.

By default, the NGINX Ingress controller will listen to all the ingress events from all the namespaces and add corresponding directives and rules into the NGINX configuration file. This makes it possible to use a centralized routing file which includes all the ingress rules, hosts, and paths.

With the NGINX Ingress controller you can also have multiple ingress objects for multiple environments or namespaces with the same network load balancer; with the ALB, each ingress object requires a new load balancer.

Furthermore, features like path-based routing can be added to the NLB when used with the NGINX ingress controller.

Why do I need a load balancer in front of an ingress?

Ingress is tightly integrated into Kubernetes, meaning that your existing workflows around kubectl will likely extend nicely to managing ingress. An Ingress controller does not typically eliminate the need for an external load balancer , it simply adds an additional layer of routing and control behind the load balancer.

Pods and nodes are not guaranteed to live for the whole lifetime that the user intends: pods are ephemeral and vulnerable to kill signals from Kubernetes during occasions such as:

  • Scaling.
  • Memory or CPU saturation.
  • Rescheduling for more efficient resource use.
  • Downtime due to outside factors.

The load balancer (Kubernetes service) is a construct that stands as a single, fixed-service endpoint for a given set of pods or worker nodes. To take advantage of the previously-discussed benefits of a Network Load Balancer (NLB), we create a Kubernetes service of  type:loadbalancer with the NLB annotations, and this load balancer sits in front of the ingress controller – which is itself a pod or a set of pods. In AWS, for a set of EC2 compute instances managed by an Autoscaling Group, there should be a load balancer that acts as both a fixed referable address and a load balancing mechanism.

Ingress with load balancer

 

This image shows the flow of traffic from the outside world hitting the Network load balancer and then hitting the Ingress resource and is diverted to the required Kubernetes service based on the path rules set up in the Ingress resource.

 

The diagram above shows a Network Load Balancer in front of the Ingress resource. This load balancer will route traffic to a Kubernetes service (or Ingress) on your cluster that will perform service-specific routing. NLB with the Ingress definition provides the benefits of both a NLB and an Ingress resource.

What advantages does the NLB have over the Application Load Balancer (ALB)?

A Network Load Balancer is capable of handling millions of requests per second while maintaining ultra-low latencies, making it ideal for load balancing TCP traffic. NLB is optimized to handle sudden and volatile traffic patterns while using a single static IP address per Availability Zone. The benefits of using a NLB are:

  • Static IP/elastic IP addresses: For each Availability Zone (AZ) you enable on the NLB, you have a network interface. Each load balancer node in the AZ uses this network interface to get a static IP address. You can also use Elastic IP to assign a fixed IP address for each Availability Zone.
  • Scalability: Ability to handle volatile workloads and scale to millions of requests per second.
  • Zonal isolation: The Network Load Balancer can be used for application architectures within a Single Zone. Network Load Balancers attempt to route a series of requests from a particular source to targets in a single AZ while still providing automatic failover should those targets become unavailable.
  • Source/remote address preservation: With a Network Load Balancer, the original source IP address and source ports for the incoming connections remain unmodified. With Classic and Application load balancers, we had to use HTTP header X-Forwarded-For to get the remote IP address.
  • Long-lived TCP connections: Network Load Balancer supports long-running TCP connections that can be open for months or years, making it ideal for WebSocket-type applications, IoT, gaming, and messaging applications.
  • Reduced bandwidth usage: Most applications are bandwidth-bound and should see a cost reduction (for load balancing) of about 25% compared to Application or Classic Load Balancers.
  • SSL termination: SSL termination will need to happen at the backend, since SSL termination on NLB for Kubernetes is not yet available.

For any NLB usage, the backend security groups control the access to the application (NLB does not have security groups of it own). The worker node security group handles the security for inbound/ outbound traffic.

How to use a Network Load Balancer with the NGINX Ingress resource in Kubernetes

Start by creating the mandatory resources for NGINX Ingress in your cluster:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/mandatory.yaml

Create the NLB for the ingress controller:

kubectl apply -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/nlb-service.yaml

Now create two services (apple.yaml and banana.yaml) to demonstrate how the Ingress routes our request.  We’ll run two web applications that each output a slightly different response. Each of the files below has a service definition and a pod definition.

Create the resources:

$ kubectl apply -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/apple.yaml $ kubectl apply -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/banana.yaml

Defining the Ingress resource (with SSL termination) to route traffic to the services created above 

If you’ve purchased and configured a custom domain name for your server, you can use that certificate, otherwise you can still use SSL with a self-signed certificate for development and testing.

In this example, where we are terminating SSL on the backend, we will create a self-signed certificate.

Anytime we reference a TLS secret, we mean a PEM-encoded X.509, RSA (2048) secret. Now generate a self-signed certificate and private key with:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout
tls.key -out tls.crt -subj "/CN=anthonycornell.com/O=anthonycornell.com"

Then create the secret in the cluster:

kubectl create secret tls tls-secret --key tls.key --cert tls.crt

Now declare an Ingress to route requests to /apple to the first service, and requests to /banana to second service. Check out the Ingress’ rules field that declares how requests are passed along:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "false"
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  tls:
  - hosts:
    - anthonycornell.com
    secretName: tls-secret
  rules:
  - host: anthonycornell.com
    http:
      paths:
        - path: /apple
          backend:
            serviceName: apple-service
            servicePort: 5678
        - path: /banana
          backend:
            serviceName: banana-service
            servicePort: 5678

Create the Ingress in the cluster:

kubectl create -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/example-ingress.yaml

Set up Route 53 to have your domain pointed to the NLB (optional):

anthonycornell.com.           A.    
ALIAS abf3d14967d6511e9903d12aa583c79b-e3b2965682e9fbde.elb.us-east-1.amazonaws.com 

Test your application:

curl  https://anthonycornell.com/banana -k
Banana
 
curl  https://anthonycornell.com/apple -k
Apple

Can I reuse a NLB with services running in different namespaces? In the same namespace?  

Install the NGINX ingress controller as explained above. In each of your namespaces, define an Ingress Resource.

Example for test:

apiVersion: extensions/v1beta1
 kind: Ingress
 metadata:
  name: api-ingresse-test
  namespace: test
  annotations:
    kubernetes.io/ingress.class: "nginx"
 spec:
  rules:
  - host: test.anthonycornell.com
    http:
      paths:
      - backend:
          serviceName: myApp
          servicePort: 80
        path: /

Suppose we have three namespaces – Test, Demo, and Staging. After creating the Ingress resource in each namespace, the NGINX ingress controller will process those resources as shown below:

 

This image shows traffic coming from three namespaces – Test, Demo, and Staging hitting the NLB and the NGINX ingress controller will process those requests to the respective namespace.

 

Cleanup

Delete the Ingress resource:

kubectl delete -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/example-ingress.yaml

Delete the services:

kubectl delete -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/apple.yaml 
kubectl delete -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/banana.yaml

Delete the NLB:

kubectl delete -f https://raw.githubusercontent.com/cornellanthony/nlb-nginxIngress-eks/master/nlb-service.yaml

Delete the NGINX ingress controller:

kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml

We hope this post was useful! Please let us know in the comments.

from AWS Open Source Blog

AWS API Gateway for HPC Job Submission

AWS API Gateway for HPC Job Submission

AWS ParallelCluster simplifies the creation and the deployment of HPC clusters. AWS API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.

In this post we combine AWS ParallelCluster and AWS API Gateway to allow an HTTP interaction with the scheduler. You can submit, monitor, and terminate jobs using the API, instead of connecting to the master node via SSH. This makes it possible to integrate ParallelCluster programmatically with other applications running on premises or on AWS.

The API uses AWS Lambda and AWS Systems Manager to execute the user commands without granting direct SSH access to the nodes, thus enhancing the security of whole cluster.

VPC configuration

The VPC used for this configuration can be created using the VPC Wizard. You can also use an existing VPC that respects the AWS ParallelCluster network requirements.

 

Launch VPC Wizard

 

In Select a VPC Configuration, choose VPC with Public and Private Subnets and then Select.

 

Select a VPC Configuration

Before starting the VPC Wizard, allocate an Elastic IP Address. This will be used to configure a NAT gateway for the private subnet. A NAT gateway is required to enable compute nodes in the AWS ParallelCluster private subnet to download the required packages and to access the AWS services public endpoints. See AWS ParallelCluster network requirements.

You can find more details about the VPC creation and configuration options in VPC with Public and Private Subnets (NAT).

The example below uses the following configuration:

IPv4 CIDR block: 10.0.0.0/16
VPC name: Cluster VPC
Public subnet’s IPv4 CIDR: 10.0.0.0/24
Availability Zone: eu-west-1a
Public subnet name: Public subnet
Private subnet’s IPv4 CIDR:1 0.0.1.0/24
Availability Zone: eu-west-1b
Private subnet name: Private subnet
Elastic IP Allocation ID: <id of the allocated Elastic IP>
Enable DNS hostnames: yes

VPC with Public and Private Subnets

AWS ParallelCluster configuration

AWS ParallelCluster is an open source cluster management tool to deploy and manage HPC clusters in the AWS cloud; to get started, see Installing AWS ParallelCluster.

After the AWS ParallelCluster command line has been configured, create the cluster template file below in .parallelcluster/config. The master_subnet_idcontains the id of the created public subnet and the compute_subnet_idcontains the private one. The ec2_iam_roleis the role that will be used for all the instances of the cluster. The steps to create this role will be explained below.

[aws]
aws_region_name = eu-west-1

[cluster slurm]
scheduler = slurm
compute_instance_type = c5.large
initial_queue_size = 2
max_queue_size = 10
maintain_initial_size = false
base_os = alinux
key_name = AWS_Ireland
vpc_settings = public
ec2_iam_role = parallelcluster-custom-role

[vpc public]
master_subnet_id = subnet-01fc20e143543f8af
compute_subnet_id = subnet-0b1ae2790497d83ec
vpc_id = vpc-0cdee679c5a6163bd

[global]
update_check = true
sanity_check = true
cluster_template = slurm

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

IAM custom Roles for SSM endpoints

To allow ParallelCluster nodes to call Lambda and SSM endpoints, it is necessary to configure a custom IAM Role.

See AWS Identity and Access Management Roles in AWS ParallelCluster for details on the default AWS ParallelCluster policy.

From the AWS console:

  • Access the AWS Identity and Access Management (IAM) service and click on Policies.
  • Choose Create policy and paste the following policy into the JSONsection. Be sure to modify <REGION>, <AWS ACCOUNT ID>to match the values for your account, and also update the S3 bucket name from pcluster-scriptsto the the bucket you want to use to store the input/output data from jobs and save the output of SSM execution commands.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:AttachVolume",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeInstanceStatus",
                "ec2:DescribeInstances",
                "ec2:DescribeRegions"
            ],
            "Sid": "EC2",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "dynamodb:ListTables"
            ],
            "Sid": "DynamoDBList",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:sqs:<REGION>:<AWS ACCOUNT ID>:parallelcluster-*"
            ],
            "Action": [
                "sqs:SendMessage",
                "sqs:ReceiveMessage",
                "sqs:ChangeMessageVisibility",
                "sqs:DeleteMessage",
                "sqs:GetQueueUrl"
            ],
            "Sid": "SQSQueue",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:DescribeTags",
                "autoScaling:UpdateAutoScalingGroup",
                "autoscaling:SetInstanceHealth"
            ],
            "Sid": "Autoscaling",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:dynamodb:<REGION>:<AWS ACCOUNT ID>:table/parallelcluster-*"
            ],
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:Query",
                "dynamodb:GetItem",
                "dynamodb:DeleteItem",
                "dynamodb:DescribeTable"
            ],
            "Sid": "DynamoDBTable",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:s3:::<REGION>-aws-parallelcluster/*"
            ],
            "Action": [
                "s3:GetObject"
            ],
            "Sid": "S3GetObj",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:cloudformation:<REGION>:<AWS ACCOUNT ID>:stack/parallelcluster-*"
            ],
            "Action": [
                "cloudformation:DescribeStacks"
            ],
            "Sid": "CloudFormationDescribe",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "sqs:ListQueues"
            ],
            "Sid": "SQSList",
            "Effect": "Allow"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeAssociation",
                "ssm:GetDeployablePatchSnapshotForInstance",
                "ssm:GetDocument",
                "ssm:DescribeDocument",
                "ssm:GetManifest",
                "ssm:GetParameter",
                "ssm:GetParameters",
                "ssm:ListAssociations",
                "ssm:ListInstanceAssociations",
                "ssm:PutInventory",
                "ssm:PutComplianceItems",
                "ssm:PutConfigurePackageResult",
                "ssm:UpdateAssociationStatus",
                "ssm:UpdateInstanceAssociationStatus",
                "ssm:UpdateInstanceInformation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssmmessages:CreateControlChannel",
                "ssmmessages:CreateDataChannel",
                "ssmmessages:OpenControlChannel",
                "ssmmessages:OpenDataChannel"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2messages:AcknowledgeMessage",
                "ec2messages:DeleteMessage",
                "ec2messages:FailMessage",
                "ec2messages:GetEndpoint",
                "ec2messages:GetMessages",
                "ec2messages:SendReply"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::pcluster-data/*"
            ]
        }
    ]
}

Choose Review policy and, in the next section, enter parallelcluster-custom-policystring and choose Create policy.

Now you can create the Role. Choose Role in the left menu and then Create role.

Select AWS service as type of trusted entity and EC2 as service that will use this role as shown here:

 

Create role

 

Choose Next Permissions to proceed.

In the policy selection, select the parallelcluster-custom-policy that you just created.

Choose Next: Tags and then Next: Review.

In the Real Name box, enter parallelcluster-custom-roleand confirm by choosing Create role.

Slurm commands execution with AWS Lambda

AWS Lambda allows you to run your code without provisioning or managing servers. Lambda is used, in this solution, to execute the Slurm commands in the Master node. The AWS Lambda function can be created from the AWS console as explained in the Create a Lambda Function with the Console documentation.

For Function name, enter slurmAPI.

As Runtime, enter Python 2.7.

Choose Create function to create it.

 

Create function

The code below should be pasted into the Function code section, which you can see by scrolling further down the page. The Lambda function uses AWS Systems Manager to execute the scheduler commands, preventing any SSH access to the node. Please modify <REGION>appropriately, and update the S3 bucket name from pcluster-datato the name you chose earlier.

import boto3
import time
import json
import random
import string

def lambda_handler(event, context):
    instance_id = event["queryStringParameters"]["instanceid"]
    selected_function = event["queryStringParameters"]["function"]
    if selected_function == 'list_jobs':
      command='squeue'
    elif selected_function == 'list_nodes':
      command='scontrol show nodes'
    elif selected_function == 'list_partitions':
      command='scontrol show partitions'
    elif selected_function == 'job_details':
      jobid = event["queryStringParameters"]["jobid"]
      command='scontrol show jobs %s'%jobid
    elif selected_function == 'submit_job':
      script_name = ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(10)])
      jobscript_location = event["queryStringParameters"]["jobscript_location"]
      command = 'aws s3 cp s3://%s %s.sh; chmod +x %s.sh'%(jobscript_location,script_name,script_name)
      s3_tmp_out = execute_command(command,instance_id)
      submitopts = ''
      try:
        submitopts = event["headers"]["submitopts"]
      except Exception as e:
        submitopts = ''
      command = 'sbatch %s %s.sh'%(submitopts,script_name)
    body = execute_command(command,instance_id)
    return {
        'statusCode': 200,
        'body': body
    }
    
def execute_command(command,instance_id):
    bucket_name = 'pcluster-data'
    ssm_client = boto3.client('ssm', region_name="<REGION>")
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    username='ec2-user'
    response = ssm_client.send_command(
             InstanceIds=[
                "%s"%instance_id
                     ],
             DocumentName="AWS-RunShellScript",
             OutputS3BucketName=bucket_name,
             OutputS3KeyPrefix="ssm",
             Parameters={
                'commands':[
                     'sudo su - %s -c "%s"'%(username,command)
                       ]
                   },
             )
    command_id = response['Command']['CommandId']
    time.sleep(1)
    output = ssm_client.get_command_invocation(
      CommandId=command_id,
      InstanceId=instance_id,
    )
    while output['Status'] != 'Success':
      time.sleep(1)
      output = ssm_client.get_command_invocation(CommandId=command_id,InstanceId=instance_id)
      if (output['Status'] == 'Failed') or (output['Status'] =='Cancelled') or (output['Status'] == 'TimedOut'):
        break
    body = ''
    files = list(bucket.objects.filter(Prefix='ssm/%s/%s/awsrunShellScript/0.awsrunShellScript'%(command_id,instance_id)))
    for obj in files:
      key = obj.key
      body += obj.get()['Body'].read()
    return body

In the Basic settings section, set 10 seconds as Timeout.

Choose Save in the top right to save the function.

In the Execution role section, choose View the join-domain-finction-role role on the IAM console (indicated by the red arrow in the image below).

 

Execution role

 

In the newly-opened tab, Choose Attach Policy and then Create Policy.

 

Permissions policies

This last action will open a new tab in your Browser. From this new tab, choose Create policy and then Json.

 

Attach Permissions

Create policy

 

Modify the <REGION>, <AWS ACCOUNT ID>appropriately, and also update the S3 bucket name from pcluster-datato the name you chose earlier.

 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:SendCommand"
            ],
            "Resource": [
                "arn:aws:ec2:<REGION>:<AWS ACCOUNT ID>:instance/*",
                "arn:aws:ssm:<REGION>::document/AWS-RunShellScript",
                "arn:aws:s3:::pcluster-data/ssm"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetCommandInvocation"
            ],
            "Resource": [
                "arn:aws:ssm:<REGION>:<AWS ACCOUNT ID>:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::pcluster-data",
                "arn:aws:s3:::pcluster-data/*"
            ]
        }
    ]
}

In the next section, enter as Name the ExecuteSlurmCommandsstring and then choose Create policy.

Close the current tab and move to the previous one.

Refresh the list, select the ExecuteSlurmCommands policy and then Attach policy, as shown here:

 

Attach Permissions

Execute the AWS Lambda function with AWS API Gateway

The AWS API Gateway allows the creation of REST and WebSocket APIs that act as a “front door” for applications to access data, business logic, or functionality from your backend services like AWS Lambda.

Sign in to the API Gateway console.

If this is your first time using API Gateway, you will see a page that introduces you to the features of the service. Choose Get Started. When the Create Example API popup appears, choose OK.

If this is not your first time using API Gateway, choose Create API.

Create an empty API as follows and choose Create API:

 

Create API

 

You can now create the slurmresource choosing the root resource (/) in the Resources tree and selecting Create Resource from the Actions dropdown menu as shown here:

 

Actions dropdown

 

The new resource can be configured as follows:

Configure as proxy resource: unchecked
Resource Name: slurm
Resource Path: /slurm
Enable API Gateway CORS: unchecked

To confirm the configuration, choose Create Resource.

 

New Child Resource

In the Resource list, choose /slurm and then Actions and Create method as shown here:

 

Create Method

Choose ANYfrom the dropdown menu, and choose the checkmark icon.

In the “/slurm – ANY – Setup” section, use the following values:

Integration type: Lambda Function
Use Lambda Proxy integration: checked
Lambda Region: eu-west-1
Lambda Function: slurmAPI
Use Default Timeout: checked

and then choose Save.

 

slurm - ANY - Setup

Choose OK when prompted with Add Permission to Lambda Function.

You can now deploy the API by choosing Deploy API from the Actions dropdown menu as shown here:

 

Deploy API

 

For Deployment stage choose [new stage], for Stage name enter slurmand then choose Deploy:

 

Deploy API

Take note of the API’s Invoke URL – it will be required for the API interaction.

Deploy the Cluster

The cluster can now be created using the following command line:

pcluster create -t slurm slurmcluster

-t slurm indicates which section of the cluster template to use.
slurmcluster is the name of the cluster that will be created.

For more details, see the AWS ParallelCluster Documentation. A detailed explanation of the pcluster command line parameters can be found in AWS ParallelCluster CLI Commands.

How to interact with the slurm API

The slurm API created in the previous steps requires some parameters:

  • instanceid– the instance id of the Master node.
  • function– the API function to execute. Accepted values .are list_jobs, list_nodes, list_partitions, job_detailsand submit_job.
  • jobscript_location– the s3 location of the job script (required only when function=submit_job) .
  • submitopts– the submission parameters passed to the scheduler (optional, can be used when function=submit_job).

Here is an example of the interaction with the API:

#Submit a job
$ curl -s POST "https://966p4hvg04.execute-api.eu-west-1.amazonaws.com/slurm/slurm?instanceid=i-062155b00c02a6c8e&function=submit_job&jobscript_location=pcluster-data/job_script.sh" -H 'submitopts: --job-name=TestJob --partition=compute'
Submitted batch job 11

#List of the jobs
$ curl -s POST "https://966p4hvg04.execute-api.eu-west-1.amazonaws.com/slurm/slurm?instanceid=i-062155b00c02a6c8e&function=list_jobs"
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                11   compute  TestJob ec2-user  R       0:14      1 ip-10-0-3-209
 
 #Job details
 $ curl -s POST "https://966p4hvg04.execute-api.eu-west-1.amazonaws.com/slurm/slurm?instanceid=i-062155b00c02a6c8e&function=job_details&jobid=11" 
JobId=11 JobName=TestJob
   UserId=ec2-user(500) GroupId=ec2-user(500) MCS_label=N/A
   Priority=4294901759 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2019-06-26T14:42:09 EligibleTime=2019-06-26T14:42:09
   AccrueTime=Unknown
   StartTime=2019-06-26T14:49:18 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-06-26T14:49:18
   Partition=compute AllocNode:Sid=ip-10-0-1-181:28284
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=ip-10-0-3-209
   BatchHost=ip-10-0-3-209
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/ec2-user/C7XMOG2hPo.sh
   WorkDir=/home/ec2-user
   StdErr=/home/ec2-user/slurm-11.out
   StdIn=/dev/null
   StdOut=/home/ec2-user/slurm-11.out
   Power=

The authentication to the API can be managed following the Controlling and Managing Access to a REST API in API Gateway Documentation.

Teardown

When you have finished your computation, the cluster can be destroyed using the following command:

pcluster delete slurmcluster

The additional created resources can be destroyed following the official AWS documentation:

Conclusion

This post has shown you how to deploy a Slurm cluster using AWS ParallelCluster, and integrate it with the AWS API Gateway.

This solution uses the AWS API Gateway, AWS Lambda, and AWS Systems Manager to simplify interaction with the cluster without granting access to the command line of the Master node, improving the overall security. You can extend the API by adding additional schedulers or interaction workflows and can be integrated with external applications.

from AWS Open Source Blog

Open Distro for Elasticsearch 1.1.0 Released

Open Distro for Elasticsearch 1.1.0 Released

We are happy to announce that Open Distro for Elasticsearch 1.1.0 is now available for download!

Version 1.1.0 includes the upstream open source versions of Elasticsearch 7.1.1, Kibana 7.1.1, and the latest updates for alerting, SQL, security, performance analyzer, and Kibana plugins, as well as the SQL JDBC driver. You can find details on enhancements, bug fixes, and more in the release notes for each plugin in their respective GitHub repositories. See Open Distro’s version history table for previous releases.

Download the latest packages

You can find Docker Hub images Open Distro for Elasticsearch 1.1.0 and Open Distro for Elasticsearch Kibana 1.1.0 on Docker Hub. Make sure your compose file specifies 1.1.0 or uses the latest tag. See our documentation on how to install Open Distro for Elasticsearch with RPMs and install Open Distro for Elasticsearch with Debian packages. You can find our Open Distro for Elasticsearch’s Security plugin artifacts on Maven Central.

We have updated our tools as well! You can download Open Distro for Elasticsearch’s PerfTop client, and Open Distro for Elasticsearch’s SQL JDBC driver.

For more detail, see our release notes for Open Distro for Elasticsearch 1.1.0.

New features in development

We’re also excited to pre-announce new plugins in development. We’ve made available pre-release alpha versions of these plugin artifacts for developers (see below for links) to integrate into their applications. We invite you to join in to submit issues and PRs on features, bugs, and tests you need or build.

k-NN Search

Open Distro for Elasticsearch’s k-nearest neighbor (k-NN) search plugin will enable high-scale, low-latency nearest neighbor search on billions of documents across thousands of dimensions with the same ease as running any regular Elasticsearch query. The k-NN plugin relies on the Non-Metric Space Library (NMSLIB). It will power use cases such as recommendations, fraud detection, and related document search. We are extending the Apache Lucene codec to introduce a new file format to store vector data. k-NN search uses the standard Elasticsearch mapping and query syntax: to designate a field as a k-NN vector you simply map it to the new k-NN field type provided by the k-NN plugin.

Index management

Open Distro for Elasticsearch Index Management will enable you to run periodic operations on your indexes, eliminating the need to build and manage external systems for these tasks. You will define custom policies to optimize and move indexes, applied based on wildcard index patterns. Policies are finite-state automata. Policies define states and transitions (Actions). The first release of Index Management will support force merge, delete, rollover, snapshot, replica_count, close/open, read_only/read_write actions, and more. Index Management will be configurable via REST or the associated Kibana plugin. We’ve made artifacts of the alpha version of Open Distro for Elasticsearch Index Management and Open Distro for Elasticsearch Kibana Index Management available on GitHub.

Job scheduler

Open Distro for Elasticsearch’s Job Scheduler plugin is a library that enables you to build plugins that can run periodic jobs on your cluster. You can use Job Scheduler for a variety of use cases, from taking snapshots once per hour, to deleting indexes more than 90 days old, to providing scheduled reports. Read our announcement page for Open Distro for Elasticsearch Job Scheduler for more details.

SQL Kibana UI

Open Distro for Elasticsearch’s Kibana UI for SQL will make it easier for you to run SQL queries and explore your data. This plugin will support SQL syntax highlighting and output results in the familiar tabular format. The SQL Kibana UI will support nested documents, allowing you to expand columns with these documents and drill down into the nested data. You will also be able to translate your SQL query to Elasticsearch query DSL with a single click and download results of the query as a CSV file.

Questions?

Please feel free to ask questions on the Open Distro for Elasticsearch community discussion forum.

Report a bug or request a feature

You can file a bug, request a feature, or propose new ideas to enhance Open Distro for Elasticsearch. If you find bugs or want to propose a feature for a particular plug-in, you can go to the specific repo and file an issue on the plug-in repo.

Getting Started

If you’re getting started on building your open source contribution karma, you can select an issue tagged as a “Good First Issue” to start contributing to Open Distro for Elasticsearch. Read the Open Distro technical documentation on the project website to help you get started.

Go develop! And contribute to Open Distro 🙂

from AWS Open Source Blog

Use Elasticsearch’s _rollover API For Efficient Storage Use

Use Elasticsearch’s _rollover API For Efficient Storage Use

Many Open Distro for Elasticsearch users manage data life cycle in their clusters by creating an index based on a standard time period, usually one index per day. This pattern has many advantages: ingest tools like Logstash support index rollover out of the box; defining a retention window is straightforward; and deleting old data is as simple as dropping an index.

If your workload has multiple data streams with different data sizes per stream, you can run into problems: Your resource usage, especially your storage per node, can become unbalanced, or “skewed.” When that happens, some nodes will become overloaded or run out of storage before other nodes, and your cluster can fall over.

You can use the _rollover API to manage the size of your indexes. You call _rollover on a regular schedule, with a threshold that defines when Elasticsearch should create a new index and start writing to it. That way, each index is as close to the same size as possible. When Elasticsearch distributes the shards for your index to nodes in your cluster, you use storage from each node as evenly as possible.

What is skew?

Elasticsearch distributes shards to nodes based primarily on the count of shards on each node (it’s more complicated than that, but that’s a good first approximation). When you have a single index, because the shards are all approximately the same size, you can ensure even distribution of data by making your shard count divisible by your node count. For example, if you have five primaries and one replica, or ten total shards, and you deploy two nodes, you will have five shards on each node (Elasticsearch always places a primary and its first replica on different nodes).

 

Two Open Distro for Elasticsearch nodes with balanced shard usage

 

When you have multiple indexes, you get heterogeneity in the storage per node. For example, say your application is generating one GB of log data per day and your VPC Flow Logs are ten GB per day. For both of these data streams, you use one primary shard,and one replica, following the best practice of up to 50GB per shard. Further, assume you have six nodes in your cluster. After seven days, each index has 14 total shards (one primary and one replica per day). Your cluster might look like the following – in the best case, you have even distribution of data:

 

An Open Distro for Elasticsearch cluster with balanced resource usage

In the worst case, assume you have five nodes. Then your shard count is indivisible by your node count, so larger shards can land together on one node, as in the image below. The nodes with larger shards use ten times more storage than the nodes with smaller shards.

An Open Distro for Elasticsearch cluster with unbalanced resource usage

While this example is somewhat manufactured, it represents a real problem that Elasticsearch users must solve.

Rollover instead!

The _rollover API creates a new index when you hit a threshold that you define in the call. First, you create an _alias for reading and writing the current index. Then you use cron or other scheduling tool to call the _rollover API on a regular basis, e.g. every minute. When your index exceeds the threshold, Elasticsearch creates a new index behind the alias, and you continue writing to that alias.

To create an alias for your index, call the _aliases API:

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "weblogs-000001",
        "alias": "weblogs",
        "is_write_index": true
      }
    }
  ]
}

You must set is_write_index to true to tell _rollover which index it needs to update.

When you call the _rollover API:

POST /weblogs/_rollover 
{
  "conditions": {
    "max_size":  "10gb"
  }
}

You will receive a response that details which of the conditions, if any, is true and whether Elasticsearch created a new index as a result of the call. If you name your indexes with a trailing number (e.g. -000001), Elasticsearch increments the number for the next index it creates. In either case, you can continue to write to the alias, uninterrupted.

Elasticsearch 7.x accepts three conditions: max_age, max_docs, and max_size. If you call _rollover with the same max_size across all of your indexes, they will all roll over at approximately the same size. [Note: Size is difficult to nail down in a distributed system. Don’t expect that you will hit exactly the same size. Variation is normal. In fact, earlier versions of Elasticsearch don’t accept max_size as a condition. For those versions, you can use max_docs, normalizing for your document size.]

The one significant tradeoff is in lifecycle management. Returning to our prior example, let’s say you roll over on ten GB of index. The data stream with ten GB daily will roll over every day. The data stream with one GB of index daily will roll over every ten days. You need to manage these indexes at different times, based on their size. Data in the lower-volume indexes will persist for longer than data in higher-volume indices.

Conclusion

When running an Elasticsearch cluster with multiple data streams of different sizes, typically for log analytics, you use the _rollover API to maintain a more nearly even distribution of data in your cluster’s nodes. This prevents skew in your storage usage and results in a more stable cluster.

Have an issue or question? Want to contribute? You can get help and discuss Open Distro for Elasticsearch on our forums. You can file issues here.

from AWS Open Source Blog

Add Single Sign-On to Open Distro for Elasticsearch Kibana Using SAML and ADFS

Add Single Sign-On to Open Distro for Elasticsearch Kibana Using SAML and ADFS

Open Distro for Elasticsearch Security (Open Distro Security) comes with authentication and access control out of the box. Prior posts have discussed LDAP integration with Open Distro for Elasticsearch and JSON Web Token authentication with Open Distro for Elasticsearch.

Security Assertion Markup Language 2.0 (SAML) is an open standard for exchanging identity and security information with applications and service providers. Open Distro Security implements the web browser Single Sign On (SSO) profile of the SAML 2.0 protocol. This enables you to configure federated access with any SAML 2.0-compliant identity provider (IdP) – e.g. Microsoft Active Directory Federation Services (ADFS), Okta, Auth0, and AWS SSO. This integration is meant for use with web browsers only; it is not a general-purpose method of authenticating users. Its primary use case is to support Kibana single sign-on. In this post, we’ll talk about setting up SAML-based SSO using Microsoft ADFS.

Prerequisites

 
User  Active Directory Group

Open Distro Security role

esuser1 ESAdmins all_access
esuser2 ESUsers readall
esuser3 N/A N/A

Active Directory Federation Services (ADFS) configuration

ADFS federation occurs with the participation of two parties: the identity or claims provider (Active Directory in our case) and the relying party (Kibana in our case). SAML federation works by issuing and transforming claims between claims providers and relying parties. A claim is information about a user from a trusted source: the trusted source is asserting that the information is true, and that the source has authenticated the user in some manner. The claims provider is the source of the claim. The relying party (Kibana) is the destination for the claims.

Relying party

Configure Kibana as a relying party in ADFS:

1. From the ADFS Management Console, right-click ADFS and select Add Relying Party Trust.

Add relying party dialog box

 

2. In the Add Relying Party Trust Wizard, select ‘Claims aware’ and click Start.

Select claims aware

 

3. Next select the option – “Enter data about the relying party manually” and click Next.

 

Select data source

 

4. Add an appropriate display name and click Next.

 

Specify Display Name

5. In the URL configuration screen, tick the box to “Enable support for SAML 2.0 WebSSO protocol and enter this Kibana url as the Service URL:

https://<kibana_base_url>:<kibana_port>/_opendistro/_security/saml/acs

Make sure to replace the kibana_base_url and kibana_port with your actual kibana configuration as noted in the prerequisites. In my setup this is : https://new-kibana.ad.example.com:5601/

Click Next.

 

Configure URL

 

6. Add a string for the Relying party trust identifier. You can choose any name here; for this demo, use kibana-saml. You will use this name in the Open Distro Security SAML config as the SP-entity-id.

 

Configure Identifiers

 

7. In the access control policy, you can permit everyone to access Kibana, or restrict to select groups. Note: this is only providing access for the users to authenticate into Kibana. We have not yet set up Open Distro Security roles and permission. The mapping from the user’s AD groups to Elasticsearch backend roles is below.

 

Choose Access control policy

 

8. In the next screen, review all the settings and click Finish to Add Kibana as a Relying Party Trust.

 

Click Finish

Claim rules

Subjects (i.e., usernames) are usually stored in the NameId element of a SAML response. We’ll create two claim rules, one for NameId (usernames) and another for Roles (group mappings).

Right Click Kibana from the Relying Party Trusts and select Edit Claim Issuance Policy…

 

Edit Claim Issuance policy

1. NameId

In the Edit Claim Issuance Policy dialog box, click Add Rule…

Select Transform an Incoming Claim as rule type and then click Next.

In the next screen, use the following settings:

  • Claim rule name: NameId
  • Incoming claim type: Windows Account Name
  • Outgoing claim type: Name ID
  • Outgoing name ID format: Unspecified
  • Pass through all claim values: checked

Click Finish.

 

Configure name id rule

 

2. Roles

In the Edit Claims issuance policy dialog box, click Add Rule…

Select Send LDAP Attributes as Claims as rule type and click Next.

In the next screen, use the following settings:

  • Claim rule name: Send-Group-as-Roles
  • Attribute Store: Active Directory
  • LDAP Attribute: Token-Groups – Unqualified Names (to select the group name)
  • Outgoing Claim Type: Roles. (This should match the “roles_key” defined in Open Distro Security’s config.)

Click Finish.

 

Configure Roles claim rule

 

Finally, download the SAML Metadata file from ADFS and copy to the Elasticsearch config directory. The ADFS metadata file can be accessed from https://<ADFS FQDN>/FederationMetadata/2007-06/FederationMetadata.xml

Configuring SAML in Open Distro Security

For a new setup, you can use plugins/opendistro_security/securityconfig/config.yml to update the SAML configuration details. In an established setup, make sure you retrieve the current Open Distro Security configuration from a running cluster and use those files to avoid losing any changes. For additional details on how to use securityadmin.sh, please refer to the Open Distro for Elasticsearch documentation.

To use SAML for authentication, you need to configure an authentication domain in the authc section of config.yml. Since SAML works solely on the HTTP layer, you do not need any authentication_backend and can set it to noop. I recommend adding at least one other authentication domain, such as LDAP or the internal user database, to support API access to Elasticsearch without SAML. For Kibana and the internal Kibana server user, you also need to add another authentication domain that supports basic authentication. This authentication domain should be placed first in the chain, and the challenge flag must be set to false. Configuring multiple authentication mechanisms ensures that a single failure will not lock you out of the system.

My config is as below:

    authc:
      basic_internal_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 0
        http_authenticator:
          type: "basic"
          challenge: false
        authentication_backend:
          type: "intern"
      saml_auth_domain:
        http_enabled: true
        transport_enabled: false
        order: 1
        http_authenticator:
          type: saml
          challenge: true
          config:
 idp: metadata_file: adfs.xml entity_id: http://sts.ad.example.com/adfs/services/trust sp: entity_id: kibana-saml kibana_url: https://new-kibana.ad.example.com:5601/ roles_key: Roles exchange_key: 'AbCDefg123......' 
        authentication_backend:
          type: noop

idp.metadata_file : The path to the SAML 2.0 metadata file of your IdP. Place the metadata file in the config directory of Open Distro for Elasticsearch. The path has to be specified relative to the config directory (you can also specify metadata_url instead of file).

idp.entity_id : This is the unique ID of your identity provider. You can find this ID in the SAML metadata:

SAML metadata entity id

sp.entity_id: This should match the “Relying Party identifier” in ADFS configuration.

kibana_url : The base URL of your Kibana installation that receives HTTP requests.

roles_key: The attribute in the SAML response where the roles are stored. You called your Claim type “Roles” in ADFS.

exchange_key: SAML, unlike other protocols, is not meant to be used for exchanging user credentials with each request. Open Distro Security trades the SAML response for a lightweight JSON web token that stores the validated user attributes. This token is signed by an exchange key that you can choose freely. The algorithm is HMAC256, so it should have at least 32 characters. Note that when you change this key all tokens signed with it will immediately become invalid.

Update Open Distro Security’s config

Now run securityadmin.sh to update the Open Distro Security’s config as shown below:

$ /usr/share/elasticsearch/plugins/opendistro_security/tools/securityadmin.sh \
    -cacert /etc/elasticsearch/root-ca.pem \
    -cert /etc/elasticsearch/kirk.pem \ 
    -key /etc/elasticsearch/kirk-key.pem \
    -h <Cluster-IP-or-FQDN> \
    -f <Path-to-config>/config.yml -t config

Here I’m specifying the paths for the Root CA certificate (-cacert), Admin Certificate (-cert), and Admin Private Key (-key) files. The distinguished name (DN) of the Admin certificate needs to be configured in the elasticsearch.yml file under opendistro_security.authcz.admin_dn . I’m restricting this update to the config file by explicitly specifying the config file location.

Role mapping

You can map Open Distro Security roles to usernames, backend roles, and/or hosts. Backend roles are determined as part of the authentication and authorization process, and in our case this is the “Roles” Attribute values of the SAML response. My prior post on setting up LDAP integration for Open Distro for Elasticsear details how to configure the security roles and role mappings. I mapped my AD Group ESAdmins to Security role all_access and AD Group ESUsers to readall

Kibana role mappings

Kibana configuration

Since most of the SAML specific configuration is done in Open Distro Security, you can simply activate SAML in your kibana.yml by adding:

opendistro_security.auth.type: "saml"

In addition, you must whitelist the Kibana endpoint for validating the SAML assertions and the logout endpoint:

server.xsrf.whitelist: ["/_opendistro/_security/saml/acs", "/_opendistro/_security/saml/logout"]

In order to test your configuration, you must restart Kibana.

sudo systemctl restart kibana.service

Test logging in as different users

Navigate to https://<<kibana url>>:5601. You will be redirected to the ADFS login page:

ADFS login screen

As a read-write user esuser1

User esuser1 is part of the ESAdmins AD group and is mapped to the security role all_access. This user is allowed to perform read and write operations.

Add a document (succeeds as expected):

 

esuser1 create document result

Run a search query (succeeds as expected):

 

esuser1 search query result

As read-only user esuser2

User esuser2 is part of the ESUsers AD group and is mapped to the security role readall. This user is allowed to perform only read operations.

Create a document (fails as expected; esuser2 is not a member of a group that allows writing):

 

esuser2 create document result

 

Run a search query (succeeds, as expected):

esuser2 search query result

As user esuser3 – not in any groups

User esuser3 is not part of any AD groups and is not mapped to any security roles. This user is not allowed to perform any operations.

Run a search query (fails, as expected; esuser3 is not a member of any group):

 

esuser3 search query result

Conclusion

In this post, I covered SAML authentication for Kibana single sign on with ADFS. Please refer to the Open Distro for Elasticsearch documentation for additional configuration options for Open Distro Security configuration with SAML

Have an issue or question? Want to contribute? You can get help and discuss Open Distro for Elasticsearch on our forums. You can file issues here.

from AWS Open Source Blog

Announcing PartiQL: One query language for all your data

Announcing PartiQL: One query language for all your data

Data is being gathered and created at rates unprecedented in history. Much of this data is intended to drive business outcomes but, according to the Harvard Business Review, “…on average, less than half of an organization’s structured data is actively used in making decisions…“

The root of the problem is that data is typically spread across a combination of relational databases, non-relational data stores, and data lakes. Some data may be highly structured and stored in SQL databases or data warehouses. Other data may be stored in NoSQL engines, including key-value stores, graph databases, ledger databases, or time-series databases. Data may also reside in the data lake, stored in formats that may lack schema, or may involve nesting or multiple values (e.g., Parquet, JSON). Every different type and flavor of data store may suit a particular use case, but each also comes with its own query language. The result is tight coupling between the query language and the format in which data is stored. Hence, if you want to change your data to another format, or change the database engine you use to access/process that data (which is not uncommon in a data lake world), or change the location of your data, you may also need to change your application and queries. This is a very large obstacle to the agility and flexibility needed to effectively use data lakes.

Today we are happy to announce PartiQL, a SQL-compatible query language that makes it easy to efficiently query data, regardless of where or in what format it is stored. As long as your query engine supports PartiQL, you can process structured data from relational databases (both transactional and analytical), semi-structured and nested data in open data formats (such as an Amazon S3 data lake), and even schema-less data in NoSQL or document databases that allow different attributes for different rows. We are open sourcing the PartiQL tutorial, specification, and a reference implementation of the language under the Apache2.0 license, so that everyone can participate, contribute, and use it to drive widespread adoption for this unifying query language.

 

Diagram showing where PartiQL fits with other data sources.

 

The PartiQL open source will make it easy for developers to parse and embed PartiQL in their own applications. The implementation supports users parsing PartiQL queries into abstract syntax trees that their applications can analyze or process and supports interpreting PartiQL queries directly.

PartiQL solves problems we faced within Amazon. It is already being used by Amazon S3 Select, Amazon Glacier Select, Amazon Redshift Spectrum, Amazon Quantum Ledger Database (Amazon QLDB), and Amazon internal systems. Also, Amazon EMR pushes down PartiQL queries to S3 Select. More AWS services will add support in the coming months. Outside of Amazon, Couchbase also looks forward to supporting PartiQL in the Couchbase Server.

We look forward to the creators of data processing engines diving deep into PartiQL, and joining us in solving a problem that affects all users of data, across all industries.

Why we built it

We developed PartiQL in response to Amazon’s own needs to query and transform vast amounts and varieties of data – not just SQL tabular data, but also nested and semi-structured data – found in a variety of formats and storage engines. Amazon’s retail business already had vast sets of semi-structured data, most often in the Ion format. Amazon’s retail business, led by Chris Suver, was in pursuit of an SQL-like query language. Multiple AWS services, such as QLDB, saw the benefits of schema-optional, document-oriented data models, but also wanted to leverage existing SQL knowledge and tools. Finally, the AWS relational database services like Redshift, and the many existing clients of SQL, needed to expand into accessing the non-relational data of the data lake, while maintaining strict backwards compatibility with SQL. At the same time, the database research community, with query language works like UCSD’s SQL++, was showing that it is possible to devise clean, well-founded query languages that stay very close to SQL, while having the power needed to process nested and semi-structured data.

Don Chamberlin, creator of the SQL language specification, says: “As JSON and other nested and semi-structured data formats have increased in importance, the need for a query language for these data formats has become clear. The approach of adapting SQL for this purpose has the advantage of building on our industry’s investment in SQL skills, tools, and infrastructure. The SQL++ proposal of Dr. Yannis Papakonstantinou, and languages based on SQL++ such as PartiQL, have shown that the extensions to SQL needed for querying semistructured data are fairly minimal. I hope that these small language extensions will help to facilitate a new generation of applications that process data in JSON and other flexible formats, with and without predefined schemas.”

We therefore set out to create a language that offers strict SQL compatibility, achieves nested and semi-structured processing with minimal extensions, treats nested data as a first-class citizen, allows optional schema, and is independent of physical formats and data stores.

The result was PartiQL, which provides a simple and consistent way to query data across a variety of formats and services. This gives you the freedom to move your data across data sources, without having to change your queries. It is backwards-compatible with SQL, and provides extensions for multi-valued, nested, and schema-less data, which blend seamlessly with the join, filtering, and aggregation capabilities of standard SQL.

PartiQL design tenets

The following design tenets captured our design goals and were fundamental to PartiQL:

  • SQL compatibility: PartiQL facilitates adoption by maintaining compatibility with SQL. Existing SQL queries will continue to work (that is, they will maintain their syntax and semantics) in SQL query processors that are extended to provide PartiQL. This avoids any need to rewrite existing SQL, and makes it easy for developers and business intelligence tools to leverage PartiQL.
  • First-class nested data: The data model treats nested data as a fundamental part of the data abstraction. Consequently, the PartiQL query language provides syntax and semantics that comprehensively and accurately access and query nested data, while naturally composing with the standard features of SQL.
  • Optional schema and query stability: PartiQL does not require a predefined schema over a dataset. It is designed to be usable by database engines that assume the presence of a schema (be it schema-on-write or schema-on-read) or schemaless engines. Technically, the result of a working query does not change as a schema is imposed on existing data, so long as the data itself remains the same. It is thus easier to provide consistent access to multiple stores, despite the different schema assumptions of the participating engines.
  • Minimal extensions: PartiQL has a minimum number of extensions over SQL. The extensions are easy to understand, lend themselves to efficient implementation, and compose well with each other and with SQL itself. This enables intuitive filtering, joining, aggregation, and windowing on the combination of structured, semi-structured, and nested datasets.
  • Format independence: PartiQL syntax and semantics are not tied to any particular data format. A query is written identically across underlying data in JSON, Parquet, ORC, CSV, Ion, or other formats. Queries operate on a comprehensive logical type system that maps to diverse underlying formats.
  • Data store independence: PartiQL syntax and semantics are not tied to a particular underlying data store. Thanks to its expressiveness, the language is applicable to diverse underlying data stores.

Past languages have addressed subsets of these tenets. For example, Postgres JSON is SQL-compatible, but does not treat the JSON nested data as a first-class citizen. Semi-structured query languages treat nested data as first-class citizens, but either allow occasional incompatibilities with SQL, or do not even look like SQL. PartiQL is the first language to address this full set of tenets.

As you would expect from its design tenets, PartiQL will be both easy and familiar for SQL users. It has been in use by several customers of Amazon Redshift Spectrum since 2018:

Annalect is Omnicom’s global data and analytics arm, providing purpose-built, scalable solutions that make data actionable, and is the driving force behind Omnicom’s revolutionary precision marketing and insights platform, Omni. “PartiQL enables us to query nested data with Amazon Redshift Spectrum directly in Amazon S3 without un-nesting, and will also enable us to easily bring nested data from Amazon S3 into local tables in Amazon Redshift using standardized language,” said Eric Kamm, senior engineer and architect at Annalect. John Briscoe, Director of Data and Operations at Annalect added: “We’re also excited that it will give us consistent query syntax from one data platform to another, allowing for easier development of multi-data platform applications and easier developer onboarding.”

Steven Moy, Software Engineer at Yelp: “PartiQL addresses the critical missing piece in a poly-store environment — a high-level declarative language that works across multiple domain-specific data stores. At Yelp, we leverage multiple AWS data stores (Redshift, S3, DynamoDB) technology to deliver the best local businesses to users and best way to reach local audiences for local business owners. With Amazon Redshift Spectrum, Yelp enables eight times the amount of data to help our developer communities make data informed decisions, and we look forward to taking that partnership a step further with PartiQL which will allow Yelp’s developers to focus their time on creating delightful user experiences instead of mastering a new query language or solving classic consistency problems.”

Unlike traditional SQL, the PartiQL query language also meets the needs of NoSQL and non-relational databases. PartiQL has already been adopted by the Amazon Quantum Ledger Database (QLDB) as their query language.

Andrew Certain, AWS Senior Principal Engineer and Amazon Quantum Ledger Database (QLDB) architect, says about the choice of PartiQL: “QLDB needed a flexible, document-oriented data model so that users can easily store and process both structured and semi-structured data, without the burden of defining and evolving a schema. At the same time, QLDB wanted to benefit from the wide knowledge of SQL. PartiQL has greatly served both purposes. Its extensions for accessing nested and semistructured data are few, powerful and quite intuitive.” QLDB, currently in preview mode, is one of the AWS services that adopted PartiQL.

The Couchbase Server, which utilizes a JSON-based document-oriented data model, is also looking forward to adopting PartiQL:

Ravi Mayuram, Senior Vice President of Engineering and CTO of Couchbase, says: “As a pioneer in bringing SQL to JSON when we introduced N1QL more than three years ago, Couchbase believes that the foundations on which SQL was built for relational databases are just as sound for a JSON data model and database. PartiQL is a welcome next step in this convergence, and we look forward to supporting it.”

PartiQL reference engine

PartiQL reference implementation architecture.

This diagram shows, at a very high level, the PartiQL reference implementation. We are open sourcing the lexer, parser, and compiler for PartiQL query expressions. We provide a library that can be embedded or used as a standalone tool for running queries. A user could use this library to simply validate PartiQL queries, or to embed a PartiQL evaluator to process data within their system. The library provides a data interface to bind to whatever data back end an application may have, and provides out-of-the-box support for Ion and JSON.

Getting started

The PartiQL open source implementation provides an interactive shell (or Read Evaluate Print Loop (REPL)) which allows users to write and evaluate PartiQL queries.

Prerequisites

PartiQL requires the Java Runtime (JVM) to be installed on your machine. You can obtain the latest version of the Java Runtime from OpenJDK, OpenJDK for Windows, or Oracle.

Follow the instructions for Installing the JDK Software and Setting JAVA_HOME to the path where your Java Runtime is installed.

Download the PartiQL REPL

Each release of PartiQL comes with an archive that contains the PartiQL REPL as a zip file.

You may have to click on Assets to see the zip and tgz archives. Download the latest partiql-cli zip archive to your machine. The file will append PartiQL’s release version to the archive, i.e., partiql-cli-0.1.0.zip.

Expand (unzip) the archive on your machine, which yields the following folder structure (where ... represents elided files/directories):

├── partiql-cli
    ├── bin
    │   ├── partiql
    │   └── partiql.bat
    ├── lib
    │   └── ... 
    ├── README.md
    └── Tutorial
        ├── code
        │   └── ... 
        ├── tutorial.html
        └── tutorial.pdf

The root folder partiql-cli contains a README.md file and three subfolders:

  1. bin contains startup scripts: partiql for macOS and Unix systems and partiql.bat for Windows systems. Execute these files to start the REPL.
  2. lib contains all the necessary Java libraries needed to run PartiQL.
  3. Tutorial contains the tutorial in pdf and html form. The subfolder code contains three types of files:
    1. Data files with the extension .env. These files contain PartiQL data that we can query.
    2. PartiQL query files with the extension .sql. These files contain the PartiQL queries used in the tutorial.
    3. Sample query output files with the extension .output. These files contain sample output from running the tutorial queries on the appropriate data.

Running the PartiQL REPL

Windows

Run (double-click on) particl.bat. This should open a command-line prompt and start the PartiQL REPL, which displays:

Welcome to the PartiQL REPL!
PartiQL>

macOS (Mac) and Unix

  1. Open a terminal and navigate to the partiql-cli folder. The folder name will have the PartiQL version as a suffix, i.e., partiql-cli-0.1.0.
  2. Start the REPL by typing ./bin/partiql and pressing ENTER, which displays:
Welcome to the PartiQL REPL!
PartiQL>

Testing the PartiQL REPL

Write a simple query to verify that your PartiQL REPL is working. At the PartiQL> prompt type:

PartiQL> SELECT * FROM [1,2,3]

and press ENTER twice. The output should look similar to:

PartiQL> SELECT * FROM [1,2,3]
   | 
===' 
<<
  {
    '_1': 1
  },
  {
    '_1': 2
  },
  {
    '_1': 3
  }
>>
--- 
OK! (86 ms)
PartiQL>?

Congratulations! You have successfully installed and run the PartiQL REPL. The PartiQL REPL is now waiting for more input.

To exit the PartiQL REPL, press:

  • Control+D in macOS or Unix
  • Control+C on Windows

or close the terminal/command prompt window.

Loading data from a file

An easy way to load the necessary data into the REPL is use the -e switch when starting the REPL and provide the name of a file that contains your data:

./bin/partiql  -e Tutorial/code/q1.env

You can then see what is loaded in the REPL’s global environment using the special REPL command !global_env, i.e.,

Welcome to the PartiQL REPL!
PartiQL> !global_env
   | 
===' 
{
  'hr': {
    'employees': <<
      {
        'id': 3,
        'name': 'Bob Smith',
        'title': NULL
      },
      {
        'id': 4,
        'name': 'Susan Smith',
        'title': 'Dev Mgr'
      },
      {
        'id': 6,
        'name': 'Jane Smith',
        'title': 'Software Eng 2'
      }
    >>
  }
}
--- 
OK! (6 ms)

How to participate in PartiQL

PartiQL is fully open sourced under the Apache2.0 license. We welcome your contributions in further expanding the specification, building the technology and increasing its adoption & mindshare in the user community. Learn more about PartiQL.

You can contribute to the project by sending a pull request for a good first issue. File an issue if there are bugs or missing features. Read through the tutorial to understand PartiQL syntax, how it extends SQL, and take a step-by-step walkthrough. Want to learn about every little detail of PartiQL? Read through the specification.

from AWS Open Source Blog

AWS ParallelCluster with AWS Directory Services Authentication

AWS ParallelCluster with AWS Directory Services Authentication

AWS ParallelCluster simplifies the creation and deployment of HPC clusters. In this post we combine ParallelCluster with AWS Directory Services to create a multi-user, POSIX-compliant system with centralized authentication and automated home directory creation.

To grant only the minimum permissions to the nodes in the cluster, no AD configuration parameters or permissions are stored directly on the cluster nodes. Instead, the ParallelCluster nodes when booted will automatically trigger an AWS Lambda function, which in turn uses AWS Systems Manager Parameter Store and AWS KMS to securely join the node to the domain. Users will log in to ParallelCluster nodes using their AD credentials.

VPC configuration for ParallelCluster

The VPC used for this configuration can be created using the “VPC Wizard” tool. You can also use an existing VPC that meets the AWS ParallelCluster network requirements.

 

 

In Select a VPC Configuration, choose VPC with Public and Private Subnets and then click Select.

 

Prior to starting the VPC Wizard, allocate an Elastic IP Address. This will be used to configure a NAT gateway for the private subnet. A NAT gateway is required to enable compute nodes in the AWS ParallelCluster private subnet to download the required packages and to access the AWS services public endpoints. See AWS ParallelCluster network requirements.

Please be sure to select two different availability zones for the Public and Private subnets. While this is not strictly required for ParallelCluster itself, we will later use these subnets again for SimpleAD, which requires subnets to be in two distinct availability zones.

 

You can find more details about VPC creation and configuration options in VPC with Public and Private Subnets (NAT).

AWS Directory Services configuration

For simplicity in this example, we will configure Simple AD as the directory service, but this solution will work with any Active Directory system.

Simple AD configuration is performed from the AWS Directory Service console. The required configuration steps are are described in Getting Started with Simple AD.

For this example, set the Simple AD configuration as follows:

Directory DNS name: test.domain
Directory NetBIOS name: TEST
Administrator password: <Your DOMAIN password>

 

 

In the networking section, select the VPC and the two subnets created in the previous steps.

The following screenshot contains the Directory details:

 

Make note of the DNS addresses listed in the directory details as these will be needed later (in this example, 10.0.0.92 and 10.0.1.215).

DHCP options set for AD

In order for nodes to join the AD Domain, a DHCP option set must be configured for the VPC, consistent with the domain name and DNS of the Simple AD service previously configured.

From the AWS VPC dashboard, set the following:

Name: custom DHCP options set
Domain name: test.domain eu-west-1.compute.internal
Domain name servers: 10.0.0.92, 10.0.1.215

The “Domain name” field must contain the Simple AD domain and the AWS regional domain (where the cluster and SimpleAD are being configured), separated by a space.

 

 

You can now assign the new DHCP options set to the VPC:

 

How to manage users and groups in Simple Active Directory

See Manage Users and Groups in Simple AD. If you prefer to use a Linux OS for account management, see How to Manage Identities in Simple AD Directories for details.

Using AWS Key Management Service to secure AD Domain joining credentials

AWS Key Management Service is a secure and resilient service that uses FIPS 140-2 validated hardware security modules to protect your keys. This service will be used to generate a key and encrypt the domain joining password, as explained in the next section.

In the AWS Console, navigate to the AWS Key Management Service (KMS) and click on Create key.

In Display name for the key, write “SimpleADJoinPassword” and click Next, leaving the default settings for all other sections.

In Customer managed keys, take note of the created Key ID.

 

AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. We will use it to securely store the domain joining information, i.e. the domain name and the joining password.

From the AWS console, access the AWS Systems Manager and select Parameter Store. You need to create two specific parameters: the DomainName which contains the name of the domain and the DomainPassword that contains the domain administrator password.

To create the first parameter, Click on Create parameter and add the following information in the Parameter details section:

Name: DomainName
Type: String
Value: test.domain

Click on Create parameter to create the parameter.

You can now create the DomainPassword parameter with the following details:

Name: DomainPassword
Type: SecureString
KMS KEY ID: alias/SimpleADJoinPassword
Value: <your_ad_password>

Click on Create parameter to create it.

The result should be similar to the screenshot below:

 

AWS ParallelCluster configuration

AWS ParallelCluster is an open source cluster management tool to deploy and manage HPC clusters in the AWS cloud; to get started, see Installing AWS ParallelCluster.

After the AWS ParallelCluster command line has been configured, create the cluster template file provided below in .parallelcluster/config . The master_subnet_id contains the ID of the created public subnet; the compute_subnet_id contains the private one.

The ec2_iam_role is the role that will be used for all the instances of the cluster. The steps for creating this role will be explained in the next section.

[aws]
aws_region_name = eu-west-1

[cluster slurm]
scheduler = slurm
compute_instance_type = c5.large
initial_queue_size = 2
max_queue_size = 10
maintain_initial_size = false
base_os = alinux
key_name = AWS_Ireland
vpc_settings = public
ec2_iam_role = parallelcluster-custom-role
pre_install = s3://pcluster-scripts/pre_install.sh
post_install = s3://pcluster-scripts/post_install.sh

[vpc public]
master_subnet_id = subnet-01fc20e143543f8af
compute_subnet_id = subnet-0b1ae2790497d83ec
vpc_id = vpc-0cdee679c5a6163bd

[global]
update_check = true
sanity_check = true
cluster_template = slurm

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

The s3://pcluster-scripts bucket contains the pre- and post-installation scripts required for the configuration of the master and compute nodes inside the domain. A unique bucket name will be required – create an S3 bucket and replace s3://pcluster-scripts with your chosen name.

The pre_install script installs the required packages and joins the node inside domain:

#!/bin/bash

# Install the required packages
yum -y install sssd realmd krb5-workstation samba-common-tools
instance_id=$(curl http://169.254.169.254/latest/meta-data/instance-id)
region=$(curl  -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/[a-z]$//')
# Lambda function to join the linux system in the domain
aws --region ${region} lambda invoke --function-name join-domain-function /tmp/out --payload '{"instance": "'${instance_id}'"}' --log-type None
output=""
while [ -z "$output" ]
do
  sleep 5
  output=$(realm list)
done
#This line allows the users to login without the domain name
sed -i 's/use_fully_qualified_names = True/use_fully_qualified_names = False/g' /etc/sssd/sssd.conf
#This line configure sssd to create the home directories in the shared folder
mkdir /shared/home/
sed -i '/fallback_homedir/c\fallback_homedir = /home/%u' /etc/sssd/sssd.conf
sleep 1
service sssd restart
# This line is required for AWS Parallel Cluster to understand correctly the custom domain
sed -i "s/--fail \${local_hostname_url}/--fail \${local_hostname_url} | awk '{print \$1}'/g" /opt/parallelcluster/scripts/compute_ready

The post_install script configures the ssh service to accept connections with a password:

#!/bin/bash

sed -i 's/PasswordAuthentication no//g' /etc/ssh/sshd_config
echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
sleep 1
service sshd restart

Copy the pre_install and post_install scripts into the S3 bucket created previously.

AD Domain join with AWS Lambda

AWS Lambda allows you to run code without provisioning or managing servers. Lambda is used in this solution to securely join the Linux node to the Simple AD domain.

You can Create a Lambda Function with the Console.

For Function name, enter join-domain-function.

As Runtime, enter Python 2.7.

Choose “Create function” to create it.

 

The following code should be entered within the Function code section, which you can find by scrolling down in the page. Please modify <REGION> with the correct value.

import json
import boto3
import time

def lambda_handler(event, context):
    json_message = json.dumps(event)
    message = json.loads(json_message)
    instance_id = message['instance']
    ssm_client = boto3.client('ssm', region_name="<REGION>") # use region code in which you are working
    DomainName = ssm_client.get_parameter(Name='DomainName')
    DomainName_value = DomainName['Parameter']['Value']
    DomainPassword = ssm_client.get_parameter(Name='DomainPassword',WithDecryption=True)
    DomainPassword_value = DomainPassword['Parameter']['Value']
    response = ssm_client.send_command(
             InstanceIds=[
                "%s"%instance_id
                     ],
             DocumentName="AWS-RunShellScript",
             Parameters={
                'commands':[
                     'echo "%s" | realm join -U [email protected]%s %s --verbose;rm -rf /var/lib/amazon/ssm/i-*/document/orchestration/*'%(DomainPassword_value,DomainName_value,DomainName_value)                       ]
                  },
               )
    return {
        'statusCode': 200,
        'body': json.dumps('Command Executed!')
    }

In the Basic settings section, set 10 sec as Timeout.

Click on Save in the top right to save the function.

In the Execution role section, click on the highlighted section to edit the role.

 

 

In the newly-opened tab, Click on Attach Policies and then Create Policy.

 

The last action opens another new tab in your browser.

Click on Create policy and then JSON.

 

 

The following policy can be entered inside the JSON editor. Please modify the <REGION>, <AWS ACCOUNT ID> and <KEY ID> with the correct values.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": [
                "arn:aws:ssm:<REGION>:<AWS ACCOUNT ID>:parameter/DomainName"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": [
                "arn:aws:ssm:<REGION>:<AWS ACCOUNT ID>:parameter/DomainPassword"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:SendCommand"
            ],
            "Resource": [
                "arn:aws:ec2:<REGION>:<AWS ACCOUNT ID>:instance/*",
                "arn:aws:ssm:<REGION>::document/AWS-RunShellScript"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:<REGION>:<AWS ACCOUNT ID>:key/<KEY ID>"
            ]
        }
    ]
}

In the next section, enter “GetJoinCredentials” as the Name and click Create policy.

Close the current tab and move to the previous one to select the policy for the Lambda role.

Refresh the list, select the GetJoinCredentials policy, and click Attach policy.

 

IAM custom Roles for Lambda and SSM endpoints

To allow ParallelCluster nodes to call Lambda and SSM endpoints, you need to configure a custom IAM Role.

See AWS Identity and Access Management Roles in AWS ParallelCluster for details on the default AWS ParallelCluster policy.

From the AWS console:

  • access the AWS Identity and Access Management (IAM) service and click on Policies.
  • choose Create policy and, in the JSON section, paste the following policy. Be sure to modify <REGION> , <AWS ACCOUNT ID> to match the values for your account, and also update the S3 bucket name from pcluster-scripts to the name you chose earlier.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:AttachVolume",
                "ec2:DescribeInstanceAttribute",
                "ec2:DescribeInstanceStatus",
                "ec2:DescribeInstances",
                "ec2:DescribeRegions"
            ],
            "Sid": "EC2",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "dynamodb:ListTables"
            ],
            "Sid": "DynamoDBList",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:sqs:<REGION>:<AWS ACCOUNT ID>:parallelcluster-*"
            ],
            "Action": [
                "sqs:SendMessage",
                "sqs:ReceiveMessage",
                "sqs:ChangeMessageVisibility",
                "sqs:DeleteMessage",
                "sqs:GetQueueUrl"
            ],
            "Sid": "SQSQueue",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:DescribeTags",
                "autoScaling:UpdateAutoScalingGroup",
                "autoscaling:SetInstanceHealth"
            ],
            "Sid": "Autoscaling",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:dynamodb:<REGION>:<AWS ACCOUNT ID>:table/parallelcluster-*"
            ],
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:Query",
                "dynamodb:GetItem",
                "dynamodb:DeleteItem",
                "dynamodb:DescribeTable"
            ],
            "Sid": "DynamoDBTable",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:s3:::<REGION>-aws-parallelcluster/*"
            ],
            "Action": [
                "s3:GetObject"
            ],
            "Sid": "S3GetObj",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "arn:aws:cloudformation:<REGION>:<AWS ACCOUNT ID>:stack/parallelcluster-*"
            ],
            "Action": [
                "cloudformation:DescribeStacks"
            ],
            "Sid": "CloudFormationDescribe",
            "Effect": "Allow"
        },
        {
            "Resource": [
                "*"
            ],
            "Action": [
                "sqs:ListQueues"
            ],
            "Sid": "SQSList",
            "Effect": "Allow"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeAssociation",
                "ssm:GetDeployablePatchSnapshotForInstance",
                "ssm:GetDocument",
                "ssm:DescribeDocument",
                "ssm:GetManifest",
                "ssm:GetParameter",
                "ssm:GetParameters",
                "ssm:ListAssociations",
                "ssm:ListInstanceAssociations",
                "ssm:PutInventory",
                "ssm:PutComplianceItems",
                "ssm:PutConfigurePackageResult",
                "ssm:UpdateAssociationStatus",
                "ssm:UpdateInstanceAssociationStatus",
                "ssm:UpdateInstanceInformation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssmmessages:CreateControlChannel",
                "ssmmessages:CreateDataChannel",
                "ssmmessages:OpenControlChannel",
                "ssmmessages:OpenDataChannel"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2messages:AcknowledgeMessage",
                "ec2messages:DeleteMessage",
                "ec2messages:FailMessage",
                "ec2messages:GetEndpoint",
                "ec2messages:GetMessages",
                "ec2messages:SendReply"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:<REGION>:<AWS ACCOUNT ID>:function:join-domain-function"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::pcluster-scripts/*"
            ]
        }
    ]
}

Click Review policy, and in the next section enter “parallelcluster-custom-policy” as the Name string. Click Create policy.

Now you can finally create the Role. Choose Role in the left menu and then Create role.

Select AWS service as the type of trusted entity, and EC2 as the service that will use this role.

Choose Next to proceed in the creation process.

 

 

In the policy selection, select the parallelcluster-custom-policy that was just created.

Click through the Next: Tags and then Next: Review pages.

In the Role name box, enter “parallelcluster-custom-role” and confirm with the Create role button.

Deploy ParallelCluster

The cluster can now be created using the following command line:

pcluster create -t slurm slurmcluster

-t slurm indicates which section of the cluster template to use. slurmcluster is the name of the cluster that will be created. For more details, see the AWS ParallelCluster Documentation. A detailed explanation of the pcluster command line parameters can be found in AWS ParallelCluster CLI Commands.

You can now connect to the Master node of the cluster with any Simple AD user and run the desired workload.

Teardown

When you have finished your computation, the cluster can be destroyed using the following command:

pcluster delete slurmcluster

The additonal created resources can be destroyed following the instructions in the AWS documentation:

Conclusion

This blog post has shown you how to deploy and integrate Simple AD with AWS ParallelCluster, allowing cluster nodes to be securely and automatically joined to a domain to provide centralized user authentication. This solution encrypts and stores the domain joining credentials using AWS Systems Manager Parameter Store with AWS KMS, and uses AWS Lambda at node boot to join the AD Domain.

from AWS Open Source Blog

Best Practices for Running Ansys Fluent on AWS ParallelCluster

Best Practices for Running Ansys Fluent on AWS ParallelCluster

Using HPC (high performance computing) to solve Computational Fluid Dynamics (CFD) challenges has become common practice. As the growth from HPC workstation to supercomputer has slowed over the last decade or two, compute clusters have increasingly taken the place of single, big SMP (shared memory processing) supercomputers, and have become the ‘new normal’. Another, more recent innovation, the cloud, has also enabled dramatic growth in total throughput.

This post will show you best good practices for setting up an HPC cluster on AWS running Ansys Fluent (a commercial computational fluid dynamics software package) in just a few minutes. In addition, you will find some sample scripts to install Ansys Fluent and run your first job. ‘Best guidance’ is a relative term, and in the cloud even more so, as there are many possibilities (aka services) that can be combined in different ways to achieve the same goal. Whether one option is better than another can only be decided in the context of the specific application characteristics or application features to be used. For example, “a high performance parallel file system (Amazon FSx) is better than NFS share” is true for the vast majority of HPC workloads, but there could be cases (like !I/O-intensive applications, or small HPC clusters created to run few and/or small jobs) where NFS share is more than enough, and it’s cheaper and simpler to set up. In this post we will share what we consider best good practices, together with some additional options – valid alternatives that you may wish to consider.

The main cluster components we will use are the following AWS services:

  • AWS ParallelCluster, an AWS-supported open source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
  • The new AWS C5n instances that can use up to 100 Gbps of network bandwidth.
  • Amazon FSx for Lustre , a highly parallel file system that supports sub-millisecond access to petabyte-scale file systems, designed to deliver 200 MB/s of aggregate throughput at 10,000 IOPS for every 1TiB of provisioned capacity.
  • Nice DCV as the remote visualization protocol.

Note: We announced Elastic Fabric Adapter (EFA) at re:Invent 2018, and have recently launched the service in multiple AWS regions. EFA is a network device that you can attach to your Amazon EC2 instances to accelerate HPC applications, providing lower and more consistent latency and higher throughput than the TCP transport traditionally used in cloud-based HPC systems. It enhances the performance of inter-instance communication critical for scaling HPC applications, and is optimized to work on the existing AWS network infrastructure. Ansys Fluent is not yet ready for use with EFA, so the use of this specific network device will not be extensively covered in this post.

Note: ANSYS Fluent is a commercial software package that requires a license. This post assumes that you already have your Ansys Fluent license on (or accessible from) AWS. Also, the installation script you will find below requires Ansys installation packages. You can download the current release from Ansys under “Downloads → Current Release”.

First step: Create a Custom AMI

To speed up cluster creation and, most importantly, to shorten the time needed to start up the compute nodes, it’s good practice to create a custom AMI that has certain packages preinstalled and settings pre-configured.

  1. Start based on an existing AMI and note down the AMI id appropriate to the region where you plan to deploy your cluster; see our list of AMIs by region. For example, we started with CentOS7 in Virginia (us-east-1), and the AMI ID is ami-0a4d7e08ea5178c02.
  2. Open the AWS Console and launch an instance in your preferred region (the same you chose your AMI from), using the ami-id as before.
  3. Make sure that your instance is accessible from the internet and has a public IP address.
  4. Give the instance an IAM role that allows it to download files from S3 (or from a specific S3 bucket).
  5. Optionally, tag the instance. (i.e., Name = Fluent-AMI-v1)
  6. Configure the security group to allow incoming connections on port 22.
  7. If you need additional details on how to create a custom AMI for AWS ParallelCluster, please refer to Building a custom AWS ParallelCluster AMI. the official documentation.
  8. Once the instance is ready, ssh into it and run the following commands as root:
yum -y update

yum install -y dkms zlib-devel libXext-devel libGLU-devel libXt-devel libXrender-devel libXinerama-devel libpng-devel libXrandr-devel libXi-devel libXft-devel libjpeg-turbo-devel libXcursor-devel readline-devel ncurses-devel python python-devel cmake qt-devel qt-assistant mpfr-devel gmp-devel htop wget screen vim xorg-x11-drv-dummy xorg-x11-server-utils libXp.x86_64 xorg-x11-fonts-cyrillic.noarch xterm.x86_64 openmotif.x86_64 compat-libstdc++-33.x86_64 libstdc++.x86_64 libstdc++.i686 gcc-c++.x86_64 compat-libstdc++-33.i686 libstdc++-devel.x86_64 libstdc++-devel.i686 compat-gcc-34.x86_64 gtk2.i686 libXxf86vm.i686 libSM.i686 libXt.i686 xorg-x11-fonts-ISO8859-1-75dpi.no xorg-x11-fonts-iso8859-1-75dpi.no libXext gdm gnome-session gnome-classic-session gnome-session-xsession xorg-x11-server-Xorg xorg-x11-drv-dummy xorg-x11-fonts-Type1 xorg-x11-utils gnome-terminal gnu-free-fonts-common gnu-free-mono-fonts gnu-free-sans-fonts gnu-free-serif-fonts alsa-plugins-pulseaudio alsa-utils

yum -y groupinstall "GNOME Desktop"

yum -y erase initial-setup gnome-initial-setup initial-setup-gui

#a reboot here may be helpful in case the kernel has been updated

#this will disable the ssh host key checking
#usually this may not be needed, but with some specific configuration Fluent may require this setting.
cat <<\EOF >> /etc/ssh/ssh_config
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
EOF


#set higher limits, usefull when running Fluent (and in general HPC applications) on multiple nodes via mpi
cat <<\EOF >> /etc/security/limits.conf
* hard memlock unlimited
* soft memlock unlimited
* hard stack 1024000
* soft stack 1024000
* hard nofile 1024000
* soft nofile 1024000
EOF

#stop and disable the firewall
systemctl disable firewalld
systemctl stop firewalld

#install the latest ENA driver, ATM 2.1.1
cd /tmp
wget https://github.com/amzn/amzn-drivers/archive/ena_linux_2.1.1.tar.gz
tar zxvf ena_linux_2.1.1.tar.gz
mv amzn-drivers-ena_linux_2.1.1 /usr/src/ena-2.1.1

cat <<EOF > /usr/src/ena-2.1.1/dkms.conf
PACKAGE_NAME="ena"
PACKAGE_VERSION="2.1.1"
AUTOINSTALL="yes"
REMAKE_INITRD="yes"
BUILT_MODULE_LOCATION[0]="kernel/linux/ena"
BUILT_MODULE_NAME[0]="ena"
DEST_MODULE_LOCATION[0]="/updates"
DEST_MODULE_NAME[0]="ena"
CLEAN="cd kernel/linux/ena; make clean"
MAKE="cd kernel/linux/ena; make BUILD_KERNEL=\${kernelver}"
EOF

dkms add -m ena -v 2.1.1
dkms build -m ena -v 2.1.1
dkms install -m ena -v 2.1.1

dracut -f —add-drivers ena
#reboot again, and make sure that after the reboot the ena driver is up to date (run modinfo ena to check)


#install the latest version of NICE DCV (at the moment it is 2017.4)
cd /tmp
wget https://d1uj6qtbmh3dt5.cloudfront.net/server/nice-dcv-2017.4-6898-el7.tgz
tar xzvf nice-dcv-2017.4-6898-el7.tgz
cd nice-dcv-2017.4-6898-el7
yum install -y nice-dcv-server-2017.4.6898-1.el7.x86_64.rpm nice-dcv-gltest-2017.4.216-1.el7.x86_64.rpm nice-xdcv-2017.4.210-1.el7.x86_64.rpm 

#install this additional package only in case you are running on an instance equipped with GPU
yum install -y nice-dcv-gl-2017.4.490-1.el7.i686.rpm nice-dcv-gl-2017.4.490-1.el7.x86_64.rpm

# Add the line "blacklist = /usr/bin/Xorg" to section [gl] of /etc/dcv/dcv-gl.conf
# to fix an incompatibility introduced with the latest versions of Xorg and Nvidia driver
sed -i 's|\[gl\]|&\nblacklist = /usr/bin/Xorg|' /etc/dcv/dcv-gl.conf

#Clean up the instance before creating the AMI.
/usr/local/sbin/ami_cleanup.sh

#shutdown the instance
shutdown -h now

Now you can create your AMI via the AWS CLI (or the AWS Web Console):

aws ec2 create-image --instance-id i-1234567890abcdef0 --name "Fluent-AMI-v1" --description "This is my first Ansys Fluent AMI"

The output would be something like:

{
"ImageId": "ami-1a2b3c4d5e6f7g"
}

Note down the AMI id. It will be used later in the AWS ParallelCluster configuration file.

Create / reuse the VPC, subnet, and security group

Next, create or reuse an existing VPC. Note the vpc-ID and the subnet-ID. More information on how to create and configure your vpc for AWS ParallelCluster is available in Network Configurations.

You can either use a single subnet for both master and compute instances, or two subnets: master in one public subnet and compute instances in a private subnet.

The configuration file below shows how to run your cluster in a single subnet, as shown in this architecture diagram:

networking single subnet architecture diagram

Also create an ad-hoc security group that has port 8443 open. This will be used to allow incoming connection to the master node using NICE DCV as a remote desktop streaming protocol.

Create the cluster configuration file and the post-install script

Now you can start writing your configuration file. Open a text file on your local PC and paste in the code below. (This is an example; you may want to modify some parameters according to your preferences. You will also need to replace the <XXX> placeholders for your own setup.)

[aws]
aws_region_name = <your-preferred-region>

[global]
sanity_check = true
cluster_template = fluent_cluster_test1
update_check = true


[vpc vpc-us-east-1]
vpc_id = vpc-<VPC-ID>
master_subnet_id = subnet-<Subnet-ID>
additional_sg=sg-<Security-Group-ID>


[cluster fluent_cluster]
key_name = <Key-Name>
vpc_settings = vpc-us-east-1
compute_instance_type=c5n.18xlarge
master_instance_type=g3.4xlarge
initial_queue_size = 0
max_queue_size = 10
maintain_initial_size = true
scheduler=sge
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::<Your-S3-Bucket>*
post_install = s3://<Your-S3-Bucket>/fluent-post-install.sh
placement_group = DYNAMIC
placement = compute
master_root_volume_size = 64
compute_root_volume_size = 20
base_os = centos7
extra_json = { "cluster" : {"cfn_scheduler_slots" : "cores" } }
tags = {"Name" : "fluent_cluster_test1"}
fsx_settings = parallel-fs
custom_ami = ami-<AMI-ID>


[fsx parallel-fs]
shared_dir = /fsx
storage_capacity = 3600
import_path = s3://<Your-S3-Bucket>
imported_file_chunk_size = 1024
export_path = s3://<Your-S3-Bucket>/export

Let’s dive deep into some of the settings of this configuration:

  1. aws_region_name = choosing the proper AWS Region is very important for the usability of your Remote Desktop session: the closer you are geographically to the selected region, the lower the network latency – usability and interactivity will be improved. If you are unsure which is your closest AWS region, use this simple CloudPing service to determine which region gives you the lowest latency.
  2. initial_queue_size=0 . This setting is used to define the initial size of your cluster. In this case it is 0 (you are free to change it accordingly to your needs). 0 means that when you first submit a job, your job will be queued in the pending state. It then moves to the running state as nodes are added to the cluster. AWS ParallelCluster by default will look at the scheduler queue every five minutes and add (or remove) nodes depending on the number of slots needed to run the pending jobs.
  3. compute_instance_type = c5n.18xlarge. This setting is used to define the instance type for the compute nodes of your cluster. This configuration file shows c5n.18xlarge. This is – at the time of writing – the most suitable instance for tightly-coupled workloads. C5n.18xlarge has the best price/performance ratio, as well as an optimal memory/core ratio and – very important – it is ready for EFA. Other suitable instances are (the newest) c5.24xlarge and c4.8xlarge – both are similarly priced, but do not support for EFA. If you want to build your mesh and need a higher memory/core ratio, m5.24xlarge or r5.24xlarge are good candidates, but at a different cost. Finally, z1d.12xlarge instances deliver the highest performance thanks to a custom Intel® Xeon® Scalable processor with a sustained all-core frequency of up to 4.0 GHz, the fastest of any cloud instance. Regardless of the instance type, our recommendation is to always choose the biggest size for every instance type. Very often, the scalability of tightly-coupled workloads is constrained by the network bandwidth (and the latency), so using the biggest size for your instances will reduce inter-node communication by using as many cores as possible from every single instance.
  4. master_instance_type = g3.4xlarge. This setting is used to define the instance type for the master node (or login node) of your cluster. In this example, we chose an instance equipped with a GPU (Nvidia M60) because we also want to post-process our data after the job is completed. Post-processing applications usually require a GPU to render heavy 3D images. If you won’t need to do any post-processing (or your post-process does not require a GPU), you could choose the same instance type as the compute nodes (maybe just a smaller size), or you could choose an instance type suitable for building your mesh (m5.24xlarge or r5.24xlarge).
  5. placement_group = DYNAMIC and placement = compute together are used to tell AWS that we want to use the cluster placement groups and that only the compute nodes need to be on the same placement group; the master does not. It’s a good practice to also have the master node in the same placement group when an NFS share is enabled, latency between the compute nodes and the master needs to be low. In our example, we are not using NFS share but FSx.
  6. extra_json = { “cluster” : {“cfn_scheduler_slots” : “cores” } } This, together with the “for” cycle you can see at the beginning of the post-install script below, is used to disable hyper-threading. The vast majority of HPC applications do not benefit from hyper-threading. However, if you disable hyper-threading without this line, SGE will not be able to correctly map slot to cores.
  7. custom_ami = ami-<AMI-ID> This setting will tell AWS ParallelCluster to use the AMI you created previously.
  8. [fsx parallel-fs] This section contains the settings to define your parallel high performance file system based on FSx .
  9. post_install = s3://<Your-S3-Bucket>/fluent-post-install.sh. This setting defines the location of a script that runs on all instances right after they have been created. Below is an example script tuned for this use case; feel free to use as it is or modify as needed:
#!/bin/bash

#source the AWS ParallelCluster profile
. /etc/parallelcluster/cfnconfig

#disable hyper-threading
for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un); do
echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done


case "${cfn_node_type}" in
MasterServer)

#save the instance type
instanceType=$(curl http://169.254.169.254/latest/meta-data/instance-type)
if [[ $instanceType == *"g3"* ]]; then

  # configure Xorg to use the Nvidia GPU with the right driver
  nvidia-xconfig -preserve-busid -enable-all-gpus

  #Configure the GPU settings to be persistent
  nvidia-persistenced

  #Disable the autoboost feature for all GPUs on the instance
  nvidia-smi -auto-boost-default=0

  #Set all GPU clock speeds to their maximum frequency.
  nvidia-smi -ac 2505,1177

else
  cd /etc/X11/
  #download a dummy xorg.conf,
  #This is used by instances without GPU
  wget -q https://xpra.org/xorg.conf 
fi

#set the system to run the graphical mode (init 5)
systemctl set-default graphical.target
#and start GDM
systemctl enable gdm.service
systemctl start gdm.service
sleep 2

#enable and start also DCV
systemctl enable dcvserver.service
systemctl start dcvserver.service
sleep 2

#create a NICE DCV session
dcv create-session --owner centos --user centos test1
echo "centos:<YourPassword>" | chpasswd

;;
ComputeFleet)

#nothing here, for now


;;
esac

Note: Replace the placeholder <YourPassword> with your own password. This password will be only used for connecting via NICE DCV. In order to connect via ssh, you still need to use the private key defined in the configuration file.

Note: Some of the services mentioned so far, FSx and C5n in particular, are extremely new services and their availability may be limited to a subset of regions. Please check the Region Table to see whether your preferred region has all the services needed. If C5n is not available, opt for C4.8xlarge or C5.18xlarge. If FSx is not available, use NFS share over EBS. Below is a sample code snippet to enable NFS share over an IO1 volume type. IO1 is an I/O-intensive, high-performance SSD volume designed to deliver a consistent baseline performance of up to 50 IOPS/GB (to a maximum of 64,000 IOPS) and provide up to 1,000 MB/s of throughput per volume (i.e., with one 1TB you can provision up to 50,000 IOPS ). You may also consider GP2 as a lower-cost alternative that offers single-digit millisecond latencies, delivers a consistent baseline performance of three IOPS/GB (minimum 100 IOPS) to a maximum of 16,000 IOPS, and provides up to 250 MB/s of throughput per volume. Learn more under EBS in the ParallelCluster documentation.

[ebs shared-fs]
volume_size = <size in GB>
shared_dir = /shared
#ebs_snapshot_id =
volume_type = io1
volume_iops = <number of IOPS>

You also need to comment out the fsx_setting parameter:

#fsx_settings = parallel-fs

and replace it with:

ebs_settings = shared-fs

When using NFS, be mindful that it has limited scalability; FSx is particularly useful when thousands of clients need to access the file system simultaneously. (Not an unusual situation if you plan to run several jobs each one using multiple nodes.)

Deploy your first cluster

Now that you have in place all the basic components to create your first AWS ParallelCluster for Ansys Fluent, you just need to upload the post-install script into your S3 bucket:

aws s3 cp fluent-post-install.sh s3://<Your-S3-Bucket>/fluent-post-install.sh

Note: Make sure that you upload the post-install script into the bucket you specified on the configuration file.

Once the post-install script is uploaded, you can create the cluster by running the command below, using the ParallelCluster configuration file defined previously:

pcluster create -c fluent.config fluent-test -t fluent_cluster -r <your-preferred-region>

Note: If this is your first test with AWS ParallelCluster and you need additional instruction on how to get started, you can refer to this blog post on getting started with ParallelCluster , and/or to the AWS ParallelCluster documentation.

Install Ansys Fluent

It is now time to connect to the master node of your cluster and install the Ansys suite. You will need to use the public IP address found in the output of the previous command.

You can connect to the master node using SSH and/or DCV.

  • via SSH: ssh -i /path/of/your/ssh.key [email protected]<public-ip-address>
  • via DCV: open the browser and connect to https://<public-ip-address>:8443 you can use “centos” as username and the password you have defined on the post-install script.

Once you are logged in, become root (sudo su - or sudo -i ) and install the Ansys suite under the /fsx directory. You can install it manually, or you can use the sample script below.

Note: We have defined the import_path = s3://<Your-S3-Bucket> in the FSx section of the configuration file. This tells FSx to preload all the data from <Your-S3-Bucket>. We recommend copying the Ansys installation files, as well as any other file or package you may need, to S3 in advance, so that all these files will be available for you under the /fsx directory of your cluster. The example below uses the Ansys iso installation files. You can use either the tar or the iso file; both can be downloaded from the Ansys website under “Download → Current Release”.

#!/bin/bash

#check the installation directory
if [ ! -d "${1}" -o -z "${1}" ]; then
 echo "Error: please check the install dir"
 exit -1
fi

ansysDir="${1}/ansys_inc"
installDir="${1}/"

ansysDisk1="ANSYS2019R2_LINX64_Disk1.iso"
ansysDisk2="ANSYS2019R2_LINX64_Disk2.iso"

# mount the Disks
disk1="${installDir}/AnsysDisk1"
disk2="${installDir}/AnsysDisk2"
mkdir -p "${disk1}"
mkdir -p "${disk2}"

echo "Mounting ${ansysDisk1} ..."
mount -o loop "${installDir}/${ansysDisk1}" "${disk1}"

echo "Mounting ${ansysDisk2} ..."
mount -o loop "${installDir}/${ansysDisk2}" "${disk2}"

# INSTALL Ansys WB
echo "Installing Ansys ${ansysver}"
"${disk1}/INSTALL" -silent -install_dir "${ansysDir}" -media_dir2 "${disk2}"

echo "Ansys installed"

umount -l "${disk1}"
echo "${ansysDisk1} unmounted..."

umount -l "${disk2}"
echo "${ansysDisk2} unmounted..."

echo "Cleaning up temporary install directory"
rm -rf "${disk1}"
rm -rf "${disk2}"

echo "Installation process completed!"

Note: If you decided to use the EBS shared option instead of FSx, once the installation of Ansys is completed, you may want to create a snapshot of your EBS volume so you can re-use it in another cluster. You can create the snapshot via web console:

  1. Open the Amazon EC2 console.
  2. Choose Snapshots in the navigation pane.
  3. Choose Create Snapshot.
  4. On the Create Snapshot page, select the volume to create a snapshot for.
  5. (Optional) Choose Add tags to your snapshot. For each tag, provide a tag key and a tag value.
  6. Choose Create Snapshot.

Or via the CLI:

aws ec2 create-snapshot --volume-id vol-xyz --description "This is the snapshot of my Ansys installation"

If you want to re-use an existing snapshot, add the following parameter in the “ebs” section of your AWS ParallelCluster configuration file:

ebs_snapshot_id = snap-XXXX

For more information, see Amazon EBS Snapshots.

Run your first Ansys Fluent job

Finally, you can run your first job by submitting the following script via qsub. Here’s an example of how to run a job using 14 C5n.18xlarge instances: qsub -pe mpi 360 /fsx/ansys-run.sh

ansys-run.shcan be something like:

#!/bin/bash

#$ -N Fluent
#$ -j Y
#$ -S /bin/bash

export [email protected]<your-license-server>
export [email protected]<your-license-server>

basedir="/fsx"
workdir="${basedir}/$(date "+%d-%m-%Y-%H-%M")-${NSLOTS}-$RANDOM"
mkdir "${workdir}"
cd "${workdir}"
cp "${basedir}/f1_racecar_140m.tar.gz" .
tar xzvf f1_racecar_140m.tar.gz
rm -f f1_racecar_140m.tar.gz
cd bench/fluent/v6/f1_racecar_140m/cas_dat

${basedir}/ansys_inc/v194/fluent/bin/fluentbench.pl f1_racecar_140m -t${NSLOTS} -cnf=$PE_HOSTFILE -part=1 -nosyslog -noloadchk -ssh -mpi=intel -cflush

Note: you may want to copy on S3 the benchmark file f1_racecar_140m.tar.gz or any other dataset you want to use, so that it’s preloaded on FSx and ready for you to use.

Conclusion

Even though this post will primarily focus on installing, setting up, and running Ansys Fluent, similar practices can be applied to run other CFD applications, as well as other applications that leverage the message passing interface (MPI) standard, such as OpenMPI or Intel-MPI. We are happy to help you by running those HPC applications on AWS on your behalf and then sharing with you our best practices, so feel free to submit your requests either via the AWS Docs GitHub repo or email.

Finally, don’t forget that AWS ParallelCluster is a community-driven project. We encourage everyone to submit pull requests or provide feedback through GitHub issues. Users’ feedback is extremely important for AWS as it drives the development of each and every service and feature!

from AWS Open Source Blog