Month: March 2019

Stuff The Internet Says On Scalability For March 29th, 2019

Stuff The Internet Says On Scalability For March 29th, 2019

Wake up! It’s HighScalability time:

Uber’s microservice Graph. Thousands of microservices. Crazy like a fox? Or just crazy? (@msuriar)

Do you like this sort of Stuff? I’d greatly appreciate your support on Patreon. I wrote Explain the Cloud Like I’m 10 for people who need to understand the cloud. And who doesn’t these days? On Amazon it has 42 mostly 5 star reviews (100 on Goodreads). They’ll learn a lot and love you for the hookup.

  • 1.5 billion: monthly What’s App users; 80 billion: docker downloads in 6 years; 1 billion: players on the App Store. 300,000 games; 13.5 billion: Voyager 1 miles from earth; 11 years: Teeny-Tiny Bluetooth Transmitter; 500 million: Airbnb guests; 12.5 million bits: information learned by average adult; 7.7%: Amazon’s share of US retail sales; 100 million: Stack Overflow monthly visitors; $156B: Consumer spending in apps across iOS and Google Play by 2023; 
  • Quotable Quotes:
    • John C. Lilly: When I say we may be our programs, nothing more, nothing less, I mean the substrate, the basic substratum under all else of our metaprograms is our programs. All we are as humans is what is built-in, what has been acquired, and what we make of both of these. So we are the result of the program substrate—the self-metaprogrammer.
    • @andrewhurstdog: I caused a Gmail outage so big it made the national news, by forgetting to dereference a pointer…Right, this comes from blameless postmortems. Firing a person doesn’t fix the problem, it removes a person that understands the problem.
    • Sean Michael Kerner: At the 2018 Dockercon conference in San Francisco, NASA engineers discussed the Double Asteroid Redirection Test (DART) mission. DART is a spacecraft that will deploy a kinetic-impact technique to deflect an asteroid that could potentially end all life on Earth. At the core of DART is a software stack that is built using Docker.
    • @bwest: Apple’s new credit card should’ve been called Hypercard and anyone that disagrees is wrong
    • Maxim Fedorov: Testing never replaces trouble shooting.
    • William R. Kerr: My analysis of employee-level U.S. Census Bureau data and qualitative interviews show that U.S. tech workers over age 40 have good reasons to be concerned about how globalization affects their career longevity. In addition to competing with greater numbers of skilled foreign workers, older tech workers are now also more likely than younger workers to lose their jobs when technical work moves overseas.5
    • @funcOfJoe: 25yrs ago: COM (focus on your biz logic) 20yrs ago: Java (focus on your biz logic) 15yrs ago: .NET (focus on your biz logic) 10yrs ago: Dynamic langs (focus on your biz logic) 5yrs ago: Microservices (focus on your biz logic) 0yrs ago: Serverless (focus on your biz logic)
    • @devfacet: Agree. We built managed database services top of Kubernetes using statefulsets. That being said I wouldn’t recommend to anyone. Operational cost is high, versioning/upgrades are painful, multi-region is hard, backups are tricky. And it’s a distraction against your real focus.
    • realusername: Android is becoming more and more useless every year, we could have a powerful computer in our pocket to do everything we want and instead we have a dumb device with a clunky system which is just good for running chat apps and small games. 
    • Shoshana Zuboff: Surveillance capitalism operates through unprecedented asymmetries in knowledge and the power that accrues to knowledge. Surveillance capitalists know everything about us, whereas their operations are designed to be unknowable to us. They accumulate vast domains of new knowledge from us, but not for us. They predict our futures for the sake of others’ gain, not ours. 
    • @mikeal: I always feel like Kubernetes wants me to be a cloud provider, and I’m like “Hey, aren’t I paying someone else to be the cloud provider?”
    • Lana Chan: What will be interesting to see in the upcoming months is how NVMe-TCP and computational storage will play out.  NVMe-TCP is the latest transport added to NVMe; PCIe, RDMA and FC.  NVMe-TCP promises to allow data centers to use their existing Ethernet infrastructure.   Overhauling existing infrastructure has been cited as potential impediments for other NVMe-oF options.    This may be the key to achieving wide adoption in the enterprise space.  Finally, with computational storage in its infancy, there is a promise to bring computation to the data.  The conversation about real-time big data analytics will change dramatically if this gets off the ground.
    • @caitie: Scaling any kind of cluster membership protocol beyond single digit thousands is currently a hard problem with Cluster Membership protocols we have today. Even in the single digit thousands 1k-5k you are going to have to have a team of folks that meticulously attend to your ETCD/ZK/cluster membership service. inally discussions of wanting to go beyond this size, rarely talk about failure domains.  Do you really want failure domains of 50k nodes, probably not.
    • @MarcJBrooker: This is a good thread. My experience has been that autonomous (i.e. hosts joining themselves) approaches stop scaling in the low thousands. Above that, having a dedicate stateful host management plane is the only successful approach I’ve seen over the long term. I’ve also found it important to separate discovery from failure detection. Discovery is a relatively slow moving property that changes intentionally (and scales O(dN/dt)). Failure detection can change results really fast, especially in the face of partitions. 2/
    • @ianmiell: My original thesis was that AWS is the new Windows to Kubernetes’ Linux. If that’s the case, the industry better hurry up with its distro management if it’s not going to go the way of OpenStack. Or to put it another way: where is the data centre’s Debian? Ubuntu?
    • Alex Guarnaschelli: This is about choices, not ability. 
    • @asymco: Taiwan’s two largest bike makers report doubling ebike shipments. Giant shipped about 385,000 e-bikes in 2018; close to doubling the number recorded a year earlier. Merida more than doubled its e-bike shipments to 143,000.
    • Doc Searls: —yet things are worse. Yet I remain optimistic. Because Cluetrain was early by (it turns out) at least two decades. And mainstream media are starting to get the clues. I know that because last week I heard from The New York Times, The Wall Street Journal, AP and an HBO show. I normally hear from none of those (or maybe one, a time or two per year).
    • Something is in the water. It’s us, and the water is still the Internet.
    • @hawkinjs: I love graphql but this is the problem – you probably don’t need it. And if you do, you probably already know it and don’t need Twitter hype to tell you. Keep it simple, until you can’t. Then think hard.
    • maxxxxx: that’s how I feel. Since working in a cube farm I am totally shot when I come home. The noise, lack of daylight and visual distractions suck all energy out of me. There is no escape from the stress while I am at work.
    • Marek Majkowski: If you are piping data between TCP sockets, you should definitely take a look at SOCKMAP. While our benchmarks show it’s not ready for prime time yet, with poor performance, high jitter and a couple of bugs, it’s very promising. We are very excited about it. It’s the first technology on Linux that truly allows the user-space process to offload TCP splicing to the kernel. It also has potential for much better performance than other approaches, ticking all the boxes of being async, kernel-only and totally avoiding needless copying of data.
    • Joel Hruska: Samsung’s memory business is under heavy pressure from declines in the DRAM and NAND market. After a burst of data center building last year, a number of semiconductor firms have predicted a relatively weak first half. But Samsung is hit coming and going by these kinds of problems. It counts companies like Apple among its major display customers, which means any slowdown in iPhone demand will hit that business. Its memory business is similarly exposed. DRAM prices are in freefall and NAND prices have dropped significantly.
    • John Hagel: in a rapidly changing world filled with uncertainty, I question whether routine tasks are even feasible, much less helpful. To the extent that routine tasks are necessary, I’ve made the case that they will be quickly taken by robots and AI – they shouldn’t be done by humans. My belief is that all workers should be focused on addressing unseen problems and opportunities to create more value.
    • @ben11kehoe: Met someone who has a rule: every container image gets rebuilt every 24 hours even if source is unchanged, and every running container gets recycled in at most 24 hours. Brilliant to link those together. Because it keeps all your dependencies fresh. Think about all the things that Lambda updates continually under your zip file. You don’t normally get that with containers. infosec is the primary reason here!
    • @Vaelec: I would say I don’t get why anybody would have disagreed with you here but I’ve had the same arguments as well on other projects and was told I simply didn’t understand things. Surprise, when we implemented multi-threading we saw an increase of 20x or more in some cases.
    • @troutman: Waiting outside a data center building in Portland, ME for the last 2.5 hours. Customer’s  servers are down. Access code for door doesn’t work. Customer escalated to DC owner engineers, tried multiple door codes. Dispatching a tech to drive an hour to let us in. Sigh.
    • Pascal Luban: The Games-on-Demand business model is interesting mainly for publishers that already own a large catalog of titles, including older ones that nobody buys anymore. Each title generates little revenue but it is their number that make it worth. And this model features another handicap: Studios cannot complement their revenue with in-app purchases or ads; Apple Arcade forbids them.
    • B-Barrington: I consider PayPal the best of the worst choices.
    • @rossalexwilson: I thought I wanted infrastructure as code, but maybe I don’t actually care about my infrastructure. I just have some code, that serves some business value, and I just want it to run somewhere. Along with being scalable and resilient to underlying failure.
    • @halvarflake: Strange question: Almost everybody we talk to has plans to move to k8s, but I have yet to personally meet someone from a company that runs a 1000+ node cluster for production. Are there any blogs/talks by people that do? The discrepancy is striking.
    • @tmclaughbos: I just had one of those “I get serverless” moments: Don’t code if you don’t have to. I had a simple task, check the state of a resource and send an alert if the value changed. We’ve all done this before. I started building a Lambda function to check the resource state and then alert Pagerduty if the state changed. For the pagerduty part, I immediately went to the API documentation and then looked at the Python module. I was all set to write some code. When I looked at the integration setup, I noticed I could just use AWS cloudwatch. I read some more documentation and realized I didn’t need to write code to alert pagerduty. All I needed to do was create a cloudwatch alarm or cloudwatch event rule, and send events to an SNS topic with an endpoint, given to me by pagerduty, subscribed to it. My code is so simple now, it just records a state change to cloudwatch. And the event generated by that just travels through resources I set up in cloudformation, all the way to pagerduty where it alerts me.
  • Tired: webhooks. Wired: all-in. 
    • @tmclaughbos: I also wonder how many companies should be offering more than just a webhook URL and also an SNS topic subscription endpoint. What I’m realizing today is I’ve still been in the regular mindset of building serverless apps similarly to how I would have built microservices.  I 1:1 map old patterns to new services.
    • What do you give up? Any idea of abstraction, that layer of an indirection that allows you to transparently change implementations. You’re all in as must be all the tools you want to use.
    • ranman/awesome-sns: A curated list of useful SNS topics.
  • Tired: REST. Wired: GraphQL
    • @JoeEmison: if we use GraphQL instead of REST (or SOAP), we can represent our API calls in the exact same way we handle state/data/objects in code. Adding fields, subobjects, etc can be done in one place and everything else just works….except in the RDBM. Once we shift to a data store that similarly stores data in the way it exists in code and also in our GraphQL calls, we truly can define data structure in one place (the GraphQL schema) and everywhere in our application/systems will always reference it the same way. It massively reduces complexity, cost of changes, interdependencies, regressions, etc.
    • Why do you give up? Simplicity. GraphQL cuts a complex vertical slice through your system. New APIs, new descriptors, new tooling, new ways of thinking, new lots of things. 
  • Google is once again on the right side of a growth curve. This time it’s bandwidth instead of web pages. Stadia is Google’s new streaming video service. It will consume a hefty 25 Mbps of bandwidth to deliver 1080p at 60 FPS. Impressive, but global bandwidth is still under 20 Mbps. Remember when Netflix jumped from delivering DVDs to streaming video? People said it wouldn’t work. The network couldn’t handle it. In short order bandwidth increased and Netflix rode the bandwidth curve to success. Sound familiar? 
  • Build platform advantage. Bundle. Unbundle. Bundle. Unbundle. Extend platform advantage. That’s the new cycle of content in the age of platforms. All the New Services Apple Announced.  
  • Tired of the simplistic OOP vs functional way of looking at the world? You’ll love Overwatch Gameplay Architecture and Netcode. It does a deep dive on a sophisticated modeling concept that takes years of experience to develop and truly appreciate. Overwatch is organized around an ECS architecture: Entity, Component, System. A world is a collection of systems and entities. An entity is an ID that corresponds to a collection of components. Components store game state and have no behaviours. Systems have no behaviours and store no game state. The breakthrough is realizing identity is primary and is fundamentally relational—separate from both state and behaviour. State and behaviour serve identity, not the other way around. In a game this separation becomes clear whereas in typical software it’s hard to disentangle. The example is imagine a cherry tree in your front yard. A tree means something subjectively different to you as the owner, to a bird, a gardener, a property assessor, or a termite. Each observer sees different behaviour in the state that describes the tree. The tree is a subject that is dealt with differently by various observers. Boom. That’s all of software. You model behaviours through subjective experiences, yet still relate them all together by the concept of identity. 
  • If a picture is worth a thousand words then Jerry Hargrove’s awesome diagram of AWS App Mesh will save a lot of reading. Need more? Read Werner Vogels classic new service style post: Redefining application communications with AWS App Mesh. There’s still not a lot of comments on this service yet. It appeals to organizations far up the microservices adoption path, so it may take a while. shubha-aws: we built App Mesh to enable customers to use microservices in any compute service in AWS – be it ECS, Fargate, EKS or even directly on EC2. You configure capabilities using APIs and App Mesh configures Envoy proxies deployed with your pods. @nathankpeck: ECS Service Discovery + Cloud Map is basically the underlying foundation. It gives you the list of other container to connect to but it stops there, and you have to implement the rest of the client side load balancing, retries, SSL in transit, etc yourself. App Mesh is the layer that adds the extra intelligence on top, such as ability to detect a failed request and retry it, distribute requests according to your desired rules/patterns, and its the layer where we will be implementing other features. In general I’d say use the raw underlying ECS service discovery and cloud map if you want to build your own service mesh logic, but use App Mesh if you just want a service mesh that works out of the box and lets you focus on your own application
  • When a limit of 1.8 million new connections per hour per region is a deal breaker you know “at scale” is no lie. Is AWS ready to provide serverless WebSockets at scale? AppSync: limitations on authentication creates a security risk and the requirement for two requests prevents caching. Finally, using the preferred database DynamoDB presents further complexity and the solution is too expensive to use at scale. Websockets: doesn’t requirements for scale or broadcasting to millions of clients. The websocket limitation is an interesting one, you often want to use pub/sub as a command and control bus, so you really want to send a message to everyone with one simple call. That should be doable. So they are building their own. 
  • How do you transition from mass marketing to mass personalization? To do that, you’ve really got to unlock the data within that ecosystem in a way that’s useful to a customer.  McDonald’s Acquires Machine-Learning Startup Dynamic Yield for $300 million. Weird fit or just the future arriving right on time?
    • jhayward: It’s mentioned in the article, but much of what drives McDonald’s profitability is their supply chain and logistics, and r&d around food prep/meal production. There are huge dollar volumes there at very low margins and it is an area of great advantage (not only in cost, but in what it’s possible to do product-wise) if done well. It’s a perfect area for data science / ML type applications. 
    • pionar: For companies like these, that operate in 10’s of thousands of franchisee-owned locations, the number of products and the combination of configurations is not merely a function of the things you see on the menu. It’s all of those products, with their different combination of parts (beef patties, lettuce, etc.), and then franchisee and regional variations (In some countries, you can’t tell a franchisee what they can or can’t sell, etc.) Add on top of that customer modifications to the product in their order (extra pickles, no onions, etc.) Why does this matter? It drives lots of stuff – food costs, inventory, inventory & sales forecasting (how many pickles do I need this week?) new item research, profit margins, etc. So you take this, multiply it by 10,000, 15,000, or, in McD’s case, 36,000 stores across 100+ countries (half both those numbers in my company’s case) and you’re talking about vast amounts of information across millions of transactions every day, and that is in fact Big Data.
    • Domino’s generates over 60% of sales via digital channels. berbec: It was a nightmare. We all had a deadline to book these installers for the Cisco VPNs and VM servers. IIRC IBM wrote the code, but Domino’s retained the rights. It was a great time for the company. Total 180 on quality, investing heavily in the right tech. 3 years after we installed the server & thin clients all around, 33% of orders and 50% of revenue was online. Online sales drove order frequency, ticket price and customer satisfaction while lowering costs. It was such a genius move. Source: I was a Domino’s GM and franchise for 17 years and saw this transition.
  • The most interesting part of this story is how the change to serverless was made incrementally. They were able to replace parts of their system over time and learn along the way. The Journey to 90% Serverless at Comic Relief. When a lot of people were let go they needed a simpler architecture, so serverless made sense. They also had spikey load patterns. So do you really want to keep a fleet of varnish servers to handle the occasional load of 10’s of thousands of requests a second? Of course not. Serverless again makes sense. They had outsourced part of the donation system and now were ablet to bring it back in-house: “Users would trigger deltas as they passed through the donation steps on the platform, these would go into an SQS queue, and then an SQS fan out on the backend would read the number of messages in the queue and trigger enough lambda’s to consume the message, but most importantly not overwhelm the backend services/database. The API would load balance the payment service providers (Stripe, Worldpay, Braintree & Paypal), allowing us to gain redundancy and reach the required 150 donations per second that would safely get us through the night of TV (it can handle much more than this).”  A combination of Sentry IOPipe provided a 360 view of errors. A regional AWS Web Application Firewall (WAF) was added to all endpoints, introducing some basic protections before API Gateway was even touched.
  • The great migrations are no longer geographical, they are from platform to platform. Warning: expect projectile emoji vomiting. Why we migrated 😼 inboxkitten (77 million serverless API requests) from 🔥 Firebase to ☁️ Cloudflare workers & 🐑 CommonsHost: What if it bills only per request, or only for the amount of CPU and ram used. Cloudflare worker, which is part of the growing trend of “edge” serverless computing. Also happens to simplify serverless billing down to a single metric. Turning what was previously convoluted, and hard to decipher bills from GCP…Into something much easily understood, and cheaper overall per request …😸 [Bill went from $112 to $39] That’s enough net savings for 7 more $9.99 games during summer sale! And has a bonus benefit of lower latency edge computing! Each request is limited to < 5ms of CPU time. Incompatibility with express.js (as it uses web workers pattern). Another thing to note is that Cloudflare workers are based on the web workers model, where it hooks onto the “fetch” event in cloudflare like an interceptor function. One script limitation per domain (unless you are on enterprise). While not a deal breaker for inboxkitten, this can be one for many commercial / production workload. Because Cloudflare serverless packages cannot be broken down into smaller packages for individual subdomains and/or URI routes. This greatly complicates things, making it impossible to have testing and production code separated on a single domain, among many more complicated setups.
  • Ids should never be 32 bits. They wrap. And you won’t handle the wrapping properly. Really. What We Learned from the Recent Mandrill Outage: In practice it’s more complicated, but the important detail is that the XID is a globally incrementing counter critical to the operation of the database. The feature that Postgres uses to combat this issue is a daemonized process called auto_vacuum which runs periodically and clears out old XIDs, protecting against wraparound. Tuning this is important, as there can be significant performance 
  • That’s the promise of software, software can always get better…or worse. @bensprecher: What other car company on Earth says, 6 months after you bought a car, “oh, hey, our engineers figured out how to squeeze 5% more performance out of the *existing vehicle* you own, here’s a free OTA software update for that”?
  • The China Study and longevity. The important part here isn’t about the conclusion, the important part is the power of open data. Trial data from studies should have their data open so anyone can perform an analysis. For example, much of the advice about statins is from studies where the data is not available. Global policies are being made impacting millions. Are we just supposed to trust the people who won’t release the data they base their often self-serving conclusions? No. All data used to create public policy should be public. An even bigger question: should the data used by tech to manipulate us be kept private?
  • Great picture illustrating how a monoculture fails. Your blast radius has no containment. Fleet of Southwest 737 jets spotted in Victorville after FAA grounded planes following two deadly crashes.
  • Who would design a system with a single point of failure? Indeed. A person would not, but somehow organizations do. Software Won’t Fix Boeing’s ‘Faulty’ Airframe. But this is BS: “Ultimately, Travis also bemoans what he calls “cultural laziness” within the software development community that is creeping into mission-critical systems like flight computers. By laziness, I mean that less and less thought is being given to getting a design correct, and simple – up-front”  The design goal wasn’t to create a safe plane. The design goal was to create the lowest cost option that would pass certification. Do you think programmers came up with that goal? Do you think programmers let it pass? Do you think programmers chose to make the second MCAS optional? Do you think programmers chose not to have a third MCAS for triple modular redundancy
  • @awscloud: Application Load Balancers now support advanced request routing based on standard or custom HTTP headers & methods, query parameters & source IP addresses. @rchrdbyd: AWS is soooo close to bringing cell-based architectures to the masses. Cell Architectures
  • The problem with a saying like “Security needs to become everyone’s job” is it is verbless. There’s no action. There’s no story of what it means or how to do it. It’s like saying “Being good is everyone’s job.” What does it mean to be good? How does one become good? How do you know you are good? Until security becomes a verb we will not be secure…or good.
  • 2019 SRE Report. 49% worked on incident last week with 4% working on over 10 incidents a week, 92% work on less than five incidents a week. 79% report having stress. I tend to think 21% are lying, but maybe they have an advanced meditation practice. 69% don’t think their company cares about their stress. 30% of work is maintenance tasks. Only 10% strongly agree automation has been used to reduce toil. 
  • ilhaan/kubeCDN (article): A self-hosted content delivery network based on Kubernetes. Easily setup Kubernetes clusters in multiple AWS regions and deploy resilient and reliable services to a global user base within minutes.
  • comicrelief/lambda-wrapper: When writing Serverless endpoints, we have found ourselves replicating a lot of boiler plate code to do basic actions, such as reading request variables or writing to SQS. The aim of this package is to provide a wrapper for our lambda functions, to provide some level of dependency and configuration injection and to reduce time spent on project setup.
  • infinimesh/infinimesh: an opinionated Platform to connect IoT devices securely. It exposes simple to consume RESTful & gRPC APIs with both high-level (e.g. device shadow) and low-level (sending messages) concepts. Infinimesh Platform is open source and fully cloud native. No vendor lock-in – run it yourself on Kubernetes or use our SaaS offering (TBA).

from High Scalability

Dockerizing Java MicroProfile Applications

Dockerizing Java MicroProfile Applications

For cloud-native applications, Kubernetes and Istio deliver a lot of important functionality out of the box, like ensuring resiliency and scalability. This functionality works generically for microservices, no matter in which language they have been implemented and independent from the application logic.

Some cloud-native functionality, like application-specific failover functionality, metrics, and fine-grained authorization, cannot be handled by Kubernetes and Istio, since it needs to be handled in the business logic of the microservices.

MicroProfile

That’s why I started to look into Eclipse MicroProfile which is an extension to JavaEE to build microservices-based architectures and a great programming model for Istio. In addition to the application-specific logic that Istio cannot handle, it also comes with convenience functions that you typically need when developing microservices, like invoking REST APIs and implementing REST APIs including their documentation.

There is a MicroProfile Starter that includes several simple samples for MicroProfile functionality. In order to try these features in Istio, I’ve started to create a simple sample application.

Get the code of the cloud-native starter sample.

Container

The first thing you need to run MicroProfile applications on Kubernetes is an image. MicroProfile is supported by several Java application servers, different JVMs can be used and there are different versions of all of these components. Because of this, there are many different images you need to choose from.

I looked for an image that contains only components that are available as open source. Here is my Open Liberty server.xml file and this is how my Dockerfile looks like:

FROM openliberty/open-liberty:microProfile2-java8-openj9
ADD https://github.com/WASdev/sample.opentracing.zipkintracer/releases/download/1.2/liberty-opentracing-zipkintracer-1.2-sample.zip /
RUN unzip liberty-opentracing-zipkintracer-1.2-sample.zip -d /opt/ol/wlp/usr/ \
 && rm liberty-opentracing-zipkintracer-1.2-sample.zip
COPY liberty/server.xml /config/
ADD target/articles.war /config/dropins/

The image contains these components:

There are several different images available for Open Liberty. I picked a community image since it comes with OpenJ9 instead of the IBM version of OpenJ9. Unfortunately, that image doesn’t seem to support MicroProfile 2.2 yet (at least I haven’t found it).

Additionally, I download and copy a file needed for Zipkin tracing onto the image which you need to do manually at this point if you want to use the tracing functionality built into MicroProfile. This functionality is pretty useful since it allows you to see the chains of invocations between microservices.

This screenshot shows the Jaeger dashboard. The BFF (backend for frontend) ‘web-api’ microservice invokes another ‘articles’ service:

Variations of the Dockerfile

In order to avoid downloading the Zipkin file every time a new image is built, I’ve created a slightly different Dockerfile where the file is added from a local directory. The image is built with a script that downloads the file if it doesn’t exist locally. Alternatively, you can download the file via Maven (check out the example pom.xml and example Dockerfile).

Additionally, I have created another variation of the Docker image so that my sample application can be installed even by people who don’t have Java and Maven installed locally (or who have wrong Java/Maven versions). This Dockerfile uses a multistage build.

FROM maven:3.5-jdk-8 as BUILD
COPY src /usr/src/app/src
COPY pom.xml /usr/src/app
RUN mvn -f /usr/src/app/pom.xml clean package

FROM openliberty/open-liberty:microProfile2-java8-openj9
ADD liberty-opentracing-zipkintracer-1.2-sample.zip /
RUN unzip liberty-opentracing-zipkintracer-1.2-sample.zip -d /opt/ol/wlp/usr/ \
 && rm liberty-opentracing-zipkintracer-1.2-sample.zip
COPY liberty/server.xml /config/
COPY --from=BUILD /usr/src/app/target/articles.war /config/dropins/

Sample Application

If you want to try MicroProfile on Istio, use the sample application, set up a local development environment, make sure you have installed all necessary prerequisites and run these commands:

$ git clone https://github.com/nheidloff/cloud-native-starter.git
$ scripts/check-prerequisites.sh
$ scripts/deploy-articles-java-jee.sh
$ scripts/deploy-web-api-java-jee.sh

from DZone Cloud Zone

Docker Container Resource Management: CPU, RAM and IO, Part 1

Docker Container Resource Management: CPU, RAM and IO, Part 1

This tutorial aims to give you practical experience of using Docker container resource limitation functionalities on an Alibaba Cloud Elastic Compute Service (ECS) instance, including:

  • CPU quotas
  • RAM quotas
  • IO bandwidth quotas

Prerequisites

You need access to an ECS server with a recent version of Docker already installed. If you don’t have one already, you can follow the steps in this tutorial.

These resource limit tests use 20-30 MB of RAM, so even a server with only a total RAM of 512MB will do.

The CPU tests are done on a server with only 2 cores. You will get more interesting results — for one of the tests — if your server has 4 cores or more. Some of the CPU tests hog all CPUs for 15 seconds. It would be great for your teammates if you did this tutorial directly on your computer and not on the shared development server.

I am writing this tutorial using CentOS. You can use Debian/Ubuntu. 99% of this tutorial will work on any Linux distro since it mostly uses Docker commands.

You need a very basic understanding of Docker, images, containers and using  docker run  and docker ps -a .

Clean Up Preparation

It will really help if you have only a few (preferably no) containers running. That way you can easily find your tutorial container in  docker ps -a  output lists.

So stop and prune all the containers you do not need running.

You can quickly do that (in your development environment) using:

docker stop $(docker ps -a -q) #stop ALL containers

To now remove all containers, run

docker rm -f $(docker ps -a -q) # remove ALL containers

––memory-reservation

From https://docs.docker.com/config/containers/resource_constraints/

Allows you to specify a soft limit smaller than --memory   which is activated when Docker detects contention or low memory on the host machine. If you use  --memory-reservation , it must be set lower than --memory   for it to take precedence. Because it is a soft limit, it does not guarantee that the container doesn’t exceed the limit.

I am running this on a 1 GB RAM server.

Let’s run 5 containers each reserving 250 MB of RAM.

docker container run -d --memory-reservation=250m --name mymem1 alpine:3.8 sleep 3600
docker container run -d --memory-reservation=250m --name mymem2 alpine:3.8 sleep 3602
docker container run -d --memory-reservation=250m --name mymem3 alpine:3.8 sleep 3603
docker container run -d --memory-reservation=250m --name mymem4 alpine:3.8 sleep 3604
docker container run -d --memory-reservation=250m --name mymem5 alpine:3.8 sleep 3605

All containers are running even though I over-reserved RAM by 250 MB. So this is pointless: reservations that do not reserve, and do not prevent over-reservations.

If you run top  you will see no virtual RAM allocated. This setting is internal to Docker.

  PID USER        VIRT    RES    SHR S %MEM     TIME+ COMMAND
  933 root      967.4m  86.0m  24.3m S  8.7   0:55.87 dockerd
  940 root      582.0m  36.3m  12.3m S  3.7   0:46.50 docker-containe
13422 root        8.7m   3.3m   2.5m S  0.3   0:00.02 docker-containe
13309 root        7.3m   3.0m   2.3m S  0.3   0:00.02 docker-containe
13676 root        7.3m   2.9m   2.2m S  0.3   0:00.01 docker-containe
13540 root        7.3m   2.8m   2.1m S  0.3   0:00.01 docker-containe
13793 root        8.7m   2.7m   2.1m S  0.3           docker-containe

 docker stats  does not show RAM reservations.

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
a1a4bd1c226b        mymem5              0.00%               1.086MiB / 985.2MiB   0.11%               578B / 0B           1.19MB / 0B         0
9ced89c63a7e        mymem4              0.00%               1.105MiB / 985.2MiB   0.11%               648B / 0B           1.19MB / 0B         0
696f1cef7d57        mymem3              0.00%               1.113MiB / 985.2MiB   0.11%               648B / 0B           1.19MB / 0B         0
77d61012b5fd        mymem2              0.00%               1.086MiB / 985.2MiB   0.11%               648B / 0B           1.19MB / 0B         0
fab3faa6d23d        mymem1              0.00%               1.043MiB / 985.2MiB   0.11%               648B / 0B           1.19MB / 0B         0
docker ps -a

Shows all 5 containers running successfully.

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
a1a4bd1c226b        alpine:3.8          "sleep 3605"        2 minutes ago       Up 2 minutes                            mymem5
9ced89c63a7e        alpine:3.8          "sleep 3604"        4 minutes ago       Up 4 minutes                            mymem4
696f1cef7d57        alpine:3.8          "sleep 3603"        5 minutes ago       Up 5 minutes                            mymem3
77d61012b5fd        alpine:3.8          "sleep 3602"        6 minutes ago       Up 6 minutes                            mymem2
fab3faa6d23d        alpine:3.8          "sleep 3600"        8 minutes ago       Up 8 minutes                            mymem1

We are finished with these containers. We can stop and then prune them.

docker container stop mymem1 -t 0
docker container stop mymem2 -t 0
docker container stop mymem3 -t 0
docker container stop mymem4 -t 0
docker container stop mymem5 -t 0
docker container prune -f 

––memory and –memory-swap (No Swapping Allowed)

From https://docs.docker.com/config/containers/resource_constraints/

  •  -m  or  --memory= : The maximum amount of memory the container can use. If you set this option, the minimum allowed value is 4m (4 megabyte).
  •  --memory-swap : The amount of memory this container is allowed to swap to disk.
  • If --memory-swap   is set to the same value as  --memory , and --memory   is set to a positive integer, the container does not have access to swap

We are now testing no swapping allowed.

We need a tool to carefully allocate RAM on a MB-by-MB basis — so that we can carefully just overstep our defined RAM limits. I decided on Python. (You do not need to know Python to understand its 4 lines of code used here.)

In the second part of this tutorial, we will use actual benchmark tools.

Download the Python Docker image if you do not already have it:

docker pull python:3-alpine 

Run our container, limiting RAM: --memory=20m --memory-swap=20m  

docker container run -d --memory=20m --memory-swap=20m --name myPython python:3-alpine sleep 3600
docker exec -it myPython /bin/sh        

At the shell prompt, enter python3  to enter the interactive Python editor. Cut and paste the code below. In Python spaces has syntax meaning, so be careful not to add any spaces or tabs to the code.

longstring = []
for x in range(17):
    len(longstring)
    longstring.append('1' * 10**6)

Press Enter to exit the for statement block. This will run the code.

Expected output :

>>> for x in range(17):
...     len(longstring)
...     longstring.append('1' * 10**6)
...
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Killed

We allocated 20 MB RAM to this container. Python uses 5 MB. The for loop gets killed when it tries to append 16 MB of ‘1’ characters to the longstring variable.

Three things of note:

  • RAM allocations within limit of 20 MB worked
  • RAM allocation that exceeded limit got killed
  • No swap used: allocations did not quietly continue to work by using swap

Summary: --memory   and --memory-swap   (No swapping allowed) works when both are set to the same value. Based on your knowledge of the applications running in your containers you should set those values appropriately.

We are finished with this container. You can stop and prune it.

docker container stop myPython
docker container prune -f 

–memory and –memory-swap (Swapping Allowed)

By specifying --memory=20m   and --memory-swap=30m  we allow 10 MB of swap.

Let’s see how that works:

docker container run -d --memory=20m --memory-swap=30m --name myPython python:3-alpine sleep 3600

docker exec -it myPython /bin/sh             

At the shell prompt, enter  python3  to enter interactive Python editor. Cut and paste the code below. In Python spaces has syntax meaning, so be careful not to add any spaces or tabs to the code.

longstring = []
for x in range(24):
    len(longstring)
    longstring.append('1' * 10**6)

Press Enter to exit the for statement block. This will run the code.

Expected output:

0 to 24 shown ... no killing

5 MB RAM used by Python. 25 MB RAM allocated above with no errors.

We specified: --memory=20m --memory-swap=30m  

We just used 30 MB, meaning 10 MB is swapped. Let’s confirm by running top in another shell.

top - 13:20:38 up  4:41,  2 users,  load average: 0.11, 0.05, 0.06
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.3 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  985.219 total,  466.879 free,  190.812 used,  327.527 buff/cache
MiB Swap: 1499.996 total, 1490.078 free,    9.918 used.  618.730 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND             SWAP
  933 root      20   0  967.4m  91.5m  24.3m S        9.3   0:45.46 dockerd
  940 root      20   0  579.9m  33.1m  12.3m S   0.3  3.4   0:36.73 docker-containe
11900 root      20   0  253.5m  19.1m  10.5m S        1.9   0:00.25 docker
11941 root      20   0   39.1m  17.4m        S        1.8   0:00.39 python3             9.5m

As expected: 10 MB swap used. (You will have to show the SWAP field in top.)

Let’s carefully try to use 2 MB more RAM – container should run out of RAM.

Cut and paste this in Python editor. Press Enter to run.

longstring = []
for x in range(26):
    len(longstring)
    longstring.append('1' * 10**6)

Expected output :

it gets killed

We are finished with this container. You can stop and prune it.

docker container stop myPython
docker container prune -f 

Summary: --memory   and --memory-swap  (swapping allowed) works when --memory-swap   is larger than  --memory .

Limits enforced perfectly.

You need to specify appropriate limits for your containers in your production environment.

Investigate current prod system RAM usage. Define limits according to those, adding a large margin for error, but still preventing runaway containers from crashing the prod server.

–oom-kill-disable

So far the automatically enabled out-of-memory functionality killed our runaway Python program.

Let’s see what happens if we disable it.

Note the  --oom-kill-disable  below:

docker container run -d --oom-kill-disable --memory=20m --memory-swap=30m --name myPython python:3-alpine sleep 3600

Enter our unsuspecting container:

docker exec -it myPython /bin/sh   

Enter python3 editor, paste that code, press ENTER to run it.

python3    
a = []
for x in range(26):
    len(a)
    a.append('1' * 10**6)

The container hangs.

Run top   in another shell console:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND             SWAP
12317 root      20   0   41.0m  17.6m   0.0m D        1.8   0:00.32 python3            10.7m

Our container is in state D, uninterruptible sleep

In another shell:

docker exec -it myPython /bin/sh  

It hangs, too.

Let’s use another shell to get our hanging container’s PID so that we can kill it:

docker inspect myPython

Get the PID.

Use top  or  kill -9 your-PID  to kill it.

Conclusion:

Do not use  --oom-kill-disable 

Your hanged shells now have a Linux prompt back. You can exit those.

––cpu-shares

From https://docs.docker.com/config/containers/resource_constraints/#cpu

 --cpu-shares : Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles.
This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares   does not prevent containers from being scheduled in swarm mode.
It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.

The plan: run 3 containers providing them with 100, 500 and 1000 CPU-shares.

The following is a terrible test. Carefully read above descriptions again, then read the next 3 commands and see if you can determine why this will not clearly show those CPU proportions allocated correctly.

Please note these CPU tests assume you are running this on your own computer and not on a shared development server. 3 tests hog 100% CPU for 20 seconds.

Later in this tutorial series, we will do these tests using our own bench container using actual Linux benchmark tools. We will specifically focus on running these CPU hogs for very short runtimes and still get accurate results. However, please read and follow these CPU tests so that you can learn to get a feeling of how wrong and slow this quick hack testing is.

Note that dd, urandom and md5sum are not bench tools either.

The problem is not the dd or its timing.

Our CPU stress application: time dd if=/dev/urandom bs=1M count=2 | md5sum  

Benchmark explanation:

  • time … measures elapsed time: shows those 3 timer lines
  • dd if=/dev/urandom bs=1M count=2 … copies bs=blocksize one MB of randomness twice
  • md5sum … calculates md5 security hashes ( give cpu a load )

Let’s run it and investigate the results:

docker container run -d --cpu-shares=1024 --name mycpu1024 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=500 --name mycpu500 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=100 --name mycpu100 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'

Let’s investigate the logs to determine runtimes:

docker logs mycpu1024
docker logs mycpu500
docker logs mycpu100

Expected output :

docker logs mycpu1024
real    0m 15.29s
user    0m 0.00s
sys     0m 14.51s

docker logs mycpu500
real    0m 18.65s
user    0m 0.00s
sys     0m 15.28s

docker logs mycpu100
real    0m 23.28s
user    0m 0.00s
sys     0m 13.09s

Note all containers used about the same sys cpu time — understandable since they all did the exact same work.

–cpu-shares=100 clearly takes longer, but –cpu-shares=500 only slightly slower than –cpu-shares=1024

The problem is that –cpu-shares=1024 runs very fast, then exits.

Then –cpu-shares=500 and –cpu-shares=100 has full access to CPU.

Then –cpu-shares=500 finishes quickly since it has most CPU shares.

Then –cpu-shares=100 finishes quickly since it has most CPU shares — NOTHING else is running.

Consider this problem and how you could solve it.

Figure it out before reading further.

You are welcome to test your solution.

My solution: All 3 these containers must run in parallel all the time. The CPU-shares work only when CPU is under stress.

mycpu1024 – count must be set 10 times that of mycpu100
mycpu500 – count must be set 5 times that of mycpu100

This way all 3 containers will probably run roughly same times — based on their CPU-shares they all got CPU-share-appropriate similar workload.

Then divide mycpu1024 runtime by 10 — it got 10 times the workload of mycpu100
Then divide mycpu500 runtime by 5 — it got 10 times the workload of mycpu100

It should be very obvious that Docker divided the CPU-shares appropriately.

Busy Docker administrator shortcut/quick method: Submit all the above containers to run again.

Have the following ready to run as well.

 --cpu-shares=250  and --cpu-shares=200 containers  

Then in another shell run  docker stats  and press ctrl C to freeze the display.

It should be obvious the CPU-shares got allocated correctly.

Clean up containers:

docker container prune -f 

–cpu-shares Identically Allocated

 --cpu-shares : Set this flag to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles.

This means that equal CPU-shares setting would mean equal CPU shares.

Let’s have 3 containers running, all with CPU-shares = 1024.

docker container run -d --cpu-shares=1024 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=1024 --name mycpu1024b alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=1024 --name mycpu1024c alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'

Run:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
c4625617f339        mycpu1024c          63.79%              1.262MiB / 985.2MiB   0.13%               648B / 0B           1.33MB / 0B         0
44362316e33a        mycpu1024b          68.44%              1.254MiB / 985.2MiB   0.13%               648B / 0B           1.33MB / 0B         0
a704aca5c0d7        mycpu1024a          66.27%              1.254MiB / 985.2MiB   0.13%               648B / 0B           1.35MB / 0B         0

As expected, all 3 containers get same percentage CPU times.

docker logs mycpu1024a
docker logs mycpu1024b
docker logs mycpu1024c

Just to confirm that they all ran the same elapsed times

docker logs mycpu1024a
real    0m 21.25s
user    0m 0.00s
sys     0m 14.72s

docker logs mycpu1024b
real    0m 22.53s
user    0m 0.00s
sys     0m 15.21s

docker logs mycpu1024c
real    0m 21.45s
user    0m 0.00s
sys     0m 15.09s

Prune the containers, we are done with them.

docker container prune -f 

from DZone Cloud Zone

Add Your Own SSL Certificates to Open Distro for Elasticsearch

Add Your Own SSL Certificates to Open Distro for Elasticsearch

Add Your Own SSL Certificates to Open Distro for Elasticsearch

Open Distro for Elasticsearch’s security plugin comes with authentication and access control out of the box. To make it easy to get started, the binary distributions contain passwords and SSL certificates that let you try out the plugin. Before adding any of your private data, you need to change the default passwords and certificates. In a prior post, we showed how you can change your admin password in Open Distro for Elasticsearch. In this post we cover changing your SSL certificates.

To change your SSL certificates, you’ll copy the certificate files into the distribution and modify your elasticsearch.yml to use them. I’ll cover changing certificates for Elasticsearch’s node-to-node communication, REST APIs, and Kibana’s back-end communication to Elasticsearch. I’ll cover both the RPM and Docker distributions of Open Distro for Elasticsearch.

Collect Files

Before you can change the certificates, you’ll need to generate (or have) the following .pem files for the certificate and key:

  • Elasticsearch admin
  • Elasticsearch node
  • Kibana node
  • Certificate authority

If you want to support SSL connections to Kibana, you need to add a certificate to Kibana as well. You can use the Elasticsearch node certificate and key files for Kibana, or use separate certificates.

There are many ways that you can create the CA and certificates. You might have a certificate authority (CA) that can issue certificates in your organization. If so, use that. If you don’t have access to your own CA, you can use the demo files that ship with Open Distro for Elasticsearch. Or you can use OpenSSL, create a CA, and then create and sign certificates with your CA. In this post, I describe copying the demo files and also creating a CA and certificates with OpenSSL.

First, make a directory to hold the various assets you’re building:

$ mkdir setup-ssl

Using the demo .pem Files

Download and install the Open Distro for Elasticsearch RPM, or run Open Distro for Elasticsearch in Docker (see Get Up and Running with Open Distro for Elasticsearch for instructions on how to run Docker locally). The demo .pem files are located in different directories, depending on the distribution you’re running:

  • Docker: /usr/share/elasticsearch/config
  • RPM: /etc/elasticsearch

Copy kirk.pem, kirk-key.pem, esnode.pem, esnode-key.pem, and root-ca.pem to the setup-ssl directory.

If you’re running Docker, use:


$ docker exec <container id> cat /usr/share/elasticsearch/config/filename.pem > filename2.pem

to cat the files to your machine. Replace <container ID> with the ID from one of your Elasticsearch containers. Replace filename.pem and filename2.pem with the above files.

If you’re running the RPM, you can simply cp the files to the setup-ssl directory.

Creating a New Certificate Authority (CA), Node, and Admin Certificates

If you want to create a CA and new certificates instead, you use OpenSSL to create a local, self-signed Certificate Authority (CA). You also create server and admin certificates. Then, use your CA to sign the certificates.

To install OpenSSL, run the below commands. You can find the latest version on the OpenSSL website:


$ sudo yum -y install openssl

First, create a private key for the CA:

$ openssl genrsa -out MyRootCA.key 2048
Generating RSA private key, 2048 bit long modulus
................+++
...............................+++
e is 65537 (0x10001)

Create the CA and enter the Organization details:

$ openssl req -x509 -new -key MyRootCA.key -sha256 -out MyRootCA.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
----
Country Name (2 letter code) [AU]:GB
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:London
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example Corp
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:Example Corp CA Root
Email Address []:

For the server and admin certificates, create keys, a certificate signing request (CSR) and
a certificate signed by the CA. In the below example, I walk through the commands
for one server — “odfe-node1”. You need to repeat this process for odfe-node2,
the admin certificate, and the kibana certificate:


$ openssl genrsa -out odfe-node1-pkcs12.key 2048

IMPORTANT: Convert these to PKCS#5 v1.5 to work correctly with the JDK. Output from
this command will be used in all the config files.


$ openssl pkcs8 -v1 "PBE-SHA1-3DES" -in "odfe-node1-pkcs12.key" -topk8 -out "odfe-node1.key" -nocrypt

Create the CSR and enter the organization and server details:

$ openssl req -new -key odfe-node1.key -out odfe-node1.csr
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value, If you enter '.', the field will
be left blank.
----
Country Name (2 letter code) [AU]:GB
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:London
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example Corp
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:odfe-node1.example.com
Email Address []:
Please enter the following 'extra' attributes to be sent with your certificate request
A challenge password []:
An optional company name []:

Use the CSR to generate the signed Certificate:

$ openssl x509 -req -in odfe-node1.csr -CA MyRootCA.pem -CAkey MyRootCA.key -CAcreateserial -out odfe-node1.pem -sha256
Signature ok
subject=/C=GB/ST=Some-State/L=London/O=Example Corp/CN=odfe-node1.example.com
Getting CA Private Key

Edit elasticsearch.yml to Add Your Certificates

Now you need to use the certificates you created or copied to setup-ssl. Whether you are running the .rpm distribution of Open Distro for Elasticsearch or the Docker distribution, you’ll edit elasticsearch.yml to add the certificate information. This will enable Open Distro for Elasticsearch’s security plugin to accept SSL requests, as well as enable node-to-node SSL communication. Create a copy of elasticsearch.yml in your setup-ssl directory. You can find elasticsearch.yml in the same directory as the .pems.

Open your local copy of elasticsearch.yml with your favorite editor. You’ll see a block of settings that begins with:

######## Start OpenDistro for Elasticsearch Security Demo Configuration ########
# WARNING: revise all the lines below before you go into production
opendistro_security.ssl.transport.pemcert_filepath: esnode.pem
opendistro_security.ssl.transport.pemkey_filepath: esnode-key.pem
opendistro_security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
...

The opendistro_security.ssl.transport.* settings enable SSL transport between nodes. The opendistro_security.ssl.http.* enable SSL for REST requests to the cluster. You need to replace the values for these variables with your own certificate files.

Make sure to remove the entry:

opendistro_security.allow_unsafe_democertificates: true

to use your certificates instead of the demo certificates.

You can also further improve security by adding Distinguished Names (DN) verification settings as below. The Security plugin supports wildcards and regular expressions:

opendistro_security.nodes_dn:
    - 'CN=node2.example.com,OU=SSL,O=Example Corp,L=London,C=GB'
    - 'CN=*.example.com,OU=SSL,O=Example Corp,L=London,C=GB'
    - 'CN=odfe-cluster*'- '/CN=.*regex/'

If you are running the .rpm distribution, copy your certificates and elasticsearch.yml to the /etc/elasticsearch/config directory. Change the file names to match the names of your certificate files.

For container deployments, override the files in the container with the your local files by modifying docker-compose.yml. Open this file in your editor and locate the volumes section for both the odfe-node1 and odfe-node2 services. Add additional lines to these sections that map your local files onto the container’s file system. When you’re done, it should look like this:

version: '3'
services:
  odfe-node1:
    image: amazon/opendistro-for-elasticsearch:0.7.0
    container_name: odfe-node1
    environment:
      - cluster.name=odfe-cluster
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - odfe-data1:/usr/share/elasticsearch/data
      - ./MyRootCA.pem:/usr/share/elasticsearch/config/MyRootCA.pem
      - ./odfe-node1.pem:/usr/share/elasticsearch/config/odfe-node1.pem
      - ./odfe-node1.key:/usr/share/elasticsearch/config/odfe-node1.key
      - ./node1-elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - odfe-net
  odfe-node2:
    image: amazon/opendistro-for-elasticsearch:0.7.0
    container_name: odfe-node2
    environment:
      - cluster.name=odfe-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.zen.ping.unicast.hosts=odfe-node1
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - odfe-data2:/usr/share/elasticsearch/data
      - ./MyRootCA.pem:/usr/share/elasticsearch/config/MyRootCA.pem
      - ./odfe-node2.pem:/usr/share/elasticsearch/config/odfe-node2.pem
      - ./odfe-node2.key:/usr/share/elasticsearch/config/odfe-node2.key
      - ./node2-elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    networks:
      - odfe-net
.....

Encrypting Access to Kibana

You enable TLS/SSL encryption between the browser and Kibana server by setting the below server.ssl options in kibana.yml. The location depends on the distribution you’re running:

  • Docker: /usr/share/kibana/config
  • RPM: /etc/kibana
server.ssl.enabled: true
server.ssl.key: <full path to your key file>
server.ssl.certificate: <full path to your certificate>

If you are running the .rpm distribution, copy your certificates to the /etc/kibana/ directory and update ssl settings in kibana.yml.

For container deployments, update the Kibana section in docker-compose.yml file by adding file mappings in volumes section and SERVER_SSL options in the environment section and save the file.

.....
  kibana:
    image: amazon/opendistro-for-elasticsearch-kibana:0.7.0
    container_name: odfe-kibana
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      ELASTICSEARCH_URL: https://odfe-node1:9200
      SERVER_SSL_ENABLED: "true"
      SERVER_SSL_KEY: /usr/share/kibana/config/odfe-node2.key
      SERVER_SSL_CERTIFICATE: /usr/share/kibana/config/odfe-node2.pem
    volumes:
      - ./MyRootCA.pem:/usr/share/kibana/config/MyRootCA.pem
      - ./odfe-node2.pem:/usr/share/kibana/config/odfe-node2.pem
      - ./odfe-node2.key:/usr/share/kibana/config/odfe-node2.key
    networks:
      - odfe-net
 .....

Restart Your World

Now you need to restart Elasticsearch. In order to remove the demo certificates from the security plugin’s Elasticsearch index, you need to remove the existing volumes. From the directory that contains your docker-compose.yml, issue the following commands:

NOTE! the following commands will erase all data that you have in Elasticsearch!


docker-compose down -v
docker-compose up

You should be able to browse to https://<localhost or FQDN of kibana>:5601/. You might need to sign out of Kibana’s UI to remove any browser-cached certificates before you can log in.

To suppress security warnings in the browser, you can use its settings panel to add the self-signed MyRootCA certificate to your Trusted Certificate Authorities.

Conclusion

You have now made your Open Distro for Elasticsearch cluster even more secure by adding your own SSL certificates. Your certificates cover (optionally) communication from your browser to Kibana, communication to your Elasticsearch endpoint, and intra-cluster communication between nodes.

Have an issue or question? Want to contribute? You can get help and discuss Open Distro for Elasticsearch on our forums. You can file issues here .

Jagadeesh Pusapadi.

Jagadeesh Pusapadi

Jagadeesh Pusapadi is a Solutions Architect with AWS working with customers on their strategic initiatives. He helps customers build innovative solutions on AWS Cloud by providing architectural guidance to achieve desired business outcomes.

from AWS Open Source Blog

Enable self-service, secured data science using Amazon SageMaker notebooks and AWS Service Catalog

Enable self-service, secured data science using Amazon SageMaker notebooks and AWS Service Catalog

Enterprises of all sizes are moving to the AWS Cloud. We hear from leadership of those enterprise teams that they are looking to provide a safe, cost-governed way to provide easy access to Amazon SageMaker to promote experimentation with data science to unlock new business opportunities and disrupt the status quo. In this blog post, Veb Singh and I will show you how you can easily enable self-service, secured data science using Amazon SageMaker, AWS Service Catalog, and AWS Key Management Service (KMS).

This blog post explains how AWS Service Catalog uses a pre-configured AWS KMS key to encrypt data at rest on the machine learning (ML) storage volume that is attached to your notebook instance without ever exposing the complex, unnecessary details to data scientists. ML storage volume encryption is enforced by an AWS Service Catalog product that is pre-configured and blessed by centralized security and/or infrastructure teams. When you create Amazon SageMaker notebook instances, training jobs, or endpoints, you can specify an AWS KMS key ID and that key will encrypt the attached ML storage volumes. You can specify an output Amazon S3 bucket for training jobs that is also encrypted with a key managed with AWS KMS. You can pass in the KMS Key ID for storing the model artifacts in that output Amazon S3 bucket.

AWS Service Catalog’s launch constraint feature allows provisioning of AWS resources by giving developers and data scientists either minimum or no IAM permissions to underlying AWS services. Governed access through AWS Service Catalog enables a better security posture and limits the blast radius. AWS Service Catalog also allows the centralized infrastructure team to enforce configuration standards across AWS services, while granting development teams the flexibility to customize AWS resources by using parameters at launch time.

The following diagram shows how AWS Service Catalog ensures two separate workflows for cloud system administrators and data scientists or developers who work with Amazon SageMaker.

Depending on your role, you will perform different tasks in the workflow:

  1. Administrator: Create an AWS CloudFormation template that deploys the Amazon SageMaker notebook instance.
  2. Administrator: Create a product portfolio and a product (the Amazon SageMaker notebook instance) in AWS Service Catalog.
  3. Developer / data scientist: Discover and launch the Amazon SageMaker notebook instance.
  4. (optional) Administrator: Ensure that the notebooks are encrypted by using Amazon CloudWatch and AWS CloudTrail logs.

Step 1. Create an AWS CloudFormation template

Open a text editor or your favorite code editor, copy the following CloudFormation template, and paste it into a new file.
Note the AWS KMS key ID being used to encrypt data at rest on the ML storage volume that is attached to your notebook instance. You will replace this value with your own already-provisioned AWS KMS key for the specific AWS region.

AWSTemplateFormatVersion: '2010-09-09'
Metadata: 
  License: Apache-2.0
Description: '@Author: Sanjay Garje. AWS CloudFormation Sample Template SageMaker NotebookInstance: This template demonstrates
  the creation of a SageMaker NotebookInstance with encryption. You will be billed for the AWS resources used if you create a stack from
  this template.'
Parameters:
  NotebookInstanceName:
    AllowedPattern: '[A-Za-z0-9-]{1,63}'
    ConstraintDescription: Maximum of 63 alphanumeric characters. Can include hyphens
      (-), but not spaces. Must be unique within your account in an AWS Region.
    Description: SageMaker Notebook instance name
    MaxLength: '63'
    MinLength: '1'
    Type: String
    Default: 'myNotebook'
  NotebookInstanceType:
    AllowedValues:
      - ml.t2.medium
    ConstraintDescription: Must select a valid notebook instance type.
    Default: ml.t2.medium
    Description: Select Instance type for the SageMaker Notebook
    Type: String
  KMSKeyId:
    Description: AWS KMS key ID used to encrypt data at rest on the ML storage volume attached to notebook instance.
    Type: String
    Default: 'Replace it with your KMSKeyId'
Resources:
  SageMakerRole: 
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - "sagemaker.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
        - "arn:aws:iam::aws:policy/AmazonS3FullAccess"
        - "arn:aws:iam::aws:policy/IAMReadOnlyAccess"
  SageMakerNotebookInstance:
    Type: "AWS::SageMaker::NotebookInstance"
    Properties:
      KmsKeyId: !Ref KMSKeyId
      NotebookInstanceName: !Ref NotebookInstanceName
      InstanceType: !Ref NotebookInstanceType 
      RoleArn: !GetAtt SageMakerRole.Arn
Outputs:
  SageMakerNoteBookURL:
    Description: "URL for the newly created SageMaker Notebook Instance"
    Value: !Sub 'https://${AWS::Region}.console.aws.amazon.com/sagemaker/home?region=${AWS::Region}#/notebook-instances/openNotebook/${NotebookInstanceName}'
  SageMakerNoteBookTerminalURL:
    Description: "Terminal access URL for the newly created SageMaker Notebook Instance"
    Value: !Sub 'https://${NotebookInstanceName}.notebook.${AWS::Region}.sagemaker.aws/terminals/1'
  SageMakerNotebookInstanceARN:
    Description: "ARN for the newly created SageMaker Notebook Instance"
    Value: !Ref SageMakerNotebookInstance

Step 2. Create a product portfolio and a product for instantiating the Amazon SageMaker notebook in AWS Service Catalog

To provide users with products, begin by creating a portfolio for those products. To create a portfolio, follow the detailed instructions in the AWS Service Catalog documentation.
On the AWS Service Catalog console Create portfolio page, use the following values for creating the portfolio:

  • Portfolio name – ML Portfolio
  • Description – Machine Learning Portfolio
  • Owner – IT

Provide TagOptions details to mandate tags.

AWS Service Catalog enforces the use of mandated tags when any of the portfolio products are launched. Follow the Managing Tag options links for further details.

Create a new product using detailed instructions in the AWS Service Catalog documentation. On the AWS Service Catalog console Upload new product page, use the following values for creating the product:

  • Product name – SageMaker Notebooks
  • Description – Notebooks for Data Scientists
  • Provided by – IT
  • Vendor (optional) – Amazon Web Services

On the Enter support details page, type the following and then choose Next:

  • Email contact – valid Email address
  • Support link – http://it.org/support
  • Support description – Support details for SageMaker Notebooks

On the Version details page, choose Upload a template file, select Choose file, locate the deploy-sagemaker-notebook.template file you saved when you set up the CloudFormation template, and then choose Next:

  • Version title – 1.0
  • Description – This is the initial version of enabling SageMaker Notebooks

On the Review page, choose CREATE.

Let’s add the “SageMaker Notebooks” product to an existing product portfolio. Choose ADD PRODUCT.

Select SageMaker Notebooks and choose ADD PRODUCT.

Add the end user who needs access to this product portfolio by following steps mentioned here.

Step 3. Log in as a data scientist or a developer to launch the product

Log in as a data scientists end user and choose Products List.

Choose Launch product.

The mandatory tag Cost Center is automatically populated by AWS Service Catalog.

Tags are a very powerful feature that you can use further to optimize your costs. For example, you can write an AWS Lambda function that can stop all Amazon SageMaker notebook instances tagged as ‘dev’ at 6 PM and start them again at 8 AM every day. And you can write a Lambda function to keep them stopped over weekend. That’s cost optimization! Here is a sample example.

Review all the parameters and choose Launch.

The product is launched and its status is “In progress”. After the status changes to “Succeeded”, go to SageMaker Notebooks and you should see the newly-provisioned notebook.

Choose the notebook name to see the details.

Step 4. Validate using the AWS CloudTrail console to ensure that the AWS KMS key is used during notebook instance creation

 

Conclusion

Customers from enterprises of all sizes have asked for self-service enablement of a machine learning environment for data scientists that comes with the right level of governance. In this blog post, Veb Singh and I show you how AWS Service Catalog now provides an easy way to enforce governance and security for provisioning Amazon SageMaker notebooks. By leveraging AWS Service Catalog, cloud administrators are able to define the right level of controls and enforce data encryption along with centrally-mandated tags for any AWS service used by various groups. At the same time, data scientists can achieve self-service and a better security posture by simply launching an Amazon SageMaker notebook instance through AWS Service Catalog.

About the Authors

Vebhhav (Veb) Singh is a San Francisco-based Sr. Solutions Architect for AWS. Veb works with some of the large strategic AWS customers. He loves to play with technologies and find simple solutions for complex problems. Veb is passionate about hydroponics. Using AWS IoT, serverless, and machine learning, he has built a fully automated green house with 20+ micro controllers. Leafy greens and strawberries are available in his greenhouse all year round. He loves to travel to a new destination every year with his family.

 

 

Photo of Sanjay GarjeSanjay Garje leads US west region’s technical business development for AWS Service Catalog and AWS Control Tower. Sanjay is a passionate technology leader who takes pride in helping customers on their AWS Cloud journeys by showing them how to transform their business and technology outcomes. In his free time, Sanjay enjoys running, learning new things, teaching Cloud & Big Data technologies at SJSU and traveling to new destinations with his family.

 

 

from AWS Management Tools Blog

The AWS Toolkit for Visual Studio Code (Developer Preview) is Now Available for Download from in the Visual Studio Marketplace

The AWS Toolkit for Visual Studio Code (Developer Preview) is Now Available for Download from in the Visual Studio Marketplace

The AWS Toolkit for the Visual Studio Code is now available for download from the Visual Studio Marketplace with support for node.js. Previously, the toolkit was only available on GitHub. The AWS Toolkit for Visual Studio Code is in developer preview.

from What’s New https://aws.amazon.com/about-aws/whats-new/2019/03/the-aws-toolkit-for-visual-studio-code–developer-preview–is-now-available-for-download-from-vs-marketplace/