Category: Uncategorized

Nancy, on a Boat! (Announcing Nancy for Docker)

Nancy, on a Boat! (Announcing Nancy for Docker)

Nancy has arrived.

You may also enjoy:  Integrating Docker Solutions Into Your CI/CD Pipeline

Nancy is now wrapped up as a Docker image for execution in a pipeline or via an alias in a terminal.

Nancy is a tool to check for vulnerabilities in your Golang dependencies, powered by Sonatype OSS Index. docker-nancy wraps the nancy executable in a Docker image.

To see how Nancy will output when finding vulnerabilities, use our intentionally vulnerable repo. Check out this build on Travis-CI or this build on CircleCI.

I demonstrate how you can use docker-nancy in the video below:

Additional details can be found at GitHub. Thank you to The Lonely Island for your late-night inspiration about boats

Further Reading

Don’t Let Open Source Vulnerabilities Crawl Into Your Docker Images

Check Docker Images for Vulnerabilities With Anchore Engine

from DZone Cloud Zone

Join HashiCorp OSS projects for Hacktoberfest 2019

Join HashiCorp OSS projects for Hacktoberfest 2019

Every October, open source projects participate in a global event called Hacktoberfest. The event promotes open source and helps beginners learn to contribute. This year HashiCorp's Terraform AWS Provider and Nomad projects would like your help! We hope to spread the word about our OSS work and connect with practitioners who are interested in learning how to contribute.

What’s Hacktoberfest?

The rules are simple: submit four valid pull requests to an open project on GitHub, and your reward is a t-shirt! Last year 46,088 people completed the challenge. The four pull requests can be across any open source project. For more information, see the FAQ for Hacktoberfest. The challenge completes at the end of the month.

Hacktoberfest for HashiCorp

If you’re interested in helping with HashiCorp projects, check out the Terraform Provider for AWS and Nomad.

First, start reviewing the contribution guidelines for each project (Terraform AWS Provider Contributing guides), where you’ll find out what we look for in a pull request. Then, search open issues for something that seems interesting to try. We encourage first time contributors to find a small documentation fix or bug report to work on and submit a PR.

Don’t be afraid to ask questions! If you need some help, drop a reply to the “Hacktoberfest 2019” topic in our community forum and one of the team can help.There’s all sorts of new things to learn. We’ll review your pull request within a week, as per Hacktoberfest guidelines and give some feedback!

With Hacktoberfest, we hope to spread the word about our OSS work and connect with practitioners who are interested in learning how to contribute.

from Hashicorp Blog

Writing About Cloud [Prompts]

Writing About Cloud [Prompts]

Image title

Here are a few cloud content ideas for your consideration.

Ever struggle with what to write? No worries, we’ve got you covered. Here’s a list of cloud prompts and article ideas to help cure even the worst cases of writer’s block. So, take a moment, check out the prompts below, pick one (or more!), and get to writing.

Also, please feel free to comment on this post to bounce around ideas, ask questions, or share which prompt(s) you’re working on. 

Need help?  Here’s how to submit a post to DZone!

Prompt #1: Cloud News We Could Use

Technology in the cloud computing atmosphere is changing all the time, and there are exciting news stories and developments emerging seemingly every day. For example, this latest breakthrough in quantum computing from Google, or the shifts that are happening below the surface at Oracle.

That’s why this first prompt is dedicated to informing the DZone masses about some of the latest happenings in cloud computing. We’re asking for a roundup of some of the latest stories in cloud, at least five headlines with a paragraph or two of commentary about how they are shaping cloud computing. And if you can find any Twitter reactions to your news, that’s always great, too.

Prompt #2: Kubernetes-less Tutorials

Kubernetes is the largest and most popular container orchestrator among developers, and for good reason. But we’ve already got tons of tutorials on Kubernetes and how it works in cloud computing environments; now it’s time to let some of the other container orchestrators, like OpenShift or AWS Fargate, take a turn. 

To be clear, we’re not asking for a Kubernetes comparison article (we’ve got those in spades, too). Take your favorite container orchestration tool and tell us how it works, components and all. 

Prompt #3: You’re Doing Cloud Wrong!!

Are you tired of hearing developers who legititmately fear that the serverless overlords are coming for your job, or seeing public, private, and hybrid cloud put head-to-head when one is clearly better than the rest? We’re giving you license to rant and rave about it here. If there’s something that you need to get off your chest about how developers, engineers, or leaders are talking about an aspect of cloud computing, (respectfully) tell them about it. 

Once again, please feel free to comment on this post to share your thoughts, ideas, questions, or other article ideas on what you would like to see on DZone! The sky’s the limit!

from DZone Cloud Zone

Amazon Relational Database Service on VMware Debuts

Amazon Relational Database Service on VMware Debuts

News

Amazon Relational Database Service on VMware Debuts

Amazon Web Services (AWS) announced the general availability of Amazon Relational Database Service (RDS) on VMware, an offering first announced in September 2018.

Amazon Relational Database Service (RDS) is a managed relational database service that works on many popular database engines. AWS says it enables the setup, operation and scaling of a relational database in the cloud with just a few clicks.

In September of last year, AWS announced the preview, saying “Amazon RDS on VMware automates database management regardless of where the database is deployed, freeing up customers to focus on developing and tuning their applications.”

This week, an Oct. 16 blog post announced it’s available for production, with initial support for Microsoft SQL Server, PostgreSQL, and MySQL, providing benefits such as:

  • Compute scaling
  • Database instance health monitoring
  • Failover
  • OS and database patching
  • The ability to provision new on-premises databases in minutes
  • The ability to make backups, and restore to a point in time
  • Automated management of on-premises databases, without having to provision and manage the database engine
Amazon RDS on VMware
[Click on image for larger view.] Amazon RDS on VMware (source: AWS)

The Amazon RDS on VMware site says: “Amazon RDS provides cost-efficient and resizable capacity while automating time-consuming administration tasks including infrastructure provisioning, database setup, patching, and backups, freeing you to focus on your applications. RDS on VMware brings many of these same benefits to your on-premises deployments, making it easy to set up, operate, and scale databases in VMware vSphere private data centers.”

Prerequisites to note as presented by the blog post include:

  • Compatibility – RDS on VMware works with vSphere clusters that run version 6.5 or better.
  • Connectivity – Your vSphere cluster must have outbound connectivity to the Internet, and must be able to make HTTPS connections to the public AWS endpoints.
  • Permissions – You will need to have Administrative privileges (and the skills to match) on the cluster in order to set up RDS on VMware. You will also need to have (or create) a second set of credentials for use by RDS on VMware.
  • Hardware – The hardware that you use to host RDS on VMware must be listed in the relevant VMware Hardware Compatibility Guide.
  • Resources – Each cluster must have at least 24 vCPUs, 24 GiB of memory, and 180 GB of storage for the on-premises management components of RDS on VMware, along with additional resources to support the on-premises database instances that you launch.

RDS on VMware is available in the US East Region. More information is provided in a FAQ.

About the Author

David Ramel is an editor and writer for Converge360.

from News

Will Kubernetes Fall into the ”Shiny Things” Trap?

Will Kubernetes Fall into the ”Shiny Things” Trap?

Image title

Everything that glitters isn’t gold forever.

Everybody loves new technology. Just like that shiny toy on Christmas morning, new tech is desirable simply because it’s new.

Software marketers are keenly aware of this “shiny things” principle, and are only too happy to position the most recent addition to the product list as the must-have of the season.

Open-source efforts, of course, can be just as shiny – regardless of whether the latest and greatest is really the solution to the business problems at hand.

Over the years, every new aspect of the technology story – software, hardware, methodologies, architectural approaches – has succumbed to this shiny things syndrome, for better or worse.

Sad to say, usually for the worse. Shininess bears little correlation to the best tool for the job, and given that “new” usually means “immature,” it can often lead decision-makers astray.

Today, there’s no question that Kubernetes is so shiny, you need to wear shades just to talk about it. This open-source container orchestration platform has soundly out-shined all the competition, and is now the wunderkind of cloud-native computing – which, of course, is plenty shiny in its own right.

The question we must answer, therefore, is whether the excitement around the still-new Kubernetes (and cloud-native computing in general) has gone overboard, leading to the counterproductive shiny things scenario.

Or perhaps – just possibly – we’ve learned our lesson this time, and we’ll be able to avoid overplaying Kubernetes’s hand.

Understanding Cloud-Native Computing

The notion of cloud-native computing may be basking in Kubernetes’ reflected glow, but we must still place the platform into cloud-native’s broader context.

The core principle of cloud-native computing is to take the best practices of the cloud and extend them to all of IT. Massive horizontal scalability, elasticity, pay-as-you-go pricing, and all the other wonderfulness we enjoy with cloud computing should now be a core benefit of the rest of IT beyond the cloud, namely on-premises as well as edge computing.

Furthermore, to be cloud-native, we want to build comprehensive abstractions that let us handle all of this complex heterogeneity beneath a seamless management layer, enabling us to focus on the needs of workloads and the applications they support, regardless of the particular choice of execution environment.

Cloud-native computing thus encompasses all of hybrid IT as well as the world of containers and microservices that Kubernetes inhabits. And that’s not all – both serverless and edge computing fall under the cloud-native banner as well.

It may sound like my definition of cloud-native is so broad that it’s merely a bucket we throw everything into – but that perspective misses the key point: cloud-native brings cloud best practices to this heterogeneous mix of environments and technologies.

In other words, cloud-native hybrid IT brings the best of the cloud to the full gamut of hybrid environments, including legacy and virtualized on-premises as well as public and private clouds.

Just because you’re implementing hybrid IT, however, doesn’t mean you’re cloud native – and the same goes for serverless and edge computing as well for that matter.

Why Cloud-Native Computing Might Be Different

What does this exploration of the finer points of cloud-native computing have to do with the shiny things problem? It acts as an antidote.

Kubernetes on its own (without the rest of the cloud-native context) is plenty shiny. Developers and infrastructure engineers are only too happy to focus on all the new and cool parts of the Kubernetes ecosystem at the expense of all that dull, boring tech that came before.

But taking that shiny things path is inherently at odds with the cloud-native story. Cloud-native by design abstracts both shiny and dull, new and old, cloud and on-prem, modern and legacy.

No organization can ever become cloud-native simply by answering the question, “How can we leverage Kubernetes at scale?” Instead, people must ask questions like, “How can we bring Kubernetes, on-premises virtualization, and everything else we’re doing under a single comprehensive abstraction in order to achieve the benefits of the cloud across our entire IT landscape?”

Answering that question will lead you away from shiny things, not toward them.

Keeping the Vendors Away from Shiny

The risk of the shiny things syndrome to enterprises is choosing inappropriate, often immature technology for the wrong reason. For the vendors peddling shiny things, however, the risks are more subtle.

We could think of the “washing” metaphor (as in “cloudwashing”) as a kind of vendor-centric shiny things problem. For example, we might coin the term “Kubernetes-washing” to refer to vendors who rush to stick the term Kubernetes on their products, even though their Kubernetes support may be mostly theoretical.

In fact, “cloud-native-washing” may be a bigger problem than “Kubernetes-washing.” I’ve already seen several vendors slapping “cloud-native” on their old tech, where what they really mean is that the tech might run in the cloud, or perhaps works with software that does.

There is more to cloud-native than running in the cloud, of course. For a vendor’s product to qualify as cloud-native, the vendor must rearchitect it to take advantage of modern cloud best practices, which typically (but doesn’t necessarily) mean reworking it as containerized microservices.

But even rewriting a piece of tech as microservices isn’t sufficient. The technology must be able to support and take advantage of the comprehensive abstraction that cloud-native computing requires.

For example, there are plenty of zero-trust security solutions on the market today. But in order to be cloud-native zero-trust, a product must fully abstract the physical network, including IP addresses, VLANs, etc.

In the cloud-native world, workloads – and the people that manage them – shouldn’t have to care about such details.

The Dreaded “Embrace and Extend”

In addition to cloud-native-washing and Kubernetes-washing, furthermore, there is another nefarious side effect of the shiny things syndrome on the horizon: the dreaded “embrace and extend.”

With embrace and extend, a vendor takes some open source software and creates a proprietary offering on top of it that is incompatible with other vendors’ proprietary offerings, as well as with the plain vanilla open source product itself.

Embrace and extend was the primary tactic the vendors fought with on the browser wars battleground of the late 1990s, and has also given us multiple, generally incompatible versions of Linux.

There are indications, however, that Kubernetes may be different. True, there are already many flavors of Kubernetes out there. Red Hat (IBM), Pivotal (VMware), Google, Microsoft, and AWS to name a few all offer their own flavors of Kubernetes.

Unlike Linux and other earlier open source efforts, however, the creators of Kubernetes designed it to be extensible as opposed to customizable – giving vendors the ability to embrace and extend the core open source platform without creating versions that would be incompatible with the other versions.

But of course, vendor lock-in is the reason so many vendors leverage shininess to convince customers to buy their proprietary gear. Having a proprietary version of Kubernetes that customers get locked into might be a huge win for the commercial Kubernetes leaders.

Will they fall for their own shiny things syndrome and build incompatible versions of Kubernetes in spite of its inherent extensibility, or will they divert their eyes from the shine and build out tech that supports full compatibility with other Kubernetes flavors?

Only time will tell. Unfortunately, there are already indications that at least some of the big Kubernetes players may be falling for the shiny things syndrome.

The Intellyx Take

Enterprises who are committed to becoming cloud-native should have nothing to do with such incompatible implementations of Kubernetes (or anything else, for that matter).

After all, cloud-native means building an abstraction across all environments that supports compatibility and portability. The last thing anyone wants to do is have to extend this abstraction across different flavors of Kubernetes, simply because the vendors in question don’t want to play ball.

There is hope, of course. Enterprises vote with their wallets. If a vendor is working at cross purposes to your organization’s cloud-native priorities, simply take your business elsewhere.

The smart vendors will get your message. The others? You don’t want to do business with them anyway.

Further Reading

The Complete Kubernetes Collection [Tutorials and Tools]

What Exactly Is a Cloud-Native Application?

from DZone Cloud Zone

EC2 Scheduling Best Practices

EC2 Scheduling Best Practices

Scheduling

Save money and optimize costs with EC2 scheduling.

The “pay for what you consume” model has been favored by AWS users for as long as one can remember. Even with EC2 instances, there are certain servers that need to be kept running 24/7, usually in the production environment; but other servers in the staging and dev environments are only used at certain calculable intervals. So running all your instances at all times prohibits you from leveraging the best part of AWS. This is where basic instance scheduling comes in, where you automate the start and stop of such non-essential instances according to a schedule.

You may also enjoy:  Provision a Free AWS EC2 Instance in 5 Minutes

EC2 instance scheduling in itself has become a best practice for optimizing instance usage and costs; but we’ve realized that the process of scheduling can also intrinsically be optimized, with certain best practices to define it. The savings that you generate just through basic scheduling during regular business hours can run up to almost 70%, imagine what happens if you go a step further. Here are some best practices to follow when it comes to scheduling your instances.

1. Find the Right Instances to Schedule

The prefix to setting up a schedule is identifying the instances that need to be subjected to a schedule in the first place. As we’ve said earlier, instances being used for dev, staging, or testing purposes are ideal for scheduling. It becomes essential to ensure that no critical instances (like production) are incorrectly selected for scheduling. Certain producition instances can also be scheduled, if it’s uptime is predictable and limited.

2. Tag Correctly

Standardized naming conventions or tags for all your instances assists in being able to differentiate between production and development instances. To enable easier scheduling, it is advised to set server hostnames based on environment type (staging, test, prod, etc.). This way the non-essential resources can be filtered out easily.

Additionally, you can also attach the schedule to the tag, to ensure the right instances are selected. For instance, if you want your instance to start at 7am and stop at 6pm from Monday to Friday, your tag value could be something along the lines of “700;1800;mon-fri”.

3. Find the Ideal Scheduling Time

Once you’ve identified the instances that qualify as non-essential, it’s imperative to assign the right schedule to these instances. In most cases, you can hypothesize the running times for non-essential instances like staging and dev. Chances are, that such instances only need to be running between 8am to 5pm; at the max. These timings can be substantiated through instance usage reports, that TotalCloud can generate for you in a single workflow.

4. Go Micro on It (Auto-Scheduling)

A basic 9-hour-a-day schedule means your instances are running 45/168 hours in a week. But in reality, the actual usage will be only about 30-40%, which is 18-20 hours. This means that despite scheduling, you’re paying for more than what you use. Real-time auto-scheduling (a unique TotalCloud feature!) can assist in shutting down an instance based on actual metric usage; for instance, if you define the CPU utilization threshold to be less than 10% for a period of 30 minutes or more, the instance will shut down when it becomes true. Imagine the magnitude of savings if you schedule for 20 hours instead of 45!?

5. Implement Manual Schedule Overrides

At any given point in time, having an option to manually override a schedule counts as a safety measure. In case of any ad hoc usage changes or during maintenance activities, you can instantly start or stop the instance to maintain more uptime or downtime, as required.

6. Take Into Account Instance Restart Behavior

When an instance is restarted, certain configurations need to be accounted for.

  • IP Address: An instance’s Public IP address changes upon restart and certain hardcoded servers will be unable to find each other’s IPs after a restart. It becomes imperative to structure your AWS architecture in a way that will minimize the effect of this issue.
  • Elastic IP: If your instance is attached to an EIP, the instance needs to be reattached to the EIP upon restart (VPC instances keep EIPs associated through stop/start.)
  • ELB: If the instance was attached to an ELB, ensure that it is re-registered to the ELB upon restart.
  • ASG: Similarly, if an instance was part of an ASG, the ASG settings need to be configured to accommodate the scheduled instance

7. Enable Notifications and Approval

It serves you better to be kept informed of when an instance’s start/stop action has been executed, so you can track whether the tagging is accurate and if any changes need to be made. In certain cases, you can also set up user approval before scheduling, so you are made aware of when an instance is about to be stopped or started and can authenticate it.

In Conclusion

With cloud users leveraging every available opportunity to save money on the cloud, the importance of resource scheduling has shot up. It gets even easier when you let us completely automate it for you, from basic scheduling to auto-scheduling to fully customized scheduling. The cost you incur on us will be offset by the incredible amount of money you save on your instances. You can sign up for a free trial to find out how this works!

Further Reading

Scheduling Jobs Using PCF Scheduler

28 Actions You Can Take to Manage Cloud Costs

Hardening an AWS EC2 Instance

from DZone Cloud Zone

Azure Cosmos DB Change Feed: A Zero Downtime Data Migration Story

Azure Cosmos DB Change Feed: A Zero Downtime Data Migration Story

Image title

Azure Cosmos DB is a fully managed, highly scalable, multi-model PaaS database solution offered by Microsoft as part of the Azure platform stack.

Azure Cosmos DB offers many useful features that can be easily enabled/disabled via feature toggles and fine-tuned through the Azure portal; the one we’re going to talk about today is called Change Feed.

What’s a Change Feed?

Change Feed is a system that allows you to listen to all insertion and update events that happen on records (documents) inside any given Azure Cosmos DB collection.

You can imagine it as a sort of commit log that can be queried at any time to retrieve useful information about whatever happens to your collection of interest.

You may also like: Experience Using Azure Cosmos DB in a Commercial Project.

When Should I Use a Change Feed?

There are several common use cases for employing a Change Feed as an event sourcing mechanism.

Employing a change feed
Source
  1. Pulling data from the Change Feed as it comes, and triggering calls to external APIs (including Azure Functions, Azure Logic Apps and the like!) as a consequence.

    Imagine a system that needs to send out notification emails to all of your subscribers whenever a new article (that gets stored on Azure Cosmos DB) has been published, consuming the Change Feed and channeling it into an Azure Function which in turn triggers an Azure Logic App that uses an Office 365 Outlook connector would make for a very elegant, highly scalable and fully serverless solution!

  2. Data ingestion and almost-real time processing via tools such as Azure Stream Analytics or Apache Spark. When using the right API for the task, it is possible to fine-tune parameters, such as the Change Feed polling window and the max item count for a given polling window, leaving us with a quite predictable and controlled stream of data that can be used for further elaboration without fear of stressing the system beyond its possibilities.
  3. Finally, my favourite scenario: zero downtime data migrations can be easily performed by consuming the Change Feed and replicating all insertion and update events coming from it into another persistence solution (eg. Azure Redis Cache). This allows users to have multiple applications that are able to work on the same data at any given time, laying the basis for a resilient system with a solid disaster recovery plan with zero waiting time

How to Use the Azure Cosmos DB Change Feed

There are three main ways you can interact with the Azure Cosmos DB Change Feed:

  1. Using the SQL API SDK library for Java or .NET, and specifically the AsyncDocumentClient.queryDocumentChangeFeedAPI; this is the low-level approach. You have full control over the Change Feed interaction, but at a very significant price: you must re-invent the wheel!

    You’ll have to implement your own polling and leasing mechanisms if you want to use this API to continuously consume the Change Feed. It is really only advised to use this API for scenarios where you need to query the Change Feed just once.

  2. Again, using the SQL API SDK library for Java/.NET, but this time invoking the high-level ChangeFeedProcessor API. The ChangeFeedProcessor is quite easy to use, as it doesn’t require a large amount of code in order to work properly and has polling, leasing, and customization mechanisms already in place.
  3. Finally, the highest-level solution is to use Azure Functions that are invoked by Azure Cosmos DB Change Feed triggers automatically. This is the full serverless approach and should be the way to go when the scenario to be handled is fairly simple and there is no useful code in your applications that should be re-used (the lower-level APIs could directly interact with your existing code as they would live within the same codebase, whereas an Azure Function would have to perform an HTTP request against your application or invoke its services by producing events, making it less efficient for the task at hand).

Change Feed Processor

Of the three ways we just talked about, my personal favorite for tricky scenarios would have to be the ChangeFeedProcessor API, since it offers the best compromise between ease of use and customizability.

Polling, Leasing, and Customization

One of the main things that makes the ChangeFeedProcessor such a great API is the fact that polling and leasing are already implemented for us. As far as polling is concerned, the Change Feed is queried by the ChangeFeedProcessor under the hood every feed poll delay seconds. We can easily change the feed poll delay, which is set to five seconds by default.

Leasing is the mechanism through which a ChangeFeedProcessor acquires unique ownership of one or more partition ranges within a Change Feed, distributing work evenly amongst multiple ChangeFeedProcessors — making sure they don’t end up consuming the same events — and keeping track of the last eventthat was successfully read from the Change Feed so that you can resume consuming the feed from that event in case of a failover scenario (once the lease has been freed up by its previous owner, that is — either actively or through lease expiration, where the default lease expiration interval is 60 seconds).

Leasing can be performed by relying on a technical collectionthat must be preemptively created and is used to store all the leases for a given Change Feed, their current owners, and the related continuation tokens. It doesn’t need any special name and the minimum RU provisioning will most likely suffice for the vast majority of use cases.

It should also be noted that it’s possible to have multiple ChangeFeedProcessors consume the same events from the Change Feed by configuring them to use different lease prefixes.

Azure Cosmos DB ChangeFeedProcessor leasing diagram
Source

The diagram above shows two different hosts (eg. μservices), one ChangeFeedProcessor for each, then gaining ownership of two leases each (each lease is represented as a single document inside the leases technical collection), therefore equally distributing the work by consuming events from different partition ranges within the Change Feed (eg. host 1 will only consume events for cities that start with A..I, whereas host 2 will only consume events for cities that start with J..S).

Customization is possible via the ChangeFeedProcessorOptions API, which lets us tune all sorts of things, from the feed poll delay to the max item count, lease prefix and lease expiration interval.

It can also be used to specify if the Change Feed should be consumed starting from the beginning, from a certain date-time or from the stored continuation token within the lease (the latter is default behaviour)!

Using the ChangeFeedProcessor

First of all, import the required SQL API SDK dependency for Java by adding the snippet below to your pom.xml:

<dependency>
<groupId>com.microsoft.azure</groupId>
    <artifactId>azure-cosmos</artifactId>
    <version>3.2.1</version>
</dependency>

Then build a CosmosClient, with your connection properties:

public static CosmosClient getCosmosClient() {
  return CosmosClient.builder()
    .endpoint("my-cosmosdb-endpoint")
    .key("my-cosmosdb-master-key")
    .connectionPolicy(ConnectionPolicy.defaultPolicy())
    .consistencyLevel(ConsistencyLevel.EVENTUAL)
    .build();
}

Finally, build and start your ChangeFeedProcessor, providing at least: the hostname that will be used as an identifier for lease ownership, the feedContainer (the collection that should provide the Change Feed), and the leaseContainer (the technical collection used to store the leases):

public static void startChangeFeedProcessor(String hostName, CosmosContainer feedContainer, CosmosContainer leaseContainer) {
    ChangeFeedProcessor.Builder()
        .hostName(hostName)
        .feedContainer(feedContainer)
        .leaseContainer(leaseContainer)
        .options(new ChangeFeedProcessorOptions().leasePrefix("my-lease-prefix")) //not required
        .handleChanges(docs -> {
            for (CosmosItemProperties document : docs) {
                System.out.println("Read document from Change Feed: " + document.toJson(SerializationFormattingPolicy.INDENTED));
            }
        })
        .build()
        .start()
        .subscribe();
}

This is really all there is to it!

Once the ChangeFeedProcessor has been started, it will keep receiving events from the Change Feed until it is gracefully stopped by invoking the ChangeFeedProcessor#stop() method, which releases the leases that were owned by that particular ChangeFeedProcessor so that the others can immediately gain their ownership and keep consuming events as they come without having to wait for the expiration interval.

Conclusion

The Azure Cosmos DB Change Feed is a nifty feature that can be used to forge elegant, resilient and scalable solutions for a specific set of use cases.

It should not be overlooked, especially when your requirements call for its usage and you already rely on Azure Cosmos DB for other purposes!

Further Reading

from DZone Cloud Zone

Developing Serverless Applications With Quarkus

Developing Serverless Applications With Quarkus

Developing serverless apps with Quarkus

Learn more about developing serverless apps with Quarkus.

In the first part of this article, I will explain both Quarkus and Knative. In the second part, I describe how to write simple microservices with Quarkus in Visual Studio Code in just a few minutes. And finally, in the last part of this post, I walk you through how to deploy and run microservices as serverless applications on Kubernetes via Knative.

Let’s get started!

You may also like: Thoughts on Quarkus

Quarkus

Quarkus is an open-source project led by Red Hat and a Java framework used to build microservices, which often require little memory and start very fast. In other words, Quarkus is a great technology for developing efficient containers.

From the outset, Quarkus has been designed around a container-first philosophy. What this means, in real terms, is that Quarkus is optimized for low memory usage and fast startup.

Quarkus is not a full Jakarta EE application server but comes with a lot of similar functionality. The big difference is that Quarkus optimizes for container workloads by doing as much processing as possible at build-time rather than at run-time. This means that functionality like reflection is not supported.

Quarkus comes in two flavors. You can run it with classic JVMs like Hotspot or you can translate the Java code in binary code via GraalVM — when using binary code memory usage is even less and startup time is even shorter. Here are the measurements from the Quarkus homepage.

 Measurements from the Quarkus homepage

In this article, I focus on classic JVMs. Rather than using Hotspot, I’ve used OpenJ9, which requires only roughly half of the memory compared to Hotspot. In my little test, my simple microservice with a REST endpoint required only 34 MB, which is similar to the Node.js node:8-alpine image. Until recently, serverless applications were primarily developed with less resource-intensive technologies like Node.js.

I think that, with Quarkus, more enterprise developers will leverage their existing Java skills to build container workloads, for example, serverless applications. In the cloud where you pay by 1. memory usage and 2. durations the code runs, frameworks like Quarkus make a huge difference.

Knative

Knative is an open-source project initiated by Google with contributions from over 50 companies, for example, Red Hat and IBM. Knative allows running containers on Kubernetes in a serverless fashion so that containers only run when actually needed.

Run serverless containers on Kubernetes with ease. Knative takes care of the details of networking, autoscaling (even to zero), and revision tracking. You just have to focus on your core logic.

I think Knative’s key functionality is that it supports ‘scale to zero’. When containers are not needed anymore, they are shut down automatically so that they don’t consume compute resources. In other words, you can run more containers in a cluster, just not all of them at the same time.

The big challenge for all serverless platforms is how to handle ‘cold starts’ of containers. The first time endpoints of containers are invoked, the containers need to be started first. Since Knative shuts down containers after a certain duration of inactivity, the same ‘cold starts’ occur the next time endpoints are invoked.

When using Knative, the container with the microservice is not the only container in the pod. Additionally, an Istio proxy and a Knative queue proxy are running. A fourth Istio container initiates the pod. These other three containers need to be started in addition to the microservice container. Even though Quarkus containers start in less than a second, the overall startup time of the pod is much longer. In my little test, it took 16 seconds to get responses from my Quarkus container. So until there are better ways to handle the cold starts, Knative shouldn’t be used for all kinds of workloads.

I think the best usage of Knative are scenarios where a lot of compute is required infrequently and responses in a few milliseconds are not required. A good example is the processing of large data sets of massive scale, which can be handled with serverless technologies running in parallel.

Development of Microservices With Quarkus

After the introduction of Quarkus and Knative, let’s now move on and take a closer look at how these technologies can be used. In this part, I describe how to build your first microservice with Quarkus.

I think Quarkus provides an awesome developer experience. There is an extension for Visual Studio Code to create projects very easily. Quarkus also supports a development mode for hot code replace and built-in debugging functionality.

To create a new project open the palette (on Mac via ⇧⌘P) and choose “>Quarkus: Generate a Maven project”.

Generating a Maven project in Quarkus

When you accept all defaults, a simple project is created with a greeting resource.

Project created with greeting resource

After this, you can open a terminal and invoke “./mvnw compile quarkus:dev” to use the development mode. The endpoint will be accessible via the URL “http://localhost:8080/hello”. Try out the hot replace functionality! Very nice.

Additionally, the VS Code extension supports debugging:

This screenshot shows the debugger in action:

Building Quarkus Images

Before microservices can be deployed to Kubernetes, the Java applications and the Docker images have to be built.

When running Maven via “./mvnw package”, two jar files are created in the /target directory.

  • getting-started-1.0-SNAPSHOT.jar: Classes and resources of the projects
  • getting-started-1.0-SNAPSHOT-runner.jar: Executable jar. Dependencies are needed additionally (in /target/lib)

Two Jars created from running Maven

The VS Code extension also creates a Dockerfile to run the Quarkus application with Hotspot. However, I’ve used OpenJ9 since it consumes only half of the memory. To use OpenJ9, create the file ‘src/main/docker/Dockerfile.jvm-j9’ with the following content.

FROM adoptopenjdk:8-jre-openj9
RUN mkdir /opt/app
COPY target/lib/* /opt/app/lib/
COPY target/*-runner.jar /opt/app/app.jar
CMD ["java", "-jar", "/opt/app/app.jar"]

To build the images and to run the container locally, run the following commands.

$ docker build -f src/main/docker/Dockerfile.jvm-j9 -t quarkus/quarkus-getting-started-jvm-j9 .
$ docker run -i --rm -p 8080:8080 quarkus/quarkus-getting-started-jvm-j9
$ curl http://localhost:8080/hello

The Quarkus application starts in around 0.7 seconds.

Quarkus app starts in around .7 seconds

The application requires 32 MB of memory.

Next, push the image to DockerHub. Replace ‘nheidloff’ with your DockerHub name.

docker tag quarkus/quarkus-getting-started-jvm-j9 nheidloff/quarkus-getting-started-jvm-j9
docker login
docker push nheidloff/quarkus-getting-started-jvm-j9

Deploying Quarkus Images as Serverless Applications

In order to deploy Quarkus microservices as a serverless application, you need to install Knative on Kubernetes. One option is to do this locally with Minikube.

I’ve used another option. IBM provides a managed Kubernetes service. To install Knative, you can simply activate it in the user interface. Read the documentation for details.

Before deploying the application, log in to the IBM Cloud.

$ ibmcloud login -a cloud.ibm.com -r eu-de -g default
$ ibmcloud ks cluster config --cluster <cluster-name-or-id>
$ export KUBECONFIG=<file-from-previous-command>
$ kubectl get pods

One option to deploy the application is to use yaml files with the custom resources provided by Knative. Another option is to use the Knative CLI (command-line interface) kn. In that case, you only have to invoke one command.

$ kn service create quarkus-getting-started --image nheidloff/quarkus-getting-started-jvm-j9
$ kn service list

The screenshot shows the three containers including the Quarkus container running in the same pod. Invocations of the greeting endpoint take between 0.06 and 0.2 seconds dependent on the network.

To find out memory usage, invoke the following command:

$ kubectl get pods
$ kubectl top pod quarkus-getting-started-<your-pod-id> --containers

The Quarkus container with OpenJ9 consumes 34 MB memory. The first time the endpoint was invoked (cold start) took 17 seconds, after this only 0.7 seconds.

Next Steps

I hope you enjoyed this article! To find out more about these technologies, check out the following resources.

Further Reading

Serverless in Your Microservices Architecture

Thoughts on Quarkus

Lightweight Serverless Java Functions With Quarkus

from DZone Cloud Zone

Getting Started With EMR Hive on Alluxio in 10 Minutes

Getting Started With EMR Hive on Alluxio in 10 Minutes

Hive

Find out what the buzz is behind working with Hive and Alluxio.

This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.

You may also enjoy:  Distributed Data Querying With Alluxio

Prerequisites

  • Install AWS command line tool on your local laptop. If you are running Linux or macOS, it is as simple as running pip install awscli.
  • Create an from the EC2 console if you don’t have an existing one.

Step 1: Create an EMR Cluster

First, let’s create an EMR cluster with Hive as its built-in application and Alluxio as an additional application through bootstrap scripts. The following command will submit a query to create such a cluster with one master and two workers instances running on EC2. Remember to replace “alluxio-aws-east” in the following command with your AWS keypair name, and “m4.xlarge” with the EC2 instance type you like to use. Check out this page for more details of this bootstrap script.

$ aws emr create-cluster \
--release-label emr-5.25.0 \
--instance-count 3 \
--instance-type m4.xlarge \
--applications Name=Hive \
--name 'EMR-Alluxio' \
--bootstrap-actions \
Path=s3://alluxio-public/emr/2.0.1/alluxio-emr.sh,\
Args=[s3://apc999/emr-tutorial/example-ml-100] \
--configurations https://alluxio-public.s3.amazonaws.com/emr/2.0.1/alluxio-emr.json \
--ec2-attributes KeyName=alluxio-aws-east

You can check out the progress at AWS EMR console. This process can take 5 to 10 minutes until the status shows “Waiting Cluster ready” as shown in the screenshot below.

Waiting Cluster ready

So far, we have a three-node cluster running.

Step 2: Create a Hive Table on Alluxio

Log in to the master node (its hostname will be different from your runs, check the “Cluster details” on the console page).

$ ssh -i /path/to/alluxio-aws-east.pem [email protected]

Check to see whether the S3 bucket “apc999” with my example input data has been properly mounted. Note that this bucket is pre-configured to be a public bucket and accessible for all AWS users.

[[email protected] ~]$ alluxio fs mount
s3://apc999/emr-tutorial/example-ml-100  on  /  (s3, capacity=-1B, used=-1B, not read-only, not shared, properties={})
[[email protected] ~]$ alluxio fs ls -R /
              1       PERSISTED 10-07-2019 20:32:09:071  DIR /ml-100k
          22628       PERSISTED 10-01-2019 07:15:07:000 100% /ml-100k/u.user

Start Hive and run a simple HQL query to create an external table “users” based on the file in Alluxio directory /ml-100k:

[[email protected] ~]$ hive
> DROP TABLE IF EXISTS users;
> CREATE EXTERNAL TABLE users (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio:///ml-100k';

Step 3: Query the Hive Table

After creating this external table, run Hive with the following query to scan the table users and select the first 10 records from this table:

> SELECT * FROM users limit 10;

You will see results like:

1   24  M   technician  85711
2   53  F   other   94043
3   23  M   writer  32067
4   24  M   technician  43537
5   33  F   other   15213
6   42  M   executive   98101
7   57  M   administrator   91344
8   36  M   administrator   05201
9   29  M   student 01002
10  53  M   lawyer  90703

Step 4: Write a New Table

Let us mount a new bucket where you have the write permission on the same Alluxio file system namespace. Make sure you can write to this bucket address. In my example, I mounted a new Alluxio directory /output with a writable bucket path (to me only) under s3://apc999/output.

[[email protected] ~]$ alluxio fs mount /output s3://apc999/output
Mounted s3://apc999/output at /output

Inside Hive, write a new table to the output directory:

> DROP TABLE IF EXISTS new_users;
> CREATE EXTERNAL TABLE new_users (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
LOCATION 'alluxio:///output/';
> INSERT OVERWRITE TABLE new_users SELECT * from users;

The above queries will create a new table called new_users based on the same content in table users. One can check the data inside alluxio:///output:

[[email protected] ~]$ alluxio fs ls -R /output
          22628       PERSISTED 10-07-2019 21:36:22:506 100% /output/000000_0

Summary

In this tutorial, we demonstrate how to run EMR Hive with Alluxio in a few simple steps based on Alluxio boot-strap scripts. Feel free to ask questions at our Alluxio community slack channel.

Further Reading

How to Properly Collect AWS EMR Metrics

from DZone Cloud Zone

Schedule Azure WebJobs Using Azure Logic Apps

Schedule Azure WebJobs Using Azure Logic Apps

Image title

Migrate your Azure WebJobs with Azure Logic Apps before it’s too late.

In this article, we will see how to migrate Azure WebJobs from Azure Scheduler to Azure Logic Apps. As we all know, Azure Scheduler will become obsolete on December 31, 2019, after which all Scheduler job collections and jobs will stop running and they will be deleted from the system.

You may also enjoy:  An Inside Look at Microsoft Azure WebJobs

In order to continue to use jobs, we must move Azure Schedulers to Azure Logic Apps as soon as possible. Azure Logic Apps have numerous features:

  • Uses a visual designer and connectors to integrate with more than 200 different services, including Azure Blob storage, Azure Service Bus, Microsoft Outlook, and SAP.

  • Manages each scheduled workload as a first-class Azure resource.

  • Runs multiple one-time jobs using a single logic app.

  • Sets schedules that automatically adjust to daylight saving time.

  • For more details about its features and usage please refer to this article.

WebJobs is a feature of the Azure App Service that enables you to run a program or script in the same context as a web app, API app, or mobile app. There is no additional cost to use WebJobs. 

Schedule Azure WebJobs Using Azure Logic Apps.

Step 1: Create and deploy an on-demand (triggered) job under Azure App Service. Click here to learn how to create and deploy a WebJob.

Create on-demand job

Step 2: Create a blank Logic App.

Blank Logic App

Step 3: Edit the Logic app and select a Recurrence type of Schedule trigger.

Recurrence type

Step 4: Set the intervals according to your requirements. Also, you can set other parameters like Time Zone and Start Time as shown in the below image. In my case, I am going to set 1 minute without any extra parameter.

Setting parameters

Step 5: After trigger configuration, now it’s time to set HTTP action with post method and basic authentication.

Set HTTP action

"inputs": {

   "authentication": {

       "password": "password",

       "type": "Basic",

       "username": "$username"

        },

    "method": "POST",

    "uri": https://appname-webjob.scm.azurewebsites.net/api/webjobtype/webjobname/run

  },

Authentication

Note: we can get the basic authentication from publish profile of an application where we have deployed the WebJob.

Step 6: Run the Logic App and check the status under run history.

Status check

WebJob Details

Note: You can download the code, copy and paste it in code view. Change your credentials and URI according to your WebJob; your Recurrence type Logic App is ready.

Expected Results

The Logic App has configured correctly, and it is working as expected. It is running the Azure WebJob according to the set frequency. If there is an error in the WebJob, then your run will be failed, and you can see it under Runs History.

Summary

In this article, we have learned how to schedule the WebJobs using Azure Logic Apps. In reference to the article given above, we noticed that the Azure Scheduler is scheduled to retire fully on December 31, 2019, and All Scheduler job collections and jobs will stop running and they will be deleted from the system simultaneously. So, before that, shift Azure Scheduler jobs to Azure Logic Apps as soon as possible.

Further Reading

Azure Logic Apps Lifecycle – The Big Picture

Azure WebJobs vs. Azure Functions

from DZone Cloud Zone