Tag: Healthcare

How to Get the Global Scalability of AWS Storage at Local Speed with Nasuni

How to Get the Global Scalability of AWS Storage at Local Speed with Nasuni

By Henry Axelrod, Partner Solutions Architect at AWS

Nasuni-Logo-1
Nasuni-APN-Badge-1
Connect with Nasuni-1
Rate Nasuni-1

Being able to access files via standard file protocols from one or more locations is important to many organizations.

In this post, I will explore how Nasuni’s solution allows customers to access files across many locations through the use of a physical or virtual appliance. You can run these appliances in a data center or the Amazon Web Services (AWS) Cloud.

Nasuni is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Storage Competency. With industry-leading encryption, local performance, and pay-as-you-go pricing, Nasuni makes it easy and cost-effective to add storage capacity.

With Nasuni, all data is stored on highly durable Amazon Simple Storage Service (Amazon S3). This enable customers to have access to their data anywhere over standard file protocols using many of the NAS features they are used to at local performance, without having to maintain full replicas of the data in each location.

The Nasuni system has two main components: edge appliances, which can be deployed anywhere; and a storage layer, which is where data is stored. The data is presented to the users on the frontend, which caches the latest data and also persists it to Amazon S3.

Because Nasuni appliances are just used for caching, Nasuni is not bound by any traditional capacity limits and is able to take advantage of the virtually unlimited capacity of Amazon S3.

Nasuni also has a control plane running on AWS that enables functionality such as global file locking.

There’s also a Nasuni management console that lets you monitor and manage multiple Nasuni edge appliances.

Nasuni Storage-1

Figure 1 – Nasuni architecture on AWS.

Deployment

Nasuni can be procured from AWS Martketplace or purchased as a hardware appliance. Once you have deployed Nasuni, you can access the Nasuni portal where you can download a virtual machine template.

For the Amazon Elastic Compute Cloud (Amazon EC2) version, an Amazon Machine Image (AMI) for both the edge appliance and management console are shared with your AWS account. Full details about the installation on Amazon EC2 can be found in the Nasuni EC2 Installation Guide.

Make sure you add the instance to an existing security group or new security group that grants https access from the machine you’ll be using to configure.

After the AMI has been successfully deployed and the instance has started, it may take a few minutes before you can access the web console while the system boots and initializes. In the Amazon EC2 console, you can go to Action > Instance Settings > Get Instance Screenshot to see how close the instance is to being ready. You will see a screen similar to the one below that lets you know your instance is ready to be accessed.

Nasuni Storage-2

Figure 2 – Instance readiness alert.

Configuration

Access the edge appliances administrative interface using https://<FILE_IP_ADDRESS OR DNS>.

In this scenario, two appliances were installed: one is us-east-1 and one in us-west-1.

Once in the interface, you’ll be prompted to specify a host name and network info. Learn more about these steps in the Nasuni Filer Initial Configuration Guide.

Please note that until you finish the initial configuration, anyone with https network access to the system can configure the host, so make sure to lock down the security group for https access to just those machines or networks that will need to access the administrative interface.

At this point, you’ll need to grab one of your serial numbers and authorization codes from the Nasuni portal. Follow the remaining prompt and set up a username and password for the initial administrator.

Once done with the configuration wizard, you will be on the home screen of the filer and ready to set up a volume, which is the logical location were data is stored. Since data can be accessed by one or more filer, we can start by adding a new volume to the first filer and then add that existing volume to the second filer.

You can go to the Add New Volume screen, as seen below. In this case, the volume is named “eastcoast”.

Nasuni Storage-3

Figure 3 – Adding a new volume.

You can select the volume to be either CIFS or NFS. By default, Nasuni will create a share or export of the volume. You can uncheck the box if you want to manually create shares or exports, in this case an NFS volume was created and retained the default export setting.

Nasuni Storage-4

Figure 4 – Create a default share or export.

Once the volume has been created, you can take the first snapshot of the volume using the Take Snapshot Now button on the volume properties page. Once you’ve taken the snapshot, the Amazon S3 bucket is created in your account with the name of the bucket having a prefix “nasunifiler”.

Still within the volume properties, you can select remote access to enable the volume to be shared with other filers, as seen below. You can enable read only, read/write for all other filers in your account, or customize access on a per-filer basis.

Nasuni Storage-5

Figure 5 – Remove Access dialog box.

Working with the Second Filer

Now you can go to the administrative interface of the “westcoast” edge appliance at https://<FILE_IP_ADDRESS OR DNS> and follow the same configuration steps as detailed above, up until adding the volume.

Instead of adding a volume this time, you can go to the All Volumes page where you should see your previously created volume, to which you can connect. You can inherit settings or customize the settings for this filer.

Nasuni Storage-6

Figure 6 – Connecting to the second filer.

Access the Volume

First, set up Linux instances in us-east-1 and us-west-1, respectively. Next, mount the volume on the clients to the filer in the same region.

On the us-east-1 client, use the DNS of filer on the East Coast and the name of the export, which in this case is the volume name:

mount eastfiler:/nfs/eastcoast /mnt/nasuni-eastcoast

On the us-west-1 client, use the DNS of filer on the West Coast:

mount westfiler:/nfs/eastcoast /mnt/nasuni-eastcoast

Next, create a simple file called data.txt on the East Coast client:

echo "this is my data" > data.txt

You can now see the file data.txt on the volume and read it.

Within minutes, if you check the West Coast client you should be able to successfully read the file that was created on the East Coast filer.

[[email protected] nasuni-eastcoast]# cat data.txt

this is my data

Within less than an hour, you can set up file system that’s accessible across the county, enabling geographically dispersed users to seamlessly collaborate.

Summary

Enabling file sharing across multiple geographic locations can be simple when using the Nasuni solution. Combined with the scale and durability of Amazon S3, Nasuni provides a strong solution for file sharing, whether for Windows or Linux clients.

The Nasuni filesystem can be shared on AWS or with on-premises users, keeping all the data securely stored an Amazon S3 bucket.

To learn more, visit the Nasuni website. If you’re ready to get started, check out Nasuni in AWS Marketplace.
.

AWS Competency Partners: The Next Smart

Nasuni is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-2
.


Nasuni-Logo-1
Connect with Nasuni-1

Nasuni – APN Partner Spotlight

Nasuni is an AWS Storage Competency Partner. Nasuni makes file storage simple by turning Amazon S3 into your local file server. With industry-leading encryption, local performance, and pay-as-you-go pricing, adding storage capacity is easy and cost-effective.

Contact Nasuni | Solution Overview | AWS Marketplace

*Already worked with Nasuni? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Reinventing the Internet of Things (IoT) Platform for Discrete Manufacturers

Reinventing the Internet of Things (IoT) Platform for Discrete Manufacturers

By David Westrom, VP of Business Development at MachineMetrics
By Graham Immerman, Director of Marketing at MachineMetrics

MachineMetrics-Logo-1
MachineMetrics-APN-Badge-2
Connect with MachineMetrics-1
Rate MachineMetrics-1

The Industrial Internet of Things (IoT) space is hot right now, as manufacturing represents perhaps the largest greenfield opportunity left for digitization. Yet, IoT platform implementations have historically had a high rate of failure within this vertical.

What’s contributing to these failure rates, and what needs to change?

In this post, we examine common approaches for enabling Industrial IoT initiatives, the pros and cons, and their culpability for the high failure rate. We then introduce a new approach that’s already driving rapid, continuous value creation for discrete manufacturers and the companies that provide and service their manufacturing assets.

First, a bit about us—MachineMetrics is manufacturing’s first Industrial IoT platform designed for discrete manufacturing. We like to think of ourselves as the machine data component of the Amazon Web Services (AWS) digital factory.

MachineMetrics, an AWS Partner Network (APN) Advanced Technology Partner with the AWS Industrial Software Competency, has developed a hybrid solution for manufacturers and machine builders that combines machine connectivity and rapid value creation of packaged software-as-a-service (SaaS) applications with the innovation enablement of an IoT platform.

Right now, hundreds of manufacturers and machine builders are using the MachineMetrics platform to measure and analyze the performance of thousands of machines across global factories.

Our solutions are providing these companies the necessary real-time data they need to optimize machine performance and productivity, increase capacity utilization, and ultimately win more business to remain globally competitive.

The IoT Platform “Revolution”

The term Industry 4.0 encompasses a promise of a new industrial revolution—for the industrial sector, one that marries advanced manufacturing techniques with the Internet of Things to create systems that are not only interconnected, but communicate, analyze, and use information to drive further intelligent action back in the physical world.

Today is the internet moment for manufacturing, and with it comes the gold rush of providers ready to enable the industry’s digital transformation.

At this point, it would be impossible to work within the manufacturing space and not have spent the past few years bombarded by pitches for Industrial IoT platforms claiming to best support the Industry 4.0 revolution.

MachineMetrics-IoT-Platforms-1

Figure 1 – Despite manufacturing being the largest industry in the U.S. and producing perhaps the most raw data, it has the least amount of digitization of any major industry.

These magical platforms market their unique machine learning (ML), artificial intelligence (AI), and edge/cloud/fog technologies to enable the fabled digital transformation of any industry through predictive models, digital twins, and fully-automated workflows.

There are more than 450 IoT platforms to choose from, according to IoT Analytics, and it can be easy to think Industry 4.0 has indeed arrived and manufacturing’s digital transformation is finally at hand. The data, however, tells another story.

IoT implementations have had a historically high rate of failure. Cisco produced a report of survey results indicating that companies considered 76 percent of their Industrial IoT initiatives failures. This has led to greater hesitation on the part of manufacturers to embark on digital transformation journeys.

So what’s driving companies to fail at such a high rate when a majority said that IoT initiatives looked good on paper?

We explore some of the organizational causes in our eBook Why Industrial IoT Projects Fail. In this post, however, we focus on the story’s technology component and IoT platforms specifically, with the goal of both identifying and proposing a solution to manufacturing’s platform problem.

The Platform Problem

There are many different types of IoT platforms, including application enablement platforms, device management platforms, analytics platforms, and others.

In July 2019, Gartner published its first-ever 2018 Magic Quadrant for Industrial IoT, which included companies that provide IoT platforms that work in multiple verticals. No company crossed Gartner’s bar for execution, however, and no one made it above the midway horizontal line (indicating a strong ability to execute). This demonstrates that successful execution and value attainment is elusive.

The challenge with a platform is that it can be time-consuming and expensive to implement and deploy. The investment required to be trained on the platform, and model and build the initial applications and solutions that generate value for the customer, can be prohibitive.

When evaluating return on investment (ROI), use cases and not the underlying platforms ultimately drive value. Since most generic IoT platforms can’t deliver packaged manufacturing use cases in the form of services, applications, or solutions by themselves, the onus of enabling a use case using an IoT platform falls on the customer or systems integrator.

Many manufacturing transformation leaders have struggled with determining a tangible and acceptable ROI from their IoT platform investments. More often than not, their projects go over budget, deployment times run long, interoperability issues occur across legacy systems, or planning and resources aren’t allocated appropriately. This all leads to a disappointing ROI, or even cancellation of the initiative.

The Discrete Manufacturing Challenge

Manufacturing has several unique challenges that are difficult to address with generic IoT platforms.

Data Variety

Not only are there many distinct types of equipment—Lathes, Mills, Plastic Injection Molding, Stamping, Laser Cutters, Robotics—but depending on the mechanisms available for acquiring data from those systems, the data points can be diverse. To provide effective tools for analyzing that data across distinct systems, the data must be transformed into a common data model.

Data Volume

Manufacturing equipment, and discrete manufacturing equipment in particular, is very complex. A machine is a large system of components that work in coordination, resulting in hundreds of distinct data points that change constantly.

Depending on the application, there may be situations where it’s required to capture data at rates of 100Hz or 100KHz. Platforms consuming this must analyze data at multiple levels within the system to avoid sending and storing unnecessary data, when only the aggregate or computed result is sufficient.

These systems must be capable of performing complex processing where it’s most appropriate—at both the edge and in the cloud.

Data Speed

While some systems can provide value with low fidelity and high latency, certain IoT use cases require much more real-time data to be effective. Edge technology is required to process high volumes of data, make decisions in milliseconds or less, and act to potentially prevent damage to the machine or the work piece.

Number of Disparate Systems

Integrating legacy systems is a complex task. Having robust data models for each application serving the vertical are required to adequately capture events. Furthermore, having a deep understanding of how the data from each of those systems interact within manufacturing is also necessary to be able to make correlations and provide coherent analysis for process improvement.

IT Infrastructure

Highly elastic and scalable systems are a new entrant to manufacturers and their IT organizations. Due to the intense processing and storage requirements of IoT, it can be prohibitively costly to overprovision a system for peak load at all times.

Utilizing secure cloud systems with virtual hardware architectures programmed to be highly available, scalable, and fault tolerant with multiple data centers separated geographically for disaster recovery are even more important when considering the value that a successful IoT initiative can bring an organization.

Investing in Industrial IoT Platforms

Discrete manufacturers invest in IoT platforms to answer four basic questions:

  • What’s happening?
  • Why is it happening?
  • What’s going to happen next?
  • What can I do about it?

To properly answer these questions and deliver business value, one needs to understand their data within the context of their own operations. This requires a singular focus on the vertical in order to realize the immediate and continuous value promised when undertaking an Industry 4.0 digital transformation initiative.

MachineMetrics-IoT-Platforms-2

Figure 2 – We often talk to companies who have predictive and preventative aspirations, but who don’t know what’s happening right now on their shop floor. By providing this visibility, our customers experience a 20 percent increase in uptime in the first month on average.

With the complexity introduced by these challenges, it should be no surprise that generic IoT platforms often fall short when it comes to manufacturing.

Limitations of On-Premises Packaged Applications

Currently, manufacturers have options to embrace data-driven manufacturing other than IoT platforms. Instead of building out applications and solutions that meet specific needs using a platform, manufacturers often opt to buy a packaged application or solution, or hire a third-party to build a solution for them leveraging various platforms, tools, and technologies.

These packaged applications and solutions have proliferated the market for decades, and the primary benefit is time to value. They can be set up quickly at a relatively lower cost and rapidly drive incremental value.

One disadvantage of packaged applications and solutions is the customer must adapt their processes to conform to the software. Packaged applications and solutions can also be difficult to customize, extend, and integrate with the many disparate systems that exist in a manufacturing facility.

The customization of packaged applications and solutions, along with customized integrations to other disparate applications and systems across the manufacturing enterprise, can create a maintenance nightmare and has resulted in a state of paralysis at many global manufacturers.

These issues are a few of many that have led to the opportunity we see today in the market for something better.

A New Approach: Jump Starting the Platform

Until recently, the options for enabling IoT initiatives and driving digital transformation for discrete manufacturers were limited to the horizontal platform approach, the integration of a packaged application or solution, or some combination of each through a myriad of disparate product and service vendors.

But what if you could have your cake and eat it, too? What if you could have the continuous improvement and innovation opportunity provided by a platform with the immediate benefit and ROI of a packaged service, application, or solution?

Today, we present to you our methodology behind why we built MachineMetrics and lay out a new approach to solving manufacturing’s platform problem. This approach has driven tremendous value to the tune of 20 percent increases in manufacturing efficiency on average for our customers within the first month.

The Industrial IoT Foundation

Enabling any Industry 4.0 initiative starts with a data infrastructure that facilitates rapid connection to any type of asset. Regardless of brand or age, it is necessary to capture and transform data into a standard format that is stored in a secure manner in a cloud infrastructure that can easily be consumed by any technology.

This is not a simple task, though. We first simplified IoT connectivity with an inexpensive edge device that enables secure Ethernet, Wi-Fi, and cellular communication while connecting directly to machine tool PLCs and controls. This device is programmed with dozens of custom software adapters developed to automatically unlock, map out, collect, and standardize the available data points (Status, Modes, Alarms, Overrides, Load, Speeds, Feeds, and more).

We then add the ability to connect additional sensors or collect data from legacy equipment with digital and analog I/O that is configured and managed remotely through a web interface. This task is paramount as every manufacturer has a wide variety of equipment types and ages that require data to be unsiloed from in order to drive analytics.

MachineMetrics-IoT-Platforms-3

Figure 3 – MachineMetrics Edge runs numerous custom software adapters that connect, collect, and transform, and push machine data to their Amazon VPC platform via Wi-Fi, Ethernet, or cellular connectivity.

Our data collection infrastructure is the foundation of the MachineMetrics IIoT Platform. Once collected and transformed, the data can be made actionable through vertically focused applications delivered either by MachineMetrics, through third parties, or with custom applications built by customers using available APIs.

Adding Vertically Packaged Applications

In order to offer quick time to value, vertically focused applications that provide actionable information for specific user personas is necessary. The foundation itself provides the data, but the value is in making the data actionable.

There are many opportunities to deliver packaged applications that make manufacturing data actionable for a variety of consumers within the manufacturing lifecycle. Here are a few examples we’ve developed to ensure our users are able to create rapid value.

Built-in capacity utilization reporting allows plant managers to make capital equipment purchase decisions. Knowing how your factory’s machine utilization compares with the industry average can either drive the decision to purchase more equipment or new equipment, as well as whether to invest in internal operations to optimize the utilization of existing equipment.

A tablet application mounted at the machine empowers operators to add human context to machine data and to meet production goals.

Through text and email notifications delivered to their mobile phones, the app enables operators to respond to problems faster and more efficiently with instructions that display when a machine needs attention, such as an inspection, a tool change, or maintenance. The app also tracks changeovers and setups so that when the process exceeds the expected time, the supervisor is called over to proactively manage the issue.

For OEMs and distributors of manufacturing equipment, a packaged application enables remote monitoring and diagnosis of machine health problems assets in the field and at customer sites in real-time.

With the application, service technicians can remotely visualize machine diagnostics and conditions to quickly identify problems, allowing them to either help troubleshoot and resolve the problem without the on-site visit or, if they need to go to the plant, bring the right tools and order the right parts in advance of a plant trip. The result is better service and improved machine uptime for the customer, but also a reduction of costly on-site service visits.

Platform Extensibility

The challenge with vertically packaged applications is that there are more use cases that drive rapid value creation with manufacturing data than any one company can build and support alone. Thus, robust APIs that allow third parties to extend the platform and build their own vertically packaged applications that allow even more value creation opportunities to be driven with the data.

This requires more investment of time and risk, but the payoff can be large. The MachineMetrics Platform is extensible at multiple levels, including at the edge, via API access to data in the cloud, and through our operator interface. This enables manufacturing customers and partners to leverage their deep domain expertise to add their own unique IP to the platform and optimize the value created for our mutual data consumers.

For example, machine builders have unique domain expertise with regards to the design and operation of their own machines. This gives them a natural advantage when it comes to the optimization of these complex and highly specialized assets. The challenge, however, is to deliver a solution to enable machine builders to leverage this expertise.

It’s tremendously difficult to seamlessly deploy an application across a fleet of machines, not to mention to maintain this solution over time. For the customer who has many different assets, the challenge is to leverage as few apps as possible to monitor, analyze, and manage the proper notifications for their various equipment types.

MachineMetrics-IoT-Platforms-4

Figure 4 – Many companies leverage MachineMetrics as a true Industrial IoT platform; others leverage the technology as the machine data component of a much larger digital factory initiative.

Leveraging MachineMetrics, machine builders can now provide advanced analytics and custom micro-services to their customers through our scalable, vertically integrated cloud platform. For example, machine builders can offer algorithms customized for their own equipment that deliver predictive health notifications of spindle life. This has created a competitive differentiation for our machine builder partners, and an opportunity to drive incremental growth through new services and business models.

The ultimate end customer, the discrete manufacturer, along with their consulting partners, also possess unique insight and expertise with regards to their manufacturing operations, processes, and products.

This domain knowledge can be leveraged by the manufacturer to extend the MachineMetrics Platform to drive continuous operational improvement. One example is monitoring SPC of part cycles where outliers can be configured as a trigger to indicate a problem cycle or a bad part.

Integrations with Existing Third-Party Applications

There is an ecosystem being developed of modern, cloud SaaS manufacturing applications. These are designed to easily integrate with other SaaS applications through APIs. For manufacturers who use these applications, integrations and time to value is shortened.

Industrial IoT data can drive value across this ecosystem with integrations into ERP/MES, BI, Quality, HR, and CMMS/Maintenance. As this ecosystem evolves with more vertically focused applications in the market, the time to value and need for customization is greatly reduced.

This provides the manufacturer a complete solution across the entire manufacturing stack that is vertically integrated for your business, flexible for changes in business process, and easily updated and maintained through cloud deployments.

Legacy ERP/MRP systems that try to tackle every problem do not support integrations, require very expensive customization and configuration, and prevents changes to business process.

Conclusion

At MachineMetrics, our mission has always been to provide manufacturers with the data they need to increase productivity, win more business, and stay competitive. In the age of Industry 4.0, discrete manufacturing’s digital transformation requires a new approach that creates confidence and inspires future innovation.

Our approach provides the necessary rapid value creation for implementations to break free of the standard “pilot purgatory,” as C-level executives have been able to achieve ROI numbers that justify larger roll-out plans, while shop floor workers are simultaneously experiencing the benefits of real time visibility and automation.

The initial success of an implementation is so important, not just to demonstrate the value of the initial technology but to realize the belief in the benefits enabled through Industrial IoT technology.

This is, of course, not the end of the digital transformation story; it’s just the beginning. As users gain confidence in the value of the technology, it’s essential to provide customers a roadmap to continuous innovation enablement. We call this roadmap the manufacturing analytics journey.

Only by incorporating this approach can IoT platforms deliver real business impact for discrete manufacturing. Without it, most of them will continue struggling to deliver on the promise of Industry 4.0.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

AWS Competency Partners: The Next Smart

MachineMetrics is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-2
.


MachineMetrics-Logo-1
Connect with MachineMetrics-1

MachineMetrics – APN Partner Spotlight

MachineMetrics is an AWS Competency Partner whose Industrial IoT platform measures and analyzes the performance of thousands of machines across global factories.

Contact MachineMetrics | Solution Overview

*Already worked with MachineMetrics? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

How to Use Nubeva with Amazon VPC Traffic Mirroring to Gain Decrypted Visibility of Your Network Traffic

How to Use Nubeva with Amazon VPC Traffic Mirroring to Gain Decrypted Visibility of Your Network Traffic

By Randy Chou, CEO at Nubeva Technologies
By Miguel Cervantes, Partner Solutions Architect at AWS
By James Wenzel, Partner Solutions Architect at AWS

Nubeva-Logo-1
APN Advanced Technology Partner-3
Connect with Nubeva-1
Rate Nubeva-1

Amazon Web Services (AWS) has added a new feature to Amazon Virtual Private Cloud (Amazon VPC) called traffic mirroring. You can think of Amazon VPC traffic mirroring as a virtual network tap that gives you direct access to the network packets flowing through your Amazon VPC.

Customers rely on network tapping and mirroring functions for testing, troubleshooting, network analysis, security, and compliance requirements, to name a few. They are already taking advantage of this new cloud-native network tapping solution, which offers network and security capabilities that are common place in the data center.

Customers who have made the move to AWS enjoy an overwhelming amount of benefits, but also face the challenge of gaining visibility into the network traffic flowing over their Amazon VPC.

Customers would use Amazon VPC flows to see network flows, but they were missing the application-level context of the packet. This left many security teams with a challenge to build complex monitoring solutions that came with undifferentiated heavy lifting.

Now that Amazon VPC traffic mirroring is available, customers can satisfy the needs of their security teams’ requirements and gain visibility into their network.

In this post, we will explore how Nubeva Technologies, an AWS Partner Network (APN) Advanced Technology Partner, built a solution that directly integrates with Amazon VPC traffic mirroring to provide an out-of-band decryption solution for AWS customers.

This joint solution with Nubeva’s product and Amazon VPC traffic mirroring gives customers a surgical approach to capture and analyze network traffic on the AWS Cloud.

What is Amazon VPC Traffic Mirroring?

Amazon VPC traffic mirroring allows you to capture and mirror network traffic for AWS Nitro System-based instances. The key benefit of Amazon VPC traffic mirroring is its relationship to the Elastic Network Interface (ENI) of the Amazon Elastic Compute Cloud (Amazon EC2) instance you want to enable a traffic mirroring session on.

These traffic mirroring sessions allow you to choose to capture all of the network traffic flowing over the ENI, or you can use traffic mirroring filters to capture the packets that are of particular interest to you. You also have the option to limit the number of bytes captured per packet.

You can use VPC traffic mirroring in a multi-account AWS environment, capturing traffic from Amazon VPCs spread across many AWS accounts, and then routing it to monitoring instances, using Amazon VPC peering or AWS Transit Gateway, in a central Amazon VPC for inspection.

A traffic mirroring session can be created and orchestrated using the AWS Software Developer Kit (SDK) or AWS Command Line Interface (CLI). As you create new workloads, enabling Amazon VPC traffic mirroring at the time of launch is just a few additional commands to your build scripts. You can check out those steps here.

What Customers Did Before Amazon VPC Traffic Mirroring

A challenge for network and security monitoring in any environment is traffic gathering and acquisition. In the on-premises world, a number of methods were created to solve this issue, such as SPAN (Switched Port Analyzer) sessions on physical network switches, or putting inline hardware on physical network connections to gain visibility into the traffic flowing over the network.

As your environment continues to grow on AWS, it becomes critical to keep an ever-watchful eye out for unusual traffic patterns or content that could signify a network intrusion, a compromised instance, or some other anomaly.

Solutions for network traffic monitoring on AWS have historically been limited to anything that can be installed on an Amazon EC2 instance, usually in the form of a software agent, or is extracted from Amazon VPC flow logs. This has impacted the adoption of packet-level monitoring in the cloud due to the cost and complexity of traditional solutions, specifically the need to deploy multiple host-based agents as an example.

Amazon VPC traffic mirroring cuts through this problem elegantly. Now, you can simply enable a traffic mirroring session on an individual ENI without impacting the resources on the underlying workload.

Then, you can direct all of this mirrored traffic, or filter based on components like the protocol, source/destination IP address and port, to various tools, such as the open source options Zeek, Suricata, and Moloch, to name a few, or any other monitoring solution. That’s it. No need for expensive tooling or middleware.

Amazon VPC traffic mirroring enables customers to detect network and security anomalies, gain operational insights, implement compliance and security controls, and most importantly troubleshoot network issues.

Amazon VPC Traffic Mirroring Use Cases

Keeping traffic mirroring costs low is critical when companies begin to look at comprehensive monitoring solutions, such as cases where forensic analysis is required. In the case of incident response, there are many facets to it.

Let’s look at a few different techniques you can execute when using Amazon VPC traffic mirroring in practice.

On-Demand

This is the traditional “something happened” button. A company’s security team identifies a potential threat inside their environment, and they start their incident response procedures. Immediately, Amazon VPC traffic mirroring can be enabled on the Amazon EC2 instances in the identified Amazon VPCs. Traffic is then sent, in real-time, to your security tools in your AWS environment.

Further automation can be achieved here to enable traffic mirroring on the fly for Amazon EC2 instances that meet certain threat remediation criteria defined by your organization. Like everything on AWS, it’s simply an API away to enable Amazon VPC traffic mirroring.

Constant

This is similar to the option above, except the monitoring is in a constant state. This means you’ll be capturing all in-bound and out-bound communication on an Amazon EC2 instance for the duration of its uptime. Constant capture is what most security organizations do on-premises today, but it has not been possible to easily replicate this in the cloud until now.

You can store packet captures in Amazon Simple Storage Service (Amazon S3) using monitoring tools for long-term archival and ready to analyze when needed. Amazon VPC traffic mirroring allows you to instrument everything and have a forensic record of your network traffic.

Sampled

The on-demand use case is often too late for many organizations, while the constant approach is often too much. Because of this, many AWS customers choose sampling as a unique and effective approach to monitoring. The automation and orchestration capabilities of Amazon VPC traffic mirroring allows you to monitor one or many groups of Amazon VPC resources for short amounts of time and then shift to another set of resources.

If any of these monitored groups show any irregularities, they can be tagged and immediately set to be monitored by another set of tools for further analysis, while the packet captures continue to sample traffic from the workloads, looking for threats and anomalies.

Filtering Packet Captures on Amazon VPC Traffic Mirroring Sessions

The best thing about packet captures is that you get all the data. The worst thing about packet captures is that you get all the data. The key with any packet capture strategy is being able to ensure you receive exactly what you need in one area, while still preserving the remainder of the data for later analysis as needed. Amazon VPC traffic mirroring allows you to be surgical, as well as expansive at the same time, with the same data.

Amazon VPC traffic mirroring allows for the creation of multiple sessions for a source ENI. This allows various types of traffic to be mirrored to different tools. For instance, maybe all HTTP/HTTPS traffic is sent to an application performance tool for deeper review. At the same time, SMTP traffic is sent to a specialized tool for data loss prevention. Finally, the remainder of the traffic is sent to an IDS solution for further analysis.

TLS/SSL Decryption with Nubeva and Amazon VPC Traffic Mirroring

Nubeva’s TLS Decrypt is a new, out-of-band solution that decrypts SSL/TLS traffic, enabling security and application teams to inspect and monitor their data in motion.

Nubeva’s born-in-the-cloud architecture works great for TLS 1.3, Elliptic Curve Diffie-Hellman Ephemera (ECDHE), perfect forward secrecy (PFS), and pinned certificates. This allows customers to promote encryption in transit practices in their AWS environment, while providing a solution to securely decrypt the mirrored traffic for additional visibility.

More than 70 percent of all network traffic is currently encrypted. Enterprises need to monitor their applications across Amazon VPCs for both security, compliance, application performance and diagnostics reasons.

Nubeva-Amazon VPC-1

Figure 1 – Challenges with visibility of encrypted network traffic.

While modern encryption protocols provide the highest levels of security, they also limit visibility due to the packet’s encryption. Nubeva integrates with Amazon VPC traffic mirroring to enable decryption and visibility for mirrored encrypted packets.

Nubeva applies a unique out-of-band decryption approach without software or hardware man-in-the-middle (MITM) components. This architecture uses a key-extraction plane independent of the encrypted traffic plane. Nubeva stores encryption keys securely in Amazon DynamoDB tables in the customer’s own AWS account.

Nubeva’s decryption agents merge keys with encrypted traffic and sends the original encrypted packet, as well as the decrypted packet, to the attached tool. This process ensures that decrypted traffic never traverses the customers Amazon VPC network environment.

Nubeva-Amazon VPC-2

Figure 2 – Nubeva Decrypt solution overview.

Customer Success: Financial Services

For one Fortune 500 financial services company, the implication of this capability is significant. This finally unlocks one of the more problematic issues for their security team. One of their top five projects for 2019 was sending decrypted/unencrypted packets to open source tools in the cloud.

In their on-premises data centers, the SOC would decrypt all traffic to their web and app tiers using standard MITM approaches. With the shift to the cloud and the change to TLS 1.2 PFS and TLS 1.3, the MITM approaches simply were not feasible.

With Nubeva’s innovative decryption capabilities, this customer project has new life. It’s now possible to decrypt traffic that no MITM solution could ever hope to decrypt.

Nubeva’s seamless decryption allows customers to send Amazon VPC traffic mirroring data to their centralized Amazon VPC for infosec tooling, which could contain Zeek or Moloch, for example, as well as any other solutions. All of these tools then leverage the key database to search for the applicable keys and unlock visibility for all their tools.

Summary

Amazon VPC traffic mirroring and Nubeva are better together. The introduction of Amazon VPC traffic mirroring has increased network visibility possibilities for AWS customers, whether you’re looking to do captures on-demand, constantly, or sample the traffic.

For customers that need the ability to execute even deeper inspection of Amazon VPC network traffic, Nubeva’s TLS decryption works great with Amazon VPC traffic mirroring to decrypt mirrored traffic on the destination for deeper analysis.

Together, this combined elegant solution enables customers to adopt aggressive encryption in their environment, while also enabling IT teams to have the right level of visibility into their cloud network traffic.

For more details on how to use Nubeva Decrypt with Amazon VPC traffic mirroring, check out the overview video. If you’re interested in trying out the Nubeva product, please visit nubeva.com/aws >>

.


Nubeva-Logo-1
Connect with Nubeva-1

Nubeva Technologies – APN Partner Spotlight

Nubeva is an APN Advanced Technology Partner that allows organizations to gain more visibility of their decrypted packet traffic on the AWS Cloud. Nubeva merges TLS keys with packet feeds for multiple tools and services both in-cloud and on-premises.

Contact Nubeva | Solution Overview

*Already worked with Nubeva? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Leveraging Multi-Model Architecture to Deliver Rich Customer Relationship Profiles with Reltio Cloud

Leveraging Multi-Model Architecture to Deliver Rich Customer Relationship Profiles with Reltio Cloud

By Anastasia Zamyshlyaeva, Chief Architect, Co-Founder at Reltio

Reltio-Logo-1
Reltio-APN-Badge-2
Connect with Reltio-1
Rate Reltio-1

Imagine you are on an international business trip and spot a popular fast-food restaurant that has a global presence. Striding inside in a hurry, you swipe your loyalty card and are ready to order.

The sandwich artist cheerfully asks if you want to go for your favorite—toasted turkey and cheddar sandwich on wheat bread with extra pickles—or if you want to try a tuna sandwich this time. With the access to your profile in the restaurant’s point-of-sale (POS) system, the sandwich artist is well aware of your usual preferences, and that you have occasionally chosen the tuna sandwich before.

Top companies put in great efforts to delight customers like this at every stage in the customer journey. By leveraging technology to deliver highly-personalized experiences, digitally savvy companies are leading the way in building rich customer relationship profiles.

The key challenge for organizations struggling to keep up in this fast-paced world is the lack of reliable customer information from the vast amounts of data stored in different systems. Customer information includes data provided by individuals, transactional data from multiple customer touch points, third-party data, and derived data from analytics.

Building a true Customer 360 requires gaining a comprehensive view of customer behavior and preferences by aggregating data from all of these sources, and more.

With a single source of truth, a true Customer 360 delivers complete a real-time customer view to all parts of the organization—sales, marketing, service, support, etc.

This consistent and contextual insight can help enterprises delight customers with personalized experience and timely offers through each touch point in the customer journey.

With Customer 360, enterprises can achieve:

  • Granular segmentation.
  • Real-time personalization.
  • Omnichannel customer engagement.
  • Faster regulatory compliance.

In this post, I will discuss Customer 360 challenges with traditional databases, and how a different approach of multi-model architecture can help. I will also share how Reltio Cloud delivers true Customer 360 for rich customer relationship profiles, and how they can be leveraged for building great customer experiences.

Reltio is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Life Sciences Competency. The Reltio Cloud platform-as-a-service (PaaS) blends data from various sources to create a true 360 view of a company’s customer base.

Customer 360 Challenges with Traditional Databases

The concept of Customer 360 has been around for some time, but not in the truest sense.

Applications built on rigid relational data models are not suited to be a single source of truth of customer data across all enterprise applications. They are not capable of capturing complex and dynamic real-world relationships, such as graph-powered applications like LinkedIn and Facebook.

The other category of databases, NoSQL or non-relational, includes graph, columnar, key-value, and document, among others. These databases are more suitable for managing large distributed datasets and supporting modern application development.

In practice, enterprises face two key challenges:

  • Manage data with a huge volume, variety, and velocity.
  • Perform multiple types of operations on that data.

Relational databases can support the volume, but not relationships essential for true Customer 360. Columnar databases are fast but do not manage relationships well. Graph databases are perfect for uncovering and handling relationships but lack the horizontal scalability to meet enterprise requirements.

Different types of operations need different types of data storage, and no single technology can work out to be the best. This realization has sparked the idea of a multi-model architecture for a single application that supports the right storage for the right data and offers the flexibility to run different workflows on the same data.

Multi-Model Architecture for True Customer 360

Multi-model databases support a mix of data models and use cases, with a single backend storage. These databases realistically handle the varying data storage needs with different data storage technologies—in other words, polyglot persistence—without compromising performance.

Multi-model architecture allows you to run different queries on different storage types; master data in the document type for search queries or reference data in the relational database. The high-volume transactional data can be in flat files, which also ties up with the master data.

Multi-model architecture helps you to get the maximum business value from your data assets, generating insights and actionable recommendations through data-driven applications.

Minimizing the backend complexity and components to be maintained, multi-model cloud storage can provide a single reliable version of truth available to all users at all times.

Reltio Cloud is built on a multi-model architecture data foundation. It’s a single place where you can bring different types of data together—entities, relationships, graphs, interactions, transactions, reference data, as well as social and third-party data. Basically, everything under one virtual data roof.

Free from the limitations of relational data modeling, Reltio Cloud captures and automatically models a variety of structured and unstructured master, reference, transaction, and activity data, without any volume restrictions.

Reltio Cloud-1

Figure 1 – Delivering speed to value with Reltio and AWS.

The Reltio graph delivers powerful customer insights, including identity resolution, householding, quick segmentation based on any attribute, roll-up of dynamic hierarchical information, finding key influencers, and understanding customer preferences.

AWS for Agility and Security

Reltio Cloud leverages a wide variety of AWS technologies and services to innovate at scale and expand into new markets at speed.

The AWS SaaS Factory helped optimize Reltio Cloud for cost-efficiency and resiliency. Full support from AWS technical services provides effortless scaling up and down with thousands of virtual machines to move any type and any volume of data into Reltio Cloud.

Reltio Cloud takes advantage of multiple AWS solutions, including:

The security model of AWS and its compliance to stringent security standards of diverse verticals assures a safe global infrastructure with high availability.

Building Great Customer Experiences with Rich Customer Relationship Profiles

Innovative global companies leading in this experience economy use Reltio Cloud to create rich customer profiles that deliver exceptional customer experiences.

Reltio Cloud continuously unlocks the value of data relationships across people, products, locations, devices, and other newly-added datasets in an increasingly complex digital and regulatory landscape.

Say you’re on a business trip and you check in to a country club where you plan to spend a night, but this is not your home club. The receptionist surprises you by upgrading your stay and giving you a great deal at a local award-winning restaurant. He also offers to secure discounted tickets for VIP seating to an upcoming concert near your home club as a great way to celebrate your wedding anniversary next month.

The country club operator can deliver this type of exciting experience because they have access to your fresh, rich, and actionable member profile, fueled by Reltio. This data is then taken through analytics to get insights that empower the receptionist with the next best offers and actions aligned with your profile, relationships, and preferences.

Reltio empowers the club operator to personalize the offering for every club member, while adhering to privacy and preferences to deliver memorable experiences and retain their high-net-worth members.

Conclusion

A true Customer 360 provides a real-time, complete view of your customers’ behaviors and preferences using aggregated data from all of your data sources. This is the starting point for designing great customer experiences in today’s experience economy.

Traditional databases are not suitable to build a single source of truth of customer data across all enterprise applications. Capturing complex and dynamic real-world relationships needs a new approach to leveraging different data storage types. A multi-model architecture efficiently manages the varying data storage needs to deliver a true Customer 360.

Using multi-model architecture, Reltio Cloud delivers a single source of truth for customer data, utilizing multiple AWS services for security and agility. Mastering customer data in Reltio Cloud allows you to build rich customer profiles by continuously unlocking the value of data relationships across people, products, locations, and devices.

You can leverage the rich customer profiles built with Reltio Cloud to design and deliver great personalized experiences across all touch-points, from a country club receptionist offering VIP tickets to a concert you may like, to a sandwich artist that knows you just might try the tuna sandwich this time around.

To learn more about Customer 36o and Reltio Cloud, check out these whitepapers:

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

AWS Competency Partners: The Next Smart

Reltio is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-1
.


Reltio-Logo-1
Connect with Reltio-1

Reltio – APN Partner Spotlight

Reltio is an AWS Competency Partner. The Reltio Cloud Platform as a Service (PaaS) blends data across all domains and formats from any internal, third party, and social media sources to create a true Customer 360.

Contact Reltio | Solution Overview

*Already worked with Reltio? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

How to Secure Enterprise Cloud Environments with AWS and HashiCorp

How to Secure Enterprise Cloud Environments with AWS and HashiCorp

By Kevin Cochran, Senior Solutions Engineer at HashiCorp

HashiCorp-Logo-1
Stackery-APN-Badge-2
Connect with HashiCorp-1
Rate HashiCorp-1

Securing applications can be a tricky topic, as security isn’t always top of mind for developers because it can slow down software releases.

HashiCorp Vault helps eliminate much of the security burden developers experience while trying to comply with security team requirements.

Vault was built to address the difficult task of passing sensitive data to users and applications without it being compromised. Within Vault, all transactions are token-based, which limits potential malicious activity, and provides greater visibility into whom and what is accessing that information.

Vault achieves this through a number of secrets engines and authentication methods that leverage trusted sources of identity, like AWS Identity and Access Management (IAM).

In this post, I will walk you through several of Vault’s features that can help you get started. You’ll see just how simple security can be!

HashiCorp is an AWS Partner Network (APN) Advanced Technology Partner with AWS Competencies in both DevOps and Containers.

Vault Auto Unseal with AWS Key Management Service

First, let’s cover how to unseal a Vault cluster. This can be simplified by storing Vault’s master key in AWS Key Management Service (KMS) and enabling the auto unseal feature.

By default, when a Vault server is created or reset it initiates in a sealed state. This is important because when Vault is sealed it can’t encrypt or decrypt anything. Basically, it forgets what its master key is and locks out any potential threats from gaining access. This is a critical feature when you know or suspect your environment has been compromised.

Unsealing Vault is not a simple as—and for good reasons. Vault utilizes the Shamir secrets-sharing technique when a new server is initiated. To unseal Vault with this technique, you must provide the minimum number of keys, as determined by the team that created the Vault server.

Should you need to unseal Vault (either from a manual seal or a restart), getting the requisite number of keys may take longer than your service level agreement (SLA) can support.

With auto unseal, Vault reaches out to KMS to retrieve its master key rather than reconstructing it from key shards. This means that to unseal Vault, all you need to do is restart the Vault service. You can still manually seal Vault in the case of a security issue, but unsealing can be done safely, securely, and easily.

Setting Up Auto Unseal

Setting up auto unseal with KMS takes only a few minutes, and the configuration is very simple. To get started, make sure your server’s environment is setup with your AWS credentials.

First, we need to add a stanza to Vault’s configuration file. Add the following lines:

seal "awskms" {
    region = "AWS_REGION"
    kms_key_id = "AWS_KMS_KEY_ID"
}

Next, issue a service restart command, such as:

service vault restart

If you’re starting with a brand new instance of Vault, you can go ahead and initialize Vault now by issuing the following command:

vault operator init -key-shares=1 -key-threshold=1

However, if you’re migrating your master key to KMS, we’ll need a couple of more steps. Moving the key to KMS takes place during the unseal process by adding a -migrate flag.

vault operator unseal -migrate UNSEAL_KEY_1
...
vault operator unseal -migrate UNSEAL_KEY_N

Your master key is now stored in KMS. Since KMS is considered a trusted source, we no longer need to use key shares. However, we still need to rekey Vault to have a single key.

We’ll first need to initialize a rekey and reduce the key shares and key threshold each to 1:

vault operator rekey -init -target=recovery -key-shares=1 -key-threshold=1

Vault provides you with a nonce token, which we’ll need for the next step. Now, we need to complete the process by using our nonce token with each of our original unseal keys:

vault operator rekey -target=recovery -key-shares=1 -key-threshold=1 -nonce=NONCE_TOKEN UNSEAL_KEY_1
...
vault operator rekey -target=recovery -key-shares=1 -key-threshold=1 -nonce=NONCE_TOKEN UNSEAL_KEY_N

That’s all there is to it! You can test it by restarting your vault service and checking the status:

service vault restart

Then:

vault status

Your Vault should be automatically unsealed:

Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    1
Threshold                1
Version                  1.1.0
Cluster Name             vault-cluster-e4e06553
Cluster ID               29314997-4388-f66d-4b5a-3ac892504ee9
HA Enabled               false

Now that Vault can be safely sealed or unsealed, you’re ready to use your Vault instance for secrets management.

Dynamic Database Credentials

Databases are where we typically find the most sensitive data of any organization.

It would make sense to take extra precautions with database access, but in the past this access was managed at a local level on a username and password basis. In the cloud, we need to manage credentials for users and applications at a much greater scale.

Vault allows you to dynamically create database credentials, which opens up a whole world of possibilities. For instance, your application may get a 24-hour lease on database credentials, and upon expiration, have a new set of credentials generated. Or you may want to generate short-lived credentials with read-only database permissions through a self-service portal.

These credentials are removed from the database upon expiry, meaning Vault manages the clean-up and reduces the burden of password rotations. In addition, you can ensure that each instance of an application has its own unique credentials for provenance.

Setup is actually quite simple. In this example, we’re using MySQL.

To get started, you’ll need a username and password which has the ability to create users. We’re going to use root, and the credentials we’ll create will have the equivalent of root access.

The first thing we need to do is enable the database secrets engine:

vault secrets enable database

Next, we need to configure a database connection. Vault currently supports eight major database engines with multiple variants of each, and custom configurations.

For MySQL, a database configuration can be issued like this:

vault write database/config/mysqlvaultdb \
    plugin_name="mysql-database-plugin" \
    connection_url=":@tcp(https://mysql.example.com:3306)/" \
    allowed_roles="db-app-role" \
    username="root" \
    password="password"

In the command above, we create a configuration named mysqlvaultdb. The connection URL contains a reference to the username and password, and points to your MySQL instance.

We haven’t created any roles just yet, but we’re letting Vault know that only the db-app-role role is allowed to use this connection.

Finally, we provide the username and password to be used in the connection string which Vault uses to interact with the database.

Next, we need to create a role which executes a CREATE USER statement in MySQL:

vault write database/roles/db-app-role \
    db_name=mysqlvaultdb \
    creation_statements="CREATE USER ''@'%' IDENTIFIED BY '';GRANT SELECT ON *.* TO ''@'%';" \
    default_ttl="1h" \
    max_ttl="24h"

This role, which we named db-app-role, is the same name we referenced in the allowed_roles in the connection configuration. The db_name is the Vault connection we created just prior: mysqlvaultdb.

The creation_statements is where the action takes place and gives the Vault administrator total control over what Vault is allowed to do within the database.

Here, Vault internally creates a username and a password, interpolates creation_statements to plugin the username () and password (), then passes the final SQL statement to the connection to be executed on the server.

Upon success, Vault returns the username and password, valid for default_ttl—in this case one hour.

Now, let’s put this to use by creating a policy and a user. We’ll create a file called getdbcreds.hcl and put the following contents in it:

path "database/creds/*" {
    capabilities = ["read"]
}

Then, we need to create the policy in Vault. We’ll call the policy getdbcreds:

vault policy write getdbcreds getdbcreds.hcl

We’re going to create a user with a simple username/password scheme. This authentication method needs to be enabled first:

vault auth enable userpass

Finally, we’ll create our user and assign the policy we just created:

vault write auth/userpass/users/james \
    password="superpass" \
    policies="getdbcreds"

To test that everything works, simply login to Vault as our new user:

vault login -method=userpass username=james

Enter the password, then issue the following command:

vault read database/creds/db-app-role

You’ll see something similar to this:

Key                Value
---                -----
lease_id           database/creds/db-app-role/iaIWuTCjE4KszxSHPFbpS6V7
lease_duration     1h
lease_renewable    true
password           A1a-ClBMDtllDELhA47d
username           v-userpass-j-app-role-o1msTfFl1e

Our user james is now able to login to the MySQL database using these credentials. After one hour, those credentials will expire and he’ll need to request a new set of credentials.

Amazon EC2 Authentication

Manually passing around secrets and tokens to applications and servers is a security hazard. Once they get loose, it’s hard to reel it all back in.

At HashiCorp, we call this challenge secrets sprawl. Vault provides several mechanisms by which users and applications can authenticate without passing secrets, keys, or tokens. One way is by using Amazon Elastic Compute Cloud (Amazon EC2) authentication. Though the IAM authentication method is preferred, Amazon EC2 allows us to use existing resources.

The Amazon EC2 authentication method allows Vault to identify an instance based on any number of attributes.

For our example, we’re just going to be using the Amazon Machine Image (AMI) Id to validate that the instance can login. If the attributes don’t match, login is denied. The full set of attributes can be found in our AWS Auth API documentation.

We’ll need to start by enabling the AWS authentication method in Vault:

vault auth enable aws

Vault will also need to communicate with our AWS account, so we need to provide our access credentials:

vault write auth/aws/config/client \
    secret_key=XXXXXX \
    access_key=XXXXXX

Our Amazon EC2 instance will be requesting access to our MySQL database, so we can use the policy we created in our previous example: getdbcreds.

We want a role which authenticates an Amazon EC2 instance based on its AMI Id, for a session that will last one hour, and has the ability to get database credentials:

vault write \
    auth/aws/role/app-db-role \
    auth_type=ec2 \
    policies=getdbcreds \
    max_ttl=1h \
    disallow_reauthentication=false \
    bound_ami_id=ami-0475f60cdd8fd2120

Vault is now ready to authenticate Amazon EC2 instances. To validate, we need an instance which is using the AMI we specified. We’ll login to that system and use the HTTP API to communicate with our Vault server.

Once logged in, we’ll want to get the PKCS7 signature from the instance’s metadata:

pkcs7=$(curl -s \
  "http://169.254.169.254/latest/dynamic/instance-identity/pkcs7" | tr -d '\n')

Along with the signature, we need to tell Vault what role we are requesting access to:

data=$(cat <<EOF
{
  "role": "app-db-role",
  "pkcs7": "$pkcs7"
}
EOF
)

Now, we’re ready to login to Vault:

curl --request POST \
  --data "$data" \
  "http://vault.example.com:8200/v1/auth/aws/login"

Vault responds with a JSON payload, which looks something like this:

{
  "request_id": "b30f4111-95b7-4481-e98f-f7a86ba9c0b9",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "TTL of \"768h0m0s\" exceeded the effective max_ttl of \"1h0m0s\"; TTL value is capped accordingly"
  ],
  "auth": {
    "client_token": "s.FErTfpbFlkfDX3pUjkgldXT8",
    "accessor": "22pd7MJLBTMK2gvRmw7tM3Ku",
    "policies": [
      "default",
      "getdbcreds"
    ],
    "token_policies": [
      "default",
      "getdbcreds"
    ],
    "metadata": {
      "account_id": "753646501470",
      "ami_id": "ami-0475f60cdd8fd2120",
      "instance_id": "i-0e50b4b3e6fce4853",
      "nonce": "03a6eb04-931d-d602-8bb2-9065134144d8",
      "region": "us-west-2",
      "role": "app-db-role",
      "role_tag_max_ttl": "0s"
    },
    "lease_duration": 3600,
    "renewable": true,
    "entity_id": "f02d29a9-f72c-fa34-2bbf-31baeb8c5fee",
    "token_type": "service",
    "orphan": true
  }
}

The value we’re most interested in is the client_token, which tells us we authenticated successfully and can now communicate with Vault using the specified role.

We can now simple pass that token as a header value and get our database credentials:

curl \
    --header "X-Vault-Token: CLIENT_TOKEN." \
    http://vault.example.com:8200/v1/database/creds/db-app-role

Which returns the following:

{
  "request_id": "1aac4536-97e1-8121-17d1-656ab953a963",
  "lease_id": "database/creds/db-app-role/wPefgAXF5rZjiRJfdC2S7fik",
  "renewable": true,
  "lease_duration": 3600,
  "data": {
    "password": "A1a-kENCugtGPxPDq4tn",
    "username": "v-aws-app-role-Clp0KoQNv5TdOvXzx"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Your application can now use dynamic database credentials using Amazon EC2 authentication. By running Vault agent on the instance itself, the instance can stay logged in without needing to reauthenticate.

You can easily test that no other instances can login by spinning up an Amazon EC2 instance with a different AMI Id and running through the same commands.

Encryption as a Service

Encryption is complicated. We need it for all kinds of data—both at rest and in transit. Outside of application development, we have security teams who understand how to put it all together, but encryption is often highly specialized within the developer community.

Unfortunately, some organizations fully expect developers to responsibly handle the encryption and decryption of sensitive data.

Yes, there are encryption libraries available, but they’re not as easy to use as most developers would like. They are, in fact, libraries and must support a multitude of use cases. Developers need to know which encryption algorithm should be used for their project.

Vault’s transit engine solves this dilemma by providing an API for developers to use for encrypting and decrypting data. This makes encryption a part of their existing workflow.

The developer simply passes in the data, and Vault returns the ciphertext. That text can be stored in place of the original data, and should your database ever be compromised, the attacker will only see useless, encrypted text.

As you might know by now, enabling the transit engine is quite simple:

vault secrets enable transit

Creating a key is just as simple:

vault write -f transit/keys/customer-key

Next, we need to add a policy which allows read access to our new key:

vault policy write "custkey" -<<EOF
path "transit/keys/customer-key" {
    capabilities = ["read"]
}
EOF

Now, assign this policy to an entity you would like to gain access to this key, such as an app role, an IAM role, or an instance.

Once your resource has authenticated, use the token with the API to encrypt/decrypt:

curl -s \
    --header "X-Vault-Token: $CLIENT_TOKEN" \
    --request POST \
    --data '{ "plaintext": "SGFzaGlDb3JwIFZhdWx0IFJvY2tzIQ==" }' \
    http://vault.example.com:8200/v1/transit/encrypt/customer-key

In return, we receive a payload with a ciphertext value. There are three columns delimited by a colon (:). The first column is the word vault, which makes it easy for developers to determine if it’s encrypted data or not. The next column—currently v1—is the version of the key.

The entire payload looks like this:

{
  "request_id": "f87aab69-5b96-4311-358e-d157cc5a4e77",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "ciphertext": "vault:v1:ctwlaZ4QI+hzwZJwMsQo0zJzGNfhhLoCoQh4PV1lPO0QhgxLhNZfXeM4KvJj0CKq9gM="
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Decrypting the data follows the same process. We just pass over the entire ciphertext we first received from vault’s transit engine:

curl -s \
    --header "X-Vault-Token: $CLIENT_TOKEN" \
    --request POST \
    --data '{ "ciphertext": "vault:v1:ctwlaZ4QI+hzwZJwMsQo0zJzGNfhhLoCoQh4PV1lPO0QhgxLhNZfXeM4KvJj0CKq9gM=" }' \
    http://vault.example.com:8200/v1/transit/decrypt/customer-key

Here, we get back our plaintext we originally sent over:

{
  "request_id": "eee22c0d-5674-2171-9df3-398d3d231f78",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "plaintext": "SGFzaGlDb3JwIFZhdWx0IFJvY2tzIQ=="
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

As you may have noticed, our plaintext is actually a base64-encoded bit of text.

Summary

HashiCorp Vault is specifically designed for public and private clouds operating in low- or zero-trust environments. In this post, we’ve covered a few of the many features Vault offers for IT organizations.

For enterprise customers, Vault offers a host of robust features meeting the requirements of governance and compliance, such as HSM integration, FIPS 140-2 compliance, disaster recovery, replication (performance, cross-region, and filter sets), namespaces, and more.

If you’ve never used Vault, download our open source version and run through the tutorials on learn.hashicorp.com.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

AWS Competency Partners: The Next Smart

HashiCorp is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-1
.


HashiCorp-Logo-1
Connect with HashiCorp-1

HashiCorp – APN Partner Spotlight

HashiCorp is an AWS DevOps Competency Partner. Enterprise versions of products like Vault enhance the open source tools with features that promote collaboration, operations, governance, and multi-data center functionality.

Contact HashiCorp | Solution Overview

*Already worked with HashiCorp? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Pivotal Greenplum on AWS: Parallel Postgres for Enterprise Analytics at Scale

Pivotal Greenplum on AWS: Parallel Postgres for Enterprise Analytics at Scale

By Ivan Bishop, Partner Solutions Architect, ISV Migrations at AWS
By Jon Roberts, Principal Engineer at Greenplum (Pivotal)

Pivotal-Logo-1
Pivotal-APN-Badge-2
Connect with Pivotal-1
Rate Pivotal-1

Are you thinking of deploying Pivotal Greenplum on the Amazon Web Services (AWS) Cloud? Many customers also want to shift the responsibility for infrastructure to AWS, considering the alternatives.

With an on-premises deployment, it takes time to provision floor space in a data center, run power cables and fiber, and ensure adequate cooling. Then, you have to acquire the hardware, provision IP addresses, install and harden the operating system (OS) across multiple machines, and finally address monitoring and security. Only then can you install and configure Pivotal Greenplum to your on-premises infrastructure.

With Pivotal Greenplum on AWS, deployments are completely automated and completes in less than an hour. In fact, the barrier to entry is low enough that business units may deploy production-ready clusters themselves, without IT involvement.

Pivotal Greenplum is a commercial fully-featured Massively Parallel Processing (MPP) data warehouse platform powered by the open source Greenplum Database. It provides powerful and rapid analytics on petabyte scale data volumes, and is available on AWS Marketplace.

Pivotal and AWS have worked together to make deployment and ongoing operations of Pivotal Greenplum easy and painless. Speed, ease of management, and security are some of the key reasons we see enterprises shifting Pivotal Greenplum to AWS.

In this post, we focus on leveraging Pivotal Greenplum (parallel Postures) for enterprise-scale analytics. We present discussions around deployment, updates, security, and speed.

Pivotal is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Container Competency that helps the world’s largest companies transform the way they build and run software.

Performance Benefits

Customers use Greenplum because it’s fast. You can run your query and the results will come back moments later. Greenplum on AWS is optimized for performance and can be even faster than a comparably configured on-premises deployment.

To achieve the best performance, we tuned both Greenplum and AWS resources in the following ways:

  • All Pivotal Greenplum nodes are placed into Auto Scaling Groups to boost resiliency.
  • All nodes feature 10GB or faster networking for maximum performance.
  • Data volumes use throughput optimized ST1 disks.

The Greenplum CloudFormation template uses AWS placement groups to minimize latency by placing nodes in close physical proximity.

We gauge performance on AWS using the same open source utilities that are used for on-premises deployments: gpcheckperf, and the TPC-DS benchmark. We also factor in the documented AWS specs for each virtual machine (VM) and disk type.

In particular, the TPC-DS benchmark is extremely useful for comparing performance across deployments using real-world database loading and query activities.

TPC-DS Benchmark Score

The Transaction Processing Performance Council (TPC) has created many benchmarks for different database workloads. The most commonly used benchmark for big data, data warehousing, and analytics is the Decision Support (DS) benchmark, or TPC-DS.

This benchmark consists of a star schema with 24 tables and 99 queries. Common parameters for benchmarking are 3TB of data, and the query execution of both one and five concurrent users.

The TPC-DS benchmark also includes more traditional DS activities like update statements. However, for Pivotal Greenplum, these activities are omitted from the quoted scores you can see in Figure 1, as they do not apply.

Simply put, the higher the score, the faster the cluster. Thanks to Pivotal Greenplum’s MPP architecture, more hardware will produce better results. Therefore, it’s most useful to compare the score in relation to the number of segment cores in the cluster.

Pivotal Greenplum-1

Figure 1 – TPC-DS benchmark scores as a function of instance size.

As you can see, Pivotal Greenplum on AWS can achieve better results than a comparably deployed on-premises appliance solution. Of course, there’s a price-performance balance you’ll need to strike, and your Pivotal account team can help you with that.

Note that the AWS vCPUs quoted above are hyper-threaded, so two vCPUs equate to a single core.

The best way to deploy Pivotal Greenplum is via AWS Marketplace. Follow the documentation, and your deployment will complete quickly, in less than an hour.

Pivotal Greenplum-2

Figure 2 – A typical Greenplum deployment on AWS.

Node Replacement

Pivotal Greenplum nodes are deployed on AWS using Auto Scaling Groups. It automatically provisions the number of nodes specified, and if a node fails for any reason, the Auto Scaling Group automatically terminates the failed node and replaces it with a new one.

Pivotal Greenplum-3

Figure 3 – Failed Greenplum nodes replaced by Auto Scaling Group.

For data availability, Pivotal Greenplum uses mirroring, a concept similar to HDFS replication (three copies of the data). When a node fails, the Master node “promotes” the Mirror Segment to act as a Primary. After the new node comes online, the self-healing mechanism goes to work. It executes the commands needed to restore the system to its fully-functional state.

To ensure that user queries operate as normal during Segment recovery, the pgBouncer connection pooler pauses queries before Segments are rebalanced. This ensures that queries stay in the queue during Segment recovery.

Single Master Node Replacement

In an on-premises deployment of Pivotal Greenplum, a Standby Master node is recommended. This node is mostly idle; it’s there in case the Master node fails, ensuring continuity if and when the Master node is replaced.

Thanks to self-healing on AWS, the Standby Master process has been moved to the first Segment node as part of the automated AWS install process. Scripts within the Amazon Machine Image (AMI) assign roles to the nodes in the Auto Scaling Group. If the Master node were to fail, the Standby Master is temporarily made to be a Master, and then demoted back to be a Standby Master. This is all done automatically.

Pivotal Greenplum-4

Figure 4 – The MDW distributes queries via the network interconnect to Segment nodes.

The Greenplum Database master (MDW) is the entry to the Greenplum Database system, accepting client connections and SQL queries, and distributing work to the segment instances (SDWn). When a user connects to the database via the Greenplum master and issues a query, processes are created in each segment database to handle the work of that query.

By carefully matching AWS instance types and storage usage, customers can optimize their AWS consumption and Pivotal Greenplum license spend whilst preserving or increasing performance.

Disk Snapshots

Amazon Elastic Block Store (Amazon EBS) volumes have a snapshot feature that is useful in backing up an EBS volume to Amazon Simple Storage Service (Amazon S3). EBS snapshots are stored in Amazon S3, but not in a user-visible bucket.

Pivotal Greenplum on AWS includes the gpsnap utility. This automates the execution of EBS snapshots in parallel for your entire cluster.

Pivotal Greenplum-5

Figure 5 – Making a gpsnap backup for a future possible restore.

Each disk gets a snapshot and is tagged so that gpsnap can be used to restore the snapshots to the correct nodes and mounts.

A backup can be created with gpsnap on AWS extremely quickly—typical execution times are around one minute. Snapshot performance is completely dependent on AWS, and Greenplum waits until all of the disk snapshots are in the “pending” or “completed” status before a database restart process kicks off.

The snapshots then have to complete, and that performance depends on how full the disks are and if there are prior snapshots. The gpcronsnap utility automates the scheduled execution of backups and are pre-configured to execute weekly.

Disaster Recovery

A great advantage of deploying Pivotal Greenplum on AWS is taking advantage of EBS snapshots for disaster recovery (DR).

Pivotal Greenplum-6

Figure 6 – With Greenplum, gpsnap data can be copied across AWS regions.

The aforementioned gpsnap utility can copy a snapshot from one region to another. You can then restore it to a new cluster when needed in a different region.

This is an on-demand, DR solution that is cost effective. You don’t need to add the cost and complexity of a second cluster.

Upgrading Pivotal Greenplum

Another cloud-only utility for Pivotal Greenplum is gprelease, which automates the upgrade of Pivotal Greenplum on AWS. It also upgrades optional packages, like MADlib, Command Center, and PostGIS.

The gpcronrelease utility runs weekly and will notify you when a new version is available. Even the cloud tools such as gpsnap and gprelease are upgraded with gprelease.

Automated Maintenance

Customers will enjoy peak performance for Pivotal Greenplum by following a few proven best practices, like analyzing, vacuuming, and reindexing.

All of these practices are combined in the gpmaintain utility, which automates many of the administrative tasks needed in a production database. The gpcronmaintain utility automates scheduled maintenance and can be easily configured to run more or less frequently.

Optional Installs

During the initial deployment of Pivotal Greenplum on AWS, many optional components are available. In Figure 7 below, you can see a few components that may interest data scientists and administrators, such as:

  • Greenplum Database provides a collection of data science-related Python modules that can be used with the Greenplum Database using PL/Python or PL/R languages.
  • Greenplum Command Center (GPCC) is a web-based application for monitoring and managing Greenplum clusters. GPCC works with data collected by agents running on the segment hosts and saved to the gpperfmon database
  • MADlib is an open-source library for scalable in-database analytics. With the MADlib extension, you can use MADlib functionality in a Greenplum database.
  • PostGIS is a spatial database extension that allows GIS objects to be stored in a Greenplum database.

Pivotal Greenplum-7

Figure 7 – Greenplum install window.

It is possible to use the gpoptional utility to install or re-install any of these components after the deployment has been completed to further customize the deployment.

Web-Based phpPgAdmin

Pivotal Greenplum on AWS also includes phpPgAdmin, a web-based SQL tool. Business users, developers, and administrators use phpPgAdmin to perform ad hoc queries and browse schemas. It’s a handy utility for many common scenarios.

Pivotal has optimized phpPgAdmin for Pivotal Greenplum and created a Pivotal user interface theme. A self-signed SSL certificate is created during the deployment, so that traffic from your browser to the cluster is encrypted.

Pivotal Greenplum-8

Figure 8 – Self-signed/commercial SSL certificate encrypt client-Greenplum connections.

In Figure 8 above, you see a Pivotal Greenplum SSL connection encrypting a query using a self-signed certificate.

Security in Review

Security is paramount, so Pivotal has worked with AWS to incorporate a number of best practices. These capabilities are designed to reduce your risk and ensure compliance with common enterprise requirements.

The Pivotal Greenplum AMI is regularly reviewed and scanned for vulnerabilities. The AWS CloudFormation template is also reviewed by AWS Solutions Architects, offering additional protection.

We protect your credentials, too. Password authentication is disabled; we use SSH keys instead. We also use MD5 encrypted password authentication, and we have disabled root and password file logins.

Want data encryption at rest? It’s available via Amazon EBS encryption. An added bonus, your snapshots are automatically encrypted if the source EBS volume is encrypted.

Lastly, all Greenplum deployments are created in a dedicated Amazon Virtual Private Cloud (VPC) to ensure network isolation and easier management of security rules.

Summary

In this post, we provided a stepwise discussion on why running Pivotal Greenplum on AWS is a compelling option for enterprise-scale analytics.

You can choose to leverage AWS for a Greenplum to simplify deployments over a traditional on-premises solution. Performance of the Greenplum database is comparable to, or greater than, the on-premise deployed solution by right-sizing the selected instance types during a highly automated CloudFormation execution. TPC-DS benchmark data helps align performance with instance pricing.

The AWS-deployed environment scales and “self heals” using Auto Scaling Groups, while day-to-day backups and disaster recovery (even across AWS regions) are possible by leveraging Amazon EBS snapshots combined with the Greenplum gpsnap tool.

Upgrading Greenplum is simplified using the cloud-only gprelease tool, whereas the core data science and other in-database analytics tools may be readily (re)configured using the gprelease and gpmaintain utilities.

Furthermore, optional installs provide a highly customized, customer-cenrtic data science environment. The phpPgAdmin tool provides easy access to Greenplum databases to run queries and perform schema analysis via SSL, if needed

Pivotal works closely with AWS to deploy and maintain a secure operating environment, and AWS Marketplace makes it simple for even small business groups to deploy Pivotal Greenplum on AWS.

You can learn more about Pivotal Greenplum in the eBook Data Warehousing with Greenplum, Second Edition.
.

AWS Competency Partners: The Next Smart

Pivotal is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-1
.


Pivotal-Logo-1
Connect with Pivotal-1

Pivotal – APN Partner Spotlight

Pivotal is an AWS Competency Partner. They help the world’s largest companies transform the way they build and run software. Pivotal Greenplum is a commercial fully-featured MPP data warehouse platform powered by the open source Greenplum Database..

Contact Pivotal | Solution Overview | AWS Marketplace

*Already worked with Pivotal? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Introducing Amazon Forecast and a Look into the Future of Time Series Prediction

Introducing Amazon Forecast and a Look into the Future of Time Series Prediction

By Dr. Sami Alsindi, Data Scientist at Inawisdom

Inawisdom-Logo-1
Inawisdom-APN-Badge-2
Connect with Inawisdom-1
Rate Inawisdom-1

Time series forecasting is a common customer need, so a means to rapidly create accurate forecasting models is therefore key to many projects. Amazon Forecast accelerates this and is based on the same technology used at Amazon.com. This post explores the use of this new service for energy consumption forecasting.

Inawisdom is an AWS Partner Network (APN) Advanced Consulting Partner with the AWS Machine Learning Competency. We work with organizations in a variety of industries to help them exploit their data assets.

Our goal at Inawisdom is to accelerate adoption of advanced analytics, artificial intelligence (AI), and machine learning (ML) by providing a full-stack of AWS Cloud and data services, from platform through data engineering, data science, AI/ML, and operational services.

We routinely work with time series data to perform forecasting for a variety of customer use cases, including personal financial predictions for consumers and predictive maintenance for manufacturers. Being able to project time series data into the future with a measure of confidence allows customers to make informed business decisions in a quantitative manner.

The Problem

One of the most exciting projects I have worked on at Inawisdom was with Drax, a UK-based energy supplier. The goal was to automatically detect anomalous energy consumption within their Haven Power retail business.

Across a portfolio of thousands of customers, each reporting their consumption every half hour, manually detecting consumption pattern changes and anomalous activity is difficult and time consuming.

The time taken to identify events that indicate faulty meters, safety issues, energy theft, and changes of tenancy results in inefficiencies and debt recovery challenges.

Current Solution

One very effective approach to create forecasts for electricity consumption is to use Amazon SageMaker’s built-in model DeepAR.

DeepAR is a LSTM neural network that can be used to forecast time series data, accounting for trends and seasonality of the time series in order for the network to learn and give accurate forecasts.

The raw dataset we worked on consisted of millions of half-hourly energy consumption readings with years of data per customer. The results are impressive, but data wrangling took roughly two weeks in the initial phase of the project to create the forecasts.

From the created forecasts, anomalies for the previous week can be detected using another Amazon SageMaker built-in model—RandomCutForest (RCF)—on the differences from observed usage to predicted usage. To learn more, check out the case study for this project.

Inawisdom-Forecast-1.2

Figure 1 – Example of a Fault Drop anomaly.

In Figure 1, you can see an example of an automatically-detected anomaly with a week’s worth of electrical usage shown. In blue, we have the real consumption; in pink, the confidence interval from DeepAR is plotted, with the median shown as a line.

The uncharacteristic blip downwards is the 29th most significant anomaly; this triggers a classification procedure that has identified this pattern as a “Fault Drop.”

Inawisdom-Forecast-2.1

Figure 2 – Example of a Change of Tenancy anomaly.

Another example of a detected anomaly is shown in Figure 2. This time, continuous uncharacteristically low usage triggered the class of “Change of Tenancy.”

This is perhaps the most important business anomaly type that needs to be identified. The longer time period that’s passed since the customer moved out of the premises, the less likely the contact details Haven Power has for the customer will be up-to-date. Consequently, this means a lower chance of recovering the customer’s outstanding debt.

Integrating Amazon Forecast with Amazon SageMaker

Amazon Forecast is the new tool for time series automated forecasting. With Amazon Forecast, I was pleasantly surprised (and slightly irritated) to discover that we could accomplished those two weeks of work in just about 10 minutes using the Amazon Web Services (AWS) console.

From my initial experiences, Amazon Forecast will be an extremely useful accelerator for any time series predictions, such as retail demand forecasting, freeing up the time of data scientists for more interesting things.

AWS has supplied a Software Development Kit (SDK) for full integration into Amazon SageMaker, and you can view the documentation and example JupyterNotebooks on Github. Using the graphical user interface (GUI), however, actually sidesteps this whole issue and is a lot easier.

To integrate Amazon Forecast with Amazon SageMaker, you first need to create a dataset group. All that’s required is a single TARGET_TIME_SERIES file containing the data as a row-wise .csv with three columns: timestamp, item_id, and a float that’s the target of the predictor model. You can also add ITEM_METADATA and RELATED_TIME_SERIES data.

Sticking with an electricity example, the TARGET_TIME_SERIES data will be hourly meter readings, the item_ids will correspond to individual meters, and the target float will be consumption in kWh. We could add to the ITEM_METADATA any groupings, such as Standard Industry Classification (SIC) codes that group similar businesses. Finally, RELATED_TIME_SERIES data could consist of weather data, for example.

Amazon Forecast handles the backend processing and transformation of these data, while you submit a job—this can take some time—and come back to your newly-parsed dataset. There is the option to automatically refresh the dataset the model is trained on, which is something that used to involve significant effort in setting up an AWS Step Function and several AWS Lambda functions to re-parse the data, or re-process it. Forecast takes the hard work away.

Inawisdom-Forecast-3

Figure 3 – Forecast datasets.

Once this is complete, you can train a predictor that can predict for up to one-third the duration of your dataset, with predictions starting for the time periods just after your dataset ends.

You define the forecast horizon, how many periods you want Amazon Forecast to look into the future, and the “recipe,” which can be one of the built-in predictor types such as DeepAR+, which in evolution of DeepAR. However, you can forego the guesswork and allow Amazon Forecast to determine the optimal predictor automatically by choosing the AutoML option, which trains using all of the recipes. Just select the recipe which results in the best fit to your dataset.

There is also the option to automatically, and periodically, retrain your predictive model. This also used to involve setting up AWS Step Functions and AWS Lambda functions, and again is made simple with Amazon Forecast.

In our case, we will first predict the next few days (72 hours):

Inawisdom-Forecast-4

Figure 4 – Train predictor parameters.

Once your predictor is trained, you can deploy it in order to make predictions.

Inawisdom-Forecast-5

Figure 5 – Predictor overview.

Once deployed, you can make predictions.

Inawisdom-Forecast-6

Figure 6 – Forecasting configuration.

In Figure 7 below, you can see hourly predictions for the 72-hour period after the last of the data available for meter “client_10.” In grey and black, we have the original data, the tail end of the observed usage for this particular meter. In orange, we have median (50 percent) prediction, and in green the upper confidence interval (90 percent).

Inawisdom-Forecast-7

Figure 7 – Forecast results (hourly).

Predictions can also be generated with lower frequency (e.g. daily) to see gradual trends. I have done this below with another predictor that calculates monthly predictions.

Inawisdom-Forecast-8

Figure 8 – Forecast results (daily).

And, of course, all of the above can be carried out algorithmically or parametrically using Amazon SageMaker implementations, as well. The possibilities are limitless!

Conclusion

Amazon Forecast makes time series forecasting effortless, removing the need for the undifferentiated heavy-lifting aspects that usually underpin it.

Additionally, Amazon Forecast massively reduces the effort required to automate data updating and model retraining. It manages this while also retaining the granularity of control that data scientists will appreciate and utilize. If only this tool had arrived three months sooner for my previous project!

AWS continues to champion the democratization of advanced and cutting-edge machine learning models, with Amazon Forecast being a perfect example of abstracting away the difficulty of model selection with the AutoML mode.

At Inawisdom, we fully embrace these developments that allow us to provide ever greater business benefit to customers and facilitate more and more exciting projects. I can’t wait to see what comes along next. Perhaps I can forecast it.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

AWS Competency Partners: The Next Smart

Inawisdom is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-2

.


Inawisdom-Logo-1
Connect with Inawisdom-1

Inawisdom – APN Partner Spotlight

Inawisdom is an AWS Machine Learning Competency Partner. Their ML practice enables customers to outperform the market by discovering value within their data through implementing advanced analytics, as well as AI and ML techniques.

Contact Inawisdom | Practice Overview

*Already worked with Inawisdom? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Building a Simple Serverless WebSocket with Stackery and AWS

Building a Simple Serverless WebSocket with Stackery and AWS

By AM Grobelny, Startup Partner Solutions Architect at AWS
By Chase Douglas, CTO at Stackery

Stackery-Logo-1
Stackery-APN-Badge-2
Connect with Stackery-1
Rate Stackery-1

With the addition of WebSocket support in Amazon API Gateway, we set out to build an AWS CloudFormation template to get the simplest WebSocket version. What we built was backed by an AWS Lambda function that responds directly to the client who initiated the request.

We learned quite a few things along the way, and even ended up with an easy solution in Stackery for managing and deploying one of these simple WebSockets.

Stackery is an AWS Partner Network (APN) Advanced Technology Partner with the AWS DevOps Competency. Its tool is dedicated to making serverless application development simpler.

In this post, we use the Stackery Canvas, which lets developers visually construct serverless applications and automatically generates CloudFormation templates. For the more terminal-driven people out there, Stackery also offers a command line interface (CLI) tool for deploying and managing serverless architectures.

To understand the distinction between one-way and two-way WebSockets with Amazon API Gateway, we’ll also discuss the alternative—a one-way communication WebSocket with messages sent to clients through the @connections API.

Overview

Our goal in this post is to build a very simple serverless WebSocket. Since this is a serverless app, we’ll use Stackery to visualize our serverless architecture and deploy directly to AWS.

Luckily, Stackery has a built-in resource for this kind of WebSocket, so it’s as easy as dragging and dropping a WebSocket onto the Stackery Canvas. To get started, sign up for a free account with Stackery.

The type of WebSocket we’re building needs two-way communication enabled and a route and integration response. This allows interaction with clients directly without the need for client connection IDs and the @connections API.

We’ll explore these configuration settings, the @connections API, and more in this post. If you’d rather skip the details and just deploy, see the section called “Deploying with Stackery” below.

Concepts

In this post, we’ll discuss the following concepts: connecting and disconnecting, sending messages, routing messages, one-way and two-way communication, and various CloudFormation types used with WebSockets in Amazon API Gateway.

First, as promised, it helps to have a general understanding of how WebSockets work with Amazon API Gateway. Let’s start with a general diagram:

Stackery-Serverless-1

Figure 1 – How WebSockets communicate using Amazon API Gateway.

Refer back to this diagram as we discuss the different parts of WebSockets in Amazon API Gateway.

Connections

Clients connect to a WebSocket via an Amazon API Gateway URI formatted like this: wss://<api-id>.execute-api.< region>.amazonaws.com/<api-stage>.

Once a client connects to a WebSocket, that connection stays open until the client terminates the connection or until your backend closes the connection. When a client connects to or disconnects from your WebSocket via this URI, event data is sent to special $connect and $disconnect routes.

Part of the data in these events includes unique connection IDs that identify connected clients and allow you to send response messages directly to that client through the @connections API.

Routing Messages

Clients communicate with your backend by sending messages over this connection. In a REST API, you’d typically map specific actions in your backend to individual paths in a URI, and clients would send requests to each of those URIs.

To achieve this with WebSockets, clients instead send messages formatted as JSON objects. The JSON object needs to include a predefined key of your choosing to be used for routing. Let’s add a route selector for this JSON object: {“action”: “test”}.

You register a JSON key like “action,” and that key is evaluated for routing on every message sent through the WebSocket. With the routing key set, you declare specific routes with values for that routing key, like “test.”

In a CloudFormation template, the RouteSelectionExpression under the AWS::ApiGatewayV2::Api type sets a JSON routing key. The RouteKey under the AWS::ApiGatewayV2::Route type sets a route.

In our example, we set the RouteSelectionExpression to “$route.body.action,” which evaluates the “action” key. We register “test” under RouteKey and assign the Target for this route to be an AWS::ApiGatewayV2::Integration.

The AWS::ApiGatewayV2::Integration declares which part of your backend will fulfill client messages sent to this route. We’ll go into more depth on the AWS::ApiGatewayV2::Integration type later.

Here’s a snippet used to create a WebSocket in Amazon API Gateway:

Stackery-Serverless-2

And here’s a snippet used to create a route for a WebSocket in Amazon API Gateway:

Stackery-Serverless-3

Respond to Clients with One- or Two-Way Communication

Now that we know how to get client requests to the right place, we need to make sure the backend can respond. Our goal is to respond directly to the client that initiated the request, but that’s not the only kind of response used with WebSockets in Amazon API Gateway.

We need to decide on one- or two-way communication for the WebSocket. These options pertain to how you send responses back to clients.

We need two-way communication enabled so we can respond directly to clients without using the @connections API. But first, let’s evaluate what one-way communication means and why you’d use it.

With one-way communication, your backend doesn’t directly respond to the messages clients send. Instead, when your backend gets invoked by a client request, your backend integration is responsible for sending messages back to clients via the @connection API by using one or more client connection IDs.

Stackery-Serverless-4

Figure 2 – How WebSockets communicate to a client using the @connections API.

Remember those special $connect and $disconnect routes? In addition to notifying when clients connect and disconnect to the WebSocket, these events also allow you to capture connection IDs that are generated for tracking and interacting with clients.

To manage these connection IDs, you register Lambda functions to handle the $connect and $disconnect events and store these connection IDs in a database like Amazon DynamoDB. For more information on connection IDs and the @connections API, see the documentation.

This allows for more complex communication, but adds the overhead of managing connection IDs. What if you only need to communicate directly back to the same client that sent a request? That’s where two-way communication comes in.

Stackery-Serverless-5

Figure 3 – How WebSockets communicate to a client directly.

The two-way communication configuration allows for Lambda to directly respond to a client connected via WebSocket. We enable two-way communication by supplying an AWS::ApiGatewayV2::RouteResponse type. For more information on the AWS::ApiGatewayV2::RouteResponse type, see the documentation.

Here’s a snippet used to create a route response for a WebSocket in Amazon API Gateway:

Stackery-Serverless-6

Backend Integrations

Finally, we need an AWS::ApiGatewayV2::Integration to tie everything together. We’ve got client requests coming through the WebSocket connection and being routed, but we haven’t told the routing mechanism where to send messages.

With an integration, we declare which backend piece fulfills a route. We want Lambda to act as the backend for the “test” route, so here’s the CloudFormation:

Stackery-Serverless-7

Figure 4 – A snippet used to create a Lambda function and Amazon API Gateway integration.

It’s worth noting that choosing the IntegrationType of “AWS_PROXY” makes your Lambda function responsible for parsing the JSON event body sent to the WebSocket, and also that you must format your response in a JSON object that has a “body” key.

Deploying with Stackery

Writing CloudFormation wasn’t our goal, but we needed learn about a lot of CloudFormation types, settings, and values on the journey to creating this WebSocket.

These detours on our trip brings us to Stackery’s motto: write functions, not YAML. We’ll be using the Stackery Canvas to visually create our architecture, but you can also utilize the Stackery CLI.

Before you do anything else, you need to create a Stackery account, link your AWS account to your Stackery account, and link a version control system to your Stackery account. This quick start guide gives great step-by-step instructions on how to get set up.

After setting up, create a new stack:

Stackery-Serverless-8

Figure 5 – Create a stack in your favorite Git provider.

You can create a new repo, or use an existing repo with your linked version control system. The code we use for the function is simple:

module.exports.handler = async (event) => {
    console.log(JSON.stringify(event, 2));
    
    let echo = '';
    let connectionId = '';
    try {
        const message = JSON.parse(event.body);
        console.log(message);
        echo = message.echo || '';
        
        connectionId = event.requestContext.connectionId;
        console.log(connectionId);
    } catch (e) {
        console.log(e);
    }
    return {
        body: "Echoing your message: " + echo
    };
};

We are just returning a message that contains the string in the “echo” key in the JSON object sent by the client.

Then, it’s as simple as dragging and dropping a WebSocket.

Stackery-Serverless-9

Figure 6 – Drag and drop a WebSocket API into your stack.

Once the WebSocket exists on the Stackery Canvas, double-click to open the configuration. Add the “test” route.

Stackery-Serverless-10.1

Figure 7 – Add a “test” route to your WebSocket API.

Next, drag and drop an AWS Lambda function. Connect the route with the function by clicking the handle on the “test” route and dragging to the handle on the function.

Finally, double-click the function to verify the Source Path field matches the location of the code in the Github repo you’re using.

Stackery-Serverless-11

Figure 8 – Create and integrate a Lambda function to handle “test” route messages.

Now, you’re ready to deploy. Move to the Deploy section in Stackery and prepare a new deployment. After your code is packaged for you, click Deploy and you’re redirected to the CloudFormation console.

You can alternatively use the Stackery CLI for an easy way to deploy in one step using AWS credentials in your local environment.

Click Execute and monitor for when CloudFormation completes.

Stackery-Serverless-12

Figure 9 – Deploy the stack into your AWS account.

Finally, once the stack finishes creating, click View back in the Stackery dashboard.

Retrieve the URI for the deployed WebSocket by double clicking on the API Gateway resource in the Stackery Canvas.

Stackery-Serverless-13

Figure 10 – View the properties of your newly deployed WebSocket API.

You can use a tool like wscat to connect via this URI and send messages to the Lambda function that simply echoes back the value in the “echo” key.

Stackery-Serverless-14

Figure 11 – Send a message to the test route of the API.

You can optionally commit all these template changes directly to your linked version control system.

Stackery-Serverless-15

Figure 12 – Save the stack in your Git repo.

And with that, you have a basic WebSocket ready to interact with clients.

Summary

Through this guide, we created the simplest serverless WebSocket in Amazon API Gateway that can directly communicate back to the client who sent a message through the WebSocket. We also explored how to manage client connections and send messages to clients via the @connections API.

By using Stackery, we were able to skip past writing our own AWS CloudFormation templates. Instead, we moved directly to deploying our two-way enabled WebSocket, and we could instead focus our efforts on writing code for our backend Lambda function.

For more information on WebSockets in Amazon API Gateway, see the documentation. Be sure to follow Stackery on Twitter to keep up with the latest features and tune in for their live events.

.

AWS Competency Partners: The Next Smart

Stackery is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-1
.


Stackery-Logo-1
Connect with Stackery-1

Stackery – APN Partner Spotlight

Stackery is an AWS DevOps Competency Partner. Its operations dashboards and command line (CLI) tools provide sophisticated runtime support to the most complex serverless systems.

Contact Stackery | Solution Overview | AWS Marketplace

*Already worked with Stackery? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Migrating Data Warehouse Workloads from On-Premises Databases to Amazon Redshift with SnapLogic

Migrating Data Warehouse Workloads from On-Premises Databases to Amazon Redshift with SnapLogic

By Sriram Kalyanaraman, Product Manager at SnapLogic, Inc.
By Saunak Chandra, Sr. Solutions Architect at AWS

SnapLogic-Logo-2
Panoply-APN Badge-1.3
Connect with SnapLogic-1
Rate SnapLogic-1

Amazon Redshift is a fast, scalable, easy-to-use data warehouse solution built on massively parallel processing (MPP) architecture and columnar storage. It’s the most suitable solution for analytical workloads, and many organizations choose Redshift for running analytics on data requiring enhanced throughput and concurrency.

Business analysts and data scientists collect data from various systems, build pipeline for orchestration, and finally load the data into Redshift before doing any analysis.

SnapLogic is an easy-to-learn data integration tool that allows business analysts and integration specialists to accomplish data ingestion from various sources into Redshift. The SnapLogic Redshift Bulk Load Snap (pre-built connector) is part of the SnapLogic Intelligent Integration Platform and enables loading large volumes of data rapidly into Redshift all at once.

In this post, we’ll cover some features of SnapLogic that let you migrate schema and data using a simple and easy-to-learn visual designer. We’ll also describe the synchronization feature of SnapLogic that enables you to transfer data on an ongoing basis after the initial migration.

SnapLogic is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Data & Analytics Competency.

Background

Many organizations run transactional workloads on-premises with databases such as MySQL. When they run analytical queries involving complex joins and aggregations on the same transactional databases, they often experience poor performance due to the high throughput requirement of analytical workloads that become contentious with the transactional processing.

Organizations need better performance for their analytical workloads, which is something only a dedicated data warehouse application can provide. This is why many businesses are moving their data to Amazon Redshift, a cloud data warehouse optimized for analytics.

One of the biggest challenges when moving an on-premises MySQL database to a cloud-based one such as Redshift is performing a bulk data migration. During this bulk data migration, you have to be mindful of:

  • Conversion of the table schema with the associated data types for the table columns.
  • Conversion of database-native stored procedure.
  • Transferring data from the source to the target and pointing your analytical workload to the target system.

The last challenge listed above is particularly important because, after turning on the new system followed by the initial extraction, there’s a need to synchronize any new transactions that come through.

Typically, migrations are done through traditional extract, transform, load (ETL) tools. Not only are the traditional approaches onerous, however, in that they force you to spend considerable time on writing code and debugging software, but they also require specialized skills. What’s more, the risk of schema errors is higher in these settings. Such approaches to bulk data migrations are expensive and time-intensive.

You can automate the migration process with an Integration Platform as a Service (iPaaS) solution like SnapLogic. This reduces the cost and effort of moving large volumes of data while closing the skills gap.

SnapLogic Designer

The screenshot in Figure 1 shows the SnapLogic Designer where you build the pipeline by dragging various widgets specific to Amazon Redshift. Users follow this sequence of steps to build an integration pipeline in SnapLogic:

  1. Log into SnapLogic Designer.
  2. Search for the Snap you are looking for.
  3. Drag-and-drop Snaps onto the canvas to build the integration pipeline.

Along the way, you can easily configure various options for Snaps, and the configuration for the Redshift Bulk Load Snap is shown later in this post.

Alternately, if a pre-built pattern pipeline is available, you can simply re-use it to complete the integration.

SnapLogic-Redshift-1

Figure 1 – SnapLogic Pipeline Designer.

Systems Architecture

The diagram in Figure 2 shows the SnapLogic integration runtime, Groundplex, installed on Amazon Web Services (AWS) in the customer’s private subnet. The Redshift cluster is also up and running in the same subnet.

SnapLogic-Redshift-2

Figure 2 – Architecture for data migration to Amazon Redshift.

Redshift Bulk Load Snap: Under the Hood

SnapLogic provides a comprehensive platform to meet integration requirements of enterprises thanks to a high degree of flexibility and ease-of-use while migrating databases.

You can leverage SnapLogic’s unified platform for data integration, data migration, application integration, API management, and data engineering, among other capabilities, all catered to meet enterprise standards and requirements.

Whether you’re loading data from a MySQL database, Salesforce, or any other software-as-a-service (SaaS) or on-premises-based application, you can effectively load data into Redshift using the low code, no code paradigm of the SnapLogic platform.

The Redshift Bulk Load Snap consumes data from an upstream source and writes it to a staging file on Amazon Simple Storage Service (Amazon S3). It does so by automatically representing data in a JSON format while it streams through the SnapLogic platform, without the user having to manually intervene. It also takes care of the schema compatibility checks of the source and the target systems.

Subsequently, the Snap automatically runs the COPY command to insert data into a target Redshift table.

SnapLogic-Redshift-3

Figure 3 – COPY command initiated by the Redshift Bulk Load Snap.

If the target table does not exist when the Snap initiates the bulk load operation, the Snap automatically creates a table in the target schema with the necessary columns and datatypes to hold the first incoming document.

The Snap also provides an option to specify the target table’s metadata before creating the actual table. This ability provides even more flexibility for users that intend to migrate their data from a relational database management system (RDBMS) to Redshift, without having to first create a table in Redshift.

Effectively, the Snap allows you to replicate a table from one database to another. The Redshift Bulk Load Snap also allows you to control what data goes into the Redshift instance by configuring the Snap properties.

SnapLogic-Redshift-4

Figure 4 – Configuration for SnapLogic Redshift Bulk Load Snap.

Overall, this is a feature-rich Snap that has multiple options to handle nearly every use case.

Here are a few key highlights:

  • As part of ensuring the validity of upstream data, the Snap provides users the option to validate input data to handle non-flat map data in a graceful manner.
  • The Truncate data option on the Snap allows users to truncate the existing target table’s data before initiating the bulk load operation.
  • To improve the efficiency of the bulk load operation, you can adjust Parallelism as necessary based on the capacity of the Redshift cluster. A value greater than “1” (say, N) will make the Snap consume upstream data and create “N” staging files on Amazon S3 followed by concurrent executions of the COPY command, thereby improving execution time for the bulk load operation.
  • It’s important to clean up tables after a bulk delete, bulk load, or a series of updates. The Vacuum command can be run against the entire database or individual tables.

The Snap supports multiple options for Vacuum type such as FULL, SORT ONLY, DELETE ONLY, and REINDEX.

In addition, certain Snaps such as Redshift Bulk Load, Redshift Unload, Redshift Bulk Upsert, and Redshift S3 Upsert support AWS Key Management Service (KMS) encryption.

If “Server-Side” KMS encryption is selected, output files written out to Amazon S3 are encrypted using the SSE-S3.

Inside SnapLogic

In this example, we’ll migrate data from MySQL to Amazon Redshift. This integration pipeline leverages the Redshift Bulk Load Snap to load the source data into a target table.

SnapLogic-Redshift-5

Figure 5 – Integration pipeline that migrates data from MySQL database to Redshift.

To ensure convenience, SnapLogic allows users to review the following pipeline execution statistics for every Snap in the pipeline, all through a single interface:

  • Pipeline execution duration.
  • CPU and memory consumption.
  • Total documents processed.
  • Rate at which the documents were processed.

SnapLogic-Redshift-6

Figure 6 – Pipeline execution statistics for the MySQL to Redshift pipeline.

For use cases involving a massive data migration, such as an initial migration from another database, the Redshift Bulk Load Snap is efficient because it abstracts behind-the-scenes complexities.

The main purpose of the Redshift Bulk Load Snap is to fetch massive volumes of data in chunks or batches from the source system and write it to Redshift, the target system.

In this example, the Bulk Load Snap was 30 times faster than a query-based insert operation into Redshift, but the performance may vary based on other factors, such as:

  • Volume of data to be loaded.
  • Hardware availability on the SnapLogic Snaplex (execution) node.

Synchronization and Fail-safe Execution

After the initial bulk load, customers can keep the source system and have it synchronized with Redshift. This is done by scheduling a batch workload for your SnapLogic pipelines.

You control the frequency of the batch execution to meet business requirements. For example, you can schedule to have this pipeline run every hour or at the end of a business day to upload incremental changes to Redshift seamlessly.

The pipeline shown in Figure 7 can help you determine changes made to a record using the Redshift SCD2 (Slowly Changing Dimensions) Snap, and upsert new or updated records.

Also, for both batch and real-time integration use cases involving migrating data from a source system to Redshift, SnapLogic enables creation of APIs that could be consumed by API consumers, within or outside the enterprise, to automate the business logic.

SnapLogic-Redshift-7

Figure 7 – Pipeline to synchronize data with Redshift after initial upload.

SnapLogic provides a number of features to recover from errors. The platform makes it easy to identify and resolve pipeline errors with error outputs for all the Snaps.

In case of network failures, SnapLogic automatically retries the scheduled pipelines that were unable to execute. Additionally, the platform provides resumable pipelines that help recover from source or target endpoint failures to provide exactly-once guaranteed data delivery.

Customer Success: Kaplan, Inc.

As a cloud-based education company, Kaplan, Inc. leverages the AWS platform to drive their big data initiative. Before SnapLogic, Kaplan forged integrations using a data virtualization technology along with a couple of off-the-shelf products, all of which required an exorbitant amount of time to derive the insights needed.

At the same time, they were undergoing an expansion of their big data strategy and were digitally transforming their company. To help in this effort, they sought a partner with an iPaaS that was flexible, scalable, had a shallow learning curve, and required minimal maintenance.

Kaplan also sought a solution that complied with their security standards and policies. The organization found these capabilities with SnapLogic, and today Kaplan has successfully created their own data lake within Amazon Redshift and ingests data from multiple sources, including ones that are part of the AWS ecosystem.

Kaplan has integrated more than 50 applications in less than a year, and plans to integrate 40 more applications over the next year. The platform ingests 20-30 million new records per day, while columnar compression has helped keep storage demands to under three terabytes.

Summary

In this post, we covered why you need Amazon Redshift, a cloud-based data warehouse, and some of the challenges faced when migrating data from on-premises database such as MySQL.

We also covered how SnapLogic can help you migrate and subsequently synchronize on-premises based databases with Redshift, and how SnapLogic’s connector for Redshift operates under the hood. Finally, we outlined how Kaplan has leveraged AWS services and the SnapLogic platform to drive their big data initiative.

If your organization is interested in learning more about how SnapLogic works with Amazon Redshift and other AWS services, please sign up for a free trial. The trial allows you to test every aspect of the Redshift Snap Pack and explore other aspects of the SnapLogic Intelligent Integration Platform.
.

AWS Competency Partners: The Next Smart

SnapLogic is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-1

.


SnapLogic-Logo-2
Connect with SnapLogic-1

SnapLogic – APN Partner Spotlight

SnapLogic is an AWS Data & Analytics Competency Partner. Through its visual, automated approach to integration, SnapLogic uniquely empowers business and IT users to accelerate integration needs for applications, data warehouse, big data, and analytics initiatives.

Contact SnapLogic | Solution Overview

*Already worked with SnapLogic? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog

Shifting Away from Legacy Data Systems Helps Companies Tap into Their Most Vital Resource

Shifting Away from Legacy Data Systems Helps Companies Tap into Their Most Vital Resource

By Jason Harris, Evangelist at Panoply

Panoply-Logo-2
Panoply-APN Badge-1.3
Connect with Panoply-1
Rate Panoply-1

Have you ever noticed that when businesses talk about data, they often use metaphors having to do with water?

Data “flows” throughout an organization and has multiple “streams.” We build “pipelines” to get data where it needs to go. Data can “trickle in,” or an unorganized approach can leave hapless users “drowning” in data.

There’s a good reason why companies talk like that. Just like water is the stuff of life, data is the lifeblood of any sophisticated business today. In the same way plumbing makes modern life possible, a well-maintained data infrastructure is a crucial foundation on which the health of a business depends.

In this post, I will examine how replacing an inflexible, legacy data system with a cloud data warehousing solution can open the “floodgates” to new opportunities.

We’ll take a close look at the experiences of Fresh Water Systems, a business based in Greenville, South Carolina, that recently worked with Panoply to fully modernize its approach to data management.

Panoply is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Data & Analytics Competency and Amazon Redshift Service Delivery designation.

Built for the cloud, Panoply delivers fast time to insights by eliminating the development and coding typically associated with transforming, integrating, and managing big data.

Wringing Out Value from Legacy Systems

Making water safe for people to use is the mission of Fresh Water Systems, a company that was founded 30 years ago and started out by installing and servicing commercial water coolers.

Today, Fresh Water Systems offers thousands of water filtration, purification, and treatment products from an array of manufacturers. For example, almost 15,000 pharmacies in the U.S. rely on Fresh Water Systems as the exclusive provider of a water treatment and dispensing system that meets quality requirements for reconstituting and compounding medications.

But as the company grew, its data infrastructure didn’t—at least not at first. Before Fresh Water Systems implemented a cloud data warehousing solution, their data was managed using a legacy system, says John Wessel, the company’s director of IT and data manager. This system was inflexible, making new integrations and connections to other data sources resource-intensive.

As Fresh Water Systems has grown, the amount of data and platforms they leverage has increased as well. The company needed a data warehouse that would adapt and grow with them into the future. Wessel began leading the implementation of a much-needed cloud data management platform.

Diving into a Modern Approach

Wessel proposed a vision of cloud-based data management to his team and was faced with initial opposition. They had been reporting out of the production ERP database (Microsoft Great Plains) and needed some extra convincing.

Rachel Thurmes, Fresh Water’s business intelligence and marketing analyst, quickly saw the business value and benefit in harnessing insights immediately from new data sources such as Facebook, Google AdWords, and Google Analytics.

Upon consideration, Josh Sutphin, operations engineer at Fresh Water Systems, started to understand the current load on the production database and became supportive of the new data analytics management solution. The team knew something had to change, and the timing seemed right.

“We wanted something fully hosted, columnar, and we preferred Panoply because it’s built as a layer on top of Amazon Redshift,” says Wessel.

When implementing Panoply, Wessel immediately noticed the solution’s speed and ease of use. Of the setup process, he shared, “Once I started using Panoply, when I added data sources and data was pulled in and accessible within minutes. I realized I had to do zero ETL. That was huge.”

Panoply-Fresh-Water-2

Figure 1 – Data flow within Fresh Water Systems that powers analytics in Mode via Panoply.

Previously, the team had used Microsoft SSIS to run dozens of Extract, Transform, Load (ETL) packages. Unfortunately, these packages would fail and it was rather laborious to fix them.

On one occasion, it took the team several hours to find a server with the correct version of Business Intelligence Development Studio to open one of the legacy .dtsx packages for repair. Even after the package was opened, it was often hard to diagnose problems as none of the original authors were still on Fresh Water System’s IT team.

In addition to Panoply, Fresh Water Systems also implemented Stitch Data for ETL, which opened up many data sources for easy access. Throughout its data management, the company uses many of Panoply’s native data connectors, such as web analytics, e-commerce, and digital ads platforms to trace how performance drives browsing and purchasing on Fresh Water System’s store.

Data on Tap, Ready on Demand

For Fresh Water Systems, the benefits of implementing a cloud data warehouse solution have included time-savings, resource management, and data democratization.

Now that Fresh Water Systems operates on a cloud-based data management solution, the company has optimized IT staff times and project management, according to Wessel.

“We have strategic partners now, such as Panoply, and have saved many costs and long-term headaches,” says Wessel. “Previously, we had to maintain database indexes, conduct nightly backups, plan for disaster recovery scenarios, and watch for blocking in the production database.

“On one particular day, the entire ERP system had resources deadlocked due to a larger report being run,” he adds. “This resulted in not being able to take phone orders for a period of time. We are glad to be past that.”

Panoply-Fresh-Water-1

Figure 2 – Fresh Water System’s activity-based report shows daily activity against contributions to overall goal.

Fresh Water Systems has also been able to access its e-commerce data seamlessly. Utilizes Shopify, and with just a few clicks, the business was able to access its data for querying and analysis. Wessel says, “I easily saved two months of work with that single data integration alone.”

Finally, the company’s marketing department is able to access data easily and frequently for key performance indicator (KPI) monitoring and reporting. Company executives have even begun exploring data because it’s so much more accessible now, and more accurate.

Next Steps

Fresh Water Systems’ IT team is excited about adding more intelligence in their business layer using DBT (Data Build Tool). In combination with Panoply, DBT will provide a robust layer that acts as a single source of truth for all departments to be able to run their respective areas.

“Two years ago, we never thought we could move this fast, and here we are now,” Wessel says proudly.

Soaking in the Success

In teaming up with Panoply, Fresh Water Systems found multiple benefits from letting go of a legacy system that was holding them back. Upgrading your company’s approach to smart data management can drive results that weren’t possible before, including:

  • Quicker access to data across teams and job functions.
  • Increased accessibility, and ease of sharing data with leadership.
  • More accurate data to generate better insights.
  • Integration among multiple sources of data.
  • Streamlined IT operations, and freeing up engineers for more valuable work.

It’s clear that upgrading and modernizing your data pipeline can integrate multiple streams into one place, and take your data-driven insights from a trickle to a flood of new knowledge.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

AWS Competency Partners: The Next Smart

Panoply is an AWS Competency Partner, and if you want to be successful in today’s complex IT environment and remain that way tomorrow and into the future, teaming up with an AWS Competency Partner is The Next Smart.

The Next Smart-APN Blog-2

.


Panoply-Logo-2
Connect with Panoply-1

Panoply – APN Partner Spotlight

Panoply is an AWS Data & Analytics Competency Partner. Built for the cloud, Panoply delivers the fast time to insights by eliminating the development and coding typically associated with transforming, integrating, and managing big data.

Contact Panoply | Solution Overview | AWS Marketplace

*Already worked with Panoply? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

from AWS Partner Network (APN) Blog