Author: ifttt

Dremio Speeds Data Lake Queries on AWS Cloud

Dremio Speeds Data Lake Queries on AWS Cloud

News

Dremio Speeds Data Lake Queries on AWS Cloud

Data lake specialist Dremio today announced a new offering that speeds queries on cloud-hosted platforms, including Microsoft Azure and Amazon Web Services Inc. (AWS).

Crucial to Big Data analytics and other data-centric scenarios, data lakes allow for the storage of huge amounts of data in various formats in a flat architecture such as “blobs” or files, as opposed to relational and other data stores.

However, quickly querying such data, especially in the cloud on storage mechanisms such as Amazon S3, can be problematic, an issue addressed by the Dremio Data Lake Engines for AWS, Azure and Hybrid Cloud.

The company said its patent-pending technology accelerates query execution of various forms of data — including JSON, text-delimited (CSV) and more — directly from the data lakes, with no need to load it into data warehouses or other systems.

The new version of the company’s open source platform speeds query execution through means such as columnar caching, predictive pipelining and a new execution engine kernel said to boost performance by up to 70 times.

For security, the platform natively supports enterprise AWS offerings such as AWS Secrets Manager, Multiple AWS IAM Roles, Server-Side Encryption with AWS KMS–Managed Keys and more.

“Organizations recognize the value of being able to quickly leverage data and analytics services to further their data-driven initiatives,” the company quoted Mike Leone, senior analyst at Enterprise Strategy Group, as saying. “But it’s more important than ever to start with a strong data foundation, especially one that can simplify the usage of a data lake to enable organizations to maximize data availability, accessibility, and insights. Dremio is addressing this need by providing a self-sufficient way for organizations and personnel to do what they want with the data that matters, no matter where that data is, how big it is, how quickly it changes, or what structure it’s in.”

The new edition of Dremio’s Data Lake Engine is available now.

About the Author

David Ramel is an editor and writer for Converge360.

from News

Serverless vs. Containers

Serverless vs. Containers

Containers

The choice between containers and serverless depends on your use case needs

All modern applications are being developed using either serverless or containers technology. However, it is always difficult to choose the one best suitable for a particular requirement.

You may also enjoy:  Serverless Computing vs. Containers

In this article, we will try to understand how these two are different from each other and in what scenario we can use one or the other.

Let us first start with understanding the basics of serverless and container technology.

What Is Serverless Computing?

Serverless is a development approach that replaces long-running virtual machines with computing power that comes into existence on demand and disappears immediately after use.

Despite the name, there certainly are servers involved in running your application. It’s just that your cloud service provider, whether it’s AWS, Azure, or Google Cloud Platform, manages these servers, and they’re not always running.

It attempts to resolve issues such as:

  • Unnecessary charges for keeping the server up even when we are not consuming any resources
  • Overall responsibility for maintenance and uptime of the server.
  • Responsibility for applying the appropriate security updates to the server.
  • As our usage scales, we need to manage to scale up our server as well. And, conversely, manage to scale it down when we don’t have as much usage.

What Are Containers?

A container is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, and settings.

Containers solve the problem of running software when it has been moved from one computing environment by essentially isolating it from its environment. For instance, containers allow you to move software from development to staging and from staging to production and have it run reliably regardless of the differences of all the environments.

Serverless vs. Containers

To start with, it’s worth saying that both — serverless and containers are elements of an architecture that is designed for future changes, and for leveraging the latest tech innovations in cloud computing. While many people often compare Docker containers and serverless computing, the two have very little in common. That is because both technologies aren’t the same thing and serve a different purpose. First, let’s go over some common points:

  1. Less overhead
  2. High performance
  3. Requires less interaction at the infrastructure level to do provisioning.

Although serverless is more innovative technology than containers, they both have their disadvantages and of course, benefits that make them both useful and relevant. So let’s review the two.

Aspects Serverless Containers
Longevity Lambda Functions  are “short-lived.” Once it’s executed, a function will spin down. Lambda has a timeout threshold of 15 minutes. Long-running workloads cannot run on this. However, Step Functions can be used to break the long-running application logic into smaller steps (functions) and run it. But, it might not apply to all kinds of long-running application. ECS provides “long-running” containers. It can run as long as you want.
Throughput

If an application has high throughput, Lambda would cost more compared to container solutions. The reason is, it would need a higher resource like Memory and execution time will be high. As Lambda charges based on memory and execution time, the cost will increase in the multiplication factor. The second reason is that 1 function can have maximum 3GB Memory and it might not be able to handle the high throughput and would need concurrent execution which may introduce latency due to cold start time.

For lower Throughput, Lambda is a good choice in terms of cost, performance, and time to deploy.

ECS uses EC2 instances to host the applications. EC2 can handle high throughput more effectively than Serverless Functions as it has different types of instance types which can be used as per throughput requirement. Its cost will be comparatively less. Latency will also be better if a single EC2 instance can also handle such kind of load.

For lower throughput also, EC2 works very well. While comparing with Lambda, need to consider other factors described in this table.

Scaling

Lambda has auto-scaling as a built-in feature.

  • It scales the functions with concurrent execution.
  • However, there is a max limit (1,000 concurrent execution) at the account level.
  • Lambda horizontal scaling is very fast however, there will be very minimal latency due to cold start time.

Containers don’t have any constraints on scaling. However,

  • We would need to forecast the scaling requirements.
  • Also, it has to be designed and configured manually or automate it through scripts.
  • Scaling containers process is slower than scaling Lambda.
  • The higher the number of worker nodes we have, more the problems it will add to the maintenance like handling latency, throttling issues.
Time to Deploy Lambda Functions are smaller in size and take significantly less time compared to containers. It takes milliseconds to deploy compared to seconds in container case. Containers take significant time initially to configure and set up as it would require system setting, libraries. However, once it is configured, it takes seconds to deploy.
Cost

In Serverless Architecture, infrastructure is not used unless the application is invoked. So, it will charge only for the server capacity that their application use during the uptime. Now, this can be cost-effective in some scenarios like:

  • Application is used rarely (once or twice a day)
  • Application has frequent scale up and down requirement due to the user request throughput changing frequently.
  • An application needs fewer resources to run. Because Lambda cost depends on memory and execution time. If it is compared with Container cost running 24 hours, it always wins.
Containers are constantly running, and therefore cloud providers have to charge for the server space even if no one is using the application at the time.

If throughput is high, containers are better cost-effective compared to Lambda.

While comparing with EKS cluster, ECS cluster is free.

Security For Lambda, system security is taken care of by AWS itself. It only needs to handle application-level security using IAM roles and policies. However, if Lambda has to run in a VPC, then VPC level security has to apply here. For containers, we are also responsible for applying the appropriate security updates to the server. This includes patching OS, upgrades to software and libraries.
ECS supports IAM Roles for Tasks which is great to grant containers access to AWS resources. For example, to allow containers to access S3, DynamoDB, SQS, or SES at runtime. EKS doesn’t provide IAM level security at pods level.
Vendor Locking Serverless function brings vendor lock-in because, if you need to move from Lambda Functions to Azure Functions, it would need significant changes at code and configuration level. Containers are designed to run on any cloud platform which supports container technologies. So it brings the benefit of building once and running anywhere. However, the services being used for Security — IAM, KMS, Security Groups, and others are tightly coupled with AWS. It would need some rework to move this workload to other platforms.
Infrastructure Control If a team doesn’t have infrastructure skills, Lambda will be a good option. The team can concentrate on business logic development and let AWS handle the infrastructure. With containers, we get full control of server, OS, and network components. We can define and configure within the limitations put by cloud providers. So, if an application/system needs fine-grained control of infrastructure, this solution works better.
Maintenance Lambda doesn’t need any maintenance work as everything at the server level is being taken care of by AWS. Containers need maintenance like patching and upgrading and that would require skilled resources as well. So, keep this in mind while choosing this architecture for deployment.
State persistence Lambda is designed for serverless so it will not maintain any state. It is short-lived. Because of this reason, we cannot use caching and that may cause latency problem. Containers can leverage the benefits of caching.
Latency & Startup Time For Lambda, cold start and warm start time are key factors to be considered, as they may cause latency as well as add to the cost of executing functions.

Containers being running always doesn’t have cold/warm start time. Also, using caching latency can be reduced.

Compared to EKS, ECS doesn’t have any proxy concept at the node level. Load balancing is just between ALB and EC2 instances. So no extra hop of latency.

VPC &ENI If Lambda is deployed in a VPC, its concurrent execution is limited by ENI capacity of the subnets. The number of ENIs per EC2 instance is limited from 2 to 15 depending on the instance type.
In ECS, each task is assigned only a single ENI so we can have a maximum of 15 tasks per EC2 instance with ECS.
Monolith Applications Lambda is not fit for a monolithic application. It cannot run complex type of application ECS can be used to run a monolith application
Testing  Testing is difficult in serverless based web applications as it often becomes hard for developers to replicate the backend environment in a local environment. Since containers run on the same platform where they are deployed, it’s relatively simple to test a container-based application before deploying it to the production.
Monitoring Lambda monitoring can be done through CloudWatch, X-Ray. Need to rely on Cloud vendor to provide monitoring capabilities. However, infrastructure level monitoring is not required in this case.

Container monitoring would require to capture Availability, System Errors, Performance and Capacity metrics to configure HA for the container applications.

When to Use Serverless

Serverless Computing is a perfect fit for the following use cases:

  1. If the application team doesn’t want to spend much time thinking where your code is running and how.
  2. If the team doesn’t have skilled infrastructure resources and worried about the cost of maintenance of servers and resources application consumes, serverless will be a great fit for such use-case.
  3. If the application’s traffic pattern changes frequently, it will handle it automatically. It will also even shut down when there is no traffic at all.
  4. Serverless websites and applications can be written and deployed without handling the work of setting up infrastructure. As such, it is possible to launch a fully-functional app or website in days using serverless.
  5. If a team needs a small batch job which can be finished within Lambda limits, its a good fit to use.

When to Use Containers

Containers are best to use for application deployment in the following use cases:

  • If the team wants to use the operating system of their own choice and leverage full control over the installed programming language and runtime version.
  • If the team wants to use software with specific version requirements, containers are great to start with.
  • If the team is okay in bearing the cost of using big yet traditional servers for anything such as Web APIs, machine learning computations, and long-running processes, then they might also want to try out containers as well (they will cost you less than servers, anyways).
  • If the team wants to develop new container-native applications.
  • If the team needs to refactor a very large and complicated monolithic application, then it’s better to use the container as it’s better for complex applications.

Summary

In a nutshell, we learned that both the technologies are good and can complement each other rather than competing. They both solve different problems and should be used wisely. 

Further Reading

10 Tips for Building and Managing Containers

Serverless Computing: Ready for Prime Time

from DZone Cloud Zone

AWS Marketplace makes it easier to find solutions from the AWS Console

AWS Marketplace makes it easier to find solutions from the AWS Console

AWS Marketplace, a curated digital catalog with over 4,800 solutions, announced a feature that makes it easier for you to find relevant third-party solutions directly in the AWS console. Now, third-party solutions from AWS Marketplace are automatically filtered and listed in the left hand navigation panel based on the AWS service console you’re in. This means that you see only Machine Learning models and algorithms in the Amazon SageMaker console. You can also use the new search bar or filtering options to look for specific solutions without leaving the console.  

from Recent Announcements https://aws.amazon.com/about-aws/whats-new/2019/09/aws-marketplace-easier-to-find-solutions-from-aws-console/

Build, test, and deploy your Amazon Sagemaker inference models to AWS Lambda

Build, test, and deploy your Amazon Sagemaker inference models to AWS Lambda

Amazon SageMaker is a fully managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. When you deploy an ML model, Amazon SageMaker leverages ML hosting instances to host the model and provides an API endpoint to provide inferences. It may also use AWS IoT Greengrass.

However, thanks to Amazon SageMaker’s flexibility, which allows deployment to different targets, there are situations when hosting the model on AWS Lambda can provide some advantages. Not every model can be hosted on AWS Lambda, for instance, when a GPU is needed. Also, there are other limits, like the size of AWS Lambda’s deployment package, which can prevent you from using this method. When using AWS Lambda is possible, this architecture has advantages like lower cost, event triggering, seamless scalability, and spike requests. For example, when the model is small and not often invoked, it may be cheaper to use AWS Lambda.

In this post, I create a pipeline to build, test, and deploy an Lambda function that provides inferences.

Prerequisites

I assume that the reader has experience with Amazon SageMaker, AWS CloudFormation, AWS Lambda, and the AWS Code* suite.

Architecture description

To create the pipeline for CI/CD, use AWS Developer Tools. The suite uses AWS CodeDeploy, AWS CodeBuild, and AWS CodePipeline. Following is a diagram of the architecture:

When I train the model with Amazon SageMaker, the output model is saved into an Amazon S3 bucket. Each time a file is put into the bucket, AWS CloudTrail triggers an Amazon CloudWatch event. This event invokes a Lambda function to check whether the file uploaded is a new model file. It then moves this file to a different S3 bucket. This is necessary because Amazon SageMaker saves other files, like checkpoints, in different folders, along with the model file. But to trigger AWS CodePipeline, there must be a specific file in a specific folder of an S3 bucket.

Therefore, after the model file is moved from the Amazon SageMaker bucket to the destination bucket, AWS CodePipeline is triggered. First, AWS CodePipeline invokes AWS CodeBuild to create three items:

  • The deployment package of the Lambda function.
  • The AWS Serverless Application Model (AWS SAM) template to create the API.
  • The Lambda function to serve the inference.

After this is done, AWS CodePipeline executes the change set to transform the AWS SAM template into an AWS CloudFormation template. When the template executes, AWS CodeDeploy is triggered. AWS CodeDeploy invokes a Lambda function to test whether the Lambda function that was newly created in the latest version of your model is working as expected. If so, AWS CodeDeploy shifts the traffic from the old version to the new version of the Lambda function with the newest version of the model. Then, the deployment is done.

How the Lambda function deployment package is created

In the AWS CloudFormation template that I created to generate the pipelines, I included a section where I indicate how AWS CodeBuild should create this package. I also outlined how to create the AWS SAM template to generate the API and the Lambda function itself.

Here’s the code example:

- "git clone ${GitRepository}"
- "cd ${GitRepositoryName}"
- "rm -rf .git "
- "ls -al "
- "aws s3 cp s3://${SourceBucket}/${SourceS3ObjectKey} ."
- "tar zxf ${SourceS3ObjectKey}"
- "ls -al"
- "pwd"
- "rm -f ${SourceS3ObjectKey}"
- "aws cloudformation package --template-file samTemplateLambdaChecker.yaml --s3-bucket ${SourceBucket} --output-template-file ../outputSamTemplate.yaml"
- "cp samTemplateLambdaChecker.yaml ../"

In the BuildSpec, I use a GitHub repository to download the necessary files. These files are the Lambda function code, the Lambda function checker (which AWS CodeDeploy uses to check whether the new model works as expected), and the AWS SAM template. In addition, AWS CodeBuild copies the latest model.tar.gz file from S3.

To work, the Lambda function also must have Apache MXNet dependencies. The AWS CloudFormation template that you use creates a Lambda layer that contains the MXNet libraries necessary to run inferences in Lambda. I have not created a pipeline to build the layer, as that isn’t the focus of this post. You can find the steps I used to compile MXNet from Lambda in the following section.

Testing the pipeline

Before proceeding, create a new S3 bucket into which to move the model file:

  1. In the S3 console, choose Create bucket.
  2. For Bucket Name, enter a custom name.
  3. For Region, choose the Region in which to create the pipeline and choose Next.
  4. Enable versioning by selecting Keep all versions of an object in the same bucket and choose Next.
  5. Choose Create bucket.

In this bucket, add three files:

  • An empty file in a zip file called empty.zip. This is necessary because AWS CodeBuild must receive a file when invoked in order to work—although, you do not use this file in this case.
  • The file mxnet-layer.zip.
  • The zip function, which copies the file from the Amazon SageMaker bucket to the AWS CodePipeline triggering bucket.

To upload these files:

  1. Open the S3
  2. Choose your bucket.
  3. On the Upload page, click on Add files and select the zip file.
  4. Choose Next until you can select Upload.

Now that you have created this new bucket, you can launch the AWS CloudFormation template after downloading the template.

  1. Open the AWS CloudFormation
  2. Choose Create Stack.
  3. For Choose a template, select Upload a template to Amazon S3 and select the file.
  4. Choose Next.
  5. Add a Stack name.
  6. Change SourceS3Bucket to the bucket name you have previously created.
  7. Choose Next, then Next
  8. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  9. Choose Create.

 

This creates the pipeline on your behalf and deploys everything necessary. When you train the model in Amazon SageMaker, you must indicate that the S3 bucket created by your AWS CloudFormation template is the bucket in which you should host the output model. To find the name of your S3 bucket:

  1. Open the AWS CloudFormation
  2. Select your Stack Name.
  3. Choose Resources and find ModelS3Location.

To simulate that a new model has been trained by Amazon SageMaker and uploaded to S3, download a model that I previously trained and uploaded here on GitHub.

After that’s downloaded, you can upload the file to the S3 bucket that you created. The model has been trained from the SMS Spam Collection dataset provided by the University of California. You can also view the workshop from re:Invent 2018 that covers how to train this model. This simple dataset was trained with a neural network using Gluon, based on Apache MXNet.

  1. Open the S3
  2. Choose your ModelS3Location bucket.
  3. Choose Upload, Add files, and select the zip file.
  4. Choose Next, and choose Upload.

From the AWS CodeDeploy console, you should be able to see that the process has been initiated, as shown in the following image.

After the process has been completed, you can see that a new AWS CloudFormation stack called AntiSpamAPI has been created. As previously explained, this new stack has created the Lambda function and the API to serve the inference. You can invoke the endpoint directly. First, find the endpoint URL.

  1. In the AWS CloudFormation console, choose your AntiSpamAPI.
  2. Choose Resources and find ServerlessRestApi.
  3. Choose the ServerlessRestApi resource, which opens the API Gateway console.
  4. From the API Gateway console, select AntiSpamAPI.
  5. Choose Stages, Prod.
  6. Copy the Invoke URL.

After you have the endpoint URL, you can test it using this simple page that I’ve created:

For example, you can determine that the preceding sentence has a 99% probability of being spam, as you can see from the raw output.

Conclusion

I hope this post proves useful for understanding how you can automatically deploy your model into a Lambda function using AWS developer tools. Having a pipeline can reduce the overhead associated with using a model with a serverless architecture. With minor changes, you can use this pipeline to deploy a model that can be trained anywhere, like Amazon Deep Learning AMIs, AWS Deep Learning Containers, or on premises.

If you have questions or suggestions, please share them on GitHub or in the comments.


About the Author

Diego Natali is a solutions architect for Amazon Web Services in Italy. With several years engineering background, he helps ISV and Start up customers designing flexible and resilient architectures using AWS services. In his spare time he enjoys watching movies and riding his dirt bike.

 

 

 

 

from AWS Machine Learning Blog

NIH’s Strategic Vision for Data Science: Enabling a FAIR-Data Ecosystem

NIH’s Strategic Vision for Data Science: Enabling a FAIR-Data Ecosystem

NIH’s Strategic Vision for Data Science: Enabling a FAIR-Data Ecosystem
Susan K. Gregurick is the Division Director for Biophysics, Biomedical Technology, and Computational Biosciences (BBCB) in NIH’s National Institute of General Medical Sciences (NIGMS). Her mission in BBCB is to advance research in computational biology, biophysics and data sciences, mathematical and biostatistical methods, and biomedical technologies in support of the NIGMS mission to increase understanding of life processes. Dr. Gregurick also serves as the Senior Advisor to the Office of Data Science Strategy, a newly formed office within the Office of the Director at NIH. Prior to joining the NIH, Susan was a program manager for the Department of Energy where she oversaw the development and implementation of the DOE Systems Biology Knowledgebase, which is a framework to integrate data, models, and simulations together for a better understanding of energy and environmental processes. During Susan’s academic career she was a Professor of Computational Biology at the University of Maryland, Baltimore County and her research interests include dynamics of large biological macromolecules. Susan holds a Ph.D. in Computational Chemistry and her areas of expertise are computational biology, high performance computing, neutron scattering and bioinformatics.

View on YouTube

Amazon Elastic Inference Now Available In Amazon ECS Tasks

Amazon Elastic Inference Now Available In Amazon ECS Tasks

Amazon ECS supports attaching Amazon Elastic Inference accelerators to your containers to make running deep learning inference workloads more cost-effective. Amazon Elastic Inference allows you to attach just the right amount of GPU-powered acceleration to any Amazon EC2 or Amazon SageMaker instance, or ECS task, to reduce the cost of running deep learning inference by up to 75%.

from Recent Announcements https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-elastic-inference-now-available-in-amazon-ecs-tasks/

Amazon EC2 I3en Instances are Now Available in AWS GovCloud (US-East), Asia Pacific (Sydney), Europe (London) AWS Regions

Amazon EC2 I3en Instances are Now Available in AWS GovCloud (US-East), Asia Pacific (Sydney), Europe (London) AWS Regions

Starting today, Amazon EC2 I3en instances are available in the AWS GovCloud (US-East), Asia Pacific (Sydney), and Europe (London) AWS Regions. I3en global availability now includes the Asia Pacific (Tokyo, Seoul, Singapore, Sydney), Europe (Frankfurt, Ireland, London), US East (N. Virginia, Ohio, AWS GovCloud), and US West (Oregon, N. California, AWS GovCloud) regions. 

from Recent Announcements https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-ec2-i3en-instances-now-available-aws-govcloud-useast-sydney-and-london/

Building a Serverless FHIR Interface on AWS

Building a Serverless FHIR Interface on AWS

This post is courtesy of Mithun Mallick, Senior Solutions Architect (Messaging), and Navneet Srivastava, Senior Solutions Architect.

Technology is revolutionizing the healthcare industry but it can be a challenge for healthcare providers to take full advantage because of software systems that don’t easily communicate with each other. A single patient visit involves multiple systems such as practice management, electronic health records, and billing. When these systems can’t operate together, it’s harder to leverage them to improve patient care.

To help make it easier to exchange data between these systems, Health Level Seven International (HL7) developed the Fast Healthcare Interoperability Resources (FHIR), an interoperability standard for the electronic exchange of healthcare information. In this post, I will show you the AWS services you use to build a serverless FHIR interface on the cloud.

In FHIR, resources are your basic building blocks. A resource is an exchangeable piece of content that has a common way to define and represent it, a set of common metadata, and a human readable part. Each resource type has the same set of operations, called interactions, that you use to manage the resources in a granular fashion. For more information, see the FHIR overview.

FHIR Serverless Architecture

My FHIR architecture features a server with its own data repository and a simple consumer application that displays Patient and Observation data. To make it easier to build, my server only supports the JSON content type over HTTPs, and it only supports the Bundle, Patient, and Observation FHIR resource types. In a production environment, your server should support all resource types.

For this architecture, the server supports the following interactions:

  • Posting bundles as collections of Patients and Observations
  • Searching Patients and Observations
  • Updating and reading Patients
  • Creating a CapabilityStatement

You can expand this architecture to support all FHIR resource types, interactions, and data formats.

The following diagram shows how the described services work together to create a serverless FHIR messaging interface.

 

Services work together to create a serverless FHIR messaging interface.

 

Amazon API Gateway

In Amazon API Gateway, you create the REST API that acts as a “front door” for the consumer application to access the data and business logic of this architecture. I used API Gateway to host the API endpoints. I created the resource definitions and API methods in the API Gateway.

For this architecture, the FHIR resources map to the resource definitions in API Gateway. The Bundle FHIR resource type maps to the Bundle API Gateway resource. The observation FHIR resource type maps to the observation API Gateway resource. And, the Patient FHIR resource type maps to the Patient API Gateway resource.

To keep the API definitions simple, I used the ANY method. The ANY method handles the various URL mappings in the AWS Lambda code, and uses Lambda proxy integration to send requests to the Lambda function.

You can use the ANY method to handle HTTP methods, such as:

  • POST to represent the interaction to create a Patient resource type
  • GET to read a Patient instance based on a patient ID, or to search based on predefined parameters

We chose Amazon DynamoDB because it provides the input data representation and query patterns necessary for a FHIR data repository. For this architecture, each resource type is stored in its own Amazon DynamoDB table. Metadata for resources stored in the repository is also stored in its own table.

We set up global secondary indexes on the patient and observations tables in order to perform searches and retrieve observations for a patient. In this architecture, the patient id is stored as a patient reference id in the observation table. The patientRefid-index allows you to retrieve observations based on the patient id without performing a full scan of the table.

We chose Amazon S3 to store archived FHIR messages because of its low cost and high durability.

Processing FHIR Messages

Each Amazon API Gateway request in this architecture is backed by an AWS Lambda function containing the Jersey RESTful web services framework, the AWS serverless Java container framework, and the HAPI FHIR library.

The AWS serverless Java framework provides a base implementation for the handleRequest method in LambdaHandler class. It uses the serverless Java container initialized in the global scope to proxy requests to our jersey application.

The handler method calls a proxy class and passes the stream classes along with the context.

This source code from the LambdaHandler class shows the handleRequest method:

// Main entry point of the Lambda function, uses the serverless-java-container initialized in the global scope
// to proxy requests to our jersey application
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) 
    throws IOException {
    	
        handler.proxyStream(inputStream, outputStream, context);

        // just in case it wasn't closed by the map	per
        outputStream.close();
}

The resource implementations classes are in the com.amazonaws.lab.resources package. This package defines the URL mappings necessary for routing the REST API calls.

The following method from the PatientResource class implements the GET patient interaction based on a patient id. The annotations describe the HTTP method called, as well as the path that is used to make the call. This method is invoked when a request is sent with the URL pattern: Patient/{id}. It retrieves the Patient resource type based on the id sent as part of the URL.

	@GET
	@Path("/{id}")
public Response gETPatientid(@Context SecurityContext securityContext,
			@ApiParam(value = "", required = true) @PathParam("id") String id, @HeaderParam("Accept") String accepted) {
…
}

Deploying the FHIR Interface

To deploy the resources for this architecture, we used an AWS Serverless Application Model (SAM) template. During deployment, SAM templates are expanded and transformed into AWS CloudFormation syntax. The template launches and configures all the services that make up the architecture.

Building the Consumer Application

For out architecture, we wrote a simple Node.JS client application that calls the APIs on FHIR server to get a list of patients and related observations. You can build more advanced applications for this architecture. For example, you could build a patient-focused application that displays vitals and immunization charts. Or, you could build a backend/mid-tier application that consumes a large number of messages and transforms them for downstream analytics.

This is the code we used to get the token from Amazon Cognito:

token = authcognito.token();

//Setting url to call FHIR server

     var options = {
       url: "https://<FHIR SERVER>",
       host: "FHIR SERVER",
       path: "Prod/Patient",
       method: "GET",
       headers: {
         "Content-Type": "application/json",
         "Authorization": token
         }
       }

This is the code we used to call the FHIR server:

request(options, function(err, response, body) {
     if (err) {
       console.log("In error  ");
       console.log(err);

}
else {
     let patientlist = JSON.parse(body);

     console.log(patientlist);
     res.json(patientlist["entry"]);
}
});
 

We used AWS CloudTrail and AWS X-Ray for logging and debugging.

The screenshots below display the results:

Conclusion

In this post, we demonstrated how to build a serverless FHIR architecture. We used Amazon API Gateway and AWS Lambda to ingest and process FHIR resources, and Amazon DynamoDB and Amazon S3 to provide a repository for the resources. Amazon Cognito provides secure access to the API Gateway. We also showed you how to build a simple consumer application that displays patient and observation data. You can modify this architecture for your individual use case.

About the authors

Mithun MallickMithun is a Sr. Solutions Architect and is responsible for helping customers in the HCLS industry build secure, scalable and cost-effective solutions on AWS. Mithun helps develop and implement strategic plan to engage customers and partners in the industry and works with the community of technically focused HCLS specialists within AWS. He has hands on experience on messaging standards like X12, HL7 and FHIR. Mithun has a M.B.A from CSU (Ft. Collins, CO) and a bachelors in Computer Engineering. He holds several associate and professional certifications for architecting on AWS.

 

 

Navneet SrivastavaNavneet, a Sr. Solutions Architect, is responsible for helping provider organizations and healthcare companies to deploy electronic medical records, devices, and AI/ML-based applications while educating customers about how to build secure, scalable, and cost-effective AWS solutions. He develops strategic plans to engage customers and partners, and works with a community of technically focused HCLS specialists within AWS. He is skilled AI, ML, Big Data, and healthcare related technologies. Navneet has a M.B.A from NYIT and a bachelors in software Engineering and holds several associate and professional certifications for architecting on AWS.

from AWS Architecture Blog

NoSQL Workbench for Amazon DynamoDB – Available in Preview

NoSQL Workbench for Amazon DynamoDB – Available in Preview

I am always impressed by the flexibility of Amazon DynamoDB, providing our customers a fully-managed key-value and document database that can easily scale from a few requests per month to millions of requests per second.

The DynamoDB team released so many great features recently, from on-demand capacity, to support for native ACID transactions. Here’s a great recap of other recent DynamoDB announcements such as global tables, point-in-time recovery, and instant adaptive capacity. DynamoDB now encrypts all customer data at rest by default.

However, switching mindset from a relational database to NoSQL is not that easy. Last year we had two amazing talks at re:Invent that can help you understand how DynamoDB works, and how you can use it for your use cases:

To help you even further, we are introducing today in preview NoSQL Workbench for Amazon DynamoDB, a free, client-side application available for Windows and macOS to help you design and visualize your data model, run queries on your data, and generate the code for your application!

The three main capabilities provided by the NoSQL Workbench are:

  • Data modeler — to build new data models, adding tables and indexes, or to import, modify, and export existing data models.
  • Visualizer — to visualize data models based on their applications access patterns, with sample data that you can add manually or import via a SQL query.
  • Operation builder — to define and execute data-plane operations or generate ready-to-use sample code for them.

To see how this new tool can simplify working with DynamoDB, let’s build an application to retrieve information on customers and their orders.

Using the NoSQL Workbench
In the Data modeler, I start by creating a CustomerOrders data model, and I add a table, CustomerAndOrders, to hold my customer data and the information on their orders. You can use this tool to create a simple data model where customers and orders are in two distinct tables, each one with their own primary keys. There would be nothing wrong with that. Here I’d like to show how this tool can also help you use more advanced design patterns. By having the customer and order data in a single table, I can construct queries that return all the data I need with a single interaction with DynamoDB, speeding up the performance of my application.

As partition key, I use the customerId. This choice provides an even distribution of data across multiple partitions. The sort key in my data model will be an overloaded attribute, in the sense that it can hold different data depending on the item:

  • A fixed string, for example customer, for the items containing the customer data.
  • The order date, written using ISO 8601 strings such as 20190823, for the items containing orders.

By overloading the sort key with these two possible values, I am able to run a single query that returns the customer data and the most recent orders. For this reason, I use a generic name for the sort key. In this case, I use sk.

Apart from the partition key and the optional sort key, DynamoDB has a flexible schema, and the other attributes can be different for each item in a table. However, with this tool I have the option to describe in the data model all the possible attributes I am going to use for a table. In this way, I can check later that all the access patterns I need for my application work well with this data model.

For this table, I add the following attributes:

  • customerName and customerAddress, for the items in the table containing customer data.
  • orderId and deliveryAddress, for the items in the table containing order data.

I am not adding a orderDate attribute, because for this data model the value will be stored in the sk sort key. For a real production use case, you would probably have much more attributes to describe your customers and orders, but I am trying to keep things simple enough here to show what you can do, without getting lost in details.

Another access pattern for my application is to be able to get a specific order by ID. For that, I add a global secondary index to my table, with orderId as partition key and no sort key.

I add the table definition to the data model, and move on to the Visualizer. There, I update the table by adding some sample data. I add data manually, but I could import a few rows from a table in a MySQL database, for example to simplify a NoSQL migration from a relational database.

Now, I visualize my data model with the sample data to have a better understanding of what to expect from this table. For example, if I select a customerId, and I query for all the orders greater than a specific date, I also get the customer data at the end, because the string customer, stored in the sk sort key, is always greater that any date written in ISO 8601 syntax.

In the Visualizer, I can also see how the global secondary index on the orderId works. Interestingly, items without an orderId are not part of this index, so I get only 4 of the 6 items that are part of my sample data. This happens because DynamoDB writes a corresponding index entry only if the index sort key value is present in the item. If the sort key doesn’t appear in every table item, the index is said to be sparseSparse indexes are useful for queries over a subsection of a table.

I now commit my data model to DynamoDB. This step creates server-side resources such as tables and global secondary indexes for the selected data model, and loads the sample data. To do so, I need AWS credentials for an AWS account. I have the AWS Command Line Interface (CLI) installed and configured in the environment where I am using this tool, so I can just select one of my named profiles.

I move to the Operation builder, where I see all the tables in the selected AWS Region. I select the newly created CustomerAndOrders table to browse the data and build the code for the operations I need in my application.

In this case, I want to run a query that, for a specific customer, selects all orders more recent that a date I provide. As we saw previously, the overloaded sort key would also return the customer data as last item. The Operation builder can help you use the full syntax of DynamoDB operations, for example adding conditions and child expressions. In this case, I add the condition to only return orders where the deliveryAddress contains Seattle.

I have the option to execute the operation on the DynamoDB table, but this time I want to use the query in my application. To generate the code, I select between Python, JavaScript (Node.js), or Java.

You can use the Operation builder to generate the code for all the access patterns that you plan to use with your application, using all the advanced features that DynamoDB provides, including ACID transactions.

Available Now
You can find how to set up NoSQL Workbench for Amazon DynamoDB (Preview) for Windows and macOS here.

We welcome your suggestions in the DynamoDB discussion forum. Let us know what you build with this new tool and how we can help you more!

from AWS News Blog https://aws.amazon.com/blogs/aws/nosql-workbench-for-amazon-dynamodb-available-in-preview/