Tag: AWS Machine Learning Blog

The AWS DeepRacer League and countdown to the re:Invent Championship Cup 2019

The AWS DeepRacer League and countdown to the re:Invent Championship Cup 2019

The AWS DeepRacer League is the world’s first autonomous racing league, open to anyone. Announced at re:Invent 2018, it puts machine learning in the hands of every developer in a fun and exciting way. Throughout 2019, developers of all skill levels have competed in the League at 21 Amazon events globally, including Amazon re:MARS and select AWS Summits, and put their skills to the test in the League’s virtual circuit via the AWS DeepRacer console. The League concludes at re:Invent 2019. Log in today and start racing—time is running out to win an expenses paid trip to re:Invent!

The final AWS Summit race in Toronto

In the eight months since the League kicked off in Santa Clara, the League has visited 17 countries, with thousands of developers completing over 13,000 laps and 165 miles of track. Each city has crowned its champion, and we will see each of them at re:Invent 2019!

On October 3, 2019, the 21st and final AWS DeepRacer Summit race took place in Toronto, Canada. The event concluded in-person racing for the AWS DeepRacer League, and not one, but four expenses paid trips were up for grabs.

First was the crowning of our Toronto champion Mohammad Al Ansari, with a winning time of 7.85 seconds, just 0.4 seconds away from beating the current world record of 7.44 seconds. Mohammad came to the AWS Summit with his colleague from Myplanet, where they took part in an AWS-led workshop for AWS DeepRacer to learn more about machine learning. They then made connections with AWS DeepRacer communities and received support from AWS DeepRacer enthusiasts such as Lyndon Leggate, a recently announced AWS ML Hero.

The re:Invent line up is shaping up

Once the racing concluded, it was time to tally up the scores for the overall competition and name the top three overall Summit participants. Foreign Exchange IT specialist Ray Goh traveled from Singapore to compete in his fourth race in his quest to top the overall leaderboard. Ray previously attended the Singapore, Hong Kong, and re:Mars races, and has steadily improved his models all year. He closed out the season with his fastest time of 8.15 seconds at the Toronto race. The other two spots went to [email protected] and [email protected], who have also secured their place in the knockouts at re:Invent along with the 21 Summit Champions.

It could be you that lifts the Championship Cup

The Championship Cup at re:Invent is sure to be filled with fun and surprises, so watch this space for more information. There is still time for developers of all skill levels to advance to the knockouts. Compete now in the final AWS DeepRacer League Virtual Circuit, and it could be you who is the Champion of the 2019 AWS DeepRacer League!

 


About the Author

Alexandra Bush is a Senior Product Marketing Manager for AWS AI. She is passionate about how technology impacts the world around us and enjoys being able to help make it accessible to all. Out of the office she loves to run, travel and stay active in the outdoors with family and friends.

 

 

from AWS Machine Learning Blog

Calculating new stats in Major League Baseball with Amazon SageMaker

Calculating new stats in Major League Baseball with Amazon SageMaker

The 2019 Major League Baseball (MLB) postseason is here after an exhilarating regular season in which fans saw many exciting new developments. MLB and Amazon Web Services (AWS) teamed up to develop and deliver three new, real-time machine learning (ML) stats to MLB games: Stolen Base Success Probability, Shift Impact, and Pitcher Similarity Match-up Analysis. These features are giving fans a deeper understanding of America’s pastime through Statcast AI, MLB’s state-of-the-art technology for collecting massive amounts of baseball data and delivering more insights, perspectives, and context to fans in every way they’re consuming baseball games.

This post looks at the role machine learning plays in providing fans with deeper insights into the game. We also provide code snippets that show the training and deployment process behind these insights on Amazon SageMaker.

Machine learning steals second

Stolen Base Success Probability provides viewers with a new depth of understanding of the cat and mouse game between the pitcher and the baserunner.

To calculate the Stolen Base Success Probability, AWS used MLB data to train, test, and deploy an ML model that analyzes thousands of data points covering 37 variables that, together, determine whether or not a player safely arrives at second if he attempts to steal. Those variables include the runner’s speed and burst, the catcher’s average pop time to second base, the pitcher’s velocity and handedness, historical stolen base success rates for the runner, batter, and pitcher, along with relevant data about the game context.

We took a 10-fold cross-validation approach to explore a range of classification algorithms, such as logistic regression, support vector machines, random forests, and neural networks, by using historical play data from 2015 to 2018 provided by MLB that corresponds to ~7.3K stolen base attempts with ~5.5K successful stolen bases and ~1.8K runners caught stealing. We applied numerous strategies to deal with the class imbalance, including class weights, custom loss functions, and sampling strategies, and found that the best performing model for predicting the probability of stolen base success was a deep neural network trained on an Amazon Deep Learning (DL) AMI, pre-configured with popular DL frameworks. The trained model was deployed using Amazon SageMaker, which provided the subsecond response times required for integrating predictions into in-game graphics in real-time, and on ML instances that auto-scaled across multiple Availability Zones. For more information, see Deploy trained Keras or TensorFlow models using Amazon SageMaker.

As the player on first base contemplates stealing second, viewers can see his Stolen Base Success Probability score in real-time right on their screens.

MLB offered fans a pilot test and preview of Stolen Base Success Probability during the 2018 postseason. Thanks to feedback from broadcasters and fans, MLB and AWS collaborated during the past offseason to develop an enhanced version with new graphics, improved latency of real-time stats for replays, and a cleaner look. One particular enhancement is the “Go Zone,” the point along the baseline where the player’s chances of successfully making the steal reaches a minimum of 85%.

As the player extends his lead towards second, viewers can now see the probability changing dynamically and a jump in his chances of success when he hits the “Go Zone.” After the runner reaches second base, whether he gets called “safe” or “out,” viewers have the opportunity during a replay to see data generated from a variety of factors that may have determined the ultimate outcome, like the runner’s sprint speed and the catcher’s pop time. Plus, that data is color-coded in green, yellow, and red to help fans visualize the factors that played the most significant roles in determining whether or not the player successfully made it to second.

Predicting impact of infield defensive strategies

Over the last decade, there have been few changes in MLB as dramatic as the rise of the infield shift, a “situational defensive realignment of fielders away from their traditional starting points.” Teams use the shift to exploit batted-ball patterns, such as a batter’s tendency to pull batted balls (right field for left-handed hitters and left field for right-handed hitters). As a batter steps up to the plate, the defensive infielders adjust their positions to cover the area where the batter has historically hit the ball into play.

Using Statcast AI data, teams can give their defense an advantage by shifting players to prevent base hits—and teams are employing this strategy more often now than at any other time in baseball history. League-wide shifting rates have increased by 86% over the last three years, up to 25.6% in 2019 from 13.8% in 2016.

AWS and MLB teamed up to employ machine learning to give baseball fans insight into the effectiveness of a shifting strategy. We developed a model to estimate the Shift Impact—the change in a hitter’s expected batting average on ground balls—as he steps up to the plate, using historical data and Amazon SageMaker. As infielders move around the field, the Shift Impact dynamically updates by re-computing the expected batting average with the changing positions of the defenders. This provides a real-time experience for fans.

Using data to quantify the Shift Impact

A spray chart can illustrate the tendency batters have in hitting balls towards a particular direction. The chart indicates the percentage at which a player’s batted balls are hit through various sections of the field. The following chart shows the 2018 spray distribution of Joey Gallo’s (from the Texas Rangers) batted balls hit within the infielders’ reach, defined as having a projected distance of less than 200 feet away from home plate. For more information, see Joey Gallo’s current stats on Baseball Savant.

The preceding chart shows the tendency to pull the ball toward right field for Joey Gallo, who hit 74% of his balls to the right of second base in 2018. A prepared defense can take advantage of this observation by overloading the right side of the infield, cutting short the trajectory of the ball and increasing the chance of converting the batted ball into an out.

We estimated the value of specific infield alignments against batters based on their historical batted-ball distribution by taking into account the last three seasons of play, or approximately 60,000 batted balls in the infield. For each of these at-bats, we gathered the launch angle and exit velocity of the batted ball and infielder positions during the pitch, while looking up the known sprint speed and handedness of the batter. While there are many metrics for offensive production in baseball, we chose to use batting average on balls in play—that is, the probability of a ball in play resulting in a base hit.

We calculated how effective a shift might be by estimating the amount by which a specific alignment decreases our offensive measure. After deriving new features, such as the projected landing path of the ball and one-hot encoding the categorical variables, the data was ready for ingestion into various ML frameworks to estimate the probability that a ball in play results in a base hit. From that, we could compute the changes to the probability due to changing infielder alignments.

Using Amazon SageMaker to calculate Shift Impact

We trained ML models on more than 50,000 at-bat samples. We found that the results of a Bayesian search through a hyperparameter optimization (HPO) job using Amazon SageMaker’s Automatic Model Tuning feature over the pre-built XGBoost algorithm on Amazon SageMaker returned the most performant predictions with overall precision of 88%, recall of 88%, and an f1 score of 88% on the validation set of nearly 10,000 events. Launching an HPO job on Amazon SageMaker is as simple as defining the parameters to describe the job, then firing it off to the backend services that manage the core infrastructure (Amazon EC2, Amazon S3, Amazon ECS) to iterate through the defined hyperparameter space efficiently and find the optimal model.

The code snippets shown utilize boto3, the Python API for AWS products and tools. Amazon SageMaker also offers the SageMaker Python SDK, an open source library with several high-level abstractions for working with Amazon SageMaker and popular deep learning frameworks.

Defining the HPO job

We started by setting up the Amazon SageMaker client and defining the tuning job. This specifies which parameters to vary during tuning, along with the evaluation metric we wish to optimize towards. In the following code, we set it to minimize the log loss on the validation set:

import boto3
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

sm_client = boto3.Session().client('sagemaker')
xgboost_image = get_image_uri(boto3.Session().region_name, 'xgboost')
role = get_execution_role()

tuning_job_config = {
    "ParameterRanges": {
      "CategoricalParameterRanges": [],
      "ContinuousParameterRanges": [
        {
          "MaxValue": "1",
          "MinValue": "0",
          "Name": "eta"
        },
        {
          "MaxValue": "2",
          "MinValue": "0",
          "Name": "alpha"
        },
      ],
      "IntegerParameterRanges": [
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "max_depth"
        },
      ]
    },
    "ResourceLimits": {
      "MaxNumberOfTrainingJobs": 100,
      "MaxParallelTrainingJobs": 10
    },
    "Strategy": "Bayesian",
    "HyperParameterTuningJobObjective": {
      "MetricName": "validation:logloss",
      "Type": "Minimize"
    }
  }
 
training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": xgboost_image,
      "TrainingInputMode": "File"
    },
    "InputDataConfig": [
      {
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_train # path to training data
          }
        }
      },
      {
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_validation # path to validation data
          }
        }
      }
    ],
    "OutputDataConfig": {
      "S3OutputPath": s3_output # outpath path for model artifacts
    },
    "ResourceConfig": {
      "InstanceCount": 2,
      "InstanceType": "ml.c4.2xlarge",
      "VolumeSizeInGB": 10
    },
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "logloss",
      "objective": "binary:logistic",
      "rate_drop": "0.3",
      "tweedie_variance_power": "1.4",
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 43200
    }
}

Launching the HPO job

With the tuning job defined in the Python dictionary above, we now submit it to the Amazon SageMaker client, which then automates the process of launching EC2 instances with containers optimized to run XGBoost from ECS. See the following code:

sm_client.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = "tuning_job_name",
                                            HyperParameterTuningJobConfig = tuning_job_config,
                                            TrainingJobDefinition = training_job_definition)

During the game, we can analyze a given batter with his most recent at-bats and run those events through the model for all infielder positions as laid out on a grid. Since the amount of compute required for inference increases geometrically as the size of each grid cell is reduced, we adjusted the size to reach a balance between the resolution required for meaningful predictions and compute time. For example, consider a shortstop that shifts over to his left. If he moves over by only one foot, there will be a negligible effect on the outcome of a batted ball. However, if he repositions himself 10 feet to his left, that can very well put himself in a better position to field a ground ball pulled to right field. Examining all at-bats in our dataset, we found such a balance on a grid composed of 10-foot by 10-foot cells, accounting for more than 10,000 infielder configurations.

The process of obtaining the best performing model from the HPO job and deploying to production follows in the next section. Due to the large number of calls required for real-time inference, the results of the model are prepopulated into a lookup table that provides the relevant predictions during a live game.

Deploying the most performant model

Each tuning job launches a number of training jobs, from which the best model is selected according to the criteria defined earlier when configuring the HPO. From Amazon SageMaker, we first pull the best training job and its model artifacts. These are stored in the S3 bucket from which the training and validation datasets were pulled. See the following code:

# get best model from HPO job
best_training_job = smclient.describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name)['BestTrainingJob']
info = smclient.describe_training_job(TrainingJobName=best_training_job['TrainingJobName'])
model_name = best_training_job['TrainingJobName'] + '-model'
model_data = info['ModelArtifacts']['S3ModelArtifacts']

Next, we refer to the pre-configured container optimized to run XGBoost models and link it to the model artifacts of the best-trained model. Once this model-container pair is created on our account, we can configure an endpoint with the instance type, number of instances, and traffic splits (for A/B testing) of our choice:

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': xgboost_image,
        'ModelDataUrl': model_data})

# create endpoint configuration
endpoint_config_name = model_name+'-endpointconfig'
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType':'ml.m5.2xlarge',
        'InitialVariantWeight':1,
        'InitialInstanceCount':1,
        'ModelName':model_name,
        'VariantName':'AllTraffic'}])

# create endpoint
endpoint_name = model_name+'-endpoint'
create_endpoint_response = smclient.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']

print("Arn: " + resp['EndpointArn'])
print(create_endpoint_response['EndpointArn'])

Inference from the endpoint

The Amazon SageMaker runtime client makes predictions from the model, and sends a request to the endpoint hosting the model container on an EC2 instance and returns the output. We can configure entry points of the endpoint for custom models and data processing steps:

# invoke endpoint
runtime_client = boto3.client('runtime.sagemaker')
random_payload = np.array2string(np.random.random(num_features), separator=',', max_line_width=np.inf)[1:-1]
response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, Body=random_payload)
prediction = response['Body'].read().decode("utf-8")
print(prediction) 

With all of the predictions for a given batter and infielder configurations, we then average the probability of a base hit returned from the model stored in the lookup table and subtract the expected batting average for the same sample of batted balls. The resulting metric is the Shift Impact.

Matchup Analysis

In interleague games, where teams from the American and National leagues compete against each other, many batters face pitchers they have never seen before. Estimating outcomes in interleague games is difficult because there is limited relevant historical data. AWS worked with MLB to group similar pitchers together to gain insight on how the batter has historically performed against similar pitchers. We took a machine learning approach, which allowed us to combine the domain knowledge of experts with data comprised of hundreds of thousands of pitches to find additional patterns we could use to identify similar pitchers.

Modeling

Taking inspiration from the field of recommendation systems, in which the matching problem is typically solved by computing a user’s inclination towards a product, here we seek to determine the interaction between a pitcher and batter. There are many algorithms appropriate to building recommenders, but few that allow us to then cluster like items that are put into the algorithm. Neural networks shine in this area. End layers in a neural network architecture can be interpreted as numerical representations of the input data, whether it be an image or a pitcher ID. Given input data, its associated numerical representation–or embedding–can be compared against the embeddings of other input items. Those embeddings that lie near each other are similar, not just in this embedding space, but also in interpretable characteristics. For example, we expect handedness to play a role in defining which pitchers are similar. This approach to recommendation systems and clustering items is known as deep matrix factorization.

Deep matrix factorization accounts for nonlinear interactions between a pair of entities, while also mixing in the techniques of content-based and collaborative filtering. Rather than working solely with a pitcher-batter matrix, as in matrix factorization, we build a neural network that aligns each pitcher and batter with their own embedding and then pass them through a series of hidden layers that are trained towards predicting the outcome of a pitch. In addition to the collaborative nature of this architecture, additional contextual data is included for each pitch such as the count, number of runners on base, and the score.

The model is optimized against the predicted outcome of each pitch, including both the pitch characteristics (slider, changeup, fastball, etc.) and the outcome (ball, single, strike, swinging strike, etc.). After training a model on this classification problem, the end layer of the pitcher ID input is extracted as the embedding for that particular pitcher.

Results

As a batter steps up to the plate against a pitcher he hasn’t faced before, we search for the nearest embeddings to that of the opposing pitcher and calculate the on-base plus slugging percentage (OPS) against that group of pitchers. To see the results in action, see 9/11/19: FSN-Ohio executes OPS comparison.

Summary

MLB uses cloud computing to create innovative experiences that introduce additional ways for fans to experience baseball. With Stolen Base Success Probability, Shift Impact, and Pitcher Similarity Match-up Analysis, MLB provides compelling, real-time insight into what’s happening on the field and a greater connection to the context that builds the unique drama of the game that fans love.

This postseason, fans will have many opportunities to see stolen base probability in action, the potential effects of infield alignments, and launch into debates with friends about what makes pitchers similar.

Fans can expect to see these new stats in live game broadcasts with partners such as ESPN and MLB Network. Plus, other professional sports leagues including the NFL and Formula 1 have selected AWS as their cloud and machine learning provider of choice.

You can find full, end-to-end examples of implementing an HPO job on Amazon SageMaker at the AWSLabs GitHub repo. If you’d like help accelerating your use of machine learning in your products and processes, please contact the Amazon ML Solutions Lab program.


About the Authors

Hussain Karimi is a data scientist at the Amazon ML Solutions Lab, where he works with AWS customers to develop machine learning models that uncover unique insights in various domains.

 

 

 

 

Travis Petersen is a Senior Data Scientist at MLB Advanced Media and an adjunct professor at Fordham University.

 

 

 

 

Priya Ponnapalli is a principal scientist and manager at Amazon ML Solutions Lab, where she helps AWS customers across different industries accelerate their AI and cloud adoption.

from AWS Machine Learning Blog

Managing multi-topic conversation flows with Amazon Lex Session API checkpoints

Managing multi-topic conversation flows with Amazon Lex Session API checkpoints

In daily conversations, you often jump back and forth between multiple topics. For example, when discussing a home improvement project related to new windows and curtains, you might have questions like, “How about closing out on curtain styles and then revisiting colors?” When AWS launched Amazon Lex Session API, you learned how to address such digressions in the conversation. You can use Session API actions to switch an intent and continue the conversation. But in everyday interactions, you might have to deal with multiple digressions: “Let’s finish selecting windows before we get to curtains.”

How do you design conversation flows that contain a series of digressions? If you are like me, you’d have a dozen questions before even considering a specific product in a home improvement project.

With session checkpoints, you can easily design a conversation to support a switch to one of many topics. You can model the home improvement conversation as two intents: OrderWindows and OrderCurtains.  Now it is easy to switch topics. The flows for OrderWindows would have a checkpoint. If the user is ordering curtains but wants to complete selecting windows first, you could move the conversation back to the OrderWindows using “windowSelection” checkpoint.

Managing session checkpoints

The Amazon Lex runtime API provides operations that enable you to manage session checkpoints for a conversation. The PutSession and GetSession calls enable you to define and retrieve checkpoints.  Here’s how you can use the APIs to manage the conversation flows described earlier. Please review the bot schema for bot details.

Follow these steps to manage the conversation flow:

  1. Store the current state of the conversation
  2. Retrieve the previously stored state and continue the conversation

Store the current state of the conversation

Call the GetSession API with no filters to retrieve the current state of the conversation between your bot and the user. The GetSession API call is followed by a PutSession API call, which applies a checkpoint ‘windowSelection’ onto the OrderWindows intent. The PutSession call is shown in the code example:

PutSession Request:  Applying 'windowSelection' checkpoint on 'OrderWindows' intent

response = client.put_session (
	botName='HomeImprovementBot',
	botAlias='Prod',
	userId='abc1234',
	recentIntentSummaryView=[
	  {
	    "intentName": "OrderCurtains",
	    "slots": {
	      "curtainSize": "None",
	      "curtainStyle": "None"
	    },
	    "confirmationStatus": "None",
	    "dialogActionType": "ElicitSlot",
	    "slotToElicit": "curtainSize",
	    "checkpointLabel": "None"
	  },
	  {
	    "intentName": "OrderWindows",
	    "slots": {
	      "windowSize": "large",
	      "windowStyle": "None"
	    },
	    "confirmationStatus": "None",
	    "dialogActionType": "ElicitSlot",
	    "slotToElicit": "windowStyle",
	    "checkpointLabel": "windowSelection"
	  }
	]
)

Retrieve the previously stored state

At this point, the OrderCurtains intent has completed. Issue a GetSession API call, while passing a ‘windowSelection’ checkpointLabelFilter. This call results with the matching intent (OrderWindows), which received the checkpoint label in the previous step.

Continue with the conversation

Finally, issue a PutSession API call, setting the next step in the conversation to be continued where the user left off in OrderWindows. The following code example lists the details for GetSession:


GetSession Request:  Filtering on 'windowSelection' CheckpointLabel

--- GetSession Request with filter: ---
 
response = client.get_session(
	botName='HomeImprovementBot',
	botAlias='Prod',
	userId='abc123',
	checkpointLabelFilter='windowSelection'
)

--- Filtered GetSession Response: --- 
{
  "recentIntentSummaryView": [
    {
      "intentName": "OrderWindows",
      "slots": {
        "windowSize": "large",
        "windowStyle": "None"
      },
      "confirmationStatus": "None",
      "dialogActionType": "ElicitSlot",
      "slotToElicit": "windowStyle",
      "checkpointLabel": "windowSelection"
    }
  ],
  "sessionAttributes": {},
  "sessionId": "XXX",
  "dialogAction": {
    "type": "ElicitSlot",
    "intentName": "OrderCurtains",
    "slots": {
      "curtainSize": "None",
      "curtainStyle": "None"
    },
    "slotToElicit": "curtainSize"
  }
}

Getting started with Session API checkpoints

In this post, you learned how to use Session API checkpoints to manage multiple digressions. You can define Session API checkpoints using the AWS SDK. You can download the bot schema for the conversation in this post to implement a quick application. For more information, see the Amazon Lex documentation.


About the Author

Shahab Shekari works as a Software Development Engineer at Amazon AI. He works on scalable distributed systems and enhancing Lex user experiences. Outside of work, he can be found traveling and enjoying the Pacific Northwest with his dogs, friends and family.

 

 

from AWS Machine Learning Blog

Verifying and adjusting your data labels to create higher quality training datasets with Amazon SageMaker Ground Truth

Verifying and adjusting your data labels to create higher quality training datasets with Amazon SageMaker Ground Truth

Building a highly accurate training dataset for your machine learning (ML) algorithm is an iterative process. It is common to review and continuously adjust your labels until you are satisfied that the labels accurately represent the ground truth, or what is directly observable in the real world. ML practitioners often built custom systems to review and update data labels because accurately labeled data is critical to ML model quality. If there are issues with the labels, the ML model can’t effectively learn the ground truth, which leads to inaccurate predictions.

One way that ML practitioners have improved the accuracy of their labeled data is through using audit workflows. Audit workflows enable a group of reviewers to verify the accuracy of labels (a process called label verification) or adjust them (a process called label adjustment) if needed.

Amazon SageMaker Ground Truth now features built-in workflows for label verification, and label adjustment for bounding boxes and semantic segmentation. With these new workflows, you can chain an existing Amazon SageMaker Ground Truth labeling job to a verification or adjustment job, or you can import your existing labels for a verification or adjustment job.

This post walks you through both options for bounding boxes labels. The walkthrough assumes that you are familiar with running a labeling job or have existing labels. For more information, see Amazon SageMarker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%.

Chaining a completed Amazon SageMaker Ground Truth labeling job

To chain a completed labeling job, complete the following steps.

  1. From the Amazon SageMaker Ground Truth console, choose Labeling jobs.
  2. Select your desired job.
  3. From the Actions drop-down menu, choose Chain.

The following screenshot shows the Labeling jobs page:

For more information, see Chaining labeling jobs.

The Job Overview page carries forward the configurations you used for your chained job. If there are no changes, you can move to the next section Task Type.

Configuring label verification

To use label verification, from Task type, choose Label verification.

See the following screenshot of the Task type page:

The Workers section is preconfigured to the selections you made for the chained labeling job. You can opt to choose a different workforce or stick with the same configurations for your label verification job. For more information, see Managing Your Workforce.

You can define your verification labels, for example, Label Correct, Label Incorrect – Object(s) Missed, and Label Incorrect – Box(es) Not Tightly Drawn.

You can also specify the instructions in the left-hand panel to guide reviewers on how to verify the labels.

See the following screenshot of the Label verification tool page:

Configuring label adjustment

To perform label adjustment, from the Task type section, choose Bounding box. See the following screenshot of the Task type page:

The following steps for configuring the Workers section and setting up the labeling tool are similar to creating a verification job. The one exception is that you must opt into displaying existing labels in the Existing-labels display options section. See the following screenshot:

Uploading your existing labels from outside Amazon SageMaker Ground Truth

If you labeled your data outside of Amazon SageMaker Ground Truth, you can still use the service to verify or adjust your labels. Import your existing labels by following these steps.

  1. Create an augmented manifest with both your data and existing labels.For example, in the following example code, the source-ref points to the images that were labeled, and the “bound-box” attribute is the label.
    {"source-ref": "<S3 location of image 1>", "bound-box": <bounding box label>}
    {"source-ref": "<S3 location of image 2>", "bound-box": <bounding box label>}

  2. Save your augmented manifest in Amazon S3.You should save the manifest in the same S3 bucket as your images. Also, remember the attribute name of your labels (in this post, bound-box) because you need to point to this when you set up your jobs.Additionally, make sure that the labels conform to the label format prescribed by Amazon SageMaker Ground Truth. For example, you can see the label format for bounding boxes in Bounding Box Job Output.You are now ready to create verification and adjustment jobs.
  3. From the Amazon SageMaker Ground Truth console, create a new labeling job.
  4. In Job overview, for Input dataset location, point to the S3 path of the augmented manifest that you created.See the following screenshot of the Job overview page:
  5. Follow the steps previously outlined to configure Task Type, Workers, and the labeling tool when setting up your verification or adjustment job.
  6. In Existing-labels display option, for Label attribute name, select the name of your augmented manifest from the drop-down menu.See the following screenshot of Existing-labels display options:

Conclusion

A highly accurate training dataset is critical for achieving your ML initiatives, and you now have built-in workflows to perform label verification and adjustment through Amazon SageMaker Ground Truth. This post walked you through how to use the new label verification and adjustment features. You can chain a completed labeling job, or you can upload labels. Visit the Amazon SageMaker Ground Truth console to get started.

As always, AWS welcomes feedback. Please submit any comments or questions.


About the Authors

Sifan Wang is a Software Development Engineer for AWS AI. His focus is on building scalable systems to process big data and intelligent systems to learn from the data. In his spare time, he enjoys traveling and hitting the gym.

 

 

 

Carter Williams is a Web Development Engineer on the Mechanical Turk Requester CX team with a focus in Computer Vision UIs. He strives to learn and develop new ways to gather accurate annotation data in intuitive ways using web technologies. In his free time, he enjoys paintball, hockey, and snowboarding.

 

 

 

Vikram Madan is the Product Manager for Amazon SageMaker Ground Truth. He focusing on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys running long distances and watching documentaries.

 

 

from AWS Machine Learning Blog

Amazon Textract is now HIPAA eligible

Amazon Textract is now HIPAA eligible

Today, Amazon Web Services (AWS) announced that Amazon Textract, a machine learning service that quickly and easily extracts text and data from scanned documents, is now eligible for healthcare workloads that require HIPAA certification. This launch builds upon the existing portfolio of AWS artificial intelligence services that are HIPAA-eligible, including Amazon Translate, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon SageMaker and Amazon Rekognition – that help deliver better healthcare outcomes.

Healthcare providers routinely extract text and data from documents such as medical records and forms through manual data entry or simple optical character recognition (OCR) software. This is a time-consuming and often inaccurate process that produces outputs requiring extensive post-processing before it can be used by other applications. What organizations want instead is the ability to accurately identify and extract text and data from forms and tables in documents of any format and from a variety of file types and templates.

Amazon Textract analyzes virtually any type of document, automatically generating highly accurate text, form, and table data. Amazon Textract identifies text and data from tables and forms in documents – such as patient information from an insurance claim or values from a table in a scanned medical chart – and recognizes a range of document formats, including those specific to healthcare and insurance, without requiring any customization or human intervention. Amazon Textract makes it easy for customers to accurately process millions of document pages in a matter of hours, significantly lowering document processing costs, and allowing customers to focus on deriving business value from their text and data instead of wasting time and effort on post-processing. Results are delivered via an API that can be easily accessed and used without requiring any machine learning experience.

Starting today, Amazon Textract is now a HIPAA-eligible service, which means healthcare customers can take full advantage of it. Many healthcare customers like Cerner, Fred Hutchinson Cancer Research Center, and The American Heart Association, are already exploring new ways use the power of ML to automate their current workloads and transform how they provide care to patients, all while meeting the security and privacy requirements required by HIPAA.

Change Healthcare is a leading independent healthcare technology company that provides data and analytics-driven solutions to improve clinical, financial, and patient engagement outcomes in the U.S. healthcare system. “At Change Healthcare, we believe that we can make healthcare affordable and accessible to all by improving the timeliness and quality of financial and administrative decisions.  This can be achieved by the power of machine learning technology to understand more from our data. But unlocking the potential of this information can often be difficult as it’s siloed in tables and forms that traditional optical character recognition hasn’t been able to analyze,” said Nick Giannasi, EVP and Chief AI Officer at Change Healthcare. “Amazon Textract further advances document understanding with the ability to retrieve structured data in addition to text, and now with the service becoming HIPAA eligible, we’ll be able to liberate the information from millions of documents and create even more value for patients, payers, and providers.”

Cambia Health Solutions is a total health solutions company and the parent company of six regional health plans, including Regence, an insurer serving 2.6 million members in Oregon, Idaho, Utah, and Washington. Cambia is transforming the health care system to be more economically sustainable and efficient for people and their families. “Over the past 100 years Cambia has been dedicated to improving health care for people and their families. To help us achieve that goal, we’re always evaluating new innovations and opportunities to optimize care coordination. One area of focus is streamlining administrative processes that are time and labor intensive. We’re excited to explore Amazon Textract to help us automate the process of extracting valuable data from paper forms accurately and efficiently. The powerful combination of data science, A.I., and a person-focused approach is key to our mission of transforming the health care system” said Faraz Shafiq, Cambia Health Solutions Chief Artificial Intelligence Officer.

ClearDATA is a HITRUST certified AWS Managed Service Provider trusted by customers across the globe to safeguard their sensitive data and power their critical applications. Matt Ferrari, Chief Technology Officer at ClearDATA, says “It’s exciting to see AWS add their optical character recognition service powered by machine learning, Amazon Textract, to their list of HIPAA eligible services. A lot of medical data that is shared among payers and providers is locked in image-based files like PDFs. Instead of manually processing that kind of data, healthcare organizations can now use Amazon Textract service to extract medical data from files that previously have been non-machine readable. This brings an opportunity to integrate this data with their electronic health records, or other cloud technologies like Amazon Comprehend Medical that can identify protected health information in the dataset.This is just another step forward in increasing the opportunity to use these emerging technologies to improve access to data, get better insights, lower costs, and improve patient and member experiences”. ClearDATA offers solutions and services that protect healthcare organizations from data privacy risks, improves their data management, and scales their healthcare IT infrastructure, along with one of the most comprehensive Business Associate Agreements in the healthcare industry.

For additional information on Amazon Machine Learning services and how healthcare and life sciences companies can run HIPAA-eligible workloads on AWS please reference the following materials:

To get started with Amazon Textract, you can click the “Get Started with Amazon Textract”, button on the Amazon Textract page. You must have an Amazon Web Services account; if you do not already have one, you will be prompted to create one during the process. Once you are signed in to your AWS account, try out Amazon Textract with your own images or PDF documents using the Amazon Textract Management Console. You can also download the Amazon Textract SDKs to start creating your own applications. Please refer to our step-by-step Getting Started Guide for more information.


About the author

Kriti Bharti is the Product Lead for Amazon Textract. Kriti has over 15 years’ experience in Product Management, Program Management, and Technology Management across multiple industries such as Healthcare, Banking and Finance, and Retail. In her time at AWS, she has helped launch a number of new services including AWS IoT Device Management and AWS IoT Device Defender. In her spare time, you can find Kriti spending a pawsome time with Fifi and her cousins, reading, or learning different dance forms.

from AWS Machine Learning Blog

Managing conversation flow with a fallback intent on Amazon Lex

Managing conversation flow with a fallback intent on Amazon Lex

Ever been stumped by a question? Imagine you’re in a business review going over weekly numbers and someone asks, “What about expenses?” Your response might be, “I don’t know. I wasn’t prepared to have that discussion right now.”

Bots aren’t fortunate enough to have the same comprehension capabilities, so how should they respond when they don’t have an answer? How can a bot recover when it doesn’t have the response? Asking you to repeat yourself could be quite frustrating if the bot still doesn’t understand. Perhaps it can pretend to understand what you said based on the last exchange? That might not always work and could also sound foolish. Maybe the bot can admit its limitations and tell you what it can do? That would be acceptable the first few times but can be suboptimal in the long run.

There is no single correct way. Conversation repair strategies vary by the kind of experience you’re trying to create. You can use error handling prompts. The bot would try to clarify by prompting “Sorry, can you please say that again?” a few times before hanging up with a message such as, “I am not able to assist you at this time.”  Building on the sample conversation above, let us first build a simple chatbot to answer questions related to revenue numbers. This bot answers questions such as “What’s the revenue in Q1?”, “What were our sales in western region?” The Lex bot contains only two intents: RegionDetails and QuarterDetails. With this bot definition, if someone were to discuss expenses (“How much did we spend last quarter?”), the bot would go through the clarification prompts and eventually hang up. You couldn’t intervene or execute business logic. The conversation would resemble the following:

Starting today, you can add fallback intent to help your bot recover gracefully in such situations. With a fallback intent, you can now control the bot’s recovery by providing additional information, managing dialog, or executing business logic. You can control the conversation better and manage the flow for an ideal outcome, such as the following:

Configuring the fallback intent

You can configure your fallback intent by completing the following steps.

  1. From the Amazon Lex console, choose Create intent.
  2. Search for AMAZON.Fallback in the existing intents.

See the following screenshot of the BusinessMetricsFallback page:

If you have any clarification prompts the Fallback intent will be triggered after the clarification prompts are executed. We recommend disabling the clarification prompts. Hang up phrase are not used when Fallback is configured. See the following screenshot of the Error handling page:

  1. Add an intent ContactDetails to collect the email ID.

This is a simple intent with just the email address as a slot type. Please review the bot definition for intent details.

  1. Add an AWS Lambda function in the fulfillment code hook of the fallback intent.

This function performs two operations. First, it creates a task (for example, a ticket entry in a database) to record your request for an operator follow-up. Second, it switches the intent to elicit additional information, such as your email ID, so that a response goes out after an operator has processed the query. Please review the Lambda definition for code details.

With the preceding bot definition, you can now control the conversation. When you ask “How much did we spend last quarter,” the input does not match any of the configured intents, and triggers the fallback intent. The fulfillment code hook of the Lambda creates the ticket and switches the intent to ContactDetails to capture the email ID.

Summary

This post demonstrated how to have better control of the conversation flow with a fallback intent. You can switch intents, execute business logic, or provide custom responses. For more information about incorporating these techniques into real bots, see the Amazon Lex documentation and FAQ page.

 

 


About the Author

Kartik Rustagi works as a Software Development Manager in Amazon AI. He and his team focus on enhancing the conversation capability of chat bots powered by Amazon Lex. When not at work, he enjoys exploring the outdoors and savoring different cuisines.

 

 

 

 

from AWS Machine Learning Blog

Generating searchable PDFs from scanned documents automatically with Amazon Textract

Generating searchable PDFs from scanned documents automatically with Amazon Textract

Amazon Textract is a machine learning service that makes it easy to extract text and data from virtually any document. Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables. This allows you to use Amazon Textract to instantly “read” virtually any type of document and accurately extract text and data without the need for any manual effort or custom code.

The blog post Automatically extract text and structured data from documents with Amazon Textract shows how to use Amazon Textract to automatically extract text and data from scanned documents without any machine learning (ML) experience. One of the use cases covered in the post is search and discovery. You can search through millions of documents by extracting text and structured data from documents with Amazon Textract and creating a smart index using Amazon ES.

This post demonstrates how to generate searchable PDF documents by extracting text from scanned documents using Amazon Textract. The solution allows you to download relevant documents, search within a document when it is stored offline, or select and copy text.

You can see an example of searchable PDF document that is generated using Amazon Textract from a scanned document. While text is locked in images in the scanned document, you can select, copy, and search text in the searchable PDF document.

To generate a searchable PDF, use Amazon Textract to extract text from documents and add the extracted text as a layer to the image in the PDF document. Amazon Textract detects and analyzes text input documents and returns information about detected items such as pages, words, lines, form data (key-value pairs), tables, and selection elements. It also provides bounding box information, which is an axis-aligned coarse representation of the location of the recognized item on the document page. You can use the detected text and its bounding box information to place text in the PDF page.

PDFDocument is a sample library in AWS Samples GitHub repo and provides the necessary logic to generate a searchable PDF document using Amazon Textract.  It also uses open-source Java library Apache PDFBox to create PDF documents, but there are similar PDF processing libraries available in other programming languages.

The following code example shows how to use sample library to generate a searchable PDF document from an image:

...

//Extract text using Amazon Textract
List<TextLine> lines = extractText(imageBytes);

//Generate searchable PDF with image and text
PDFDocument doc = new PDFDocument();
doc.addPage(image, imageType, lines);

//Save PDF to local disk
try(OutputStream outputStream = new FileOutputStream(outputDocumentName)) {
    doc.save(outputStream);
}

...

Generating a searchable PDF from an image document

The following code shows how to take an image document and generate a corresponding searchable PDF document. Extract the text using Amazon Textract and create a searchable PDF by adding the text as a layer with the image.

public class DemoPdfFromLocalImage {

    public static void run(String documentName, String outputDocumentName) throws IOException {

        System.out.println("Generating searchable pdf from: " + documentName);

        ImageType imageType = ImageType.JPEG;
        if(documentName.toLowerCase().endsWith(".png"))
            imageType = ImageType.PNG;

        //Get image bytes
        ByteBuffer imageBytes = null;
        try(InputStream in = new FileInputStream(documentName)) {
            imageBytes = ByteBuffer.wrap(IOUtils.toByteArray(in));
        }

        //Extract text
        List<TextLine> lines = extractText(imageBytes);

        //Get Image
        BufferedImage image = getImage(documentName);

        //Create new pdf document
        PDFDocument pdfDocument = new PDFDocument();

        //Add page with text layer and image in the pdf document
        pdfDocument.addPage(image, imageType, lines);

        //Save PDF to local disk
        try(OutputStream outputStream = new FileOutputStream(outputDocumentName)) {
            pdfDocument.save(outputStream);
            pdfDocument.close();
        }

        System.out.println("Generated searchable pdf: " + outputDocumentName);
    }
    
    private static BufferedImage getImage(String documentName) throws IOException {

        BufferedImage image = null;

        try(InputStream in = new FileInputStream(documentName)) {
            image = ImageIO.read(in);
        }

        return image;
    }

    private static List<TextLine> extractText(ByteBuffer imageBytes) {

        AmazonTextract client = AmazonTextractClientBuilder.defaultClient();

        DetectDocumentTextRequest request = new DetectDocumentTextRequest()
                .withDocument(new Document()
                        .withBytes(imageBytes));

        DetectDocumentTextResult result = client.detectDocumentText(request);

        List<TextLine> lines = new ArrayList<TextLine>();
        List<Block> blocks = result.getBlocks();
        BoundingBox boundingBox = null;
        for (Block block : blocks) {
            if ((block.getBlockType()).equals("LINE")) {
                boundingBox = block.getGeometry().getBoundingBox();
                lines.add(new TextLine(boundingBox.getLeft(),
                        boundingBox.getTop(),
                        boundingBox.getWidth(),
                        boundingBox.getHeight(),
                        block.getText()));
            }
        }

        return lines;
    }
}

Generating a searchable PDF from a PDF document

The following code example takes an input PDF document from an Amazon S3 bucket and generates the corresponding searchable PDF document. You extract text from the PDF document using Amazon Textract, and create a searchable PDF by adding text as a layer with an image for each page.

public class DemoPdfFromS3Pdf {
    public static void run(String bucketName, String documentName, String outputDocumentName) throws IOException, InterruptedException {

        System.out.println("Generating searchable pdf from: " + bucketName + "/" + documentName);

        //Extract text using Amazon Textract
        List<ArrayList<TextLine>> linesInPages = extractText(bucketName, documentName);

        //Get input pdf document from Amazon S3
        InputStream inputPdf = getPdfFromS3(bucketName, documentName);

        //Create new PDF document
        PDFDocument pdfDocument = new PDFDocument();

        //For each page add text layer and image in the pdf document
        PDDocument inputDocument = PDDocument.load(inputPdf);
        PDFRenderer pdfRenderer = new PDFRenderer(inputDocument);
        BufferedImage image = null;
        for (int page = 0; page < inputDocument.getNumberOfPages(); ++page) {
            image = pdfRenderer.renderImageWithDPI(page, 300, org.apache.pdfbox.rendering.ImageType.RGB);

            pdfDocument.addPage(image, ImageType.JPEG, linesInPages.get(page));

            System.out.println("Processed page index: " + page);
        }

        //Save PDF to stream
        ByteArrayOutputStream os = new ByteArrayOutputStream();
        pdfDocument.save(os);
        pdfDocument.close();
        inputDocument.close();

        //Upload PDF to S3
        UploadToS3(bucketName, outputDocumentName, "application/pdf", os.toByteArray());

        System.out.println("Generated searchable pdf: " + bucketName + "/" + outputDocumentName);
    }

    private static List<ArrayList<TextLine>> extractText(String bucketName, String documentName) throws InterruptedException {

        AmazonTextract client = AmazonTextractClientBuilder.defaultClient();

        StartDocumentTextDetectionRequest req = new StartDocumentTextDetectionRequest()
                .withDocumentLocation(new DocumentLocation()
                        .withS3Object(new S3Object()
                                .withBucket(bucketName)
                                .withName(documentName)))
                .withJobTag("DetectingText");

        StartDocumentTextDetectionResult startDocumentTextDetectionResult = client.startDocumentTextDetection(req);
        String startJobId = startDocumentTextDetectionResult.getJobId();

        System.out.println("Text detection job started with Id: " + startJobId);

        GetDocumentTextDetectionRequest documentTextDetectionRequest = null;
        GetDocumentTextDetectionResult response = null;

        String jobStatus = "IN_PROGRESS";

        while (jobStatus.equals("IN_PROGRESS")) {
            System.out.println("Waiting for job to complete...");
            TimeUnit.SECONDS.sleep(10);
            documentTextDetectionRequest = new GetDocumentTextDetectionRequest()
                    .withJobId(startJobId)
                    .withMaxResults(1);

            response = client.getDocumentTextDetection(documentTextDetectionRequest);
            jobStatus = response.getJobStatus();
        }

        int maxResults = 1000;
        String paginationToken = null;
        Boolean finished = false;

        List<ArrayList<TextLine>> pages = new ArrayList<ArrayList<TextLine>>();
        ArrayList<TextLine> page = null;
        BoundingBox boundingBox = null;

        while (finished == false) {
            documentTextDetectionRequest = new GetDocumentTextDetectionRequest()
                    .withJobId(startJobId)
                    .withMaxResults(maxResults)
                    .withNextToken(paginationToken);
            response = client.getDocumentTextDetection(documentTextDetectionRequest);

            //Show blocks information
            List<Block> blocks = response.getBlocks();
            for (Block block : blocks) {
                if (block.getBlockType().equals("PAGE")) {
                    page = new ArrayList<TextLine>();
                    pages.add(page);
                } else if (block.getBlockType().equals("LINE")) {
                    boundingBox = block.getGeometry().getBoundingBox();
                    page.add(new TextLine(boundingBox.getLeft(),
                            boundingBox.getTop(),
                            boundingBox.getWidth(),
                            boundingBox.getHeight(),
                            block.getText()));
                }
            }
            paginationToken = response.getNextToken();
            if (paginationToken == null)
                finished = true;
        }

        return pages;
    }

    private static InputStream getPdfFromS3(String bucketName, String documentName) throws IOException {

        AmazonS3 s3client = AmazonS3ClientBuilder.defaultClient();
        com.amazonaws.services.s3.model.S3Object fullObject = s3client.getObject(new GetObjectRequest(bucketName, documentName));
        InputStream in = fullObject.getObjectContent();
        return in;
    }

    private static void UploadToS3(String bucketName, String objectName, String contentType, byte[] bytes) {
        AmazonS3 s3client = AmazonS3ClientBuilder.defaultClient();
        ByteArrayInputStream baInputStream = new ByteArrayInputStream(bytes);
        ObjectMetadata metadata = new ObjectMetadata();
        metadata.setContentLength(bytes.length);
        metadata.setContentType(contentType);
        PutObjectRequest putRequest = new PutObjectRequest(bucketName, objectName, baInputStream, metadata);
        s3client.putObject(putRequest);
    }
}

Running code on a local machine

To run the code on a local machine, complete the following steps. The code examples are available on the GitHub repo.

  1. Set up your AWS Account and AWS CLI.

For more information, see Getting Started with Amazon Textract.

  1. Download and unzip searchablepdf.zip from the GitHub repo.
  2. Install Apache Maven if it is not already installed.
  3. In the project directory, run mvn package.
  4. Run java -cp target/searchable-pdf-1.0.jar Demo.

This runs the Java project with Demo as the main class.

By default, only the first example to create a searchable PDF from an image on a local drive is enabled. To run other examples, uncomment the relevant lines in Demo class.

Running code in Lambda

To run the code in Lambda, complete the following steps. The code examples are available on the GitHub repo.

  1. Download and unzip searchablepdf.zip from the GitHub repo.
  2. Install Apache Maven if it is not already installed.
  3. In the project directory, run mvn package.

The build creates a .jar in project-dir/target/searchable-pdf1.0.jar, using information in the pom.xml to do the necessary transforms. This is a standalone .jar (.zip file) that includes all the dependencies. This is your deployment package that you can upload to Lambda to create a function. For more information, see AWS Lambda Deployment Package in Java. DemoLambda has all the necessary code to read S3 events and take action based on the type of input document.

  1. Create a Lambda with Java 8 and IAM role that has read and write permissions to the S3 bucket you created earlier.
  2. Configure the IAM role to also have permissions to call Amazon Textract.
  3. Set handler to DemoLambda::handleRequest.
  4. Increase timeout to 5 minutes.
  5. Upload the .jar file you built earlier.
  6. Create an S3 bucket.
  7. In the S3 bucket, create a folder labeled documents.
  8. Add a trigger in the Lambda function such that when an object uploads to the documents folder, the Lambda function executes.

Make sure that you set a trigger for the documents folder. If you add a trigger for the whole bucket, the function also triggers every time an output PDF document generates.

  1. Upload an image (.jpeg or .png) or PDF document to the documents folder in your S3 bucket.

In a few seconds, you should see the searchable PDF document in your S3 bucket.

These steps show simple S3 and Lambda integration. For large-scale document processing, see the reference architecture at following GitHub repo.

Conclusion

This post showed how to use Amazon Textract to generate searchable PDF documents automatically. You can search across millions of documents to find the relevant file by creating a smart search index using Amazon ES. Searchable PDF documents then allows you to select and copy text and search within a document after downloading it for offline use.

To learn more about different text and data extraction features of Amazon Textract, see How Amazon Textract Works.


About the Authors

Kashif Imran is a Solutions Architect at Amazon Web Services. He works with some of the largest strategic AWS customers to provide technical guidance and design advice. His expertise spans application architecture, serverless, containers, NoSQL and machine learning.

 

 

 

 

 

from AWS Machine Learning Blog

Transcribe speech to text in real time using Amazon Transcribe with WebSocket

Transcribe speech to text in real time using Amazon Transcribe with WebSocket

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. In November 2018, we added streaming transcriptions over HTTP/2 to Amazon Transcribe. This enabled users to pass a live audio stream to our service and, in return, receive text transcripts in real time. We are excited to share that we recently started supporting real-time transcriptions over the WebSocket protocol. WebSocket support makes streaming speech-to-text through Amazon Transcribe more accessible to a wider user base, especially for those who want to build browser or mobile-based applications.

In this blog post, we assume that you are aware of our streaming transcription service running over HTTP/2, and focus on showing you how to use the real-time offering over WebSocket. However, for reference on using HTTP/2, you can read our previous blog post and tech documentation.

What is WebSocket?

WebSocket is a full-duplex communication protocol built over TCP. The protocol was standardized by the IETF as RFC 6455 in 2011. WebSocket is suitable for long-lived connectivity whereby both the server and the client can transmit data over the same connection at the same time. It is also practical for cross-domain usage. Voila! No need to worry about cross-origin resource sharing (CORS) as there would be when using HTTP.

Using Amazon Transcribe streaming with WebSocket

To use Amazon Transcribe’s StartStreamTranscriptionWebSocket API, you first need to authorize your IAM user to use the Amazon Transcribe Streaming WebSocket. Go to the AWS Management Console, navigate to Identity & Access Management (IAM), and attach the following inline policy to your user in the AWS IAM console. Please refer to “To embed an inline policy for a user or role” for instructions on how to add permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        "Sid": "transcribestreaming",
        "Effect": "Allow",
        "Action": "transcribe:StartStreamTranscriptionWebSocket",
        "Resource": "*"
    ]
}

Your upgrade request should be pre-signed with your AWS credentials using the AWS Signature Version 4. The request should contain the required parameters, namely sample-rate, language code, and media-encoding. You could optionally supply vocabulary-name to use a custom vocabulary. The StartStreamTranscriptionWebSocket API supports all of the languages that Amazon Transcribe streaming supports today. After your connection is upgraded to WebSocket, you can send your audio chunks as an AudioEvent of the event-stream encoding in the binary WebSocket frame. The response you get is the transcript JSON, which would also be event-stream encoded. For more details, please refer to our tech docs.

To demonstrate how you can power your application with Amazon Transcribe in real time with WebSocket, we built a sample static website. On the website you can enter your account credentials, choose one of the preferred languages, and start streaming. The complete sample code is available on GitHub. JavaScript developers, among others, may find this to be a helpful start. We’d love to see what other cool applications you can build using Amazon Transcribe streaming with WebSocket!


About the authors

Bhaskar Bagchi is an engineer in the Amazon Transcribe service team. Outside of work, Bhaskar enjoys photography and singing.

 

 

 

 

Karan Grover is an engineer in the Amazon Transcribe service team. Outside of work, Karan enjoys hiking and is a photography enthusiast.

 

 

 

 

Paul Zhao is a Product Manager at AWS Machine Learning. He manages the Amazon Transcribe service. Outside of work, Paul is a motorcycle enthusiast and avid woodworker.

 

 

 

 

 

 

from AWS Machine Learning Blog

Using Amazon Polly in Windows Applications

Using Amazon Polly in Windows Applications

AWS offers a vast array of services that allow developers to build applications in the cloud. At the same time, Windows desktop applications can take advantage of these services as well. Today, we are releasing Amazon Polly for Windows, an open-source engine that allows users to take advantage of Amazon Polly voices in SAPI-compliant Windows applications.

What is SAPI? SAPI (Speech Application Programming Interface) is a Microsoft Windows API that allows desktop applications to implement speech synthesis. When an application supports SAPI, it can access any of the installed SAPI voices to generate speech.

Out of the box, Microsoft Windows provides one SAPI male and female voice that can be used in any supported voice application. With Amazon Polly for Windows, users can install over 50 additional voices across over 25 languages, paying only for what they use.  For more details, please visit the Amazon Polly documentation and check the full list of text-to-speech voices.

Create an AWS account

If you don’t already have an AWS account, you can sign up here, which gives you 12-months in our free tier. During the first 12 months, Amazon Polly is free for the first 5 million characters/month. How many characters is that? As an example, “Ulysses” by James Joyce is 730 pages and contains approximately 1.5 million characters. So you could have Amazon Polly read the entire book three times and still have an additional 500,000 free characters for the remainder of the month.

Configure your account

  1. Log in to your AWS account.
  2. After you’ve logged in, click Services from the top menu bar, then type IAM in the search box. Click IAM when it pops up.
  3. On the left, click Users
  4. Click Add User
  5. Type in polly-windows-user (you can use any name)
  6. Click the Programmatic access check box and leave AWS Management Console access unchecked
  7. Click Next: Permissions
  8. Click Attach existing policies directly
  9. At the bottom of the page, in the search box next to Filter: Policy type, type polly
  10. Click the check box next to AmazonPollyReadOnlyAccess
  11. Click Next: Review
  12. Click Create user

IMPORTANT: Don’t close the webpage. You’ll need both the access key ID and the secret access key in Step 3.

Step 2: Install the AWS CLI for Windows

Click here to download the AWS CLI for Windows.

Step 3: Configure the AWS client

Amazon Polly for Windows requires an AWS profile called polly-windows. This ensures that the Amazon Polly engine is using the correct account.

  1. Open a Windows command prompt
  2. Type this command:
    aws configure --profile polly-windows 

  3. When prompted for the AWS Access Key ID and AWS Secret Access Key, use the values from the previous step.
  4. For Default Region, you can hit Enter for the default (us-east-1) or enter a different Region. Make sure to use all lower-case.
  5. For Default output format, just hit Enter
  6. Verify this worked by running the following command. You should see a list of voices:
    aws --profile polly-windows polly describe-voices 

Step 4: Install Amazon Polly TTS Engine for Windows

Click here to download and run the installer. You can verify that the installer worked properly. Amazon Polly for Windows comes with PollyPlayer, an application that allows you to experiment with the voices without additional software. Simply pick a voice, enter text, and then click Say It.

Using Amazon Polly Voices in Applications

The Amazon Polly voices are accessible in any Windows application that implements Windows SAPI. This means that after the Amazon Polly voices are installed, you simply need to select the Amazon Polly voice that you want to use from the list of voices in the application.

Amazon Polly supports SSML (Speech Synthesis Markup Language), which allows users to add tags to customize the speech generation. With Amazon Polly for Windows, users can either use plaintext or SSML tags when submitting requests. The standard Amazon Polly limits apply of 3000 maximum billed characters per request, or 6000 characters total (SSML tags are not billed).

Example: Using Amazon Polly for Windows with Adobe Captivate

Building eLearning content is a great use case for generated speech. In the past, content managers would need to record voice content, and then re-record as content changes. Using an eLearning designer such as Adobe Captivate along with Amazon Polly voices allows you to easily create and dynamically update content whenever you need.

You can use any SAPI-enabled eLearning solution. In this demonstration, we walk through creating a simple slide with Captivate to show how quickly and easily you can add voice content. If you don’t already have Captivate, you can download a free trial here.

Step 1: Create a project

Start Captivate and click New Project / Blank Project to create a new project.

At this point, you have a new blank project with a single slide.

Step 2: Add speech content

From the Audio menu, click Speech Management.

This brings up a Speech Management modal window, where you can add speech content to the slide. Click on the Speech Agent drop-down and select Amazon Polly – US English – Salli (Neural).  By default, all slides to use this voice.

Click the + button to add content.

In the textbox, type My name is Salli. My speech is generated by Amazon Polly.

Now we must generate the audio. Behind the scenes, Captivate uses the Windows SAPI driver to call back to AWS to generate the speech. Click Save and Generate Audio.

After the speech is generated, you can preview the audio by clicking the Play button next to the Generate Audio button.

You hear Salli speaking the text. Click the Close button.

After closing the window, you can preview the entire project to hear the speech with the slide.

The wide selection of Amazon Polly voices allows a content manager to build and experiment with limitless combinations of speech. Because content and voice selections can be updated at any time, content managers can keep both the audio presentation and content fresh without ever having to go near a recording studio.

Now that you’ve installed Amazon Polly for Windows, you can have fun experimenting with different variations of speech using using SSML tags, which are all fully supported in Windows. And because Amazon Polly for Windows is open-source, you can feel free to contribute features and submit feature requests. You can share feedback at the Amazon Polly forum. We’d love to hear how you’re using Amazon Polly for Windows!


About the Author

Troy Larson is a Senior DevOPs Cloud Architect for AWS Professional Services.

 

from AWS Machine Learning Blog

Build your ML skills with AWS Machine Learning on Coursera

Build your ML skills with AWS Machine Learning on Coursera

Machine learning (ML) is one of the fastest growing areas in technology and a highly sought after skillset in today’s job market. Today, I am excited to announce a new education course, built in collaboration with Coursera, to help you build your ML skills: Getting started with AWS Machine Learning. You can access the course content for free now on the Coursera website.

The World Economic Forum [1] states that the growth of artificial intelligence (AI) could create 58 million net new jobs in the next few years, yet, it’s estimated that there are currently 300,000 AI engineers worldwide, but millions are needed [2]. This means that there is a unique and immediate opportunity to for you to get started learning the essential ML concepts that are used to build AI applications – no matter what your skill level. Learning the foundations of ML now will help you keep pace with this growth, expand your skills, and even help advance your career.

Based on the same ML courses used to train engineers at Amazon, this course teaches you how to get started with AWS Machine Learning. Key topics include: Machine Learning on AWS, Computer Vision on AWS, and Natural Language Processing (NLP) on AWS. Each topic consists of several modules deep-diving into a variety of ML concepts, AWS services, as well as insights from experts to put the concepts into practice. This course is a great start to build your foundational knowledge on Machine Learning before diving in deeper with the AWS Machine Learning Certification.

How it Works

You can read and view the course content for free on Coursera. If you want to access assessments, take graded assignments, and get a post course certificate, it costs $49 in the USA and $29 in Brazil, Russia, Mexico, and India. If you choose the paid route, when you complete the course, you’ll get an electronic Certificate that you can print and even add to your LinkedIn profile to showcase your new found machine learning knowledge.

Enroll now to build your skills towards becoming an ML developer!


About the Author

Tara Shankar Jana is a Senior Product Marketing Manager for AWS Machine Learning. Currently he is working on building unique and scalable educational offerings for the aspiring ML developer communities- to help them expand their skills on ML. Outside of work he loves reading books, travelling and spending time with his family.

 

 

 


[1] Artificial Intelligence to Create 58 Million New Jobs by 2022, Says Report (Forbes)
[2] Tencent says there are only 300,000 AI engineers worldwide, but millions are needed (The Verge)


from AWS Machine Learning Blog