Contributed by: Amr Ragab, Business Development Manager, Accelerated Computing, AWS and Kong Zhao, Solution Architect, NVIDIA Corporation

AWS continually evolves GPU offerings, striving to showcase how new technical improvements created by AWS partners improve the platform’s performance.

One result from AWS’s collaboration with NVIDIA is the recent release of the G4 instance type, a technology update from the G2 and G3. The G4 features a Turing T4 GPU with 16GB of GPU memory, offered under the Nitro hypervisor with one GPU to 4 GPUS per node. A bare metal option will be released in the coming months. It also includes up to 1.8 TB of local non-volatile memory express (NVMe) storage and up to 100 Gbps of network bandwidth.

The Turing T4 is the latest offering from NVIDIA, accelerating machine learning (ML) training and inferencing, video transcoding, and other compute-intensive workloads. With such a diverse array of optimized directives, you can now perform diverse accelerated compute workloads on a single instance family.

NVIDIA has also taken the lead in providing a robust and performant software layer in the form of SDKs and container solutions through the NVIDIA GPU Cloud (NGC) container registry. These accelerated components, combined with AWS elasticity and scale, provide a powerful combination for performant pipelines on AWS.

NVIDIA DeepStream SDK

This post focuses on one such NVIDIA SDK: DeepStream.

The DeepStream SDK is built to provide an end-to-end video processing and ML inferencing analytics solution. It uses the Video Codec API and TensorRT as key components.

DeepStream also supports an edge-cloud strategy to stream perception on the edge and other sensor metadata into AWS for further processing. An example includes wide-area consumption of multiple camera streams and metadata through the Amazon Kinesis platform.

Another classic workload that can take advantage of DeepStream is compiling the model artifacts resulting from distributed training in AWS with Amazon SageMaker Neo. Use this model on the edge or on an Amazon S3 video data lake.

If you are interested in exploring these solutions, contact your AWS account team.

Deployment

Set up programmatic access to AWS to instantiate a g4dn.2xlarge instance type with Ubuntu 18.04 in a subnet that routes SSH access. If you are interested in the full stack details, the following are required to set up the instance to execute DeepStream SDK workflows.

  • An Ubuntu 18.04 Instance with:
    • NVIDIA Turing T4 Driver (418.67 or latest)
    • CUDA 10.1
    • nvidia-docker2

Alternatively, you can launch the NVIDIA Deep Learning AMI available in AWS Marketplace, which includes the latest drivers and SDKs.

aws ec2 run-instances --region us-east-1 --image-id ami-026c8acd92718196b --instance-type g4dn.2xlarge --key-name <key-name> —subnet-id <subnet> --security-group-ids {<security-groupids>} —block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=75}'

When the instance is up, connect with SSH and pull the latest DeepStream SDK Docker image from the NGC container registry.

docker pull nvcr.io/nvidia/deepstream:4.0-19.07

nvidia-docker run -it --rm -v /usr/lib/x86_64-linux-gnu/libnvidia-encode.so:/usr/lib/x86_64-linux-gnu/libnvidia-encode.so -v /tmp/.X11-unix:/tmp/.X11-unix -p 8554:8554 -e DISPLAY=$DISPLAY nvcr.io/nvidia/deepstream: 4.0-19.07

If your instance is running a full X environment, you can pass the authentication and display to the container to view the results in real time. However, for the purposes of this post, just execute the workload on the shell.

Go to the /root/deepstream_sdk_v4.0_x86_64/samples/configs/deepstream-app/ folder.

The following configuration files are included in the package:

  • source30_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates 30 stream decodes with primary inferencing.
  • source4_1080p_resnet_dec_infer_tiled_display_int8.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers.
  • source4_1080p_resnet_dec_infer_tracker_sgie_tiled_display_int8_gpu1.txt: This configuration file demonstrates four stream decodes with primary inferencing, object tracking, and three different secondary classifiers on GPU 1.
  • config_infer_primary.txt: This configuration file configures an nvinfer element as the primary detector.
  • config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt, config_infer_secondary_vehicletypes.txt: These configuration files configure an nvinfer element as the secondary classifier.
  • iou_config.txt: This configuration file configures a low-level Intersection over Union (IOU) tracker.
  • source1_usb_dec_infer_resnet_int8.txt: This configuration file demonstrates one USB camera as input.

The following sample models are provided with the SDK.

Model Model type Number of classes Resolution
Primary Detector Resnet10 4 640 x 368
Secondary Car Color Classifier Resnet18 12 224 x 224
Secondary Car Make Classifier Resnet18 6 224 x 224
Secondary Vehicle Type Classifier Resnet18 20 224 x 224

Edit the configuration file source30_1080p_dec_infer-resnet_tiled_display_int8.txt to disable [sink0] and enable [sink1] for file output. Save the file, then run the DeepStream sample code.

[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=2
 
[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=2000000
output-file=out.mp4
source-id=0
 
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

You get performance data on the inferencing workflow.

 (deepstream-app:1059): GLib-GObject-WARNING **: 20:38:25.991: g_object_set_is_valid_property: object class 'nvv4l2h264enc' has no property named 'bufapi-version'
Creating LL OSD context new
 
Runtime commands:
        h: Print this help
        q: Quit
 
        p: Pause
        r: Resume
 
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.
 
** INFO: <bus_callback:163>: Pipeline ready
 
Creating LL OSD context new
** INFO: <bus_callback:149>: Pipeline running
 
 
**PERF: FPS 0 (Avg)     FPS 1 (Avg)     FPS 2 (Avg)     FPS 3 (Avg)     FPS 4 (Avg)     FPS 5 (Avg)     FPS 6 (Avg)     FPS 7 (Avg)     FPS 8 (Avg)     FPS 9 (Avg)     FPS 10 (Avg)   FPS 11 (Avg)     FPS 12 (Avg)    FPS 13 (Avg)    FPS 14 (Avg)    FPS 15 (Avg)    FPS 16 (Avg)    FPS 17 (Avg)    FPS 18 (Avg)    FPS 19 (Avg)    FPS 20 (Avg)    FPS 21 (Avg)    FPS 22 (Avg)    FPS 23 (Avg)    FPS 24 (Avg)    FPS 25 (Avg)    FPS 26 (Avg)    FPS 27 (Avg)    FPS 28 (Avg)    FPS 29 (Avg)
**PERF: 35.02 (35.02)   37.92 (37.92)   37.93 (37.93)   35.85 (35.85)   36.39 (36.39)   38.40 (38.40)   35.85 (35.85)   35.18 (35.18)   35.60 (35.60)   35.02 (35.02)   38.77 (38.77)  37.71 (37.71)    35.18 (35.18)   38.60 (38.60)   38.40 (38.40)   38.60 (38.60)   34.77 (34.77)   37.70 (37.70)   35.97 (35.97)   37.00 (37.00)   35.51 (35.51)   38.40 (38.40)   38.60 (38.60)   38.40 (38.40)   38.13 (38.13)   37.70 (37.70)   35.85 (35.85)   35.97 (35.97)   37.92 (37.92)   37.92 (37.92)
**PERF: 39.10 (37.76)   38.90 (38.60)   38.70 (38.47)   38.70 (37.78)   38.90 (38.10)   38.90 (38.75)   39.10 (38.05)   38.70 (37.55)   39.10 (37.96)   39.10 (37.76)   39.10 (39.00)  39.10 (38.68)    39.10 (37.83)   39.10 (38.95)   39.10 (38.89)   39.10 (38.95)   38.90 (37.55)   38.70 (38.39)   38.90 (37.96)   38.50 (38.03)   39.10 (37.98)   38.90 (38.75)   38.30 (38.39)   38.70 (38.61)   38.90 (38.67)   39.10 (38.66)   39.10 (38.05)   39.10 (38.10)   39.10 (38.74)   38.90 (38.60)
**PERF: 38.91 (38.22)   38.71 (38.65)   39.31 (38.82)   39.31 (38.40)   39.11 (38.51)   38.91 (38.82)   39.31 (38.56)   39.31 (38.26)   39.11 (38.42)   38.51 (38.06)   38.51 (38.80)  39.31 (38.94)    39.31 (38.42)   39.11 (39.02)   37.71 (38.41)   39.31 (39.10)   39.31 (38.26)   39.31 (38.77)   39.31 (38.51)   39.31 (38.55)   39.11 (38.44)   39.31 (38.98)   39.11 (38.69)   39.31 (38.90)   39.11 (38.85)   39.31 (38.93)   39.31 (38.56)   39.31 (38.59)   39.31 (38.97)   39.31 (38.89)
**PERF: 37.56 (38.03)   38.15 (38.50)   38.35 (38.68)   38.35 (38.38)   37.76 (38.29)   38.15 (38.62)   38.35 (38.50)   37.56 (38.06)   38.15 (38.35)   37.76 (37.97)   37.96 (38.55)  38.35 (38.77)    38.35 (38.40)   37.56 (38.59)   38.35 (38.39)   37.96 (38.77)   36.96 (37.88)   38.35 (38.65)   38.15 (38.41)   38.35 (38.49)   38.35 (38.41)   38.35 (38.80)   37.96 (38.47)   37.96 (38.62)   37.56 (38.47)   37.56 (38.53)   38.15 (38.44)   38.35 (38.52)   38.35 (38.79)   38.35 (38.73)
**PERF: 40.71 (38.63)   40.31 (38.91)   40.51 (39.09)   40.90 (38.95)   39.91 (38.65)   40.90 (39.14)   40.90 (39.04)   40.51 (38.60)   40.71 (38.87)   40.51 (38.54)   40.71 (39.04)  40.90 (39.25)    40.71 (38.92)   40.90 (39.11)   40.90 (38.96)   40.90 (39.25)   40.90 (38.56)   40.90 (39.15)   40.11 (38.79)   40.90 (39.03)   40.90 (38.97)   40.90 (39.27)   40.90 (39.02)   40.90 (39.14)   39.51 (38.71)   40.90 (39.06)   40.51 (38.90)   40.71 (39.01)   40.90 (39.27)   40.90 (39.22)
**PERF: 39.46 (38.78)   39.26 (38.97)   39.46 (39.16)   39.26 (39.00)   39.26 (38.76)   39.26 (39.16)   39.06 (39.04)   39.46 (38.76)   39.46 (38.98)   39.26 (38.67)   39.46 (39.12)  39.46 (39.29)    39.26 (38.98)   39.26 (39.14)   39.26 (39.01)   38.65 (39.14)   38.45 (38.54)   39.46 (39.21)   39.46 (38.91)   39.46 (39.11)   39.26 (39.03)   39.26 (39.27)   39.46 (39.10)   39.26 (39.16)   39.26 (38.81)   39.26 (39.10)   39.06 (38.93)   39.46 (39.09)   39.06 (39.23)   39.26 (39.23)
**PERF: 39.04 (38.82)   38.84 (38.95)   38.84 (39.11)   38.84 (38.98)   38.84 (38.77)   38.64 (39.08)   39.04 (39.04)   39.04 (38.80)   39.04 (38.99)   39.04 (38.73)   38.64 (39.04)  39.04 (39.25)    38.44 (38.90)   39.04 (39.13)   38.84 (38.99)   38.44 (39.03)   39.04 (38.62)   39.04 (39.18)   38.84 (38.90)   38.84 (39.07)   37.84 (38.84)   39.04 (39.24)   39.04 (39.09)   39.04 (39.14)   38.64 (38.78)   38.64 (39.03)   39.04 (38.95)   38.84 (39.05)   38.64 (39.14)   38.24 (39.08)
** INFO: <bus_callback:186>: Received EOS. Exiting ...
 
Quitting
App run successful

The output video file, out.mp4, is under the current folder and can be played after download.

Extending the architecture further, you can make use of AWS Batch to execute an event-driven pipeline.

Here, the input file from S3 triggers an Amazon CloudWatch event, standing up a G4 instance with a DeepStream Docker image, sourced in Amazon ECR, to process the pipeline. The video and ML analytics results can be pushed back to S3 for further processing.

Conclusion

With this basic architecture in place, you can execute a video analytics and ML inferencing pipeline. Future work can also include integration with Kinesis and cataloging DeepStream results. Let us know how it goes working with DeepStream and the NVIDIA container stack on AWS.

 

from AWS Compute Blog