SageMaker Batch Transform

SageMaker Batch Transform creates a fleet of containers to run parallel processing on objects in S3. Batch Transform is best used when you need a custom image or to load large objects into memory (e.g., batch machine learning).

  • If the process is not parallel across files, use SageMaker processing, which will allocate a machine and make S3 files available locally for python processing SageMaker Processing documentation
  • If the process can be run in a small JavaScript package, processing can be performed faster, cheaper, and with better parallelization using S3 batch. S3 Batch documentation

Usage

Create a SageMaker model, which consists of:

  • GZip file containing code
  • ECR docker image URI

You can create a model:

  • Automatically as the output of any aws-sagemaker-remote training job
  • Manually by uploading a GZip containing your code, building an ECR image, and running aws-sagemaker-remote model create

You can create a fleet of containers running your model from the command line.

  • You define the number of instancs and the type of instance
  • Each file is posted to your model using the Accept (output) and Content-Type (input) MIME types you specify
  • Each response from your model is saved to S3 with the extension .out

Command-Line Interface

The command aws-sagemaker-remote transform create will start a job.

Run aws-sagemaker-remote transform create --help for help.

aws-sagemaker-remote

Set of utilities for managing AWS training, processing, and more.

aws-sagemaker-remote [OPTIONS] COMMAND [ARGS]...

Options

--profile <profile>

AWS profile. Run aws-configure to configure a profile.

transform

SageMaker batch transform commands

aws-sagemaker-remote transform [OPTIONS] COMMAND [ARGS]...
create

Create a batch transformation job for objects in S3

  • Model must already exist in SageMaker
  • Model instances are deployed
  • Each S3 object is posted to one of your instances
  • Results are saved in S3 with the extension “.out”
  • Model instances are destroyed
aws-sagemaker-remote transform create [OPTIONS]

Options

--base-job-name <base_job_name>

Transform job base name. If job name not provided, job name is the base job name plus a timestamp.

--job-name <job_name>

Transform job name for tracking in AWS console

--model-name <model_name>

Required SageMaker Model name

--concurrency <concurrency>

Concurrency (number of concurrent requests to each container)

--timeout <timeout>

Timeout in seconds per request

--retries <retries>

Number of retries for each failed request

--input-s3 <input_s3>

Required Input path on S3

--output-s3 <output_s3>

Required Output path on S3

--input-type <input_type>

Required Input MIME type (“Content-Type” header)

--output-type <output_type>

Required Output MIME type (“Accept” header)

--output-json <output_json>

Save job information in JSON file

--instance-type <instance_type>

SageMaker Instance type (e.g., ml.m5.large)

--instance-count <instance_count>

Number of containers to use (processing will be distributed)

--payload-mb <payload_mb>

Maximum payload size (MB)

See CLI documentation.