SageMaker Batch Transform¶

SageMaker Batch Transform creates a fleet of containers to run parallel processing on objects in S3. Batch Transform is best used when you need a custom image or to load large objects into memory (e.g., batch machine learning).

If the process is not parallel across files, use SageMaker processing, which will allocate a machine and make S3 files available locally for python processing SageMaker Processing documentation
If the process can be run in a small JavaScript package, processing can be performed faster, cheaper, and with better parallelization using S3 batch. S3 Batch documentation

Usage¶

Create a SageMaker model, which consists of:

GZip file containing code
ECR docker image URI

You can create a model:

Automatically as the output of any aws-sagemaker-remote training job
Manually by uploading a GZip containing your code, building an ECR image, and running aws-sagemaker-remote model create

You can create a fleet of containers running your model from the command line.

You define the number of instancs and the type of instance
Each file is posted to your model using the Accept (output) and Content-Type (input) MIME types you specify
Each response from your model is saved to S3 with the extension .out

Command-Line Interface¶

The command aws-sagemaker-remote transform create will start a job.

Run aws-sagemaker-remote transform create --help for help.

aws-sagemaker-remote¶

Set of utilities for managing AWS training, processing, and more.

aws-sagemaker-remote [OPTIONS] COMMAND [ARGS]...

Options

--profile <profile>¶: AWS profile. Run aws-configure to configure a profile.

transform¶

SageMaker batch transform commands

aws-sagemaker-remote transform [OPTIONS] COMMAND [ARGS]...

create¶

Create a batch transformation job for objects in S3

Model must already exist in SageMaker
Model instances are deployed
Each S3 object is posted to one of your instances
Results are saved in S3 with the extension “.out”
Model instances are destroyed

aws-sagemaker-remote transform create [OPTIONS]

Options

--base-job-name <base_job_name>¶: Transform job base name. If job name not provided, job name is the base job name plus a timestamp.

--job-name <job_name>¶: Transform job name for tracking in AWS console

--model-name <model_name>¶: Required SageMaker Model name

--concurrency <concurrency>¶: Concurrency (number of concurrent requests to each container)

--timeout <timeout>¶: Timeout in seconds per request

--retries <retries>¶: Number of retries for each failed request

--input-s3 <input_s3>¶: Required Input path on S3

--output-s3 <output_s3>¶: Required Output path on S3

--input-type <input_type>¶: Required Input MIME type (“Content-Type” header)

--output-type <output_type>¶: Required Output MIME type (“Accept” header)

--output-json <output_json>¶: Save job information in JSON file

--instance-type <instance_type>¶: SageMaker Instance type (e.g., ml.m5.large)

--instance-count <instance_count>¶: Number of containers to use (processing will be distributed)

--payload-mb <payload_mb>¶: Maximum payload size (MB)

See CLI documentation.