SageMaker Batch Transform¶
SageMaker Batch Transform creates a fleet of containers to run parallel processing on objects in S3. Batch Transform is best used when you need a custom image or to load large objects into memory (e.g., batch machine learning).
- If the process is not parallel across files, use SageMaker processing, which will allocate a machine and make S3 files available locally for python processing SageMaker Processing documentation
- If the process can be run in a small JavaScript package, processing can be performed faster, cheaper, and with better parallelization using S3 batch. S3 Batch documentation
Usage¶
Create a SageMaker model, which consists of:
- GZip file containing code
- ECR docker image URI
You can create a model:
- Automatically as the output of any
aws-sagemaker-remote
training job - Manually by uploading a GZip containing your code, building an ECR image, and running
aws-sagemaker-remote model create
You can create a fleet of containers running your model from the command line.
- You define the number of instancs and the type of instance
- Each file is posted to your model using the Accept (output) and Content-Type (input) MIME types you specify
- Each response from your model is saved to S3 with the extension
.out
Command-Line Interface¶
The command aws-sagemaker-remote transform create
will start a job.
Run aws-sagemaker-remote transform create --help
for help.
aws-sagemaker-remote¶
Set of utilities for managing AWS training, processing, and more.
aws-sagemaker-remote [OPTIONS] COMMAND [ARGS]...
Options
-
--profile
<profile>
¶ AWS profile. Run aws-configure to configure a profile.
transform¶
SageMaker batch transform commands
aws-sagemaker-remote transform [OPTIONS] COMMAND [ARGS]...
create¶
Create a batch transformation job for objects in S3
- Model must already exist in SageMaker
- Model instances are deployed
- Each S3 object is posted to one of your instances
- Results are saved in S3 with the extension “.out”
- Model instances are destroyed
aws-sagemaker-remote transform create [OPTIONS]
Options
-
--base-job-name
<base_job_name>
¶ Transform job base name. If job name not provided, job name is the base job name plus a timestamp.
-
--job-name
<job_name>
¶ Transform job name for tracking in AWS console
-
--model-name
<model_name>
¶ Required SageMaker Model name
-
--concurrency
<concurrency>
¶ Concurrency (number of concurrent requests to each container)
-
--timeout
<timeout>
¶ Timeout in seconds per request
-
--retries
<retries>
¶ Number of retries for each failed request
-
--input-s3
<input_s3>
¶ Required Input path on S3
-
--output-s3
<output_s3>
¶ Required Output path on S3
-
--input-type
<input_type>
¶ Required Input MIME type (“Content-Type” header)
-
--output-type
<output_type>
¶ Required Output MIME type (“Accept” header)
-
--output-json
<output_json>
¶ Save job information in JSON file
-
--instance-type
<instance_type>
¶ SageMaker Instance type (e.g., ml.m5.large)
-
--instance-count
<instance_count>
¶ Number of containers to use (processing will be distributed)
-
--payload-mb
<payload_mb>
¶ Maximum payload size (MB)
See CLI documentation.