aws_sagemaker_remote.training package

Submodules

aws_sagemaker_remote.training.args module

aws_sagemaker_remote.training.args.CHECKPOINT_LOCAL_PATH = '/opt/ml/checkpoints'

args = {} output_dir = os.environ.get(‘SM_OUTPUT_DIR’, None) if output_dir:

args[‘output_dir’] = output_dir

model_dir = os.environ.get(‘SM_MODEL_DIR’, None) if model_dir:

args[‘model_dir’] = model_dir
for channel in config.inputs.keys():

env_key = ‘SM_CHANNEL_{}’.format(channel.upper()) channel_dir = os.environ.get(env_key, None) if channel_dir:

args[channel] = channel_dir

return args

Type:def sagemaker_env_args(config)
aws_sagemaker_remote.training.args.is_sagemaker()
aws_sagemaker_remote.training.args.sagemaker_env_arg()

Check for SM_TRAINING_ENV environment variable and return object if it exists

aws_sagemaker_remote.training.args.sagemaker_env_args(args: argparse.Namespace, config: aws_sagemaker_remote.training.config.SageMakerTrainingConfig)

Check for SM_TRAINING_ENV environment variable and use it to override arguments.

aws_sagemaker_remote.training.args.sagemaker_training_args(parser: argparse.ArgumentParser, script, source='', base_job_name='training-job', job_name='', profile='default', run=False, wait=True, inputs=None, dependencies=None, additional_arguments=None, argparse_callback=None, model_dir='output/model', output_dir='output/output', checkpoint_dir='output/checkpoint', checkpoint_s3='default', checkpoint_container='/opt/ml/checkpoints', checkpoint_initial=None, training_image='aws-sagemaker-remote-training:latest', training_image_path='/home/docs/checkouts/readthedocs.org/user_builds/aws-sagemaker-remote/checkouts/latest/aws_sagemaker_remote/ecr/training', training_image_accounts=['763104351884'], training_instance='ml.m5.large', training_role='aws-sagemaker-remote-training-role', enable_sagemaker=True, experiment_name=None, trial_name=None, spot_instances=False, volume_size=30, max_run=43200, max_wait=86400, env=None, workers=2, output_json=None)

Configure argparse.ArgumentParser for training scripts.

Parameters:
  • parser (argparse.ArgumentParser) – Parser to configure
  • script (str) – Path to script file to execute. Set default for --sagemaker-script
  • source (str, optional) – Path of source directory to upload. Must include script path. Defaults to directory containing script if not provided.
  • base_job_name (str, optional) – Job name will be generated from base_job_name and a timestamp if job_name is not provided. Set default for --sagemaker-base-job-name.
  • job_name (str, optional) – Job name is used for tracking and organization. Generated from base_job_name if not provided. Use base_job_name and leave job_name blank for most use-cases. Set default for --sagemaker-job-name.
  • profile (str, optional) – AWS profile to use for session. Set default for --sagemaker-profile.
  • run (bool, optional) – Run on SageMaker. Set default for --sagemaker-run.
  • wait (bool, optional) – Wait for SageMaker processing to complete. Set default for --sagemaker-wait.
  • inputs (dict(str,str), optional) – Dictionary of input arguments. For eack key and value, create an argument --key that defaults to value. * Running locally, input arguments are unmodified. * Running remotely, inputs are set to appropriate SageMaker mount points. Local inputs are uploaded automatically.
  • dependencies (dict(str, str)) – Dictionary of modules. For eack key and value, create an argument --module-key that defaults to value. This controls the path of a dependency of your code. The files at the given path will be uploaded to S3, downloaded to SageMaker, and put on PYTHONPATH.
  • additional_arguments (list, optional) – List of tuple of positional args and keyword args for argparse.ArgumentParser.add_argument. Use to add additional arguments to the script.
  • argparse_callback (function, optional) – Function accepting one argument parser:argparse.ArgumentParser that adds additional arguments. Use to add additional arguments to the script.
  • model_dir (string, optional) – Directory to save trained inference model. Set default for --model-dir.
  • output_dir (string, optional) – Directory to save outputs (images, logs, etc.). Set default for --output-dir.
  • checkpoint_dir (string, optional) – Directory to save checkpoints for saving and resuming training. Set default for --checkpoint-dir.
  • checkpoint_s3 (string, optional) – S3 storage for checkpoints for saving and resuming training or “default”. Set default for --sagemaker-checkpoint-s3.
  • checkpoint_container (string, optional) – Local directory for checkpoints when running remotely. Set default for --sagemaker-checkpoint-container.
  • training_image (str, optional) – URI of ECR or DockerHub Docker image to use for training. Set default for --sagemaker-training-image.
  • training_instance (str, optional) – Type of instance to use for training (e.g., ml.t3.medium). Set default for --sagemaker-training-instance.
  • training_role (str, optional) – AWS IAM role name to use for training. Will be created if it does not exist. Set default for --sagemaker-training-role.
  • experiment_name (str, optional) – Name of experiment. Required if trial_name is provided. Set default for --sagemaker-experiment-name.
  • trial_name (str, optional) – Name of trial within experiment. Set default for --sagemaker-trial-name.
  • enable_sagemaker (bool, optional) –
    • True: Include SageMaker command-line options.
    • False: Only include local command-line options
  • max_run (int, optional) – Maximum training time in seconds.
  • max_wait (int, optional) – Maximum time to wait for a spot instance in seconds.
  • workers (int, optional) – Number of workers
aws_sagemaker_remote.training.args.sagemaker_training_channel_args(parser: argparse.ArgumentParser, inputs)
aws_sagemaker_remote.training.args.sagemaker_training_checkpoint_args(parser: argparse.ArgumentParser, checkpoint_dir, checkpoint_initial=None, checkpoint_s3='default', checkpoint_container='/opt/ml/checkpoints', enable_sagemaker=True)
aws_sagemaker_remote.training.args.sagemaker_training_dependency_args(parser: argparse.ArgumentParser, dependencies)
aws_sagemaker_remote.training.args.sagemaker_training_model_args(parser: argparse.ArgumentParser, model_dir='model')
aws_sagemaker_remote.training.args.sagemaker_training_output_args(parser: argparse.ArgumentParser, output_dir)
aws_sagemaker_remote.training.args.sagemaker_training_parser_for_docs()

aws_sagemaker_remote.training.channels module

aws_sagemaker_remote.training.channels.expand_folder_channels(channels, session)
aws_sagemaker_remote.training.channels.expand_list_channels(channels)
aws_sagemaker_remote.training.channels.expand_repeated_channels(channels)
aws_sagemaker_remote.training.channels.parse_channel_arguments(channels, session)
aws_sagemaker_remote.training.channels.process_channels(channels, args, session, prefix)
aws_sagemaker_remote.training.channels.read_channel_arguments(channels, args)
aws_sagemaker_remote.training.channels.remove_empty_channels(channels)
aws_sagemaker_remote.training.channels.set_suffixes(channels, session, hyperparameters)
aws_sagemaker_remote.training.channels.standardize_channel(channel)
aws_sagemaker_remote.training.channels.standardize_channels(channels)
aws_sagemaker_remote.training.channels.upload_local_channel(channel, session, s3_uri)
aws_sagemaker_remote.training.channels.upload_local_channels(channels, session, prefix)

aws_sagemaker_remote.training.config module

class aws_sagemaker_remote.training.config.SageMakerTrainingConfig(inputs=None, dependencies=None, env=None)

Bases: object

aws_sagemaker_remote.training.experiment module

aws_sagemaker_remote.training.experiment.ensure_experiment(client, experiment_name)

aws_sagemaker_remote.training.iam module

aws_sagemaker_remote.training.iam.ensure_training_role(iam, role_name)

aws_sagemaker_remote.training.main module

class aws_sagemaker_remote.training.main.TrainingCommand(main, script=None, help=None, metrics=None, **training_args)

Bases: aws_sagemaker_remote.commands.Command

configure(parser: argparse.ArgumentParser)
run(args)
aws_sagemaker_remote.training.main.sagemaker_training_handle(args, config, main, metrics=None)
aws_sagemaker_remote.training.main.sagemaker_training_main(main, script=None, script_fn=None, description=None, metrics=None, **training_args)

Entry point for training.

Example

from aws_sagemaker_remote import sagemaker_processing_main

def main(args):
    # your code here
    pass

if __name__ == '__main__':
    sagemaker_processing_main(
        main=main,
        # ... additional configuration
    )
Parameters:
  • main (function) – Main function. Must accept a single argument args (argparse.Namespace)
  • script (str, optional) – Path to script file to execute. Set to __file__ for most use-cases. Empty or None defaults to file containing main. Object interpreted as file containing the object.
  • description (str, optional) – Script description for argparse
  • metrics (dict, optional) – Metrics to record. Dictionary of metric name (str) to RegEx that extracts metric (str). See SageMaker Training Metrics Docs
  • **training_args (dict, optional) – Keyword arguments to aws_sagemaker_remote.training.args.sagemaker_training_args()

aws_sagemaker_remote.training.train module

aws_sagemaker_remote.training.train.sagemaker_training_run(args, config: aws_sagemaker_remote.training.config.SageMakerTrainingConfig, metrics=None)

aws_sagemaker_remote.training.training_inputs module

aws_sagemaker_remote.training.training_inputs.build_training_input(channel, i, args)
aws_sagemaker_remote.training.training_inputs.build_training_inputs(channels, args)

Module contents