Title: | R6sagemaker lower level api calls |
---|---|
Description: | `R6sagemaker` lower level api calls. |
Authors: | Dyfan Jones [aut, cre], Amazon.com, Inc. [cph] |
Maintainer: | Dyfan Jones <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 0.2.0.9000 |
Built: | 2024-11-15 03:52:05 UTC |
Source: | https://github.com/DyfanJones/sagemaker-r-common |
'R6sagemaker' lower level api calls.
Maintainer: Dyfan Jones [email protected]
Other contributors:
Amazon.com, Inc. [copyright holder]
Base class for tuning job or training job analytics classes. Understands common functionality like persistence and caching.
new()
Initialize a “AnalyticsMetricsBase“ instance.
AnalyticsMetricsBase$new()
export_csv()
ersists the analytics dataframe to a file.
AnalyticsMetricsBase$export_csv(filename)
filename
(str): The name of the file to save to.
dataframe()
A dataframe with lots of interesting results about this object. Created by calling SageMaker List and Describe APIs and converting them into a convenient tabular summary.
AnalyticsMetricsBase$dataframe(force_refresh = FALSE)
force_refresh
(bool): Set to True to fetch the latest data from SageMaker API.
clear_cache()
Clear the object of all local caches of API methods, so that the next time any properties are accessed they will be refreshed from the service.
AnalyticsMetricsBase$clear_cache()
format()
format class
AnalyticsMetricsBase$format()
clone()
The objects of this class are cloneable with this method.
AnalyticsMetricsBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
With this input, SQL queries will be executed using Athena to generate datasets to S3.
sagemaker.core::ApiObject
-> AthenaDatasetDefinition
new()
Initialize AthenaDatasetDefinition.
AthenaDatasetDefinition$new( catalog = NULL, database = NULL, query_string = NULL, output_s3_uri = NULL, work_group = NULL, kms_key_id = NULL, output_format = NULL, output_compression = NULL )
catalog
(str, default=None): The name of the data catalog used in Athena query execution.
database
(str, default=None): The name of the database used in the Athena query execution.
query_string
(str, default=None): The SQL query statements, to be executed.
output_s3_uri
(str, default=None): The location in Amazon S3 where Athena query results are stored.
work_group
(str, default=None): The name of the workgroup in which the Athena query is being started.
kms_key_id
(str, default=None): The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt data generated from an Athena query execution.
output_format
(str, default=None): The data storage format for Athena query results. Valid options are "PARQUET", "ORC", "AVRO", "JSON", "TEXTFILE"
output_compression
(str, default=None): The compression used for Athena query results. Valid options are "GZIP", "SNAPPY", "ZLIB"
clone()
The objects of this class are cloneable with this method.
AthenaDatasetDefinition$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config object related to bias configurations of the input dataset.
analysis_config
Analysis config dictionary
new()
Initializes a configuration of the sensitive groups in the dataset.
BiasConfig$new( label_values_or_threshold, facet_name, facet_values_or_threshold = NULL, group_name = NULL )
label_values_or_threshold
(Any): List of label values or threshold to indicate positive outcome used for bias metrics.
facet_name
(str): Sensitive attribute in the input data for which we like to compare metrics.
facet_values_or_threshold
(list): Optional list of values to form a sensitive group or threshold for a numeric facet column that defines the lower bound of a sensitive group. Defaults to considering each possible value as sensitive group and computing metrics vs all the other examples.
group_name
(str): Optional column name or index to indicate a group column to be used for the bias metric 'Conditional Demographic Disparity in Labels - CDDL' or 'Conditional Demographic Disparity in Predicted Labels - CDDPL'.
get_config()
Returns part of an analysis config dictionary.
BiasConfig$get_config()
format()
format class
BiasConfig$format()
clone()
The objects of this class are cloneable with this method.
BiasConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Creates tensor collections for SageMaker Debugger
name
The name of the collection configuration.
parameters
The parameters for the collection configuration.
new()
Constructor for collection configuration.
CollectionConfig$new(name, parameters = NULL)
name
(str): Required. The name of the collection configuration.
parameters
(dict): Optional. The parameters for the collection configuration.
to_request_list()
Generate a request dictionary using the parameters initializing the object.
CollectionConfig$to_request_list()
dict: A portion of an API request as a dictionary.
format()
format class
CollectionConfig$format()
clone()
The objects of this class are cloneable with this method.
CollectionConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
List the possible cProfile timers for Python profiling.
cProfileTimer
cProfileTimer
An object of class cProfileTimer
(inherits from Enum
, environment
) of length 4.
environment containing [TOTAL_TIME, CPU_TIME, OFF_CPU_TIME, DEFAULT]
Config object related to configurations of the input and output dataset.
s3_data_input_path
Dataset S3 prefix/object URI.
s3_output_path
S3 prefix to store the output.
s3_analysis_config_output_path
S3 prefix to store the analysis_config output.
s3_data_distribution_type
Valid options are "FullyReplicated" or "ShardedByS3Key".
s3_compression_type
Valid options are "None" or "Gzip".
label
Target attribute of the model required by bias metrics
headers
A list of column names in the input dataset.
features
JSONPath for locating the feature columns
analysis_config
Analysis config dictionary
new()
Initializes a configuration of both input and output datasets.
DataConfig$new( s3_data_input_path, s3_output_path, s3_analysis_config_output_path = NULL, label = NULL, headers = NULL, features = NULL, dataset_type = c("text/csv", "application/jsonlines", "application/x-parquet", "application/x-image"), s3_data_distribution_type = "FullyReplicated", s3_compression_type = c("None", "Gzip"), joinsource = NULL )
s3_data_input_path
(str): Dataset S3 prefix/object URI.
s3_output_path
(str): S3 prefix to store the output.
s3_analysis_config_output_path
(str): S3 prefix to store the analysis_config output If this field is None, then the s3_output_path will be used to store the analysis_config output
label
(str): Target attribute of the model required by bias metrics (optional for SHAP) Specified as column name or index for CSV dataset, or as JSONPath for JSONLines.
headers
(list[str]): A list of column names in the input dataset.
features
(str): JSONPath for locating the feature columns for bias metrics if the dataset format is JSONLines.
dataset_type
(str): Format of the dataset. Valid values are "text/csv" for CSV and "application/jsonlines" for JSONLines.
s3_data_distribution_type
(str): Valid options are "FullyReplicated" or "ShardedByS3Key".
s3_compression_type
(str): Valid options are "None" or "Gzip".
joinsource
(str): The name or index of the column in the dataset that acts as an identifier column (for instance, while performing a join). This column is only used as an identifier, and not used for any other computations. This is an optional field in all cases except when the dataset contains more than one file, and 'save_local_shap_values' is set to true in SHAPConfig.
get_config()
Returns part of an analysis config dictionary.
DataConfig$get_config()
format()
format class
DataConfig$format()
clone()
The objects of this class are cloneable with this method.
DataConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
The configuration for framework metrics to be collected for data loader profiling.
sagemaker.common::MetricsConfigBase
-> DataloaderProfilingConfig
new()
Specify target steps or a target duration to profile.
By default, it profiles step 7 of
training. If profile_default_steps
is set to 'True' and none of the other
range parameters is specified,
the class uses the default config for dataloader profiling.
DataloaderProfilingConfig$new( start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL, profile_default_steps = FALSE, metrics_regex = ".*" )
start_step
(int): The step to start profiling. The default is step 7.
num_steps
(int): The number of steps to profile. The default is for 1 step.
start_unix_time
(int): The Unix time to start profiling. The default is for 1 step.
duration
(float): The duration in seconds to profile.
profile_default_steps
(bool): Indicates whether the default config should be used.
metrics_regex
(str): Regex pattern
clone()
The objects of this class are cloneable with this method.
DataloaderProfilingConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
DatasetDefinition input.
DatasetDefinition input.
sagemaker.core::ApiObject
-> DatasetDefinition
new()
Initialize DatasetDefinition.
DatasetDefinition$new( data_distribution_type = "ShardedByS3Key", input_mode = "File", local_path = NULL, redshift_dataset_definition = NULL, athena_dataset_definition = NULL )
data_distribution_type
(str, default="ShardedByS3Key"): Whether the generated dataset is FullyReplicated or ShardedByS3Key (default).
input_mode
(str, default="File"): Whether to use File or Pipe input mode. In File (default) mode, Amazon SageMaker copies the data from the input source onto the local Amazon Elastic Block Store (Amazon EBS) volumes before starting your training algorithm. This is the most commonly used input mode. In Pipe mode, Amazon SageMaker streams input data from the source directly to your algorithm without using the EBS volume.
local_path
(str, default=None): The local path where you want Amazon SageMaker to download the Dataset Definition inputs to run a processing job. LocalPath is an absolute path to the input data. This is a required parameter when 'AppManaged' is False (default).
redshift_dataset_definition
(:class:sagemaker.common::RedshiftDatasetDefinition
,
default=None):
Configuration for Redshift Dataset Definition input.
athena_dataset_definition
(:class:sagemaker.common::AthenaDatasetDefinition
,
default=None):
Configuration for Athena Dataset Definition input.
clone()
The objects of this class are cloneable with this method.
DatasetDefinition$clone(deep = FALSE)
deep
Whether to make a deep clone.
DebuggerHookConfig
provides options to customize how debugging
information is emitted and saved. This high-level DebuggerHookConfig
class
runs based on the 'smdebug.SaveConfig
https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#saveconfig
class.
s3_output_path
The location in Amazon S3 to store the output tensors
container_local_output_path
The local path in the container
hook_parameters
A dictionary of parameters
collection_configs
A list of :class:'~sagemaker.debugger.CollectionConfig
new()
Initialize the DebuggerHookConfig instance.
DebuggerHookConfig$new( s3_output_path = NULL, container_local_output_path = NULL, hook_parameters = NULL, collection_configs = NULL )
s3_output_path
(str): Optional. The location in Amazon S3 to store the output tensors. The default Debugger output path is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/debug-output/.
container_local_output_path
(str): Optional. The local path in the container.
hook_parameters
(dict): Optional. A dictionary of parameters.
collection_configs
([sagemaker.debugger.CollectionConfig]): Required. A list
of :class:'~sagemaker.debugger.CollectionConfig' objects to be saved
at the s3_output_path
.
to_request_list()
Generate a request dictionary using the parameters when initializing the object.
DebuggerHookConfig$to_request_list()
dict: An portion of an API request as a dictionary.
format()
format class
DebuggerHookConfig$format()
clone()
The objects of this class are cloneable with this method.
DebuggerHookConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
The configuration for framework metrics to be collected for detailed profiling.
sagemaker.common::MetricsConfigBase
-> DetailedProfilingConfig
new()
Specify target steps or a target duration to profile.
By default, it profiles step 5
of training.
If profile_default_steps
is set to 'True' and none of the other
range parameters is specified,
the class uses the default configuration for detailed profiling.
DetailedProfilingConfig$new( start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL, profile_default_steps = FALSE )
start_step
(int): The step to start profiling. The default is step 5.
num_steps
(int): The number of steps to profile. The default is for 1 step.
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
profile_default_steps
(bool): Indicates whether the default config should be used.
clone()
The objects of this class are cloneable with this method.
DetailedProfilingConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Accepts drift check baselines parameters for conversion to request dict.
new()
Initialize a “DriftCheckBaselines“ instance and turn parameters into dict.
DriftCheckBaselines$new( model_statistics = NULL, model_constraints = NULL, model_data_statistics = NULL, model_data_constraints = NULL, bias_config_file = NULL, bias_pre_training_constraints = NULL, bias_post_training_constraints = NULL, explainability_constraints = NULL, explainability_config_file = NULL )
model_statistics
(MetricsSource): A metric source object that represents
model_constraints
(MetricsSource): A metric source object that represents
model_data_statistics
(MetricsSource): A metric source object that represents
model_data_constraints
(MetricsSource): A metric source object that represents
bias_config_file
(FileSource): A file source object that represents bias config
bias_pre_training_constraints
(MetricsSource):
bias_post_training_constraints
(MetricsSource):
explainability_constraints
(MetricsSource):
explainability_config_file
(FileSource): A file source object that represents
to_request_list()
Generates a request dictionary using the parameters provided to the class.
DriftCheckBaselines$to_request_list()
clone()
The objects of this class are cloneable with this method.
DriftCheckBaselines$clone(deep = FALSE)
deep
Whether to make a deep clone.
Fetch trial component data and make them accessible for analytics.
sagemaker.common::AnalyticsMetricsBase
-> ExperimentAnalytics
MAX_TRIAL_COMPONENTS
class metadata
name
Name of the Experiment being analyzed
new()
Initialize a “ExperimentAnalytics“ instance.
ExperimentAnalytics$new( experiment_name = NULL, search_expression = NULL, sort_by = NULL, sort_order = NULL, metric_names = NULL, parameter_names = NULL, sagemaker_session = NULL )
experiment_name
(str, optional): Name of the experiment if you want to constrain the search to only trial components belonging to an experiment.
search_expression
(dict, optional): The search query to find the set of trial components to use to populate the data frame.
sort_by
(str, optional): The name of the resource property used to sort the set of trial components.
sort_order
(str optional): How trial components are ordered, valid values are Ascending and Descending. The default is Descending.
metric_names
(list, optional): string names of all the metrics to be shown in the data frame. If not specified, all metrics will be shown of all trials.
parameter_names
(list, optional): string names of the parameters to be shown in the data frame. If not specified, all parameters will be shown of all trials.
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
clear_cache()
Clear the object of all local caches of API methods.
ExperimentAnalytics$clear_cache()
clone()
The objects of this class are cloneable with this method.
ExperimentAnalytics$clone(deep = FALSE)
deep
Whether to make a deep clone.
Abstract config class to configure an explainability method.
get_explainability_config()
Returns config.
ExplainabilityConfig$get_explainability_config()
format()
format class
ExplainabilityConfig$format()
clone()
The objects of this class are cloneable with this method.
ExplainabilityConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Configuration for processing job outputs in Amazon SageMaker Feature Store
sagemaker.core::ApiObject
-> FeatureStoreOutput
feature_group_name
placeholder
clone()
The objects of this class are cloneable with this method.
FeatureStoreOutput$clone(deep = FALSE)
deep
Whether to make a deep clone.
Handles Amazon SageMaker processing tasks for jobs using a machine learning framework
sagemaker.common::Processor
-> sagemaker.common::ScriptProcessor
-> FrameworkProcessor
framework_entrypoint_command
new()
Initializes a “FrameworkProcessor“ instance. The “FrameworkProcessor“ handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, which allows for a set of Python scripts to be run as part of the Processing Job.
FrameworkProcessor$new( estimator_cls, framework_version, role, instance_count, instance_type, py_version = "py3", image_uri = NULL, command = NULL, volume_size_in_gb = 30, volume_kms_key = NULL, output_kms_key = NULL, code_location = NULL, max_runtime_in_seconds = NULL, base_job_name = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL )
estimator_cls
(type): A subclass of the :class:'~sagemaker.estimator.Framework' estimator
framework_version
(str): The version of the framework. Value is ignored when “image_uri“ is provided.
role
(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.
instance_count
(int): The number of instances to run a processing job with.
instance_type
(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
py_version
(str): Python version you want to use for executing your model training code. One of 'py2' or 'py3'. Defaults to 'py3'. Value is ignored when “image_uri“ is provided.
image_uri
(str): The URI of the Docker image to use for the processing jobs (default: None).
command
([str]): The command to run, along with any command-line flags to *precede* the “'code script“'. Example: ["python3", "-v"]. If not provided, ["python"] will be chosen (default: None).
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: None).
output_kms_key
(str): The KMS key ID for processing job outputs (default: None).
code_location
(str): The S3 prefix URI where custom code will be uploaded (default: None). The code file uploaded to S3 is 'code_location/job-name/source/sourcedir.tar.gz'. If not specified, the default “code location“ is 's3://sagemaker-default-bucket'
max_runtime_in_seconds
(int): Timeout in seconds (default: None). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.
base_job_name
(str): Prefix for processing name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp (default: None).
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain (default: None).
env
(dict[str, str]): Environment variables to be passed to the processing jobs (default: None).
tags
(list[dict]): List of tags to be passed to the processing job (default: None). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
network_config
(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets (default: None).
get_run_args()
This object contains the normalized inputs, outputs and arguments needed when using a “FrameworkProcessor“ in a :class:'~sagemaker.workflow.steps.ProcessingStep'.
FrameworkProcessor$get_run_args( code, source_dir = NULL, dependencies = NULL, git_config = NULL, inputs = NULL, outputs = NULL, arguments = NULL, job_name = NULL )
code
(str): This can be an S3 URI or a local path to a file with the framework script to run. See the “code“ argument in 'sagemaker.processing.FrameworkProcessor.run()'.
source_dir
(str): Path (absolute, relative, or an S3 URI) to a directory wit any other processing source code dependencies aside from the entrypoint file (default: None). See the “source_dir“ argument in 'sagemaker.processing.FrameworkProcessor.run()'
dependencies
(list[str]): A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). See the “dependencies“ argument in 'sagemaker.processing.FrameworkProcessor.run()'.
git_config
(dict[str, str]): Git configurations used for cloning files. See the 'git_config' argument in 'sagemaker.processing.FrameworkProcessor.run()'.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: None).
job_name
(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
Returns a RunArgs object.
run()
Runs a processing job.
FrameworkProcessor$run( code, source_dir = NULL, dependencies = NULL, git_config = NULL, inputs = NULL, outputs = NULL, arguments = NULL, wait = TRUE, logs = TRUE, job_name = NULL, experiment_config = NULL, kms_key = NULL )
code
(str): This can be an S3 URI or a local path to a file with the framework script to run.Path (absolute or relative) to the local Python source file which should be executed as the entry point to training. When 'code' is an S3 URI, ignore 'source_dir', 'dependencies, and 'git_config'. If “source_dir“ is specified, then “code“ must point to a file located at the root of “source_dir“.
source_dir
(str): Path (absolute, relative or an S3 URI) to a directory with any other processing source code dependencies aside from the entry point file (default: None). If “source_dir“ is an S3 URI, it must point to a tar.gz file. Structure within this directory are preserved when processing on Amazon SageMaker (default: None).
dependencies
(list[str]): A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. If 'git_config' is provided, 'dependencies' should be a list of relative locations to directories with any additional libraries needed in the Git repo (default: None).
git_config
(dict[str, str]): Git configurations used for cloning files, including “repo“, “branch“, “commit“, “2FA_enabled“, “username“, “password“ and “token“. The “repo“ field is required. All other fields are optional. “repo“ specifies the Git repository where your training script is stored. If you don't provide “branch“, the default value 'master' is used. If you don't provide “commit“, the latest commit in the specified branch is used. results in cloning the repo specified in 'repo', then checkout the 'master' branch, and checkout the specified commit. “2FA_enabled“, “username“, “password“ and “token“ are used for authentication. For GitHub (or other Git) accounts, set “2FA_enabled“ to 'True' if two-factor authentication is enabled for the account, otherwise set it to 'False'. If you do not provide a value for “2FA_enabled“, a default value of 'False' is used. CodeCommit does not support two-factor authentication, so do not provide "2FA_enabled" with CodeCommit repositories. For GitHub and other Git repos, when SSH URLs are provided, it doesn't matter whether 2FA is enabled or disabled; you should either have no passphrase for the SSH key pairs, or have the ssh-agent configured so that you will not be prompted for SSH passphrase when you do 'git clone' command with SSH URLs. When HTTPS URLs are provided: if 2FA is disabled, then either token or username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, only token will be used for authentication if provided. If required authentication info is not provided, python SDK will try to use local credentials storage to authenticate. If that fails either, an error message will be thrown. For CodeCommit repos, 2FA is not supported, so '2FA_enabled' should not be provided. There is no token in CodeCommit, so 'token' should not be provided too. When 'repo' is an SSH URL, the requirements are the same as GitHub-like repos. When 'repo' is an HTTPS URL, username+password will be used for authentication if they are provided; otherwise, python SDK will try to use either CodeCommit credential helper or local credential storage for authentication.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: None).
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name
(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
clone()
The objects of this class are cloneable with this method.
FrameworkProcessor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Validates user inputs and fills in default values if no input is provided. There are three main profiling options to choose from: :class:'~sagemaker.debugger.metrics_config.DetailedProfilingConfig', :class:'~sagemaker.debugger.metrics_config.DataloaderProfilingConfig', and :class:'~sagemaker.debugger.metrics_config.PythonProfilingConfig'. The following list shows available scenarios of configuring the profiling options.
None of the profiling configuration, step range, or time range is specified. SageMaker Debugger activates framework profiling based on the default settings of each profiling option.
Target step or time range is specified to this :class:'~sagemaker.debugger.metrics_config.FrameworkProfile' class. The requested target step or time range setting propagates to all of the framework profiling options. For example, if you configure this class as following, all of the profiling options profiles the 6th step.
Individual profiling configurations are specified through the “*_profiling_config“ parameters. SageMaker Debugger profiles framework metrics only for the specified profiling configurations. For example, if the :class:'~sagemaker.debugger.metrics_config.DetailedProfilingConfig' class is configured but not the other profiling options, Debugger only profiles based on the settings specified to the :class:'~sagemaker.debugger.metrics_config.DetailedProfilingConfig' class. For example, the following example shows a profiling configuration to perform detailed profiling at step 10, data loader profiling at step 9 and 10, and Python profiling at step 12. If the individual profiling configurations are specified in addition to the step or time range, SageMaker Debugger prioritizes the individual profiling configurations and ignores the step or time range. For example, in the following code, the “start_step=1“ and “num_steps=10“ will be ignored.
new()
Initialize the FrameworkProfile class object.
FrameworkProfile$new( local_path = BASE_FOLDER_DEFAULT, file_max_size = MAX_FILE_SIZE_DEFAULT, file_close_interval = CLOSE_FILE_INTERVAL_DEFAULT, file_open_fail_threshold = FILE_OPEN_FAIL_THRESHOLD_DEFAULT, detailed_profiling_config = NULL, dataloader_profiling_config = NULL, python_profiling_config = NULL, horovod_profiling_config = NULL, smdataparallel_profiling_config = NULL, start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL )
local_path
(str):
file_max_size
(int):
file_close_interval
(int):
file_open_fail_threshold
(int):
detailed_profiling_config
(DetailedProfilingConfig): The configuration for detailed profiling. Configure it using the :class:'~sagemaker.debugger.metrics_config.DetailedProfilingConfig' class. Pass “DetailedProfilingConfig()“ to use the default configuration.
dataloader_profiling_config
(DataloaderProfilingConfig): The configuration for dataloader metrics profiling. Configure it using the :class:'~sagemaker.debugger.metrics_config.DataloaderProfilingConfig' class. Pass “DataloaderProfilingConfig()“ to use the default configuration.
python_profiling_config
(PythonProfilingConfig): The configuration for stats collected by the Python profiler (cProfile or Pyinstrument). Configure it using the :class:'~sagemaker.debugger.metrics_config.PythonProfilingConfig' class. Pass “PythonProfilingConfig()“ to use the default configuration.
horovod_profiling_config
:
smdataparallel_profiling_config
:
start_step
(int): The step at which to start profiling.
num_steps
(int): The number of steps to profile.
start_unix_time
(int): The Unix time at which to start profiling.
duration
(float): The duration in seconds to profile.
format()
format class
FrameworkProfile$format()
clone()
The objects of this class are cloneable with this method.
FrameworkProfile$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(sagemaker.common) profiler_config=ProfilerConfig$new( framework_profile_params=FrameworkProfile$new( start_step=1, num_steps=10, detailed_profiling_config=DetailedProfilingConfig$new(start_step=10, num_steps=1), dataloader_profiling_config=DataloaderProfilingConfig$new(start_step=9, num_steps=2), python_profiling_config=PythonProfilingConfig$new(start_step=12, num_steps=1) ) )
library(sagemaker.common) profiler_config=ProfilerConfig$new( framework_profile_params=FrameworkProfile$new( start_step=1, num_steps=10, detailed_profiling_config=DetailedProfilingConfig$new(start_step=10, num_steps=1), dataloader_profiling_config=DataloaderProfilingConfig$new(start_step=9, num_steps=2), python_profiling_config=PythonProfilingConfig$new(start_step=12, num_steps=1) ) )
Return the default built-in profiler rule with a unique name.
get_default_profiler_rule()
get_default_profiler_rule()
ProfilerRule: The instance of the built-in ProfilerRule.
For a full list of rule image URIs, see 'Use Debugger Docker Images for Built-in or Custom Rules https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-docker-images-rules.html.
get_rule_container_image_uri(region)
get_rule_container_image_uri(region)
region |
(str): A string of AWS Region. For example, “'us-east-1'“. |
str : Formatted image URI for the given AWS Region and the rule container type.
The configuration for framework metrics from Horovod distributed training.
sagemaker.common::MetricsConfigBase
-> HorovodProfilingConfig
new()
Specify target steps or a target duration to profile.
By default, it profiles step 13 of training.
If profile_default_steps
is set to 'True' and none of the other range
parameters is specified,
the class uses the default config for horovod profiling.
HorovodProfilingConfig$new( start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL, profile_default_steps = FALSE )
start_step
(int): The step to start profiling. The default is step 13.
num_steps
(int): The number of steps to profile. The default is for 1 steps.
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
profile_default_steps
(bool): Indicates whether the default config should be used.
clone()
The objects of this class are cloneable with this method.
HorovodProfilingConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Fetch results about a hyperparameter tuning job and make them accessible for analytics.
sagemaker.common::AnalyticsMetricsBase
-> HyperparameterTuningJobAnalytics
name
Name of the HyperparameterTuningJob being analyzed
tuning_ranges
A dictionary describing the ranges of all tuned hyperparameters. The keys are the names of the hyperparameter, and the values are the ranges. The output can take one of two forms: * If the 'TrainingJobDefinition' field is present in the job description, the output is a dictionary constructed from 'ParameterRanges' in 'HyperParameterTuningJobConfig' of the job description. The keys are the parameter names, while the values are the parameter ranges. Example: >>> >>> "eta": "MaxValue": "1", "MinValue": "0", "Name": "eta", >>> "gamma": "MaxValue": "10", "MinValue": "0", "Name": "gamma", >>> "iterations": "MaxValue": "100", "MinValue": "50", "Name": "iterations", >>> "num_layers": "MaxValue": "30", "MinValue": "5", "Name": "num_layers", >>> * If the 'TrainingJobDefinitions' field (list) is present in the job description, the output is a dictionary with keys as the 'DefinitionName' values from all items in 'TrainingJobDefinitions', and each value would be a dictionary constructed from 'HyperParameterRanges' in each item in 'TrainingJobDefinitions' in the same format as above Example: >>> >>> "estimator_1": >>> "eta": "MaxValue": "1", "MinValue": "0", "Name": "eta", >>> "gamma": "MaxValue": "10", "MinValue": "0", "Name": "gamma", >>> , >>> "estimator_2": >>> "framework": "Values": ["TF", "MXNet"], "Name": "framework", >>> "gamma": "MaxValue": "1.0", "MinValue": "0.2", "Name": "gamma" >>> >>> For more details about the 'TrainingJobDefinition' and 'TrainingJobDefinitions' fields in job description, see https://botocore.readthedocs.io/en/latest/reference/services/sagemaker.html#SageMaker.Client.create_hyper_parameter_tuning_job
new()
Initialize a “HyperparameterTuningJobAnalytics“ instance.
HyperparameterTuningJobAnalytics$new( hyperparameter_tuning_job_name, sagemaker_session = NULL )
hyperparameter_tuning_job_name
(str): name of the HyperparameterTuningJob to analyze.
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
description()
Call “DescribeHyperParameterTuningJob“ for the hyperparameter tuning job.
HyperparameterTuningJobAnalytics$description(force_refresh = FALSE)
force_refresh
(bool): Set to True to fetch the latest data from SageMaker API.
dict: The Amazon SageMaker response for “DescribeHyperParameterTuningJob“.
training_job_summaries()
A (paginated) list of everything from “ListTrainingJobsForTuningJob“.
HyperparameterTuningJobAnalytics$training_job_summaries(force_refresh = FALSE)
force_refresh
(bool): Set to True to fetch the latest data from SageMaker API.
dict: The Amazon SageMaker response for “ListTrainingJobsForTuningJob“.
clear_cache()
Clear the object of all local caches of API methods.
HyperparameterTuningJobAnalytics$clear_cache()
clone()
The objects of this class are cloneable with this method.
HyperparameterTuningJobAnalytics$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config object for handling images
Config object for handling images
sagemaker.common::ExplainabilityConfig
-> ImageConfig
image_config
Image config
new()
Initializes all configuration parameters needed for SHAP CV explainability
ImageConfig$new( model_type, num_segments = NULL, feature_extraction_method = NULL, segment_compactness = NULL, max_objects = NULL, iou_threshold = NULL, context = NULL )
model_type
(str): Specifies the type of CV model. Options: (IMAGE_CLASSIFICATION | OBJECT_DETECTION).
num_segments
(None or int): Clarify uses SKLearn's SLIC method for image segmentation to generate features/superpixels. num_segments specifies approximate number of segments to be generated. Default is None. SLIC will default to 100 segments.
feature_extraction_method
(NULL or str): method used for extracting features from the image.ex. "segmentation". Default is segmentation.
segment_compactness
(NULL or float): Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.
max_objects
(NULL or int): maximum number of objects displayed. Object detection algorithm may detect more than max_objects number of objects in a single image. The top max_objects number of objects according to confidence score will be displayed.
iou_threshold
(NULL or float): minimum intersection over union for the object bounding box to consider its confidence score for computing SHAP values [0.0, 1.0]. This parameter is used for the object detection case.
context
(NULL or float): refers to the portion of the image outside of the bounding box. Scale is [0.0, 1.0]. If set to 1.0, whole image is considered, if set to 0.0 only the image inside bounding box is considered.
get_image_config()
Returns the image config part of an analysis config dictionary.
ImageConfig$get_image_config()
clone()
The objects of this class are cloneable with this method.
ImageConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Helper function to determine whether the provided regex is valid.
is_valid_regex(regex)
is_valid_regex(regex)
regex |
(str): The user provided regex. |
bool: Indicates whether the provided regex was valid or not.
Contains lambda boto3 wrappers to Create, Update, Delete and Invoke Lambda functions.
new()
Constructs a Lambda instance. This instance represents a Lambda function and provides methods for updating, deleting and invoking the function. This class can be used either for creating a new Lambda function or using an existing one. When using an existing Lambda function, only the function_arn argument is required. When creating a new one the function_name, execution_role_arn and handler arguments are required, as well as either script or zipped_code_dir.
Lambda$new( function_arn = NULL, function_name = NULL, execution_role_arn = NULL, zipped_code_dir = NULL, s3_bucket = NULL, script = NULL, handler = NULL, session = NULL, timeout = 120, memory_size = 128, runtime = "python3.8" )
function_arn
(str): The arn of the Lambda function.
function_name
(str): The name of the Lambda function. Function name must be provided to create a Lambda function.
execution_role_arn
(str): The role to be attached to Lambda function.
zipped_code_dir
(str): The path of the zipped code package of the Lambda function.
s3_bucket
(str): The bucket where zipped code is uploaded. If not provided, default session bucket is used to upload zipped_code_dir.
script
(str): The path of Lambda function script for direct zipped upload
handler
(str): The Lambda handler. The format for handler should be file_name.function_name. For ex: if the name of the Lambda script is hello_world.py and Lambda function definition in that script is
session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, new session is created.
timeout
(int): Timeout of the Lambda function in seconds. Default is 120 seconds.
memory_size
(int): Memory of the Lambda function in megabytes. Default is 128 MB.
runtime
(str): Runtime of the Lambda function. Default is set to python3.8.
lambda_handler(event,
context), the handler should be hello_world.lambda_handler
create()
Method to create a lambda function.
Lambda$create()
boto3 response from Lambda's create_function method.
update()
Method to update a lambda function.
Lambda$update()
: paws response from Lambda's update_function method.
invoke()
Method to invoke a lambda function.
Lambda$invoke()
paws response from Lambda's invoke method.
delete()
Method to delete a lambda function.
Lambda$delete()
paws response from Lambda's delete_function method.
format()
foramt class
Lambda$format()
clone()
The objects of this class are cloneable with this method.
Lambda$clone(deep = FALSE)
deep
Whether to make a deep clone.
It determines the step or time range that needs to be
profiled and validates the input value pairs. Available profiling range parameter pairs are
(start_step
and num_steps
) and (start_unix_time
and duration
).
The two parameter pairs are mutually exclusive, and this class validates
if one of the two pairs is used. If both pairs are specified, a
FOUND_BOTH_STEP_AND_TIME_FIELDS error occurs.
new()
Validate the provided range fields and set the range to be profiled accordingly.
MetricsConfigBase$new( name, start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL )
name
(str): The name of the metrics config.
start_step
(int): The step to start profiling.
num_steps
(int): The number of steps to profile.
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
to_json_string()
Convert this metrics configuration to dictionary formatted as a string. Calling eval on the return value is the same as calling _to_json directly.
MetricsConfigBase$to_json_string()
str: This metrics configuration as a dictionary and formatted as a string.
format()
format class
MetricsConfigBase$format()
clone()
The objects of this class are cloneable with this method.
MetricsConfigBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config object related to a model and its endpoint to be created.
predictor_config
Predictor dictionary of the analysis config
new()
Initializes a configuration of a model and the endpoint to be created for it.
ModelConfig$new( model_name, instance_count, instance_type, accept_type = NULL, content_type = NULL, content_template = NULL, custom_attributes = NULL, accelerator_type = NULL, endpoint_name_prefix = NULL )
model_name
(str): Model name (as created by 'CreateModel').
instance_count
(int): The number of instances of a new endpoint for model inference.
instance_type
(str): The type of EC2 instance to use for model inference, for example, 'ml.c5.xlarge'.
accept_type
(str): The model output format to be used for getting inferences with the shadow endpoint. Valid values are "text/csv" for CSV and "application/jsonlines". Default is the same as content_type.
content_type
(str): The model input format to be used for getting inferences with the shadow endpoint. Valid values are "text/csv" for CSV and "application/jsonlines". Default is the same as dataset format.
content_template
(str): A template string to be used to construct the model input from dataset instances. It is only used when "model_content_type" is "application/jsonlines". The template should have one and only one placeholder $features which will be replaced by a features list for to form the model inference input.
custom_attributes
(str): Provides additional information about a request for an inference submitted to a model hosted at an Amazon SageMaker endpoint. The information is an opaque value that is forwarded verbatim. You could use this value, for example, to provide an ID that you can use to track a request or to provide other metadata that a service endpoint was programmed to process. The value must consist of no more than 1024 visible US-ASCII characters as specified in Section 3.3.6. Field Value Components ( https://tools.ietf.org/html/rfc7230#section-3.2.6) of the Hypertext Transfer Protocol (HTTP/1.1).
accelerator_type
(str): The Elastic Inference accelerator type to deploy to the model endpoint instance for making inferences to the model, see https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html.
endpoint_name_prefix
(str): The endpoint name prefix of a new endpoint. Must follow pattern "^[a-zA-Z0-9](-\*[a-zA-Z0-9]".
get_predictor_config()
Returns part of the predictor dictionary of the analysis config.
ModelConfig$get_predictor_config()
format()
format class
ModelConfig$format()
clone()
The objects of this class are cloneable with this method.
ModelConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config object to extract a predicted label from the model output.
label
Predicted label of the same type as the label in the dataset
probability
Optional index or JSONPath location in the model
probability_threshold
An optional value for binary prediction task
predictor_config
Predictor dictionary of the analysis config.
new()
Initializes a model output config to extract the predicted label. The following examples show different parameter configurations depending on the endpoint:
Regression Task: The model returns the score, e.g. 1.2. we don't need to specify
anything. For json output, e.g. list('score'=1.2)
we can set ''label='score”'
Binary classification:
The model returns a single probability and we would like to classify as 'yes' those with a probability exceeding 0.2. We can set ''probability_threshold=0.2, label_headers='yes”'.
The model returns list('probability'=0.3)
, for which we would like to apply a
threshold of 0.5 to obtain a predicted label in list(0, 1)
. In this case we can set
''label='probability”'.
The model returns a tuple of the predicted label and the probability. In this case we can set ''label=0''.
Multiclass classification:
The model returns
list('labels'= c('cat', 'dog', 'fish'), 'probabilities'=c(0.35, 0.25, 0.4))
.
In this case we would set the ''probability='probabilities”' and
''label='labels”‘ and infer the predicted label to be '’fish.''
The model returns list('predicted_label'='fish', 'probabilities'=c(0.35, 0.25, 0.4]))
.
In this case we would set the ''label='predicted_label”'.
The model returns c(0.35, 0.25, 0.4)
. In this case, we can set
''label_headers=['cat','dog','fish']'‘ and infer the predicted label to be '’fish.''
ModelPredictedLabelConfig$new( label = NULL, probability = NULL, probability_threshold = NULL, label_headers = NULL )
label
(str or [integer] or list[integer]): Optional index or JSONPath location in the model output for the prediction. In case, this is a predicted label of the same type as the label in the dataset no further arguments need to be specified.
probability
(str or [integer] or list[integer]): Optional index or JSONPath location in the model output for the predicted scores.
probability_threshold
(float): An optional value for binary prediction tasks in which the model returns a probability, to indicate the threshold to convert the prediction to a boolean value. Default is 0.5.
label_headers
(list): List of label values - one for each score of the “probability“.
get_predictor_config()
Returns probability_threshold, predictor config.
ModelPredictedLabelConfig$get_predictor_config()
format()
format class
ModelPredictedLabelConfig$format()
clone()
The objects of this class are cloneable with this method.
ModelPredictedLabelConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Accepts network configuration parameters and provides a method to turn these parameters into a dictionary.
new()
Initialize a “NetworkConfig“ instance. NetworkConfig accepts network configuration parameters and provides a method to turn these parameters into a dictionary.
NetworkConfig$new( enable_network_isolation = FALSE, security_group_ids = NULL, subnets = NULL, encrypt_inter_container_traffic = NULL )
enable_network_isolation
(bool): Boolean that determines whether to enable network isolation.
security_group_ids
([str]): A list of strings representing security group IDs.
subnets
([str]): A list of strings representing subnets.
encrypt_inter_container_traffic
(bool): Boolean that determines whether to encrypt inter-container traffic. Default value is None.
to_request_list()
Generates a request dictionary using the parameters provided to the class.
NetworkConfig$to_request_list()
format()
format class
NetworkConfig$format()
clone()
The objects of this class are cloneable with this method.
NetworkConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
If PDP is requested, the Partial Dependence Plots will be included in the report, and the corresponding values will be included in the analysis output.
sagemaker.common::ExplainabilityConfig
-> PDPConfig
pdp_config
PDP Config
new()
Initializes config for PDP.
PDPConfig$new(features = NULL, grid_resolution = 15, top_k_features = 10)
features
(None or list): List of features names or indices for which partial dependence plots must be computed and plotted. When ShapConfig is provided, this parameter is optional as Clarify will try to compute the partial dependence plots for top feature based on SHAP attributions. When ShapConfig is not provided, 'features' must be provided.
grid_resolution
(int): In case of numerical features, this number represents that number of buckets that range of values must be divided into. This decides the granularity of the grid in which the PDP are plotted.
top_k_features
(int): Set the number of top SHAP attributes to be selected to compute partial dependence plots.
get_explainability_config()
Returns config.
PDPConfig$get_explainability_config()
clone()
The objects of this class are cloneable with this method.
PDPConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Accepts parameters that specify an Amazon S3 input for a processing job and provides a method to turn those parameters into a dictionary.
new()
Initializes a “ProcessingInput“ instance. “ProcessingInput“ accepts parameters that specify an Amazon S3 input for a processing job and provides a method to turn those parameters into a dictionary.
ProcessingInput$new( source = NULL, destination = NULL, input_name = NULL, s3_data_type = c("S3Prefix", "ManifestFile"), s3_input_mode = c("File", "Pipe"), s3_data_distribution_type = c("FullyReplicated", "ShardedByS3Key"), s3_compression_type = c("None", "Gzip"), s3_input = NULL, dataset_definition = NULL, app_managed = FALSE )
source
(str): The source for the input. If a local path is provided, it will automatically be uploaded to S3 under: "s3://<default-bucket-name>/<job-name>/input/<input-name>".
destination
(str): The destination of the input.
input_name
(str): The name for the input. If a name is not provided, one will be generated (eg. "input-1").
s3_data_type
(str): Valid options are "ManifestFile" or "S3Prefix".
s3_input_mode
(str): Valid options are "Pipe" or "File".
s3_data_distribution_type
(str): Valid options are "FullyReplicated" or "ShardedByS3Key".
s3_compression_type
(str): Valid options are "None" or "Gzip".
s3_input
(:class:'~sagemaker.dataset_definition.S3Input') Metadata of data objects stored in S3
dataset_definition
(:class:'~sagemaker.dataset_definition.DatasetDefinition') DatasetDefinition input
app_managed
(bool): Whether the input are managed by SageMaker or application
to_request_list()
Generates a request dictionary using the parameters provided to the class.
ProcessingInput$to_request_list()
format()
format class
ProcessingInput$format()
clone()
The objects of this class are cloneable with this method.
ProcessingInput$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Processor:
ProcessingJob
,
ProcessingOutput
,
Processor
,
ScriptProcessor
Provides functionality to start, describe, and stop processing jobs.
sagemaker.common::.Job
-> ProcessingJob
inputs
A list of :class:'~sagemaker.processing.ProcessingInput' objects.
outputs
A list of :class:'~sagemaker.processing.ProcessingOutput' objects.
output_kms_key
The output KMS key associated with the job
new()
Initializes a Processing job.
ProcessingJob$new( sagemaker_session = NULL, job_name = NULL, inputs = NULL, outputs = NULL, output_kms_key = NULL )
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
job_name
(str): Name of the Processing job.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): A list of :class:'~sagemaker.processing.ProcessingInput' objects.
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): A list of :class:'~sagemaker.processing.ProcessingOutput' objects.
output_kms_key
(str): The output KMS key associated with the job (default: None).
start_new()
Starts a new processing job using the provided inputs and outputs.
ProcessingJob$start_new(processor, inputs, outputs, experiment_config)
processor
(:class:'~sagemaker.processing.Processor'): The “Processor“ instance that started the job.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): A list of :class:'~sagemaker.processing.ProcessingInput' objects.
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): A list of :class:'~sagemaker.processing.ProcessingOutput' objects.
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
:class:'~sagemaker.processing.ProcessingJob': The instance of “ProcessingJob“ created using the “Processor“.
from_processing_name()
Initializes a “ProcessingJob“ from a processing job name.
ProcessingJob$from_processing_name(sagemaker_session, processing_job_name)
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
processing_job_name
(str): Name of the processing job.
:class:'~sagemaker.processing.ProcessingJob': The instance of “ProcessingJob“ created from the job name.
from_processing_arn()
Initializes a “ProcessingJob“ from a Processing ARN.
ProcessingJob$from_processing_arn(sagemaker_session, processing_job_arn)
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
processing_job_arn
(str): ARN of the processing job.
:class:'~sagemaker.processing.ProcessingJob': The instance of “ProcessingJob“ created from the processing job's ARN.
wait()
Waits for the processing job to complete.
ProcessingJob$wait(logs = TRUE)
logs
(bool): Whether to show the logs produced by the job (default: True).
describe()
Prints out a response from the DescribeProcessingJob API call.
ProcessingJob$describe()
stop()
the processing job.
ProcessingJob$stop()
prepare_app_specification()
Prepares a dict that represents a ProcessingJob's AppSpecification.
ProcessingJob$prepare_app_specification( container_arguments, container_entrypoint, image_uri )
container_arguments
(list[str]): The arguments for a container used to run a processing job.
container_entrypoint
(list[str]): The entrypoint for a container used to run a processing job.
image_uri
(str): The container image to be run by the processing job.
dict: Represents AppSpecification which configures the processing job to run a specified Docker container image.
prepare_output_config()
Prepares a dict that represents a ProcessingOutputConfig.
ProcessingJob$prepare_output_config(kms_key_id, outputs)
kms_key_id
(str): The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt the processing job output. KmsKeyId can be an ID of a KMS key, ARN of a KMS key, alias of a KMS key, or alias of a KMS key. The KmsKeyId is applied to all outputs.
outputs
(list[dict]): Output configuration information for a processing job.
dict: Represents output configuration for the processing job.
prepare_processing_resources()
Prepares a dict that represents the ProcessingResources.
ProcessingJob$prepare_processing_resources( instance_count, instance_type, volume_kms_key_id, volume_size_in_gb )
instance_count
(int): The number of ML compute instances to use in the processing job. For distributed processing jobs, specify a value greater than 1. The default value is 1.
instance_type
(str): The ML compute instance type for the processing job.
volume_kms_key_id
(str): The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) that run the processing job.
volume_size_in_gb
(int): The size of the ML storage volume in gigabytes that you want to provision. You must specify sufficient ML storage for your scenario.
dict: Represents ProcessingResources which identifies the resources, ML compute instances, and ML storage volumes to deploy for a processing job.
prepare_stopping_condition()
Prepares a dict that represents the job's StoppingCondition.
ProcessingJob$prepare_stopping_condition(max_runtime_in_seconds)
max_runtime_in_seconds
(int): Specifies the maximum runtime in seconds.
list
clone()
The objects of this class are cloneable with this method.
ProcessingJob$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Processor:
ProcessingInput
,
ProcessingOutput
,
Processor
,
ScriptProcessor
Accepts parameters that specify an Amazon S3 output for a processing job and provides a method to turn those parameters into a dictionary.
new()
Initializes a “ProcessingOutput“ instance. “ProcessingOutput“ accepts parameters that specify an Amazon S3 output for a processing job and provides a method to turn those parameters into a dictionary.
ProcessingOutput$new( source = NULL, destination = NULL, output_name = NULL, s3_upload_mode = c("EndOfJob", "Continuous"), app_managed = FALSE, feature_store_output = NULL )
source
(str): The source for the output.
destination
(str): The destination of the output. If a destination is not provided, one will be generated: "s3://<default-bucket-name>/<job-name>/output/<output-name>".
output_name
(str): The name of the output. If a name is not provided, one will be generated (eg. "output-1").
s3_upload_mode
(str): Valid options are "EndOfJob" or "Continuous". s3_upload_mode (str): Valid options are "EndOfJob" or "Continuous".
app_managed
(bool): Whether the input are managed by SageMaker or application
feature_store_output
(:class:'~sagemaker.processing.FeatureStoreOutput') Configuration for processing job outputs of FeatureStore.
to_request_list()
Generates a request dictionary using the parameters provided to the class.
ProcessingOutput$to_request_list()
format()
format class
ProcessingOutput$format()
clone()
The objects of this class are cloneable with this method.
ProcessingOutput$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Processor:
ProcessingInput
,
ProcessingJob
,
Processor
,
ScriptProcessor
Handles Amazon SageMaker Processing tasks.
role
An AWS IAM role name or ARN
image_uri
The URI of the Docker image to use
instance_count
The number of instances to run
instance_type
The type of EC2 instance to use
entrypoint
The entrypoint for the processing job
volume_size_in_gb
Size in GB of the EBS volume
volume_kms_key
A KMS key for the processing
output_kms_key
The KMS key ID for processing job outputs
max_runtime_in_seconds
Timeout in seconds
base_job_name
Prefix for processing job name
sagemaker_session
Session object which manages interactions with Amazon SageMaker
env
Environment variables
tags
List of tags to be passed
network_config
A :class:'~sagemaker.network.NetworkConfig'
jobs
Jobs ran /running
latest_job
Previously ran jobs
.current_job_name
Current job
arguments
extra agruments
new()
Initializes a “Processor“ instance. The “Processor“ handles Amazon SageMaker Processing tasks.
Processor$new( role, image_uri, instance_count, instance_type, entrypoint = NULL, volume_size_in_gb = 30, volume_kms_key = NULL, output_kms_key = NULL, max_runtime_in_seconds = NULL, base_job_name = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL )
role
(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.
image_uri
(str): The URI of the Docker image to use for the processing jobs.
instance_count
(int): The number of instances to run a processing job with.
instance_type
(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
entrypoint
(list[str]): The entrypoint for the processing job (default: NULL). This is in the form of a list of strings that make a command.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: NULL).
output_kms_key
(str): The KMS key ID for processing job outputs (default: NULL).
max_runtime_in_seconds
(int): Timeout in seconds (default: NULL). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.
base_job_name
(str): Prefix for processing job name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp.
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
env
(dict[str, str]): Environment variables to be passed to the processing jobs (default: NULL).
tags
(list[dict]): List of tags to be passed to the processing job (default: NULL). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
network_config
(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
run()
Runs a processing job.
Processor$run( inputs = NULL, outputs = NULL, arguments = NULL, wait = TRUE, logs = TRUE, job_name = NULL, experiment_config = NULL )
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: NULL).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: NULL).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: NULL).
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when “wait“ is True (default: True).
job_name
(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
format()
format class
Processor$format()
clone()
The objects of this class are cloneable with this method.
Processor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Processor:
ProcessingInput
,
ProcessingJob
,
ProcessingOutput
,
ScriptProcessor
SageMaker Debugger collects system and framework profiling information of training jobs and identify performance bottlenecks.
new()
Initialize a “ProfilerConfig“ instance. Pass the output of this class to the “profiler_config“ parameter of the generic :class:'~sagemaker.estimator.Estimator' class and SageMaker Framework estimators.
ProfilerConfig$new( s3_output_path = NULL, system_monitor_interval_millis = NULL, framework_profile_params = NULL )
s3_output_path
(str): The location in Amazon S3 to store the output. The default Debugger output path for profiling data is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/profiler-output/.
system_monitor_interval_millis
(int): The time interval in milliseconds to collect system metrics. Available values are 100, 200, 500, 1000 (1 second), 5000 (5 seconds), and 60000 (1 minute) milliseconds. The default is 500 milliseconds.
framework_profile_params
(:class:'~sagemaker.debugger.FrameworkProfile'): A parameter object for framework metrics profiling. Configure it using the :class:'~sagemaker.debugger.FrameworkProfile' class. To use the default framework profile parameters, pass “FrameworkProfile()“. For more information about the default values, see :class:'~sagemaker.debugger.FrameworkProfile'.
# The following example shows the basic ``profiler_config`` # parameter configuration, enabling system monitoring every 5000 milliseconds # and framework profiling with default parameter values. library(sagemaker.common) profiler_config = ProfilerConfig$new( system_monitor_interval_millis = 5000, framework_profile_params = FrameworkProfile$new() )
to_request_list()
Generate a request dictionary using the parameters provided when initializing the object.
ProfilerConfig$to_request_list()
dict: An portion of an API request as a dictionary.
format()
format class
ProfilerConfig$format()
clone()
The objects of this class are cloneable with this method.
ProfilerConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------ ## Method `ProfilerConfig$new` ## ------------------------------------------------ # The following example shows the basic ``profiler_config`` # parameter configuration, enabling system monitoring every 5000 milliseconds # and framework profiling with default parameter values. library(sagemaker.common) profiler_config = ProfilerConfig$new( system_monitor_interval_millis = 5000, framework_profile_params = FrameworkProfile$new() )
## ------------------------------------------------ ## Method `ProfilerConfig$new` ## ------------------------------------------------ # The following example shows the basic ``profiler_config`` # parameter configuration, enabling system monitoring every 5000 milliseconds # and framework profiling with default parameter values. library(sagemaker.common) profiler_config = ProfilerConfig$new( system_monitor_interval_millis = 5000, framework_profile_params = FrameworkProfile$new() )
SageMaker Debugger profiling rules automatically analyze hardware system resource utilization and framework metrics of a training job to identify performance bottlenecks. SageMaker Debugger comes pre-packaged with built-in *profiling* rules. For example, the profiling rules can detect if GPUs are underutilized due to CPU bottlenecks or IO bottlenecks. For a full list of built-in rules for debugging, see 'List of Debugger Built-in Rules' https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html. You can also write your own profiling rules using the Amazon SageMaker Debugger APIs.
sagemaker.common::RuleBase
-> ProfilerRule
sagemaker()
Initialize a “ProfilerRule“ object for a *built-in* profiling rule. The rule analyzes system and framework metrics of a given training job to identify performance bottlenecks.
ProfilerRule$sagemaker( base_config, name = NULL, container_local_output_path = NULL, s3_output_path = NULL )
base_config
(sagemaker.debugger::ProfilerReport
): The base rule configuration object
returned from the sagemaker.debugger
method.
For example, sagemaker.debugger::ProfilerReport$new()
.
For a full list of built-in rules for debugging, see
'List of Debugger Built-in Rules'
https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html.
name
(str): The name of the profiler rule. If one is not provided, the name of the base_config will be used.
container_local_output_path
(str): The path in the container.
s3_output_path
(str): The location in Amazon S3 to store the profiling output data. The default Debugger output path for profiling data is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/profiler-output/.
:class:'~sagemaker.debugger.ProfilerRule': The instance of the built-in ProfilerRule.
custom()
Initialize a “ProfilerRule“ object for a *custom* profiling rule. You can create a rule that analyzes system and framework metrics emitted during the training of a model and monitors conditions that are critical for the success of a training job.
ProfilerRule$custom( name, image_uri, instance_type, volume_size_in_gb, source = NULL, rule_to_invoke = NULL, container_local_output_path = NULL, s3_output_path = NULL, rule_parameters = NULL )
name
(str): The name of the profiler rule.
image_uri
(str): The URI of the image to be used by the proflier rule.
instance_type
(str): Type of EC2 instance to use, for example, 'ml.c4.xlarge'.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data.
source
(str): A source file containing a rule to invoke. If provided, you must also provide rule_to_invoke. This can either be an S3 uri or a local path.
rule_to_invoke
(str): The name of the rule to invoke within the source. If provided, you must also provide the source.
container_local_output_path
(str): The path in the container.
s3_output_path
(str): The location in Amazon S3 to store the output. The default Debugger output path for profiling data is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/profiler-output/.
rule_parameters
(dict): A dictionary of parameters for the rule.
:class:'~sagemaker.debugger.ProfilerRule': The instance of the custom ProfilerRule.
to_profiler_rule_config_list()
Generates a request dictionary using the parameters provided when initializing object.
ProfilerRule$to_profiler_rule_config_list()
lict: An portion of an API request as a dictionary.
clone()
The objects of this class are cloneable with this method.
ProfilerRule$clone(deep = FALSE)
deep
Whether to make a deep clone.
List the Python profiler options for Python profiling.
PythonProfiler
PythonProfiler
An object of class PythonProfiler
(inherits from Enum
, environment
) of length 2.
environment containing [CPROFILE, PYINSTRUMENT]
The configuration for framework metrics to be collected for Python profiling.
sagemaker.common::MetricsConfigBase
-> PythonProfilingConfig
new()
Choose a Python profiler: cProfile or Pyinstrument.
Specify target steps or a target duration to profile.
If no parameter is specified,
it profiles based on profiling configurations
preset by the profile_default_steps
parameter,
which is set to 'True' by default.
If you specify the following parameters,
then the profile_default_steps
parameter
will be ignored.
PythonProfilingConfig$new( start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL, profile_default_steps = FALSE, python_profiler = PythonProfiler$CPROFILE, cprofile_timer = cProfileTimer$TOTAL_TIME )
start_step
(int): The step to start profiling. The default is step 9.
num_steps
(int): The number of steps to profile. The default is for 3 steps.
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
profile_default_steps
(bool): Indicates whether the default configuration should be used. If set to 'True', Python profiling will be done at step 9, 10, and 11 of training, using cProfiler and collecting metrics based on the total time, cpu time, and off cpu time for these three steps respectively. The default is “True“.
python_profiler
(PythonProfiler): The Python profiler to use to collect python profiling stats. Available options are “"cProfile"“ and “"Pyinstrument"“. The default is “"cProfile"“. Instead of passing the string values, you can also use the enumerator util, :class:'~sagemaker.debugger.utils.PythonProfiler', to choose one of the available options.
cprofile_timer
(cProfileTimer): The timer to be used by cProfile when collecting python profiling stats. Available options are “"total_time"“, “"cpu_time"“, and “"off_cpu_time"“. The default is “"total_time"“. If you choose Pyinstrument, this parameter is ignored. Instead of passing the string values, you can also use the enumerator util, :class:'~sagemaker.debugger.utils.cProfileTimer', to choose one of the available options.
clone()
The objects of this class are cloneable with this method.
PythonProfilingConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
With this input, SQL queries will be executed using Redshift to generate datasets to S3.
sagemaker.core::ApiObject
-> RedshiftDatasetDefinition
new()
Initialize RedshiftDatasetDefinition.
RedshiftDatasetDefinition$new( cluster_id = NULL, database = NULL, db_user = NULL, query_string = NULL, cluster_role_arn = NULL, output_s3_uri = NULL, kms_key_id = NULL, output_format = NULL, output_compression = NULL )
cluster_id
(str, default=None): The Redshift cluster Identifier. database (str, default=None): The name of the Redshift database used in Redshift query execution.
database
(str, default=None): The name of the Redshift database used in Redshift query execution.
db_user
(str, default=None): The database user name used in Redshift query execution.
query_string
(str, default=None): The SQL query statements to be executed.
cluster_role_arn
(str, default=None): The IAM role attached to your Redshift cluster that Amazon SageMaker uses to generate datasets.
output_s3_uri
(str, default=None): The location in Amazon S3 where the Redshift query results are stored.
kms_key_id
(str, default=None): The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt data from a Redshift execution.
output_format
(str, default=None): The data storage format for Redshift query results. Valid options are "PARQUET", "CSV"
output_compression
(str, default=None): The compression used for Redshift query results. Valid options are "None", "GZIP", "SNAPPY", "ZSTD", "BZIP2"
clone()
The objects of this class are cloneable with this method.
RedshiftDatasetDefinition$clone(deep = FALSE)
deep
Whether to make a deep clone.
The SageMaker Debugger Rule class configures *debugging* rules to debug your training job. The debugging rules analyze tensor outputs from your training job and monitor conditions that are critical for the success of the training job. SageMaker Debugger comes pre-packaged with built-in *debugging* rules. For example, the debugging rules can detect whether gradients are getting too large or too small, or if a model is overfitting. For a full list of built-in rules for debugging, see 'List of Debugger Built-in Rules' https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html. You can also write your own rules using the custom rule classmethod.
sagemaker.common::RuleBase
-> Rule
collection_configs
A list of :class:'~sagemaker.debugger.CollectionConfig
actions
Placeholder
new()
Configure the debugging rules using the following classmethods.
Rule$new( name = NULL, image_uri = NULL, instance_type = NULL, container_local_output_path = NULL, s3_output_path = NULL, volume_size_in_gb = NULL, rule_parameters = NULL, collections_to_save = NULL, actions = NULL )
name
(str): The name of the rule.
image_uri
(str): The image URI to use the rule.
instance_type
(str): Type of EC2 instance to use. For example, 'ml.c4.xlarge'.
container_local_output_path
(str): The local path to store the Rule output.
s3_output_path
(str): The location in S3 to store the output.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data.
rule_parameters
(dict): A dictionary of parameters for the rule.
collections_to_save
([sagemaker.debugger.CollectionConfig]): Optional. A list of :class:'~sagemaker.debugger.CollectionConfig' objects to be saved.
actions
:
sagemaker()
Initialize a “Rule“ object for a built-in
debugging rule.
Rule$sagemaker( base_config, name = NULL, container_local_output_path = NULL, s3_output_path = NULL, other_trials_s3_input_paths = NULL, rule_parameters = NULL, collections_to_save = NULL, actions = NULL )
base_config
(dict): Required. This is the base rule config dictionary returned from the
:class:sagemaker.debugger
method.
For example, sagemaker.debugger::dead_relu()
.
For a full list of built-in rules for debugging, see
'List of Debugger Built-in Rules'
https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html.
name
(str): Optional. The name of the debugger rule. If one is not provided, the name of the base_config will be used.
container_local_output_path
(str): Optional. The local path in the rule processing container.
s3_output_path
(str): Optional. The location in Amazon S3 to store the output tensors. The default Debugger output path for debugging data is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/debug-output/.
other_trials_s3_input_paths
([str]): Optional. The Amazon S3 input paths of other trials to use the SimilarAcrossRuns rule.
rule_parameters
(dict): Optional. A dictionary of parameters for the rule.
collections_to_save
(:class:sagemaker.debugger::CollectionConfig
):
Optional. A list
of :class:sagemaker.debugger::CollectionConfig
objects to be saved.
actions
:
:class:'~sagemaker.debugger.Rule': An instance of the built-in rule.
custom()
Initialize a “Rule“ object for a *custom* debugging rule. You can create a custom rule that analyzes tensors emitted during the training of a model and monitors conditions that are critical for the success of a training job. For more information, see 'Create Debugger Custom Rules for Training Job Analysis' https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-custom-rules.html.
Rule$custom( name, image_uri, instance_type, volume_size_in_gb, source = NULL, rule_to_invoke = NULL, container_local_output_path = NULL, s3_output_path = NULL, other_trials_s3_input_paths = NULL, rule_parameters = NULL, collections_to_save = NULL, actions = NULL )
name
(str): Required. The name of the debugger rule.
image_uri
(str): Required. The URI of the image to be used by the debugger rule.
instance_type
(str): Required. Type of EC2 instance to use, for example, 'ml.c4.xlarge'.
volume_size_in_gb
(int): Required. Size in GB of the EBS volume to use for storing data.
source
(str): Optional. A source file containing a rule to invoke. If provided, you must also provide rule_to_invoke. This can either be an S3 uri or a local path.
rule_to_invoke
(str): Optional. The name of the rule to invoke within the source. If provided, you must also provide source.
container_local_output_path
(str): Optional. The local path in the container.
s3_output_path
(str): Optional. The location in Amazon S3 to store the output tensors. The default Debugger output path for debugging data is created under the default output path of the :class:'~sagemaker.estimator.Estimator' class. For example, s3://sagemaker-<region>-<12digit_account_id>/<training-job-name>/debug-output/.
other_trials_s3_input_paths
([str]): Optional. The Amazon S3 input paths of other trials to use the SimilarAcrossRuns rule.
rule_parameters
(dict): Optional. A dictionary of parameters for the rule.
collections_to_save
([sagemaker.debugger.CollectionConfig]): Optional. A list of :class:'~sagemaker.debugger.CollectionConfig' objects to be saved.
actions
:
:class:'~sagemaker.debugger.Rule': The instance of the custom rule.
prepare_actions()
Prepare actions for Debugger Rule.
Rule$prepare_actions(training_job_name)
training_job_name
(str): The training job name. To be set as the default training job prefix for the StopTraining action if it is specified.
to_debugger_rule_config_list()
Generates a request dictionary using the parameters provided when initializing object.
Rule$to_debugger_rule_config_list()
dict: An portion of an API request as a dictionary.
clone()
The objects of this class are cloneable with this method.
Rule$clone(deep = FALSE)
deep
Whether to make a deep clone.
Debugger rule classes inheriting this RuleBase class are :class:'~sagemaker.debugger.Rule' and :class:'~sagemaker.debugger.ProfilerRule'. Do not directly use the rule base class to instantiate a SageMaker Debugger rule. Use the :class:'~sagemaker.debugger.Rule' classmethods for debugging and the :class:'~sagemaker.debugger.ProfilerRule' classmethods for profiling.
name
(str): The name of the rule.
image_uri
(str): The image URI to use the rule.
instance_type
(str): Type of EC2 instance to use. For example, 'ml.c4.xlarge'.
container_local_output_path
(str): The local path to store the Rule output.
s3_output_path
(str): The location in S3 to store the output.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data.
rule_parameters
(dict): A dictionary of parameters for the rule.
new()
Initialize RuleBase class
RuleBase$new( name = NULL, image_uri = NULL, instance_type = NULL, container_local_output_path = NULL, s3_output_path = NULL, volume_size_in_gb = NULL, rule_parameters = NULL )
name
(str): The name of the rule.
image_uri
(str): The image URI to use the rule.
instance_type
(str): Type of EC2 instance to use. For example, 'ml.c4.xlarge'.
container_local_output_path
(str): The local path to store the Rule output.
s3_output_path
(str): The location in S3 to store the output.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data.
rule_parameters
(dict): A dictionary of parameters for the rule.
format()
format class
RuleBase$format()
clone()
The objects of this class are cloneable with this method.
RuleBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
Accepts parameters that correspond to ScriptProcessors.
code
This can be an S3 URI or a local path to a file with the framework script to run
inputs
Input files for the processing job
outputs
Outputs for the processing job
arguments
A list of string arguments to be passed to a processing job
new()
An instance of this class is returned from the “get_run_args()“ method on processors, and is used for normalizing the arguments so that they can be passed to :class:'~sagemaker.workflow.steps.ProcessingStep'
RunArgs$new(code, inputs = NULL, outputs = NULL, arguments = NULL)
code
(str): This can be an S3 URI or a local path to a file with the framework script to run.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: None).
clone()
The objects of this class are cloneable with this method.
RunArgs$clone(deep = FALSE)
deep
Whether to make a deep clone.
Two options are provided: specifying a S3 prefix or by explicitly listing the files in manifest file and referencing the manifest file's S3 path.
sagemaker.core::ApiObject
-> S3Input
new()
Initialize S3Input.
S3Input$new( s3_uri = NULL, local_path = NULL, s3_data_type = "S3Prefix", s3_input_mode = "File", s3_data_distribution_type = "FullyReplicated", s3_compression_type = NULL )
s3_uri
(str, default=None): the path to a specific S3 object or a S3 prefix
local_path
(str, default=None): the path to a local directory. If not provided, skips data download by SageMaker platform.
s3_data_type
(str, default="S3Prefix"): Valid options are "ManifestFile" or "S3Prefix".
s3_input_mode
(str, default="File"): Valid options are "Pipe" or "File".
s3_data_distribution_type
(str, default="FullyReplicated"): Valid options are "FullyReplicated" or "ShardedByS3Key".
s3_compression_type
(str, default=None): Valid options are "None" or "Gzip"
clone()
The objects of this class are cloneable with this method.
S3Input$clone(deep = FALSE)
deep
Whether to make a deep clone.
Note: Strong consistency is not guaranteed if S3Prefix is provided here. S3 list operations are not strongly consistent. Use ManifestFile if strong consistency is required.
Handles SageMaker Processing task to compute bias metrics and explain a model.
sagemaker.common::Processor
-> SageMakerClarifyProcessor
job_name_prefix
Processing job name prefix
new()
Initializes a “Processor“ instance, computing bias metrics and model explanations.
SageMakerClarifyProcessor$new( role, instance_count, instance_type, volume_size_in_gb = 30, volume_kms_key = NULL, output_kms_key = NULL, max_runtime_in_seconds = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL, job_name_prefix = NULL, version = NULL )
role
(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.
instance_count
(int): The number of instances to run a processing job with.
instance_type
(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: None).
output_kms_key
(str): The KMS key ID for processing job outputs (default: None).
max_runtime_in_seconds
(int): Timeout in seconds (default: None). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
env
(dict[str, str]): Environment variables to be passed to the processing jobs (default: None).
tags
(list[dict]): List of tags to be passed to the processing job (default: None). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
network_config
(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
job_name_prefix
(str): Processing job name prefix.
version
(str): Clarify version want to be used.
run()
Overriding the base class method but deferring to specific run_* methods.
SageMakerClarifyProcessor$run()
run_pre_training_bias()
Runs a ProcessingJob to compute the requested bias 'methods' of the input data. Computes the requested methods that compare 'methods' (e.g. fraction of examples) for the sensitive group vs the other examples.
SageMakerClarifyProcessor$run_pre_training_bias( data_config, data_bias_config, methods = "all", wait = TRUE, logs = TRUE, job_name = NULL, kms_key = NULL, experiment_config = NULL )
data_config
(:class:'~sagemaker.clarify.DataConfig'): Config of the input/output data.
data_bias_config
(:class:'~sagemaker.clarify.BiasConfig'): Config of sensitive groups.
methods
(str or list[str]): Selector of a subset of potential metrics:
'CI' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ci.html
'DPL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dpl.html
'KL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-kl.html
'JS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-js.html
'LP' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-lp.html
'TVD' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-tvd.html
'KS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ks.html
'CDDL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cdd.html
Defaults to computing all.
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when “wait“ is True (default: True).
job_name
(str): Processing job name. If not specified, a name is composed of "Clarify-Pretraining-Bias" and current timestamp.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
run_post_training_bias()
Runs a ProcessingJob to compute the requested bias 'methods' of the model predictions. Spins up a model endpoint, runs inference over the input example in the 's3_data_input_path' to obtain predicted labels. Computes a the requested methods that compare 'methods' (e.g. accuracy, precision, recall) for the sensitive group vs the other examples.
SageMakerClarifyProcessor$run_post_training_bias( data_config, data_bias_config, model_config, model_predicted_label_config, methods = "all", wait = TRUE, logs = TRUE, job_name = NULL, kms_key = NULL, experiment_config = NULL )
data_config
(:class:'~sagemaker.clarify.DataConfig'): Config of the input/output data.
data_bias_config
(:class:'~sagemaker.clarify.BiasConfig'): Config of sensitive groups.
model_config
(:class:'~sagemaker.clarify.ModelConfig'): Config of the model and its endpoint to be created.
model_predicted_label_config
(:class:'~sagemaker.clarify.ModelPredictedLabelConfig'): Config of how to extract the predicted label from the model output.
methods
(str or list[str]): Selector of a subset of potential metrics:
'CI' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ci.html
'DPL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dpl.html
'KL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-kl.html
'JS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-js.html
'LP' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-lp.html
'TVD' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-tvd.html
'KS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ks.html
'CDDL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cdd.html
Defaults to computing all.
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when “wait“ is True (default: True).
job_name
(str): Processing job name. If not specified, a name is composed of "Clarify-Posttraining-Bias" and current timestamp.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
run_bias()
Runs a ProcessingJob to compute the requested bias 'methods' of the model predictions. Spins up a model endpoint, runs inference over the input example in the 's3_data_input_path' to obtain predicted labels. Computes a the requested methods that compare 'methods' (e.g. accuracy, precision, recall) for the sensitive group vs the other examples.
SageMakerClarifyProcessor$run_bias( data_config, bias_config, model_config, model_predicted_label_config = NULL, pre_training_methods = "all", post_training_methods = "all", wait = TRUE, logs = TRUE, job_name = NULL, kms_key = NULL, experiment_config = NULL )
data_config
(:class:'~sagemaker.clarify.DataConfig'): Config of the input/output data.
bias_config
(:class:'~sagemaker.clarify.BiasConfig'): Config of sensitive groups.
model_config
(:class:'~sagemaker.clarify.ModelConfig'): Config of the model and its endpoint to be created.
model_predicted_label_config
(:class:'~sagemaker.clarify.ModelPredictedLabelConfig'): Config of how to extract the predicted label from the model output.
pre_training_methods
(str or list[str]): Selector of a subset of potential metrics:
'CI' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ci.html
'DPL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dpl.html
'KL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-kl.html
'JS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-js.html
'LP' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-lp.html
'TVD' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-tvd.html
'KS' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ks.html
'CDDL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cdd.html
Defaults to computing all.
post_training_methods
(str or list[str]): Selector of a subset of potential metrics:
'DPPL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dppl.html
'DI' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-di.html
'DCA' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dca.html
'DCR' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dcr.html
'RD' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-rd.html
'DAR' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dar.html
'DRR' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-drr.html
'AD' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ad.html
'CDDPL' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cddpl.html
'TE' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-te.html
'FT' https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ft.html
Defaults to computing all.
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when “wait“ is True (default: True).
job_name
(str): Processing job name. If not specified, a name is composed of "Clarify-Bias" and current timestamp.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
run_explainability()
Runs a ProcessingJob computing for each example in the input the feature importance. Currently, only SHAP is supported as explainability method. Spins up a model endpoint. For each input example in the 's3_data_input_path' the SHAP algorithm determines feature importance, by creating 'num_samples' copies of the example with a subset of features replaced with values from the 'baseline'. Model inference is run to see how the prediction changes with the replaced features. If the model output returns multiple scores importance is computed for each of them. Across examples, feature importance is aggregated using 'agg_method'.
SageMakerClarifyProcessor$run_explainability( data_config, model_config, explainability_config, model_scores = NULL, wait = TRUE, logs = TRUE, job_name = NULL, kms_key = NULL, experiment_config = NULL )
data_config
(:class:'~sagemaker.clarify.DataConfig'): Config of the input/output data.
model_config
(:class:'~sagemaker.clarify.ModelConfig'): Config of the model and its endpoint to be created.
explainability_config
(:class:'~sagemaker.clarify.ExplainabilityConfig'): Config of the specific explainability method. Currently, only SHAP is supported.
model_scores
: Index or JSONPath location in the model output for the predicted scores to be explained. This is not required if the model output is a single score.
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when “wait“ is True (default: True).
job_name
(str): Processing job name. If not specified, a name is composed of "Clarify-Explainability" and current timestamp.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
clone()
The objects of this class are cloneable with this method.
SageMakerClarifyProcessor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Handles Amazon SageMaker processing tasks for jobs using a machine learning framework.
sagemaker.common::Processor
-> ScriptProcessor
new()
Initializes a “ScriptProcessor“ instance. The “ScriptProcessor“ handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, which allows for providing a script to be run as part of the Processing Job.
ScriptProcessor$new( role, image_uri, command, instance_count, instance_type, volume_size_in_gb = 30, volume_kms_key = NULL, output_kms_key = NULL, max_runtime_in_seconds = NULL, base_job_name = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL )
role
(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.
image_uri
(str): The URI of the Docker image to use for the processing jobs.
command
([str]): The command to run, along with any command-line flags. Example: ["python3", "-v"].
instance_count
(int): The number of instances to run a processing job with.
instance_type
(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: NULL).
output_kms_key
(str): The KMS key ID for processing job outputs (default: NULL).
max_runtime_in_seconds
(int): Timeout in seconds (default: NULL). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.
base_job_name
(str): Prefix for processing name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp.
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
env
(dict[str, str]): Environment variables to be passed to the processing jobs (default: NULL).
tags
(list[dict]): List of tags to be passed to the processing job (default: NULL). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
network_config
(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
get_run_args()
Returns a RunArgs object. For processors (:class:'~sagemaker.spark.processing.PySparkProcessor', :class:'~sagemaker.spark.processing.SparkJar') that have special run() arguments, this object contains the normalized arguments for passing to :class:'~sagemaker.workflow.steps.ProcessingStep'.
ScriptProcessor$get_run_args( code, inputs = NULL, outputs = NULL, arguments = NULL )
code
(str): This can be an S3 URI or a local path to a file with the framework script to run.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: None).
run()
Runs a processing job.
ScriptProcessor$run( code, inputs = NULL, outputs = NULL, arguments = NULL, wait = TRUE, logs = TRUE, job_name = NULL, experiment_config = NULL, kms_key = NULL )
code
(str): This can be an S3 URI or a local path to a file with the framework script to run.
inputs
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: NULL).
outputs
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: NULL).
arguments
(list[str]): A list of string arguments to be passed to a processing job (default: NULL).
wait
(bool): Whether the call should wait until the job completes (default: True).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name
(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).
clone()
The objects of this class are cloneable with this method.
ScriptProcessor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Processor:
ProcessingInput
,
ProcessingJob
,
ProcessingOutput
,
Processor
This object specifies configuration related to serverless endpoint. Use this configuration when trying to create serverless endpoint and make serverless inference
memory_size_in_mb
The memory size of your serverless endpoint.
max_concurrency
The maximum number of concurrent invocations your serverless endpoint can process
new()
Initialize a ServerlessInferenceConfig object for serverless inference configuration.
ServerlessInferenceConfig$new(memory_size_in_mb = 2048, max_concurrency = 5)
memory_size_in_mb
(int): Optional. The memory size of your serverless endpoint. Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose the default value for you. (Default: 2048)
max_concurrency
(int): Optional. The maximum number of concurrent invocations your serverless endpoint can process. If no value is provided, Amazon SageMaker will choose the default value for you. (Default: 5)
to_request_list()
Generates a request dictionary using the parameters provided to the class.
ServerlessInferenceConfig$to_request_list()
format()
Format class
ServerlessInferenceConfig$format()
clone()
The objects of this class are cloneable with this method.
ServerlessInferenceConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config class of SHAP.
sagemaker.common::ExplainabilityConfig
-> SHAPConfig
shap_config
Shap Config
new()
Initializes config for SHAP.
SHAPConfig$new( baseline, num_samples, agg_method = c("mean_abs", "median", "mean_sq"), use_logit = FALSE, save_local_shap_values = TRUE, seed = NULL, num_clusters = NULL, text_config = NULL, image_config = NULL )
baseline
(str or list): A list of rows (at least one) or S3 object URI to be used as the baseline dataset in the Kernel SHAP algorithm. The format should be the same as the dataset format. Each row should contain only the feature columns/values and omit the label column/values.
num_samples
(int): Number of samples to be used in the Kernel SHAP algorithm. This number determines the size of the generated synthetic dataset to compute the SHAP values.
agg_method
(str): Aggregation method for global SHAP values. Valid values are "mean_abs" (mean of absolute SHAP values for all instances), "median" (median of SHAP values for all instances) and "mean_sq" (mean of squared SHAP values for all instances).
use_logit
(bool): Indicator of whether the logit function is to be applied to the model predictions. Default is False. If "use_logit" is true then the SHAP values will have log-odds units.
save_local_shap_values
(bool): Indicator of whether to save the local SHAP values in the output location. Default is True.
seed
(int): seed value to get deterministic SHAP values. Default is NULL.
num_clusters
(NULL or int): If a baseline is not provided, Clarify automatically computes a baseline dataset via a clustering algorithm (K-means/K-prototypes). num_clusters is a parameter for this algorithm. num_clusters will be the resulting size of the baseline dataset. If not provided, Clarify job will use a default value.
text_config
(:class:'~sagemaker.clarify.TextConfig'): Config to handle text features. Default is NULL
image_config
(:class:'~sagemaker.clarify.ImageConfig'): Config to handle image features. Default is NULL
get_explainability_config()
Returns config.
SHAPConfig$get_explainability_config()
clone()
The objects of this class are cloneable with this method.
SHAPConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Configuration for framework metrics collected from a SageMaker Distributed training job.
sagemaker.common::MetricsConfigBase
-> SMDataParallelProfilingConfig
new()
Specify target steps or a target duration to profile.
By default, it profiles step 15 of training.
If profile_default_steps
is set to 'True' and none of the other
range parameters is specified,
the class uses the default configuration for SageMaker Distributed profiling.
SMDataParallelProfilingConfig$new( start_step = NULL, num_steps = NULL, start_unix_time = NULL, duration = NULL, profile_default_steps = FALSE )
start_step
(int): The step to start profiling. The default is step 15.
num_steps
(int): The number of steps to profile. The default is for 1 steps.
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
profile_default_steps
(bool): Indicates whether the default configuration should be used.
clone()
The objects of this class are cloneable with this method.
SMDataParallelProfilingConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
It returns the target steps in dictionary format that you can pass to the :class:'~sagemaker.debugger.FrameworkProfile' class.
new()
Set the start step and num steps. If the start step is not specified, Debugger starts profiling at step 0. If num steps is not specified, profile for 1 step.
StepRange$new(start_step, num_steps)
start_step
(int): The step to start profiling.
num_steps
(int): The number of steps to profile.
to_json()
Convert the step range into a dictionary.
StepRange$to_json()
list: The step range as a dictionary.
format()
format class
StepRange$format()
clone()
The objects of this class are cloneable with this method.
StepRange$clone(deep = FALSE)
deep
Whether to make a deep clone.
Create a tensor ouput configuration object for debugging visualizations on TensorBoard
s3_output_path
The location in Amazon S3 to store the output.
container_local_output_path
The local path in the container.
new()
Initialize the TensorBoardOutputConfig instance.
TensorBoardOutputConfig$new(s3_output_path, container_local_output_path = NULL)
s3_output_path
(str): Optional. The location in Amazon S3 to store the output.
container_local_output_path
(str): Optional. The local path in the container.
to_request_list()
Generate a request dictionary using the instances attributes.
TensorBoardOutputConfig$to_request_list()
dict: An portion of an API request as a dictionary.
format()
format class
TensorBoardOutputConfig$format()
clone()
The objects of this class are cloneable with this method.
TensorBoardOutputConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
The SHAP analysis will break down longer text into chunks (e.g. tokens, sentences, or paragraphs ) and replace them with the strings specified in the baseline for that feature. The shap value of a chunk then captures how much replacing it affects the prediction.
sagemaker.common::ExplainabilityConfig
-> TextConfig
text_config
Text Config
new()
Initializes a text configuration.
TextConfig$new(granularity, language)
granularity
(str): Determines the granularity in which text features are broken down to, can be "token", "sentence", or "paragraph". Shap values are computed for these units.
language
(str): Specifies the language of the text features, can be "chinese", "danish", "dutch", "english", "french", "german", "greek", "italian", "japanese", "lithuanian", "multi-language", "norwegian bokmal", "polish", "portuguese", "romanian", "russian", "spanish", "afrikaans", "albanian", "arabic", "armenian", "basque", "bengali", "bulgarian", "catalan", "croatian", "czech", "estonian", "finnish", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "irish", "kannada", "kyrgyz", "latvian", "ligurian", "luxembourgish", "macedonian", "malayalam", "marathi", "nepali", "persian", "sanskrit", "serbian", "setswana", "sinhala", "slovak", "slovenian", "swedish", "tagalog", "tamil", "tatar", "telugu", "thai", "turkish", "ukrainian", "urdu", "vietnamese", "yoruba". Use "multi-language" for a mix of mulitple languages.
get_text_config()
Returns part of an analysis config dictionary.
TextConfig$get_text_config()
clone()
The objects of this class are cloneable with this method.
TextConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
It returns the target time duration in dictionary format that you can pass to the :class:'~sagemaker.debugger.FrameworkProfile' class.
new()
Set the start Unix time and duration. If the start Unix time is not specified, profile starting at step 0. If the duration is not specified, profile for 1 step.
TimeRange$new(start_unix_time = NULL, duration = NULL)
start_unix_time
(int): The Unix time to start profiling.
duration
(float): The duration in seconds to profile.
to_json()
Convert the time range into a dictionary.
TimeRange$to_json()
dict: The time range as a dictionary.
format()
format class
TimeRange$format()
...
(ignored).
clone()
The objects of this class are cloneable with this method.
TimeRange$clone(deep = FALSE)
deep
Whether to make a deep clone.
Fetch training curve data from CloudWatch Metrics for a specific training job.
sagemaker.common::AnalyticsMetricsBase
-> TrainingJobAnalytics
CLOUDWATCH_NAMESPACE
CloudWatch namespace to return Training Job Analytics data
name
Name of the TrainingJob being analyzed
new()
Initialize a “TrainingJobAnalytics“ instance.
TrainingJobAnalytics$new( training_job_name, metric_names = NULL, sagemaker_session = NULL, start_time = NULL, end_time = NULL, period = NULL )
training_job_name
(str): name of the TrainingJob to analyze.
metric_names
(list, optional): string names of all the metrics to collect for this training job. If not specified, then it will use all metric names configured for this job.
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is specified using the default AWS configuration chain.
start_time
:
end_time
:
period
:
clear_cache()
Clear the object of all local caches of API methods, so that the next time any properties are accessed they will be refreshed from the service.
TrainingJobAnalytics$clear_cache()
clone()
The objects of this class are cloneable with this method.
TrainingJobAnalytics$clone(deep = FALSE)
deep
Whether to make a deep clone.
A class for handling creating and interacting with Amazon SageMaker transform jobs
new()
Initialize a “Transformer“.
Transformer$new( model_name, instance_count, instance_type, strategy = NULL, assemble_with = NULL, output_path = NULL, output_kms_key = NULL, accept = NULL, max_concurrent_transforms = NULL, max_payload = NULL, tags = NULL, env = NULL, base_transform_job_name = NULL, sagemaker_session = NULL, volume_kms_key = NULL )
model_name
(str): Name of the SageMaker model being used for the transform job.
instance_count
(int): Number of EC2 instances to use.
instance_type
(str): Type of EC2 instance to use, for example, 'ml.c4.xlarge'.
strategy
(str): The strategy used to decide how to batch records in a single request (default: None). Valid values: 'MultiRecord' and 'SingleRecord'.
assemble_with
(str): How the output is assembled (default: None). Valid values: 'Line' or 'None'.
output_path
(str): S3 location for saving the transform result. If not specified, results are stored to a default bucket.
output_kms_key
(str): Optional. KMS key ID for encrypting the transform output (default: None).
accept
(str): The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
max_concurrent_transforms
(int): The maximum number of HTTP requests to be made to each individual transform container at one time.
max_payload
(int): Maximum size of the payload in a single HTTP request to the container in MB.
tags
(list[dict]): List of tags for labeling a transform job (default: None). For more, see the SageMaker API documentation for 'Tag <https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html>'_.
env
(dict): Environment variables to be set for use during the transform job (default: None).
base_transform_job_name
(str): Prefix for the transform job when the :meth:'~sagemaker.transformer.Transformer.transform' method launches. If not specified, a default prefix will be generated based on the training image name that was used to train the model associated with the transform job.
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the estimator creates one using the default AWS configuration chain.
volume_kms_key
(str): Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
transform()
Start a new transform job.
Transformer$transform( data, data_type = "S3Prefix", content_type = NULL, compression_type = NULL, split_type = NULL, job_name = NULL, input_filter = NULL, output_filter = NULL, join_source = NULL, experiment_config = NULL, model_client_config = NULL, wait = TRUE, logs = TRUE, ... )
data
(str): Input data location in S3.
data_type
(str): What the S3 location defines (default: 'S3Prefix'). Valid values:
'S3Prefix' - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job.
'ManifestFile' - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job.
content_type
(str): MIME type of the input data (default: None).
compression_type
(str): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None.
split_type
(str): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'.
job_name
(str): job name (default: None). If not specified, one will be generated.
input_filter
(str): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value '$', representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the 'RFC format <https://tools.ietf.org/html/rfc4180>'_. See 'Supported JSONPath Operators <https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html#data-processing-operators>'_ for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.features" (default: None).
output_filter
(str): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.prediction" (default: None).
join_source
(str): The source of data to be joined to the transform output. It can be set to 'Input' meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.
experiment_config
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys, 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'. (default: “None“).
model_client_config
(dict[str, str]): Model configuration. Dictionary contains two optional keys, 'InvocationsTimeoutInSeconds', and 'InvocationsMaxRetries'. (default: “None“).
wait
(bool): Whether the call should wait until the job completes (default: TRUE).
logs
(bool): Whether to show the logs produced by the job. Only meaningful when wait is True (default: TRUE).
...
Other parameters (currently not used)
NULL invisible
delete_model()
Delete the corresponding SageMaker model for this Transformer.
Transformer$delete_model()
wait()
Wait for latest running batch transform job
Transformer$wait(logs = TRUE)
logs
return logs
stop_transform_job()
Stop latest running batch transform job.
Transformer$stop_transform_job(wait = TRUE)
wait
wait for transform job
attach()
Attach an existing transform job to a new Transformer instance
Transformer$attach(transform_job_name, sagemaker_session = NULL)
transform_job_name
(str): Name for the transform job to be attached.
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one will be created using the default AWS configuration chain.
Transformer (class): The Transformer instance with the specified transform job attached.
format()
format class
Transformer$format()
clone()
The objects of this class are cloneable with this method.
Transformer$clone(deep = FALSE)
deep
Whether to make a deep clone.