Title: | sagemaker pipeline and workflows |
---|---|
Description: | `sagemaker` pipeline and workflows. |
Authors: | Dyfan Jones [aut, cre], Amazon.com, Inc. [cph] |
Maintainer: | Dyfan Jones <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 0.1.2.9000 |
Built: | 2025-01-15 05:43:08 UTC |
Source: | https://github.com/DyfanJones/sagemaker-r-workflow |
'sagemaker' pipeline and workflows.
Maintainer: Dyfan Jones [email protected]
Other contributors:
Amazon.com, Inc. [copyright holder]
Configuration class to enable caching in pipeline workflow.
enable_caching
To enable step caching.
expire_after
If step caching is enabled, a timeout also needs to defined.
config
Configures caching in pipeline steps.
new()
Initialize Workflow CacheConfig If caching is enabled, the pipeline attempts to find a previous execution of a step that was called with the same arguments. Step caching only considers successful execution. If a successful previous execution is found, the pipeline propagates the values from previous execution rather than recomputing the step. When multiple successful executions exist within the timeout period, it uses the result for the most recent successful execution.
CacheConfig$new(enable_caching = FALSE, expire_after = NULL)
enable_caching
(bool): To enable step caching. Defaults to 'FALSE'.
expire_after
(str): If step caching is enabled, a timeout also needs to defined. It defines how old a previous execution can be to be considered for reuse. Value should be an ISO 8601 duration string. Defaults to 'NULL'.
format()
format class
CacheConfig$format()
clone()
The objects of this class are cloneable with this method.
CacheConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Output for a callback step.
output_name
The output name
output_type
The output type
new()
Initialize CallbackOutput class
CallbackOutput$new(output_name, output_type = CallbackOutputTypeEnum$String)
output_name
(str): The output name
output_type
(CallbackOutputTypeEnum): The output type
to_request()
Get the request structure for workflow service calls.
CallbackOutput$to_request()
expr()
The 'Get' expression dict for a 'CallbackOutput'.
CallbackOutput$expr(step_name)
step_name
(str): The name of the step the callback step associated with this output belongs to.
format()
format class
CallbackOutput$format()
clone()
The objects of this class are cloneable with this method.
CallbackOutput$clone(deep = FALSE)
deep
Whether to make a deep clone.
CallbackOutput type enum.
CallbackOutputTypeEnum
CallbackOutputTypeEnum
An object of class Enum
(inherits from environment
) of length 4.
Callback step for workflow.
Callback step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> CallbackStep
sqs_queue_url
An SQS queue URL for receiving callback messages.
inputs
Input arguments that will be provided in the SQS message body of callback messages
outputs
Outputs that can be provided when completing a callback.
cache_config
A list of step names this 'TransformStep'
arguments
The arguments dict that is used to define the callback step
properties
A Properties object representing the output parameters of the callback step.
new()
Constructs a CallbackStep.
CallbackStep$new( name, sqs_queue_url, inputs, outputs, display_name = NULL, description = NULL, cache_config = NULL, depends_on = NULL )
name
(str): The name of the callback step.
sqs_queue_url
(str): An SQS queue URL for receiving callback messages.
inputs
(dict): Input arguments that will be provided in the SQS message body of callback messages.
outputs
(List[CallbackOutput]): Outputs that can be provided when completing a callback.
display_name
(str): The display name of the callback step.
description
(str): The description of the callback step.
cache_config
(CacheConfig): A 'CacheConfig' instance.
depends_on
(List[str]): A list of step names this 'TransformStep' depends on
to_request()
Updates the dictionary with cache configuration.
CallbackStep$to_request()
clone()
The objects of this class are cloneable with this method.
CallbackStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Check job config for QualityCheckStep and ClarifyCheckStep
new()
Constructs a CheckJobConfig instance.
CheckJobConfig$new( role, instance_count = 1, instance_type = "ml.m5.xlarge", volume_size_in_gb = 30, volume_kms_key = NULL, output_kms_key = NULL, max_runtime_in_seconds = NULL, base_job_name = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL )
role
(str): An AWS IAM role. The Amazon SageMaker jobs use this role.
instance_count
(int): The number of instances to run the jobs with (default: 1).
instance_type
(str): Type of EC2 instance to use for the job (default: 'ml.m5.xlarge').
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: None).
output_kms_key
(str): The KMS key id for the job's outputs (default: None).
max_runtime_in_seconds
(int): Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600 if not specified
base_job_name
(str): Prefix for the job name. If not specified, a default name is generated based on the training image name and current timestamp (default: None).
sagemaker_session
(sagemaker.session.Session): Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed (default: None). If not specified, one is created using the default AWS configuration chain.
env
(dict): Environment variables to be passed to the job (default: None).
tags
([dict]): List of tags to be passed to the job (default: None).
network_config
(sagemaker.network.NetworkConfig): A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets (default: None).
.generate_model_monitor()
Generates a ModelMonitor object Generates a ModelMonitor object with required config attributes for QualityCheckStep and ClarifyCheckStep
CheckJobConfig$.generate_model_monitor(mm_type)
mm_type
(str): The subclass type of ModelMonitor object. A valid mm_type should be one of the following: "DefaultModelMonitor", "ModelQualityMonitor", "ModelBiasMonitor", "ModelExplainabilityMonitor"
sagemaker.model_monitor.ModelMonitor or None if the mm_type is not valid
format()
Format class
CheckJobConfig$format()
clone()
The objects of this class are cloneable with this method.
CheckJobConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Clarify Check Config
Clarify Check Config
data_config
Config of the input/output data.
kms_key
The ARN of the KMS key that is used to encrypt the user code file
monitoring_analysis_config_uri
The uri of monitoring analysis config.
new()
Initialize ClarifyCheckConfig class
ClarifyCheckConfig$new( data_config, kms_key = NULL, monitoring_analysis_config_uri = NULL )
data_config
(DataConfig): Config of the input/output data.
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file (default: None). This field CANNOT be any of PipelineNonPrimitiveInputTypes.
monitoring_analysis_config_uri
(str): The uri of monitoring analysis config. This field does not take input. It will be generated once uploading the created analysis config file.
format()
format class
ClarifyCheckConfig$format()
clone()
The objects of this class are cloneable with this method.
ClarifyCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
ClarifyCheckStep step for workflow.
ClarifyCheckStep step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> ClarifyCheckStep
arguments
The arguments dict that is used to define the ClarifyCheck step.
properties
A Properties object representing the output parameters of the ClarifyCheck step.
new()
Constructs a ClarifyCheckStep.
ClarifyCheckStep$new( name, clarify_check_config, check_job_config, skip_check = FALSE, register_new_baseline = FALSE, model_package_group_name = NULL, supplied_baseline_constraints = NULL, display_name = NULL, description = NULL, cache_config = NULL, depends_on = NULL )
name
(str): The name of the ClarifyCheckStep step.
clarify_check_config
(ClarifyCheckConfig): A ClarifyCheckConfig instance.
check_job_config
(CheckJobConfig): A CheckJobConfig instance.
skip_check
(bool or PipelineNonPrimitiveInputTypes): Whether the check should be skipped (default: False).
register_new_baseline
(bool or PipelineNonPrimitiveInputTypes): Whether the new baseline should be registered (default: False).
model_package_group_name
(str or PipelineNonPrimitiveInputTypes): The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
supplied_baseline_constraints
(str or PipelineNonPrimitiveInputTypes): The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
display_name
(str): The display name of the ClarifyCheckStep step (default: None).
description
(str): The description of the ClarifyCheckStep step (default: None).
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance (default: None).
depends_on
(List[str] or List[Step]): A list of step names or step instances this 'sagemaker.workflow.steps.ClarifyCheckStep' depends on (default: None).
to_request()
Updates the dictionary with cache configuration etc.
ClarifyCheckStep$to_request()
clone()
The objects of this class are cloneable with this method.
ClarifyCheckStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Compilation step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> CompilationStep
arguments
The arguments dict that is used to call 'create_compilation_job'. NOTE: The CreateTrainingJob request is not quite the args list that workflow needs. The TrainingJobName and ExperimentConfig attributes cannot be included.
properties
A Properties object representing the DescribeTrainingJobResponse data model.
new()
Construct a CompilationStep. Given an 'EstimatorBase' and a 'sagemaker.model.Model' instance construct a CompilationStep. In addition to the estimator and Model instances, the other arguments are those that are supplied to the 'compile_model' method of the 'sagemaker.model.Model.compile_model'.
CompilationStep$new( name, estimator, model, inputs = NULL, job_arguments = NULL, depends_on = NULL, retry_policies = NULL, display_name = NULL, description = NULL, cache_config = NULL )
name
(str): The name of the compilation step.
estimator
(EstimatorBase): A 'sagemaker.estimator.EstimatorBase' instance.
model
(Model): A 'sagemaker.model.Model' instance.
inputs
(CompilationInput): A 'sagemaker.inputs.CompilationInput' instance. Defaults to 'None'.
job_arguments
(List[str]): A list of strings to be passed into the processing job. Defaults to 'None'.
depends_on
(List[str] or List[Step]): A list of step names or step instances this 'sagemaker.workflow.steps.CompilationStep' depends on
retry_policies
(List[RetryPolicy]): A list of retry policy
display_name
(str): The display name of the compilation step.
description
(str): The description of the compilation step.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
to_request()
Updates the dictionary with cache configuration.
CompilationStep$to_request()
clone()
The objects of this class are cloneable with this method.
CompilationStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Abstract Condition entity.
sagemaker.workflow::Entity
-> Condition
condition_type
The type of condition.
new()
Initialize Condition class
Condition$new(condition_type = enum_items(ConditionTypeEnum))
condition_type
(ConditionTypeEnum): The type of condition.
clone()
The objects of this class are cloneable with this method.
Condition$clone(deep = FALSE)
deep
Whether to make a deep clone.
Generic comparison condition that can be used to derive specific condition comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> ConditionComparison
left
The execution variable, parameter, or property to use in the comparison.
right
The execution variable, parameter, property, or Python primitive value to compare to.
new()
Initialize ConditionComparison Class
ConditionComparison$new( condition_type = enum_items(ConditionTypeEnum), left, right )
condition_type
(ConditionTypeEnum): The type of condition.
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
to_request()
Get the request structure for workflow service calls.
ConditionComparison$to_request()
clone()
The objects of this class are cloneable with this method.
ConditionComparison$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for equality comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> sagemaker.workflow::ConditionComparison
-> ConditionEquals
new()
Construct A condition for equality comparisons.
ConditionEquals$new(left, right)
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
clone()
The objects of this class are cloneable with this method.
ConditionEquals$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for greater than comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> sagemaker.workflow::ConditionComparison
-> ConditionGreaterThan
new()
Construct an instance of ConditionGreaterThan for greater than comparisons.
ConditionGreaterThan$new(left, right)
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
clone()
The objects of this class are cloneable with this method.
ConditionGreaterThan$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for greater than or equal to comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> sagemaker.workflow::ConditionComparison
-> ConditionGreaterThanOrEqualTo
new()
Construct of ConditionGreaterThanOrEqualTo for greater than or equal to comparisons.
ConditionGreaterThanOrEqualTo$new(left, right)
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
clone()
The objects of this class are cloneable with this method.
ConditionGreaterThanOrEqualTo$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition to check membership.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> ConditionIn
new()
Construct a 'ConditionIn' condition to check membership.
ConditionIn$new(value, in_values)
value
(ConditionValueType): The execution variable, parameter, or property to use for the in comparison.
in_values
(List[Union[ConditionValueType, PrimitiveType]]): The list of values to check for membership in.
to_request()
Get the request structure for workflow service calls.
ConditionIn$to_request()
clone()
The objects of this class are cloneable with this method.
ConditionIn$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for less than or equal to comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> sagemaker.workflow::ConditionComparison
-> ConditionLessThan
new()
Construct ConditionLessThanOrEqualTo for less than or equal to comparisons.
ConditionLessThan$new(left, right)
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
clone()
The objects of this class are cloneable with this method.
ConditionLessThan$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for less than or equal to comparisons.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> sagemaker.workflow::ConditionComparison
-> ConditionLessThanOrEqualTo
new()
Construct ConditionLessThanOrEqualTo for less than or equal to comparisons.
ConditionLessThanOrEqualTo$new(left, right)
left
(ConditionValueType): The execution variable, parameter, or property to use in the comparison.
right
(Union[ConditionValueType, PrimitiveType]): The execution variable, parameter, property, or Python primitive value to compare to.
clone()
The objects of this class are cloneable with this method.
ConditionLessThanOrEqualTo$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for negating another 'Condition'.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> ConditionNot
new()
Construct a 'ConditionNot' condition for negating another 'Condition'.
ConditionNot$new(expression)
expression
(Condition): A 'Condition' to take the negation of.
to_request()
Get the request structure for workflow service calls.
ConditionNot$to_request()
clone()
The objects of this class are cloneable with this method.
ConditionNot$clone(deep = FALSE)
deep
Whether to make a deep clone.
A condition for taking the logical OR of a list of 'Condition' instances.
sagemaker.workflow::Entity
-> sagemaker.workflow::Condition
-> ConditionOr
new()
Construct a 'ConditionOr' condition.
ConditionOr$new(conditions = NULL)
conditions
(List[Condition]): A list of 'Condition' instances to logically OR.
to_request()
Get the request structure for workflow service calls.
ConditionOr$to_request()
clone()
The objects of this class are cloneable with this method.
ConditionOr$clone(deep = FALSE)
deep
Whether to make a deep clone.
Conditional step for pipelines to support conditional branching in the execution of steps.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> ConditionStep
conditions
The name of the step.
if_steps
A list of 'sagemaker.workflow.steps.Step' and 'sagemaker.workflow.step_collections.StepCollection' instances
else_steps
A list of 'sagemaker.workflow.steps.Step' and 'sagemaker.workflow.step_collections.StepCollection' instances
arguments
The arguments dict that is used to define the conditional branching in the pipeline.
properties
A simple Properties object with 'Outcome' as the only property
new()
Construct a ConditionStep for pipelines to support conditional branching. If all of the conditions in the condition list evaluate to True, the 'if_steps' are marked as ready for execution. Otherwise, the 'else_steps' are marked as ready for execution.
ConditionStep$new( name, depends_on = NULL, display_name = NULL, description = NULL, conditions = NULL, if_steps = NULL, else_steps = NULL )
name
(str): The name of the step.
depends_on
(List[str]): The list of step names the current step depends on
display_name
(str): The display name of the condition step.
description
(str): The description of the condition step.
conditions
(List[Condition]): A list of 'sagemaker.workflow.conditions.Condition' instances.
if_steps
(List[Union[Step, StepCollection]]): A list of 'sagemaker.workflow.steps.Step' and 'sagemaker.workflow.step_collections.StepCollection' instances that are marked as ready for execution if the list of conditions evaluates to True.
else_steps
(List[Union[Step, StepCollection]]): A list of 'sagemaker.workflow.steps.Step' and 'sagemaker.workflow.step_collections.StepCollection' instances that are marked as ready for execution if the list of conditions evaluates to False.
clone()
The objects of this class are cloneable with this method.
ConditionStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Condition type enum.
ConditionTypeEnum
ConditionTypeEnum
An object of class Enum
(inherits from environment
) of length 8.
ConfigurableRetryStep step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> ConfigurableRetryStep
new()
Initialize ConfigurableRetryStep class
ConfigurableRetryStep$new( name, step_type = enum_items(StepTypeEnum), display_name = NULL, description = NULL, depends_on = NULL, retry_policies = NULL )
name
(str): The name of the step.
step_type
(StepTypeEnum): The type of the step.
display_name
(str): The display name of the step.
description
(str): The description of the step.
depends_on
(List[str] or List[Step]): The list of step names or step instances the current step depends on
retry_policies
(List[RetryPolicy]): The custom retry policy configuration
add_retry_policy()
Add a retry policy to the current step retry policies list.
ConfigurableRetryStep$add_retry_policy(retry_policy)
retry_policy
: Placeholder
to_request()
Gets the request structure for ConfigurableRetryStep
ConfigurableRetryStep$to_request()
clone()
The objects of this class are cloneable with this method.
ConfigurableRetryStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
CreateModel step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> CreateModelStep
arguments
The arguments dict that is used to call 'create_model'. NOTE: The CreateModelRequest is not quite the args list that workflow needs. ModelName cannot be included in the arguments.
properties
A Properties object representing the DescribeModelResponse data model.
new()
Construct a CreateModelStep, given an 'sagemaker.model.Model' instance. In addition to the Model instance, the other arguments are those that are supplied to the '_create_sagemaker_model' method of the 'sagemaker.model.Model._create_sagemaker_model'.
CreateModelStep$new( name, model, inputs, depends_on = NULL, retry_policies = NULL, display_name = NULL, description = NULL )
name
(str): The name of the CreateModel step.
model
(Model): A 'sagemaker.model.Model' instance.
inputs
(CreateModelInput): A 'sagemaker.inputs.CreateModelInput' instance. Defaults to 'None'.
depends_on
(List[str]): A list of step names this 'sagemaker.workflow.steps.CreateModelStep' depends on
retry_policies
(List[RetryPolicy]): A list of retry policy
display_name
(str): The display name of the CreateModel step.
description
(str): The description of the CreateModel step.
clone()
The objects of this class are cloneable with this method.
CreateModelStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Data Bias Check Config
Data Bias Check Config
sagemaker.workflow::ClarifyCheckConfig
-> DataBiasCheckConfig
data_bias_config
Config of sensitive groups
methods
Selector of a subset of potential metrics
new()
Initialize DataBiasCheckConfig class
DataBiasCheckConfig$new(data_bias_config, methods = "all", ...)
data_bias_config
(BiasConfig): Config of sensitive groups.
methods
(str or list[str]): Selector of a subset of potential metrics:
"CI" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-bias-metric-class-imbalance.html,
"DPL" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-true-label-imbalance.html,
"KL" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-kl-divergence.html,
"LP" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-lp-norm.html,
"KS" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-kolmogorov-smirnov.html,
"CDDL" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-cddl.html
Defaults to computing all. This field CANNOT be any of PipelineNonPrimitiveInputTypes.
...
: Parameters from ClarifyCheckConfig
clone()
The objects of this class are cloneable with this method.
DataBiasCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Data Quality Check Config.
sagemaker.workflow::QualityCheckConfig
-> DataQualityCheckConfig
record_preprocessor_script
(str): The path to the record preprocessor script (default: None). This can be a local path or an S3 uri string but CANNOT be any of PipelineNonPrimitiveInputTypes.
clone()
The objects of this class are cloneable with this method.
DataQualityCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Handles Amazon SageMaker DataWrangler tasks
new()
Initializes a “Processor“ instance. The “Processor“ handles Amazon SageMaker Processing tasks.
DataWranglerProcessor$new( role, data_wrangler_flow_source, instance_count, instance_type, volume_size_in_gb = 30L, volume_kms_key = NULL, output_kms_key = NULL, max_runtime_in_seconds = NULL, base_job_name = NULL, sagemaker_session = NULL, env = NULL, tags = NULL, network_config = NULL )
role
(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.
data_wrangler_flow_source
(str): The source of the DaraWrangler flow which will be used for the DataWrangler job. If a local path is provided, it will automatically be uploaded to S3 under: "s3://<default-bucket-name>/<job-name>/input/<input-name>".
instance_count
(int): The number of instances to run a processing job with.
instance_type
(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.
volume_size_in_gb
(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key
(str): A KMS key for the processing volume (default: None).
output_kms_key
(str): The KMS key ID for processing job outputs (default: None).
max_runtime_in_seconds
(int): Timeout in seconds (default: None). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.
base_job_name
(str): Prefix for processing job name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp.
sagemaker_session
(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.
env
(dict[str, str]): Environment variables to be passed to the processing jobs (default: None).
tags
(list[dict]): List of tags to be passed to the processing job (default: None). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
network_config
(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
clone()
The objects of this class are cloneable with this method.
DataWranglerProcessor$clone(deep = FALSE)
deep
Whether to make a deep clone.
Export Airflow deploy config from a SageMaker model
deploy_config( model, initial_instance_count, instance_type, endpoint_name = NULL, tags = NULL )
deploy_config( model, initial_instance_count, instance_type, endpoint_name = NULL, tags = NULL )
model |
(sagemaker.model.Model): The SageMaker model to export the Airflow config from. |
initial_instance_count |
(int): The initial number of instances to run in the “Endpoint“ created from this “Model“. |
instance_type |
(str): The EC2 instance type to deploy this Model to. For example, 'ml.p2.xlarge'. |
endpoint_name |
(str): The name of the endpoint to create (default: None). If not specified, a unique endpoint name will be created. |
tags |
(list[dict]): List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html. |
dict: Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
Export Airflow deploy config from a SageMaker estimator
deploy_config_from_estimator( estimator, task_id, task_type, initial_instance_count, instance_type, model_name = NULL, endpoint_name = NULL, tags = NULL, ... )
deploy_config_from_estimator( estimator, task_id, task_type, initial_instance_count, instance_type, model_name = NULL, endpoint_name = NULL, tags = NULL, ... )
estimator |
(sagemaker.model.EstimatorBase): The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job. |
task_id |
(str): The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The endpoint config is built based on the training job generated in this operator. |
task_type |
(str): Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be 'training', 'tuning' or None (which means training job is not from any task). |
initial_instance_count |
(int): Minimum number of EC2 instances to deploy to an endpoint for prediction. |
instance_type |
(str): Type of EC2 instance to deploy to an endpoint for prediction, for example, 'ml.c4.xlarge'. |
model_name |
(str): Name to use for creating an Amazon SageMaker model. If not specified, one will be generated. |
endpoint_name |
(str): Name to use for creating an Amazon SageMaker endpoint. If not specified, the name of the SageMaker model is used. |
tags |
(list[dict]): List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html. |
... |
: Passed to invocation of “create_model()“. Implementations may customize “create_model()“ to accept “**kwargs“ to customize model creation during deploy. For more, see the implementation docs. |
dict: Deploy config that can be directly used by SageMakerEndpointOperator in Airflow.
EMR step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> EMRStep
arguments
The arguments dict that is used to call 'AddJobFlowSteps'. NOTE: The AddFlowJobSteps request is not quite the args list that workflow needs. The Name attribute in AddJobFlowSteps cannot be passed; it will be set during runtime. In addition to that, we will also need to include emr job inputs and output config.
properties
A Properties object representing the EMR DescribeStepResponse model
new()
Constructs a EMRStep.
EMRStep$new( name, display_name, description, cluster_id, step_config, depends_on = NULL, cache_config = NULL )
name
(str): The name of the EMR step.
display_name
(str): The display name of the EMR step.
description
(str): The description of the EMR step.
cluster_id
(str): The ID of the running EMR cluster.
step_config
(EMRStepConfig): One StepConfig to be executed by the job flow.
depends_on
(List[str]): A list of step names this 'sagemaker.workflow.steps.EMRStep' depends on
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
to_request()
Updates the dictionary with cache configuration.
EMRStep$to_request()
clone()
The objects of this class are cloneable with this method.
EMRStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Config for a Hadoop Jar step
jar
A path to a JAR file run during the step.
args
A list of command line arguments
main_class
The name of the main class in the specified Java file.
properties
A list of key-value pairs that are set when the step runs.
new()
Create a definition for input data used by an EMR cluster(job flow) step. See AWS documentation on the “StepConfig“ API for more details on the parameters.
EMRStepConfig$new(jar, args = NULL, main_class = NULL, properties = NULL)
jar
(str): A path to a JAR file run during the step.
args
(List[str]): A list of command line arguments passed to the JAR file's main function when executed.
main_class
(str): The name of the main class in the specified Java file.
properties
(List(dict)): A list of key-value pairs that are set when the step runs.
to_request()
Convert EMRStepConfig object to request list.
EMRStepConfig$to_request()
format()
format class
EMRStepConfig$format()
clone()
The objects of this class are cloneable with this method.
EMRStepConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Creates a Transformer step collection for workflow.
sagemaker.workflow::StepCollection
-> EstimatorTransformer
new()
Construct steps required for a Transformer step collection: An estimator-centric step collection. It models what happens in workflows when invoking the 'transform()' method on an estimator instance: First, if custom model artifacts are required, a '_RepackModelStep' is included. Second, a 'CreateModelStep' with the model data passed in from a training step or other training job output. Finally, a 'TransformerStep'. If repacking the model artifacts is not necessary, only the CreateModelStep and TransformerStep are in the step collection.
EstimatorTransformer$new( name, estimator, model_data, model_inputs, instance_count, instance_type, transform_inputs, description = NULL, display_name = NULL, image_uri = NULL, predictor_cls = NULL, env = NULL, strategy = NULL, assemble_with = NULL, output_path = NULL, output_kms_key = NULL, accept = NULL, max_concurrent_transforms = NULL, max_payload = NULL, tags = NULL, volume_kms_key = NULL, depends_on = NULL, repack_model_step_retry_policies = NULL, model_step_retry_policies = NULL, transform_step_retry_policies = NULL, ... )
name
(str): The name of the Transform Step.
estimator
: The estimator instance.
model_data
(str): The S3 location of a SageMaker model data “.tar.gz“ file (default: None).
model_inputs
(CreateModelInput): A 'sagemaker.inputs.CreateModelInput' instance. Defaults to 'None'.
instance_count
(int): The number of EC2 instances to use.
instance_type
(str): The type of EC2 instance to use.
transform_inputs
(TransformInput): A 'sagemaker.inputs.TransformInput' instance.
description
(str): The description of the step.
display_name
(str): The display name of the step.
image_uri
(str): A Docker image URI.
predictor_cls
(callable[string, :Session]): A function to call to create a predictor (default: None). If not None, “deploy“ will return the result of invoking this function on the created endpoint name.
env
(dict): The Environment variables to be set for use during the transform job (default: None).
strategy
(str): The strategy used to decide how to batch records in a single request (default: None). Valid values: 'MultiRecord' and 'SingleRecord'.
assemble_with
(str): How the output is assembled (default: None). Valid values: 'Line' or 'None'.
output_path
(str): The S3 location for saving the transform result. If not specified, results are stored to a default bucket.
output_kms_key
(str): Optional. A KMS key ID for encrypting the transform output (default: None).
accept
(str): The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
max_concurrent_transforms
(int): The maximum number of HTTP requests to be made to each individual transform container at one time.
max_payload
(int): Maximum size of the payload in a single HTTP
tags
(list[dict]): List of tags for labeling a training job. For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.
volume_kms_key
(str): Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).
depends_on
(List[str] or List[Step]): The list of step names or step instances the first step in the collection depends on
repack_model_step_retry_policies
(List[RetryPolicy]): The list of retry policies for the repack model step
model_step_retry_policies
(List[RetryPolicy]): The list of retry policies for model step
transform_step_retry_policies
(List[RetryPolicy]): The list of retry policies for transform step
...
: pass onto model class.
clone()
The objects of this class are cloneable with this method.
EstimatorTransformer$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline execution variables for workflow.
sagemaker.workflow::Expression
-> ExecutionVariable
name
The name of the execution variable.
expr
The 'Get' expression dict for an 'ExecutionVariable'.
new()
Create a pipeline execution variable.
ExecutionVariable$new(name)
name
(str): The name of the execution variable.
clone()
The objects of this class are cloneable with this method.
ExecutionVariable$clone(deep = FALSE)
deep
Whether to make a deep clone.
Considerations to move these as module-level constants should be made.
ExecutionVariables
ExecutionVariables
An object of class Enum
(inherits from environment
) of length 6.
FailStep' for SageMaker Pipelines Workflows.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> FailStep
error_message
An error message defined by the user.
arguments
The arguments dictionary that is used to define the 'FailStep'.
properties
A 'Properties' object is not available for the 'FailStep'. Executing a 'FailStep' will terminate the pipeline. 'FailStep' properties should not be referenced.
new()
Constructs a 'FailStep'.
FailStep$new( name, error_message = NULL, display_name = NULL, description = NULL, depends_on = NULL )
name
(str): The name of the 'FailStep'. A name is required and must be unique within a pipeline.
error_message
(str or PipelineNonPrimitiveInputTypes): An error message defined by the user. Once the 'FailStep' is reached, the execution fails and the error message is set as the failure reason (default: None).
display_name
(str): The display name of the 'FailStep'. The display name provides better UI readability. (default: None).
description
(str): The description of the 'FailStep' (default: None).
depends_on
(List[str] or List[Step]): A list of 'Step' names or 'Step' instances that this 'FailStep' depends on. If a listed 'Step' name does not exist, an error is returned (default: None).
clone()
The objects of this class are cloneable with this method.
FailStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
This list of dicts adheres to the request schema of: '"Name": "MyParameterName", "Value": "MyValue"'
format_start_parameters(parameters)
format_start_parameters(parameters)
parameters |
(Dict[str, Any]): A dict of named values where the keys are the names of the parameters to pass values into. |
Generate the data ingestion only flow from athena input
generate_data_ingestion_flow_from_athena_dataset_definition( input_name, athena_dataset_definition, operator_version = "0.1", schema = NULL )
generate_data_ingestion_flow_from_athena_dataset_definition( input_name, athena_dataset_definition, operator_version = "0.1", schema = NULL )
input_name |
(str): the name of the input to flow source node |
athena_dataset_definition |
(AthenaDatasetDefinition): athena input to flow source node |
operator_version |
(str): the version of the operator |
schema |
(list): the schema for the data to be ingested |
dict (typing.Dict): A flow only conduct data ingestion with 1-1 mapping output_name (str): The output name used to configure 'sagemaker.processing.FeatureStoreOutput'
Generate the data ingestion only flow from redshift input
generate_data_ingestion_flow_from_redshift_dataset_definition( input_name, redshift_dataset_definition, operator_version = "0.1", schema = NULL )
generate_data_ingestion_flow_from_redshift_dataset_definition( input_name, redshift_dataset_definition, operator_version = "0.1", schema = NULL )
input_name |
(str): the name of the input to flow source node |
redshift_dataset_definition |
(RedshiftDatasetDefinition): redshift input to flow source node |
operator_version |
(str): the version of the operator |
schema |
(list): the schema for the data to be ingested |
list: A flow only conduct data ingestion with 1-1 mapping output_name (str): The output name used to configure 'sagemaker.processing.FeatureStoreOutput'
Generate the data ingestion only flow from s3 input
generate_data_ingestion_flow_from_s3_input( input_name, s3_uri, s3_content_type = "csv", s3_has_header = FALSE, operator_version = "0.1", schema = NULL )
generate_data_ingestion_flow_from_s3_input( input_name, s3_uri, s3_content_type = "csv", s3_has_header = FALSE, operator_version = "0.1", schema = NULL )
input_name |
(str): the name of the input to flow source node |
s3_uri |
(str): uri for the s3 input to flow source node |
s3_content_type |
(str): s3 input content type |
s3_has_header |
(bool): flag indicating the input has header or not |
operator_version |
(str): the version of the operator |
schema |
(list): the schema for the data to be ingested |
list: A flow only conduct data ingestion with 1-1 mapping output_name (str): The output name used to configure 'sagemaker.processing.FeatureStoreOutput'
Get the MD5 hash of a file.
hash_file(path)
hash_file(path)
path |
(str): The local path for the file. |
str: The MD5 hash of the file.
Converts a list of ProcessingInput or ProcessingOutput objects to a list of dicts
input_output_list_converter(object_list)
input_output_list_converter(object_list)
object_list |
(list[ProcessingInput or ProcessingOutput] |
List of dicts
Replaces Parameter values in a list of nested Dict[str, Any] with their workflow expression.
interpolate( request_obj, callback_output_to_step_map, lambda_output_to_step_map )
interpolate( request_obj, callback_output_to_step_map, lambda_output_to_step_map )
request_obj |
(RequestType): The request dict. |
callback_output_to_step_map |
(list[str, str]): A dict of output name -> step name. |
lambda_output_to_step_map |
(list[str, str]): Placeholder |
RequestType: The request dict with Parameter values replaced by their expression.
Join together properties.
sagemaker.workflow::Expression
-> Join
on
The primitive types and parameters to join.
values
The string to join the values on (Defaults to "").
expr
The expression dict for a 'Join' function.
new()
Initialize Join Class
Join$new(on = "", values = "")
on
(str): The string to join the values on (Defaults to "").
values
(List[Union[PrimitiveType, Parameter]]): The primitive types and parameters to join.
clone()
The objects of this class are cloneable with this method.
Join$clone(deep = FALSE)
deep
Whether to make a deep clone.
Get JSON properties from PropertyFiles.
sagemaker.workflow::Expression
-> JsonGet
step_name
The step from which to get the property file.
property_file
Either a PropertyFile instance or the name of a property file.
json_path
The JSON path expression to the requested value.
expr
The expression dict for a 'JsonGet' function.
new()
Initialize JsonGet class
JsonGet$new(step_name, property_file, json_path)
step_name
(Step): The step from which to get the property file.
property_file
(Union[PropertyFile, str]): Either a PropertyFile instance or the name of a property file.
json_path
(str): The JSON path expression to the requested value.
clone()
The objects of this class are cloneable with this method.
JsonGet$clone(deep = FALSE)
deep
Whether to make a deep clone.
Output for a lambdaback step.
output_name
(str): The output name
output_type
(LambdaOutputTypeEnum): The output type
new()
Initialize LambdaOutput class
LambdaOutput$new(output_name, output_type = enum_items(LambdaOutputTypeEnum))
output_name
(str): The output name
output_type
(LambdaOutputTypeEnum): The output type
to_request()
Get the request structure for workflow service calls.
LambdaOutput$to_request()
expr()
The 'Get' expression dict for a 'LambdaOutput'.
LambdaOutput$expr(step_name)
step_name
(str): The name of the step the lambda step associated
format()
format class
LambdaOutput$format()
clone()
The objects of this class are cloneable with this method.
LambdaOutput$clone(deep = FALSE)
deep
Whether to make a deep clone.
LambdaOutput type enum.
LambdaOutputTypeEnum
LambdaOutputTypeEnum
An object of class Enum
(inherits from environment
) of length 4.
Lambda step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> LambdaStep
arguments
The arguments dict that is used to define the lambda step.
properties
A Properties object representing the output parameters of the lambda step.
new()
Constructs a LambdaStep.
LambdaStep$new( name, lambda_func, display_name = NULL, description = NULL, inputs = NULL, outputs = NULL, cache_config = NULL, depends_on = NULL )
name
(str): The name of the lambda step.
lambda_func
(str): An instance of sagemaker.lambda_helper.Lambda. If lambda arn is specified in the instance, LambdaStep just invokes the function, else lambda function will be created while creating the pipeline.
display_name
(str): The display name of the Lambda step.
description
(str): The description of the Lambda step.
inputs
(dict): Input arguments that will be provided to the lambda function.
outputs
(List[LambdaOutput]): List of outputs from the lambda function.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
depends_on
(List[str]): A list of step names this 'sagemaker.workflow.steps.LambdaStep' depends on
to_request()
Updates the dictionary with cache configuration.
LambdaStep$to_request()
clone()
The objects of this class are cloneable with this method.
LambdaStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Get the request structure for list of entities.
list_to_request(entities)
list_to_request(entities)
entities |
(Sequence[Entity]): A list of entities. |
list: A request structure for a workflow service call.
Export Airflow model config from a SageMaker model
model_config(model, instance_type = NULL, role = NULL, image_uri = NULL)
model_config(model, instance_type = NULL, role = NULL, image_uri = NULL)
model |
(sagemaker.model.Model): The Model object from which to export the Airflow config |
instance_type |
(str): The EC2 instance type to deploy this Model to. For example, 'ml.p2.xlarge' |
role |
(str): The “ExecutionRoleArn“ IAM Role ARN for the model |
image_uri |
(str): An Docker image URI to use for deploying the model |
dict: Model config that can be directly used by SageMakerModelOperator in Airflow. It can also be part of the config used by SageMakerEndpointOperator and SageMakerTransformOperator in Airflow.
Export Airflow model config from a SageMaker estimator
model_config_from_estimator( estimator, task_id, task_type, instance_type = NULL, role = NULL, image_uri = NULL, name = NULL, model_server_workers = NULL, vpc_config_override = "VPC_CONFIG_DEFAULT" )
model_config_from_estimator( estimator, task_id, task_type, instance_type = NULL, role = NULL, image_uri = NULL, name = NULL, model_server_workers = NULL, vpc_config_override = "VPC_CONFIG_DEFAULT" )
estimator |
(sagemaker.model.EstimatorBase): The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job. |
task_id |
(str): The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The model config is built based on the training job generated in this operator. |
task_type |
(str): Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be 'training', 'tuning' or None (which means training job is not from any task). |
instance_type |
(str): The EC2 instance type to deploy this Model to. For example, 'ml.p2.xlarge' |
role |
(str): The “ExecutionRoleArn“ IAM Role ARN for the model |
image_uri |
(str): A Docker image URI to use for deploying the model |
name |
(str): Name of the model |
model_server_workers |
(int): The number of worker processes used by the inference server. If None, server will use one worker per vCPU. Only effective when estimator is a SageMaker framework. |
vpc_config_override |
(dict[str, list[str]]): Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * 'Subnets' (list[str]): List of subnet ids. * 'SecurityGroupIds' (list[str]): List of security group ids. |
dict: Model config that can be directly used by SageMakerModelOperator in Airflow. It can also be part of the config used by SageMakerEndpointOperator in Airflow.
Model Bias Check Config
Model Bias Check Config
sagemaker.workflow::ClarifyCheckConfig
-> ModelBiasCheckConfig
data_bias_config
Config of sensitive groups
model_config
Config of the model and its endpoint to be created
model_predicted_label_config
Config of how to extract the predicted label from the model output
methods
Selector of a subset of potential metrics
new()
Initialize DataBiasCheckConfig class
ModelBiasCheckConfig$new( data_bias_config, model_config, model_predicted_label_config, methods = "all", ... )
data_bias_config
(BiasConfig): Config of sensitive groups.
model_config
(ModelConfig): Config of the model and its endpoint to be created.
model_predicted_label_config
(ModelPredictedLabelConfig): Config of how to extract the predicted label from the model output.
methods
(str or list[str]): Selector of a subset of potential metrics:
"DPPL"https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dppl.html,
"DI" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-di.html,
"DCA" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dca.html,
"DCR" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dcr.html,
"RD" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-rd.html,
"DAR" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-dar.html,
"DRR" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-drr.html,
"AD" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ad.html,
"CDDPL" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cddpl.html,
"TE" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-te.html,
"FT" https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-ft.html
Defaults to computing all. This field CANNOT be any of PipelineNonPrimitiveInputTypes.
...
: Parameters from ClarifyCheckConfig
clone()
The objects of this class are cloneable with this method.
ModelBiasCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Model Explainability Check Config
Model Explainability Check Config
sagemaker.workflow::ClarifyCheckConfig
-> ModelExplainabilityCheckConfig
model_config
Config of the model and its endpoint to be created
explainability_config
Config of the specific explainability method
model_scores
Index or JSONPath location in the model output
new()
Initialize ModelExplainabilityCheckConfig class
ModelExplainabilityCheckConfig$new( model_config, explainability_config, model_scores = NULL, ... )
model_config
(ModelConfig): Config of the model and its endpoint to be created.
explainability_config
(SHAPConfig): Config of the specific explainability method. Currently, only SHAP is supported.
model_scores
(str or int or ModelPredictedLabelConfig): Index or JSONPath location in the model output for the predicted scores to be explained (default: None). This is not required if the model output is a single score. Alternatively, an instance of ModelPredictedLabelConfig can be provided but this field CANNOT be any of PipelineNonPrimitiveInputTypes.
...
: Parameters from ClarifyCheckConfig
clone()
The objects of this class are cloneable with this method.
ModelExplainabilityCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Model Quality Check Config.
sagemaker.workflow::QualityCheckConfig
-> ModelQualityCheckConfig
problem_type
(str or PipelineNonPrimitiveInputTypes): The type of problem of this model quality monitoring. Valid values are "Regression", "BinaryClassification", "MulticlassClassification".
inference_attribute
(str or PipelineNonPrimitiveInputTypes): Index or JSONpath to locate predicted label(s) (default: None).
probability_attribute
(str or PipelineNonPrimitiveInputTypes): Index or JSONpath to locate probabilities (default: None).
ground_truth_attribute
(str or PipelineNonPrimitiveInputTypes: Index or JSONpath to locate actual label(s) (default: None).
probability_threshold_attribute
(str or PipelineNonPrimitiveInputTypes): Threshold to convert probabilities to binaries (default: None).
clone()
The objects of this class are cloneable with this method.
ModelQualityCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
Parallelism config for SageMaker pipeline
max_parallel_execution_steps
Max number of steps which could be parallelized
new()
Create a ParallelismConfiguration
ParallelismConfiguration$new(max_parallel_execution_steps)
max_parallel_execution_steps,
int: max number of steps which could be parallelized
to_request()
The request structure.
ParallelismConfiguration$to_request()
format()
format class
ParallelismConfiguration$format()
clone()
The objects of this class are cloneable with this method.
ParallelismConfiguration$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline parameter for workflow.
sagemaker.workflow::Entity
-> Parameter
name
The name of the parameter.
parameter_type
The type of the parameter
default_value
The default python value of the parameter
expr
The 'Get' expression dict for a 'Parameter'
new()
Initialize Parameter class
Parameter$new( name, parameter_type = ParameterTypeEnum$new(), default_value = NULL )
name
(str): The name of the parameter.
parameter_type
(ParameterTypeEnum): The type of the parameter.
default_value
(PrimitiveType): The default Python value of the parameter.
to_request()
Get the request structure for workflow service calls.
Parameter$to_request()
clone()
The objects of this class are cloneable with this method.
Parameter$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline boolean parameter for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Parameter
-> ParameterBoolean
new()
Create a pipeline boolean parameter.
ParameterBoolean$new(name, default_value = NULL)
name
(str): The name of the parameter.
default_value
(str): The default Python value of the parameter. Defaults to None.
clone()
The objects of this class are cloneable with this method.
ParameterBoolean$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline float parameter for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Parameter
-> ParameterFloat
float
Return default value or implicit value
new()
Create a pipeline float parameter.
ParameterFloat$new(name, default_value = NULL)
name
(str): The name of the parameter.
default_value
(float): The default Python value of the parameter.
clone()
The objects of this class are cloneable with this method.
ParameterFloat$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline integer parameter for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Parameter
-> ParameterInteger
int
Return default value or implicit value
new()
Create a pipeline integer parameter.
ParameterInteger$new(name, default_value = NULL)
name
(str): The name of the parameter.
default_value
(int): The default Python value of the parameter.
clone()
The objects of this class are cloneable with this method.
ParameterInteger$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline string parameter for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Parameter
-> ParameterString
enum_values
Placeholder
str
Return default value or implicit value
new()
Create a pipeline string parameter.
ParameterString$new(name, default_value = NULL, enum_values = NULL)
name
(str): The name of the parameter.
default_value
(str): The default Python value of the parameter.
enum_values
(list): placeholder
to_request()
Get the request structure for workflow service calls.
ParameterString$to_request()
clone()
The objects of this class are cloneable with this method.
ParameterString$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline for workflow.
sagemaker.workflow::Entity
-> Pipeline
new()
Initialize Pipeline Class
Pipeline$new( name, parameters = list(), pipeline_experiment_config = PipelineExperimentConfig$new(ExecutionVariables$PIPELINE_NAME, ExecutionVariables$PIPELINE_EXECUTION_ID), steps = list(), sagemaker_session = NULL )
name
(str): The name of the pipeline.
parameters
(Sequence[Parameter]): The list of the parameters.
pipeline_experiment_config
(Optional[PipelineExperimentConfig]): If set, the workflow will attempt to create an experiment and trial before executing the steps. Creation will be skipped if an experiment or a trial with the same name already exists. By default, pipeline name is used as experiment name and execution id is used as the trial name. If set to None, no experiment or trial will be created automatically.
steps
(Sequence[Union[Step, StepCollection]]): The list of the non-conditional steps associated with the pipeline. Any steps that are within the 'if_steps' or 'else_steps' of a 'ConditionStep' cannot be listed in the steps of a pipeline. Of particular note, the workflow service rejects any pipeline definitions that specify a step in the list of steps of a pipeline and that step in the 'if_steps' or 'else_steps' of any 'ConditionStep'.
sagemaker_session
(Session): Session object that manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, the pipeline creates one using the default AWS configuration chain.
to_request()
Gets the request structure for workflow service calls.
Pipeline$to_request()
create()
Creates a Pipeline in the Pipelines service.
Pipeline$create( role_arn, description = NULL, tags = NULL, parallelism_config = NULL )
role_arn
(str): The role arn that is assumed by the pipeline to create step artifacts.
description
(str): A description of the pipeline.
tags
(List[Dict[str, str]]): A list of "Key": "string", "Value": "string" dicts as tags.
parallelism_config
(Optional[ParallelismConfiguration]): Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
A response dict from the service.
describe()
Describes a Pipeline in the Workflow service.
Pipeline$describe()
Response dict from the service. See 'boto3 client documentation https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_pipeline
update()
Updates a Pipeline in the Workflow service.
Pipeline$update(role_arn, description = NULL, parallelism_config = NULL)
role_arn
(str): The role arn that is assumed by pipelines to create step artifacts.
description
(str): A description of the pipeline.
parallelism_config
(Optional[ParallelismConfiguration]): Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
A response dict from the service.
upsert()
Creates a pipeline or updates it, if it already exists.
Pipeline$upsert( role_arn, description = NULL, tags = NULL, parallelism_config = NULL )
role_arn
(str): The role arn that is assumed by workflow to create step artifacts.
description
(str): A description of the pipeline.
tags
(List[Dict[str, str]]): A list of "Key": "string", "Value": "string" dicts as tags.
parallelism_config
(Optional[Config for parallel steps, Parallelism configuration that is applied to each of. the executions
response dict from service
delete()
Deletes a Pipeline in the Workflow service.
Pipeline$delete()
A response dict from the service.
start()
Starts a Pipeline execution in the Workflow service.
Pipeline$start( parameters = NULL, execution_display_name = NULL, execution_description = NULL, parallelism_config = NULL )
parameters
(Dict[str, Union[str, bool, int, float]]): values to override pipeline parameters.
execution_display_name
(str): The display name of the pipeline execution.
execution_description
(str): A description of the execution.
parallelism_config
(Optional[ParallelismConfiguration]): Parallelism configuration that is applied to each of the executions of the pipeline. It takes precedence over the parallelism configuration of the parent pipeline.
A '.PipelineExecution' instance, if successful.
definition()
Converts a request structure to string representation for workflow service calls.
Pipeline$definition()
clone()
The objects of this class are cloneable with this method.
Pipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
Experiment config for SageMaker pipeline.
sagemaker.workflow::Entity
-> PipelineExperimentConfig
new()
Create a PipelineExperimentConfig
PipelineExperimentConfig$new(experiment_name, trial_name)
experiment_name
(Union[str, Parameter, ExecutionVariable, Expression]): the name of the experiment that will be created.
trial_name
(Union[str, Parameter, ExecutionVariable, Expression]): the name of the trial that will be created.
# Use pipeline name as the experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( ExecutionVariables$PIPELINE_NAME, ExecutionVariables$PIPELINE_EXECUTION_ID) # Use a customized experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( 'MyExperiment', ExecutionVariables$PIPELINE_EXECUTION_ID)
to_request()
Returns: the request structure.
PipelineExperimentConfig$to_request()
clone()
The objects of this class are cloneable with this method.
PipelineExperimentConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------ ## Method `PipelineExperimentConfig$new` ## ------------------------------------------------ # Use pipeline name as the experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( ExecutionVariables$PIPELINE_NAME, ExecutionVariables$PIPELINE_EXECUTION_ID) # Use a customized experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( 'MyExperiment', ExecutionVariables$PIPELINE_EXECUTION_ID)
## ------------------------------------------------ ## Method `PipelineExperimentConfig$new` ## ------------------------------------------------ # Use pipeline name as the experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( ExecutionVariables$PIPELINE_NAME, ExecutionVariables$PIPELINE_EXECUTION_ID) # Use a customized experiment name and pipeline execution id as the trial name:: PipelineExperimentConfig$new( 'MyExperiment', ExecutionVariables$PIPELINE_EXECUTION_ID)
Enum-like class for all pipeline experiment config property references.
PipelineExperimentConfigProperties
PipelineExperimentConfigProperties
An object of class Enum
(inherits from environment
) of length 2.
Reference to pipeline experiment config property.
sagemaker.workflow::Expression
-> PipelineExperimentConfigProperty
expr
The 'Get' expression dict for a pipeline experiment config property.
new()
Create a reference to pipeline experiment property.
PipelineExperimentConfigProperty$new(name)
name
(str): The name of the pipeline experiment config property.
clone()
The objects of this class are cloneable with this method.
PipelineExperimentConfigProperty$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipelineVariables must implement the expr property.
sagemaker.workflow::PropertiesMeta
-> PipelineVariable
expr
Get the expression structure for workflow service calls.
to_string()
Prompt the pipeline to convert the pipeline variable to String in runtime
PipelineVariable$to_string()
startswith()
Simulate the Python string's built-in method: startswith
PipelineVariable$startswith(prefix, start = NULL, end = NULL)
prefix
(str, tuple): The (tuple of) string to be checked.
start
(int): To set the start index of the matching boundary (default: None).
end
(int): To set the end index of the matching boundary (default: None).
bool: Always return False as Pipeline variables are parsed during execution runtime
endswith()
Simulate the Python string's built-in method: endswith
PipelineVariable$endswith(suffix, start = NULL, end = NULL)
suffix
(str, tuple): The (tuple of) string to be checked.
start
(int): To set the start index of the matching boundary (default: None).
end
(int): To set the end index of the matching boundary (default: None).
bool: Always return False as Pipeline variables are parsed during execution runtime
clone()
The objects of this class are cloneable with this method.
PipelineVariable$clone(deep = FALSE)
deep
Whether to make a deep clone.
This is done by adding the required 'feature_dim' hyperparameter from training data.
prepare_amazon_algorithm_estimator(estimator, inputs, mini_batch_size = NULL)
prepare_amazon_algorithm_estimator(estimator, inputs, mini_batch_size = NULL)
estimator |
(sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase): An estimator for a built-in Amazon algorithm to get information from and update. |
inputs |
: The training data. * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~'Record' objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.RecordSet' objects, where each instance is a different channel of training data. |
mini_batch_size |
(numeric): |
S3 operations specify where to upload 'source_dir'.
prepare_framework(estimator, s3_operations)
prepare_framework(estimator, s3_operations)
estimator |
(sagemaker.estimator.Estimator): The framework estimator to get information from and update. |
s3_operations |
(list): The dict to specify s3 operations (upload 'source_dir'). |
Prepare the framework model container information. Specify related S3 operations for Airflow to perform. (Upload 'source_dir' )
prepare_framework_container_def(model, instance_type, s3_operations)
prepare_framework_container_def(model, instance_type, s3_operations)
model |
(sagemaker.model.FrameworkModel): The framework model |
instance_type |
(str): The EC2 instance type to deploy this Model to. For example, 'ml.p2.xlarge'. |
s3_operations |
(dict): The dict to specify S3 operations (upload 'source_dir' ). |
dict: The container information of this framework model.
Export Airflow processing config from a SageMaker processor
processing_config( processor, inputs = NULL, outputs = NULL, job_name = NULL, experiment_config = NULL, container_arguments = NULL, container_entrypoint = NULL, kms_key_id = NULL )
processing_config( processor, inputs = NULL, outputs = NULL, job_name = NULL, experiment_config = NULL, container_arguments = NULL, container_entrypoint = NULL, kms_key_id = NULL )
processor |
(sagemaker.processor.Processor): The SageMaker processor to export Airflow config from. |
inputs |
(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None). |
outputs |
(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None). |
job_name |
(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp. |
experiment_config |
(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'. |
container_arguments |
([str]): The arguments for a container used to run a processing job. |
container_entrypoint |
([str]): The entrypoint for a container used to run a processing job. |
kms_key_id |
(str): The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt the processing job output. KmsKeyId can be an ID of a KMS key, ARN of a KMS key, alias of a KMS key, or alias of a KMS key. The KmsKeyId is applied to all outputs. |
dict: Processing config that can be directly used by SageMakerProcessingOperator in Airflow.
Processing step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> ProcessingStep
arguments
The arguments dict that is used to call 'create_processing_job'. NOTE: The CreateProcessingJob request is not quite the args list that workflow needs. ProcessingJobName and ExperimentConfig cannot be included in the arguments.
properties
A Properties object representing the DescribeProcessingJobResponse data model.
new()
Construct a ProcessingStep, given a 'Processor' instance. In addition to the processor instance, the other arguments are those that are supplied to the 'process' method of the 'sagemaker.processing.Processor'.
ProcessingStep$new( name, processor, display_name = NULL, description = NULL, inputs = NULL, outputs = NULL, job_arguments = NULL, code = NULL, property_files = NULL, cache_config = NULL, depends_on = NULL, retry_policies = NULL, kms_key = NULL )
name
(str): The name of the processing step.
processor
(Processor): A 'sagemaker.processing.Processor' instance.
display_name
(str): The display name of the processing step.
description
(str): The description of the processing step.
inputs
(List[ProcessingInput]): A list of 'sagemaker.processing.ProcessorInput' instances. Defaults to 'None'.
outputs
(List[ProcessingOutput]): A list of 'sagemaker.processing.ProcessorOutput' instances. Defaults to 'None'.
job_arguments
(List[str]): A list of strings to be passed into the processing job. Defaults to 'None'.
code
(str): This can be an S3 URI or a local path to a file with the framework script to run. Defaults to 'None'.
property_files
(List[PropertyFile]): A list of property files that workflow looks for and resolves from the configured processing output list.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
depends_on
(List[str] or List[Step]): A list of step names or step instance this 'sagemaker.workflow.steps.ProcessingStep' depends on
retry_policies
(List[RetryPolicy]): A list of retry policy
kms_key
(str): The ARN of the KMS key that is used to encrypt the user code file. Defaults to 'None'.
to_request()
Get the request structure for workflow service calls.
ProcessingStep$to_request()
clone()
The objects of this class are cloneable with this method.
ProcessingStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Properties for use in workflow expressions.
sagemaker.workflow::PropertiesMeta
-> sagemaker.workflow::PipelineVariable
-> Properties
shape_name
The botocore sagemaker service model shape name.
shape_names
A List of the botocore sagemaker service model shape name
expr
The 'Get' expression dict for a 'Properties'.
new()
Create a Properties instance representing the given shape.
Properties$new( path, shape_name = NULL, shape_names = NULL, service_name = "sagemaker" )
path
(str): The parent path of the Properties instance.
shape_name
(str): The botocore sagemaker service model shape name.
shape_names
(str): A List of the botocore sagemaker service model shape name.
service_name
(str):
clone()
The objects of this class are cloneable with this method.
Properties$clone(deep = FALSE)
deep
Whether to make a deep clone.
PropertiesList for use in workflow expressions
sagemaker.workflow::PropertiesMeta
-> sagemaker.workflow::PipelineVariable
-> sagemaker.workflow::Properties
-> PropertiesList
new()
Create a PropertiesList instance representing the given shape.
PropertiesList$new(path, shape_name = NULL, service_name = "sagemaker")
path
(str): The parent path of the PropertiesList instance.
shape_name
(str): The botocore sagemaker service model shape name.
service_name
(str): The botocore service name.
root_shape_name
(str): The botocore sagemaker service model shape name.
get_item()
Populate the indexing item with a Property, for both lists and dictionaries.
PropertiesList$get_item(item)
item
(Union[int, str]): The index of the item in sequence.
clone()
The objects of this class are cloneable with this method.
PropertiesList$clone(deep = FALSE)
deep
Whether to make a deep clone.
PropertiesMap for use in workflow expressions.
sagemaker.workflow::PropertiesMeta
-> sagemaker.workflow::PipelineVariable
-> sagemaker.workflow::Properties
-> PropertiesMap
path
The parent path of the PropertiesMap instance.
shape_name
The botocore sagemaker service model shape name.
service_name
The botocore service name.
new()
Create a PropertiesMap instance representing the given shape.
PropertiesMap$new(path, shape_name = NULL, service_name = "sagemaker")
path
(str): The parent path of the PropertiesMap instance.
shape_name
(str): The botocore sagemaker service model shape name.
service_name
(str): The botocore service name.
get_item()
Populate the indexing item with a Property, for both lists and dictionaries.
PropertiesMap$get_item(item)
item
(Union[int, str]): The index of the item in sequence.
clone()
The objects of this class are cloneable with this method.
PropertiesMap$clone(deep = FALSE)
deep
Whether to make a deep clone.
Provides a property file struct.
sagemaker.workflow::Expression
-> PropertyFile
name
The name of the property file for reference with 'JsonGet' functions.
output_name
The name of the processing job output channel.
path
The path to the file at the output channel location.
expr
Get the expression structure for workflow service calls.
new()
Initializing PropertyFile Class
PropertyFile$new(name, output_name, path)
name
(str): The name of the property file for reference with 'JsonGet' functions.
output_name
(str): The name of the processing job output channel.
path
(str): The path to the file at the output channel location.
clone()
The objects of this class are cloneable with this method.
PropertyFile$clone(deep = FALSE)
deep
Whether to make a deep clone.
Quality Check Config.
baseline_dataset
str or PipelineNonPrimitiveInputTypes): The path to the baseline_dataset file. This can be a local path or an S3 uri string
dataset_format
(dict): The format of the baseline_dataset.
output_s3_uri
(str or PipelineNonPrimitiveInputTypes): Desired S3 destination of the constraint_violations and statistics json files (default: None). If not specified an auto generated path will be used: "s3://<default_session_bucket>/model-monitor/baselining/<job_name>/results"
post_analytics_processor_script
(str): The path to the record post-analytics processor script (default: None). This can be a local path or an S3 uri string but CANNOT be any of PipelineNonPrimitiveInputTypes.
clone()
The objects of this class are cloneable with this method.
QualityCheckConfig$clone(deep = FALSE)
deep
Whether to make a deep clone.
QualityCheck step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> QualityCheckStep
arguments
The arguments dict that is used to define the QualityCheck step.
properties
A Properties object representing the output parameters of the QualityCheck step.
new()
Constructs a QualityCheckStep.
QualityCheckStep$new( name, quality_check_config, check_job_config, skip_check = FALSE, register_new_baseline = FALSE, model_package_group_name = NULL, supplied_baseline_statistics = NULL, supplied_baseline_constraints = NULL, display_name = NULL, description = NULL, cache_config = NULL, depends_on = NULL )
name
(str): The name of the QualityCheckStep step.
quality_check_config
(QualityCheckConfig): A QualityCheckConfig instance.
check_job_config
(CheckJobConfig): A CheckJobConfig instance.
skip_check
(bool or PipelineNonPrimitiveInputTypes): Whether the check should be skipped (default: False).
register_new_baseline
(bool or PipelineNonPrimitiveInputTypes): Whether the new baseline should be registered (default: False).
model_package_group_name
(str or PipelineNonPrimitiveInputTypes): The name of a registered model package group, among which the baseline will be fetched from the latest approved model (default: None).
supplied_baseline_statistics
(str or PipelineNonPrimitiveInputTypes): The S3 path to the supplied statistics object representing the statistics JSON file which will be used for drift to check (default: None).
supplied_baseline_constraints
(str or PipelineNonPrimitiveInputTypes): The S3 path to the supplied constraints object representing the constraints JSON file which will be used for drift to check (default: None).
display_name
(str): The display name of the QualityCheckStep step (default: None).
description
(str): The description of the QualityCheckStep step (default: None).
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance (default: None).
depends_on
(List[str] or List[Step]): A list of step names or step instances this 'sagemaker.workflow.steps.QualityCheckStep' depends on (default: None).
to_request()
Updates the dictionary with cache configuration etc.
QualityCheckStep$to_request()
clone()
The objects of this class are cloneable with this method.
QualityCheckStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Register Model step collection for workflow.
sagemaker.workflow::StepCollection
-> RegisterModel
new()
Construct steps '_RepackModelStep' and '_RegisterModelStep' based on the estimator.
RegisterModel$new( name, content_types, response_types, inference_instances, transform_instances, estimator = NULL, model_data = NULL, depends_on = NULL, repack_model_step_retry_policies = NULL, register_model_step_retry_policies = NULL, model_package_group_name = NULL, model_metrics = NULL, approval_status = NULL, image_uri = NULL, compile_model_family = NULL, display_name = NULL, description = NULL, tags = NULL, model = NULL, drift_check_baselines = NULL, ... )
name
(str): The name of the training step.
content_types
(list): The supported MIME types for the input data (default: None).
response_types
(list): The supported MIME types for the output data (default: None).
inference_instances
(list): A list of the instance types that are used to generate inferences in real-time (default: None).
transform_instances
(list): A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).
estimator
The estimator instance.
model_data
The S3 uri to the model data from training.
depends_on
(List[str] or List[Step]): The list of step names or step instances the first step in the collection depends on
repack_model_step_retry_policies
(List[RetryPolicy]): The list of retry policies for the repack model step
register_model_step_retry_policies
(List[RetryPolicy]): The list of retry policies for register model step
model_package_group_name
(str): The Model Package Group name, exclusive to 'model_package_name', using 'model_package_group_name' makes the Model Package versioned (default: None).
model_metrics
(ModelMetrics): ModelMetrics object (default: None).
approval_status
(str): Model Approval Status, values can be "Approved", "Rejected", or "PendingManualApproval" (default: "PendingManualApproval").
image_uri
(str): The container image uri for Model Package, if not specified, Estimator's training container image is used (default: None).
compile_model_family
(str): The instance family for the compiled model. If specified, a compiled model is used (default: None).
display_name
(str): The display name of the step.
description
(str): Model Package description (default: None).
tags
(List[dict[str, str]]): The list of tags to attach to the model package group. Note that tags will only be applied to newly created model package groups; if the name of an existing group is passed to "model_package_group_name", tags will not be applied.
model
(object or Model): A PipelineModel object that comprises a list of models which gets executed as a serial inference pipeline or a Model object.
drift_check_baselines
(DriftCheckBaselines): DriftCheckBaselines object (default: None).
...
: additional arguments to 'create_model'.
clone()
The objects of this class are cloneable with this method.
RegisterModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
RetryPolicy base class
RetryPolicy base class
sagemaker.workflow::Entity
-> RetryPolicy
backoff_rate
(float): The multiplier by which the retry interval increases during each attempt (default: 2.0)
interval_seconds
(int): An integer that represents the number of seconds before the first retry attempt (default: 1)
max_attempts
(int): A positive integer that represents the maximum number of retry attempts. (default: None)
expire_after_mins
(int): A positive integer that represents the maximum minute to expire any further retry attempt (default: None)
new()
Initialize RetryPolicy class
RetryPolicy$new( backoff_rate = DEFAULT_BACKOFF_RATE, interval_seconds = DEFAULT_INTERVAL_SECONDS, max_attempts = NULL, expire_after_mins = NULL )
backoff_rate
(float): The multiplier by which the retry interval increases during each attempt (default: 2.0)
interval_seconds
(int): An integer that represents the number of seconds before the first retry attempt (default: 1)
max_attempts
(int): A positive integer that represents the maximum number of retry attempts. (default: None)
expire_after_mins
(int): A positive integer that represents the maximum minute to expire any further retry attempt (default: None)
validate_backoff_rate()
Validate the input back off rate type
RetryPolicy$validate_backoff_rate(value)
value
object to be checked
validate_interval_seconds()
Validate the input interval seconds
RetryPolicy$validate_interval_seconds(value)
value
object to be checked
validate_max_attempts()
Validate the input max attempts
RetryPolicy$validate_max_attempts(value)
value
object to be checked
validate_expire_after_mins()
Validate expire after mins
RetryPolicy$validate_expire_after_mins(value)
value
object to be checked
to_request()
Get the request structure for workflow service calls.
RetryPolicy$to_request()
value
object to be checked
format()
format class
RetryPolicy$format()
clone()
The objects of this class are cloneable with this method.
RetryPolicy$clone(deep = FALSE)
deep
Whether to make a deep clone.
SageMaker Job ExceptionType enum.
SageMakerJobExceptionTypeEnum
SageMakerJobExceptionTypeEnum
An object of class Enum
(inherits from environment
) of length 3.
RetryPolicy for exception thrown by SageMaker Job.
sagemaker.workflow::Entity
-> sagemaker.workflow::RetryPolicy
-> SageMakerJobStepRetryPolicy
exception_type_list
Contains exception_types or failure_reason_types
new()
Initialize SageMakerJobStepRetryPolicy
SageMakerJobStepRetryPolicy$new( exception_types = NULL, failure_reason_types = NULL, backoff_rate = 2, interval_seconds = 1, max_attempts = NULL, expire_after_mins = NULL )
exception_types
(List[SageMakerJobExceptionTypeEnum]): The SageMaker exception to match for this policy. The SageMaker exceptions captured here are the exceptions thrown by synchronously creating the job. For instance the resource limit exception.
failure_reason_types
(List[SageMakerJobExceptionTypeEnum]): the SageMaker failure reason types to match for this policy. The failure reason type is presented in FailureReason field of the Describe response, it indicates the runtime failure reason for a job.
backoff_rate
(float): The multiplier by which the retry interval increases during each attempt (default: 2.0)
interval_seconds
(int): An integer that represents the number of seconds before the first retry attempt (default: 1)
max_attempts
(int): A positive integer that represents the maximum number of retry attempts. (default: None)
expire_after_mins
(int): A positive integer that represents the maximum minute to expire any further retry attempt (default: None)
to_request()
Gets the request structure for retry policy
SageMakerJobStepRetryPolicy$to_request()
clone()
The objects of this class are cloneable with this method.
SageMakerJobStepRetryPolicy$clone(deep = FALSE)
deep
Whether to make a deep clone.
Pipeline step for workflow.
sagemaker.workflow::Entity
-> Step
name
The name of the step.
display_name
The display name of the step.
description
The description of the step.
step_type
The type of the step.
depends_on
The list of step names the current step depends on
retry_policies
(List[RetryPolicy]): The custom retry policy configuration
arguments
The arguments to the particular step service call.
properties
The properties of the particular step.
ref
Gets a reference dict for steps
new()
Initialize Workflow Step
Step$new( name, display_name = NULL, description = NULL, step_type = enum_items(StepTypeEnum), depends_on = NULL )
name
(str): The name of the step.
display_name
(str): The display name of the step.
description
(str): The description of the step.
step_type
(StepTypeEnum): The type of the step.
depends_on
(List[str] or List[Step]): The list of step names or step instances the current step depends on
to_request()
Gets the request structure for workflow service calls.
Step$to_request()
add_depends_on()
Add step names to the current step depends on list
Step$add_depends_on(step_names)
step_names
(list): placeholder
format()
formats class
Step$format()
clone()
The objects of this class are cloneable with this method.
Step$clone(deep = FALSE)
deep
Whether to make a deep clone.
A wrapper of pipeline steps for workflow.
steps
A list of steps.
new()
Initialize StepCollection class
StepCollection$new(steps)
steps
(List[Step]): A list of steps.
request_list()
Get the request structure for workflow service calls.
StepCollection$request_list()
format()
format class
StepCollection$format()
clone()
The objects of this class are cloneable with this method.
StepCollection$clone(deep = FALSE)
deep
Whether to make a deep clone.
Step ExceptionType enum.
StepExceptionTypeEnum
StepExceptionTypeEnum
An object of class Enum
(inherits from environment
) of length 2.
RetryPolicy for a retryable step. The pipeline service will retry
sagemaker.workflow::Entity
-> sagemaker.workflow::RetryPolicy
-> StepRetryPolicy
exception_types
(List[StepExceptionTypeEnum]): the exception types to match for this policy
new()
Initialize StepRetryPolicy class
StepRetryPolicy$new( exception_types, backoff_rate = 2, interval_seconds = 1, max_attempts = NULL, expire_after_mins = NULL )
exception_types
(List[StepExceptionTypeEnum]): the exception types to match for this policy
backoff_rate
(float): The multiplier by which the retry interval increases during each attempt (default: 2.0)
interval_seconds
(int): An integer that represents the number of seconds before the first retry attempt (default: 1)
max_attempts
(int): A positive integer that represents the maximum number of retry attempts. (default: None)
expire_after_mins
(int): A positive integer that represents the maximum minute to expire any further retry attempt (default: None)
to_request()
Gets the request structure for retry policy.
StepRetryPolicy$to_request()
clone()
The objects of this class are cloneable with this method.
StepRetryPolicy$clone(deep = FALSE)
deep
Whether to make a deep clone.
Enum of step types.
StepTypeEnum
StepTypeEnum
An object of class Enum
(inherits from environment
) of length 13.
Export Airflow base training config from an estimator
training_base_config( estimator, inputs = NULL, job_name = NULL, mini_batch_size = NULL )
training_base_config( estimator, inputs = NULL, job_name = NULL, mini_batch_size = NULL )
estimator |
(sagemaker.estimator.EstimatorBase): The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator. |
inputs |
: Information about the training data. Please refer to the “fit()“ method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved. * (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or :func:'~sagemaker.inputs.TrainingInput' objects. * (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See :func:'sagemaker.inputs.TrainingInput' for full details. * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~'Record' objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.RecordSet' objects, where each instance is a different channel of training data. |
job_name |
(str): Specify a training job name if needed. |
mini_batch_size |
(int): Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator. |
dict: Training config that can be directly used by SageMakerTrainingOperator in Airflow.
Export Airflow training config from an estimator
training_config( estimator, inputs = NULL, job_name = NULL, mini_batch_size = NULL )
training_config( estimator, inputs = NULL, job_name = NULL, mini_batch_size = NULL )
estimator |
(sagemaker.estimator.EstimatorBase): The estimator to export training config from. Can be a BYO estimator, Framework estimator or Amazon algorithm estimator. |
inputs |
: Information about the training data. Please refer to the “fit()“ method of the associated estimator, as this can take any of the following forms: * (str) - The S3 location where training data is saved. * (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or :func:'~sagemaker.inputs.TrainingInput' objects. * (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See :func:'sagemaker.inputs.TrainingInput' for full details. * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~'Record' objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.RecordSet' objects, where each instance is a different channel of training data. |
job_name |
(str): Specify a training job name if needed. |
mini_batch_size |
(int): Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator. |
list: Training config that can be directly used by SageMakerTrainingOperator in Airflow.
Training step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> TrainingStep
arguments
The arguments dict that is used to call 'create_training_job'. NOTE: The CreateTrainingJob request is not quite the args list that workflow needs. The TrainingJobName and ExperimentConfig attributes cannot be included.
properties
A Properties object representing the DescribeTrainingJobResponse data model.
new()
Construct a TrainingStep, given an 'EstimatorBase' instance. In addition to the estimator instance, the other arguments are those that are supplied to the 'fit' method of the 'sagemaker.estimator.Estimator'.
TrainingStep$new( name, estimator, display_name = NULL, description = NULL, inputs = NULL, cache_config = NULL, depends_on = NULL, retry_policies = NULL )
name
(str): The name of the training step.
estimator
(EstimatorBase): A 'sagemaker.estimator.EstimatorBase' instance.
display_name
(str): The display name of the training step.
description
(str): The description of the training step.
inputs
(str or dict or sagemaker.inputs.TrainingInput or sagemaker.inputs.FileSystemInput): Information about the training data. This can be one of three types:
(str) the S3 location where training data is saved, or a file:// path in local mode.
(dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) If using multiple channels for training data, you can specify a dict mapping channel names to strings or :func:'~sagemaker.inputs.TrainingInput' objects.
(sagemaker.inputs.TrainingInput) - channel configuration for S3 data sources that can provide additional information as well as the path to the training dataset. See :func:'sagemaker.inputs.TrainingInput' for full details.
(sagemaker.inputs.FileSystemInput) - channel configuration for a file system data source that can provide additional information as well as the path to the training dataset.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
depends_on
(List[str]): A list of step names this 'sagemaker.workflow.steps.TrainingStep' depends on
retry_policies
(List[RetryPolicy]): A list of retry policy
to_request()
A Properties object representing the DescribeTrainingJobResponse data model.
TrainingStep$to_request()
clone()
The objects of this class are cloneable with this method.
TrainingStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Export Airflow transform config from a SageMaker transformer
transform_config( transformer, data, data_type = "S3Prefix", content_type = NULL, compression_type = NULL, split_type = NULL, job_name = NULL, input_filter = NULL, output_filter = NULL, join_source = NULL )
transform_config( transformer, data, data_type = "S3Prefix", content_type = NULL, compression_type = NULL, split_type = NULL, job_name = NULL, input_filter = NULL, output_filter = NULL, join_source = NULL )
transformer |
(sagemaker.transformer.Transformer): The SageMaker transformer to export Airflow config from. |
data |
(str): Input data location in S3. |
data_type |
(str): What the S3 location defines (default: 'S3Prefix'). Valid values: * 'S3Prefix' - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job. * 'ManifestFile' - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job. |
content_type |
(str): MIME type of the input data (default: None). |
compression_type |
(str): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None. |
split_type |
(str): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'. |
job_name |
(str): job name (default: None). If not specified, one will be generated. |
input_filter |
(str): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value '$', representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the 'RFC format <https://tools.ietf.org/html/rfc4180>'_. See 'Supported JSONPath Operators <https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html#data-processing-operators>'_ for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.features" (default: None). |
output_filter |
(str): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.prediction" (default: None). |
join_source |
(str): The source of data to be joined to the transform output. It can be set to 'Input' meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None. |
dict: Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Export Airflow transform config from a SageMaker estimator
transform_config_from_estimator( estimator, task_id, task_type, instance_count, instance_type, data, data_type = "S3Prefix", content_type = NULL, compression_type = NULL, split_type = NULL, job_name = NULL, model_name = NULL, strategy = NULL, assemble_with = NULL, output_path = NULL, output_kms_key = NULL, accept = NULL, env = NULL, max_concurrent_transforms = NULL, max_payload = NULL, tags = NULL, role = NULL, volume_kms_key = NULL, model_server_workers = NULL, image_uri = NULL, vpc_config_override = NULL, input_filter = NULL, output_filter = NULL, join_source = NULL )
transform_config_from_estimator( estimator, task_id, task_type, instance_count, instance_type, data, data_type = "S3Prefix", content_type = NULL, compression_type = NULL, split_type = NULL, job_name = NULL, model_name = NULL, strategy = NULL, assemble_with = NULL, output_path = NULL, output_kms_key = NULL, accept = NULL, env = NULL, max_concurrent_transforms = NULL, max_payload = NULL, tags = NULL, role = NULL, volume_kms_key = NULL, model_server_workers = NULL, image_uri = NULL, vpc_config_override = NULL, input_filter = NULL, output_filter = NULL, join_source = NULL )
estimator |
(sagemaker.model.EstimatorBase): The SageMaker estimator to export Airflow config from. It has to be an estimator associated with a training job. |
task_id |
(str): The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. The transform config is built based on the training job generated in this operator. |
task_type |
(str): Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be 'training', 'tuning' or None (which means training job is not from any task). |
instance_count |
(int): Number of EC2 instances to use. |
instance_type |
(str): Type of EC2 instance to use, for example, 'ml.c4.xlarge'. |
data |
(str): Input data location in S3. |
data_type |
(str): What the S3 location defines (default: 'S3Prefix'). Valid values: * 'S3Prefix' - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job. * 'ManifestFile' - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job. |
content_type |
(str): MIME type of the input data (default: None). |
compression_type |
(str): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None. |
split_type |
(str): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'. |
job_name |
(str): transform job name (default: None). If not specified, one will be generated. |
model_name |
(str): model name (default: None). If not specified, one will be generated. |
strategy |
(str): The strategy used to decide how to batch records in a single request (default: None). Valid values: 'MultiRecord' and 'SingleRecord'. |
assemble_with |
(str): How the output is assembled (default: None). Valid values: 'Line' or 'None'. |
output_path |
(str): S3 location for saving the transform result. If not specified, results are stored to a default bucket. |
output_kms_key |
(str): Optional. KMS key ID for encrypting the transform output (default: None). |
accept |
(str): The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output. |
env |
(dict): Environment variables to be set for use during the transform job (default: None). |
max_concurrent_transforms |
(int): The maximum number of HTTP requests to be made to each individual transform container at one time. |
max_payload |
(int): Maximum size of the payload in a single HTTP request to the container in MB. |
tags |
(list[dict]): List of tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job. |
role |
(str): The “ExecutionRoleArn“ IAM Role ARN for the “Model“, which is also used during transform jobs. If not specified, the role from the Estimator will be used. |
volume_kms_key |
(str): Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None). |
model_server_workers |
(int): Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU. |
image_uri |
(str): A Docker image URI to use for deploying the model |
vpc_config_override |
(dict[str, list[str]]): Override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * 'Subnets' (list[str]): List of subnet ids. * 'SecurityGroupIds' (list[str]): List of security group ids. |
input_filter |
(str): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value '$', representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the 'RFC format <https://tools.ietf.org/html/rfc4180>'_. See 'Supported JSONPath Operators <https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html#data-processing-operators>'_ for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.features" (default: None). |
output_filter |
(str): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.prediction" (default: None). |
join_source |
(str): The source of data to be joined to the transform output. It can be set to 'Input' meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None. |
dict: Transform config that can be directly used by SageMakerTransformOperator in Airflow.
Transform step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> TransformStep
arguments
The arguments dict that is used to call 'create_transform_job'. NOTE: The CreateTransformJob request is not quite the args list that workflow needs. TransformJobName and ExperimentConfig cannot be included in the arguments.
properties
A Properties object representing the DescribeTransformJobResponse data model.
new()
Constructs a TransformStep, given an 'Transformer' instance. In addition to the transformer instance, the other arguments are those that are supplied to the 'transform' method of the 'sagemaker.transformer.Transformer'.
TransformStep$new( name, transformer, inputs, display_name = NULL, description = NULL, cache_config = NULL, depends_on = NULL, retry_policies = NULL )
name
(str): The name of the transform step.
transformer
(Transformer): A 'sagemaker.transformer.Transformer' instance.
inputs
(TransformInput): A 'sagemaker.inputs.TransformInput' instance.
display_name
(str): The display name of the transform step.
description
(str): The description of the transform step.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
depends_on
(List[str]): A list of step names this 'sagemaker.workflow.steps.TransformStep' depends on.
retry_policies
(List[RetryPolicy]): A list of retry policy
to_request()
Updates the dictionary with cache configuration.
TransformStep$to_request()
clone()
The objects of this class are cloneable with this method.
TransformStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Export Airflow tuning config from a HyperparameterTuner
tuning_config( tuner, inputs, job_name = NULL, include_cls_metadata = FALSE, mini_batch_size = NULL )
tuning_config( tuner, inputs, job_name = NULL, include_cls_metadata = FALSE, mini_batch_size = NULL )
tuner |
(sagemaker.tuner.HyperparameterTuner): The tuner to export tuning config from. |
inputs |
: Information about the training data. Please refer to the “fit()“ method of the associated estimator in the tuner, as this can take any of the following forms: * (str) - The S3 location where training data is saved. * (dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or :func:'~sagemaker.inputs.TrainingInput' objects. * (sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See :func:'sagemaker.inputs.TrainingInput' for full details. * (sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~'Record' objects serialized and stored in S3. For use with an estimator for an Amazon algorithm. * (list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.RecordSet' objects, where each instance is a different channel of training data. * (dict[str, one the forms above]): Required by only tuners created via the factory method “HyperparameterTuner.create()“. The keys should be the same estimator names as keys for the “estimator_list“ argument of the “HyperparameterTuner.create()“ method. |
job_name |
(str): Specify a tuning job name if needed. |
include_cls_metadata |
: It can take one of the following two forms. * (bool) - Whether or not the hyperparameter tuning job should include information about the estimator class (default: False). This information is passed as a hyperparameter, so if the algorithm you are using cannot handle unknown hyperparameters (e.g. an Amazon SageMaker built-in algorithm that does not have a custom estimator in the Python SDK), then set “include_cls_metadata“ to “False“. * (dict[str, bool]) - This version should be used for tuners created via the factory method “HyperparameterTuner.create()“, to specify the flag for individual estimators provided in the “estimator_list“ argument of the method. The keys would be the same estimator names as in “estimator_list“. If one estimator doesn't need the flag set, then no need to include it in the dictionary. If none of the estimators need the flag set, then an empty dictionary ““ must be used. |
mini_batch_size |
: It can take one of the following two forms. * (int) - Specify this argument only when estimator is a built-in estimator of an Amazon algorithm. For other estimators, batch size should be specified in the estimator. * (dict[str, int]) - This version should be used for tuners created via the factory method “HyperparameterTuner.create()“, to specify the value for individual estimators provided in the “estimator_list“ argument of the method. The keys would be the same estimator names as in “estimator_list“. If one estimator doesn't need the value set, then no need to include it in the dictionary. If none of the estimators need the value set, then an empty dictionary ““ must be used. |
list: Tuning config that can be directly used by SageMakerTuningOperator in Airflow.
Tuning step for workflow.
sagemaker.workflow::Entity
-> sagemaker.workflow::Step
-> sagemaker.workflow::ConfigurableRetryStep
-> TuningStep
arguments
The arguments dict that is used to call 'create_hyper_parameter_tuning_job'. NOTE: The CreateHyperParameterTuningJob request is not quite the args list that workflow needs. The HyperParameterTuningJobName attribute cannot be included.
properties
A Properties object representing 'DescribeHyperParameterTuningJobResponse' and 'ListTrainingJobsForHyperParameterTuningJobResponse' data model.
new()
Construct a TuningStep, given a 'HyperparameterTuner' instance. In addition to the tuner instance, the other arguments are those that are supplied to the 'fit' method of the 'sagemaker.tuner.HyperparameterTuner'.
TuningStep$new( name, tuner, display_name = NULL, description = NULL, inputs = NULL, job_arguments = NULL, cache_config = NULL, depends_on = NULL, retry_policies = NULL )
name
(str): The name of the tuning step.
tuner
(HyperparameterTuner): A 'sagemaker.tuner.HyperparameterTuner' instance.
display_name
(str): The display name of the tuning step.
description
(str): The description of the tuning step.
inputs
: Information about the training data. Please refer to the “fit()“ method of the associated estimator, as this can take any of the following forms:
(str) - The S3 location where training data is saved.
(dict[str, str] or dict[str, sagemaker.inputs.TrainingInput]) - If using multiple channels for training data, you can specify a dict mapping channel names to strings or :func:'~sagemaker.inputs.TrainingInput' objects.
(sagemaker.inputs.TrainingInput) - Channel configuration for S3 data sources that can provide additional information about the training dataset. See :func:'sagemaker.inputs.TrainingInput' for full details.
(sagemaker.session.FileSystemInput) - channel configuration for a file system data source that can provide additional information as well as the path to the training dataset.
(sagemaker.amazon.amazon_estimator.RecordSet) - A collection of Amazon :class:~'Record' objects serialized and stored in S3. For use with an estimator for an Amazon algorithm.
(sagemaker.amazon.amazon_estimator.FileSystemRecordSet) - Amazon SageMaker channel configuration for a file system data source for Amazon algorithms.
(list[sagemaker.amazon.amazon_estimator.RecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.RecordSet' objects, where each instance is a different channel of training data.
(list[sagemaker.amazon.amazon_estimator.FileSystemRecordSet]) - A list of :class:~'sagemaker.amazon.amazon_estimator.FileSystemRecordSet' objects, where each instance is a different channel of training data.
job_arguments
(List[str]): A list of strings to be passed into the processing job. Defaults to 'None'.
cache_config
(CacheConfig): A 'sagemaker.workflow.steps.CacheConfig' instance.
depends_on
(List[str] or List[Step]): A list of step names or step instance this 'sagemaker.workflow.steps.ProcessingStep' depends on
retry_policies
(List[RetryPolicy]): A list of retry policy
to_request()
Updates the dictionary with cache configuration.
TuningStep$to_request()
get_top_model_s3_uri()
Get the model artifact s3 uri from the top performing training jobs.
TuningStep$get_top_model_s3_uri(top_k, s3_bucket, prefix = "")
top_k
(int): the index of the top performing training job tuning step stores up to 50 top performing training jobs, hence a valid top_k value is from 0 to 49. The best training job model is at index 0
s3_bucket
(str): the s3 bucket to store the training job output artifact
prefix
(str): the s3 key prefix to store the training job output artifact
clone()
The objects of this class are cloneable with this method.
TuningStep$clone(deep = FALSE)
deep
Whether to make a deep clone.
Update training job of the estimator from a task in the DAG
update_estimator_from_task(estimator, task_id, task_type)
update_estimator_from_task(estimator, task_id, task_type)
estimator |
(sagemaker.estimator.EstimatorBase): The estimator to update |
task_id |
(str): The task id of any airflow.contrib.operators.SageMakerTrainingOperator or airflow.contrib.operators.SageMakerTuningOperator that generates training jobs in the DAG. |
task_type |
(str): Whether the task is from SageMakerTrainingOperator or SageMakerTuningOperator. Values can be 'training', 'tuning' or None (which means training job is not from any task). |
Updated the S3 URI of the framework source directory in given estimator.
update_submit_s3_uri(estimator, job_name)
update_submit_s3_uri(estimator, job_name)
estimator |
(sagemaker.estimator.Framework): The Framework estimator to update. |
job_name |
(str): The new job name included in the submit S3 URI |
str: The updated S3 URI of framework source directory