Skip to content

Increase Length Limit for Arguments to Steps #5224

Open
@eugeneyarovoi

Description

@eugeneyarovoi

Describe the feature you'd like
Consider raising the character limit on arguments passed to scripts a little. Right now, each parameter passed to a step through job_arguments must be no more than 256 characters. This is an oppressive limit. Consider:

arguments = [
        "--s3_data_path",
        s3_path,
        "--model_description",
        # This, whether it's Parameter or str, cannot be > 256 chars
        model_description,
    ]

preprocessing_step = ProcessingStep(
        name=...,
        processor=...,
        code=...,
        job_arguments=arguments,
    )

As a result, when building pipelines, you end up having to upload even something simple like a description string to a file on S3. What's worse, is that if model_description in the example above is a Parameter, a Parameter can legally be up to 1024 characters. So not even every Parameter can be passed to a Step! Whether the above code will break or not, if model_description is a Parameter, depends on the Parameter's runtime value. Surely this is a very sad state of affairs.

Even just fairly modest increases to the size limits would greatly improve quality-of-life. At the very least, the size limit should match the size limit of ParameterString, at 1024 characters, so that any Parameter can be passed as a step argument. Ideally, for both ParameterString and the argument length, the limit would be increased further to a number more like 10k. That would allow any small content that's reasonable to have as a Parameter (e.g. small text blob) to be a Parameter while things like data files, etc. would still have to be uploaded to S3.

I am fairly confident that this could be done without hitting command length limitations on any modern shell. If the issue is that there may be up to 100 parameters, impose an overall length limit on the shell command so that each individual parameter does not need to be so constrained.

How would this feature be used? Please describe.
Users would more easily pass more data directly, without having to put things in S3 files all the time, greatly increasing the usability of Sagemaker pipelines.

Describe alternatives you've considered
The workaround is that any content even potentially exceeding 256 characters must go in an S3 file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions