Description
Describe the feature you'd like
Consider raising the character limit on arguments passed to scripts a little. Right now, each parameter passed to a step through job_arguments
must be no more than 256 characters. This is an oppressive limit. Consider:
arguments = [
"--s3_data_path",
s3_path,
"--model_description",
# This, whether it's Parameter or str, cannot be > 256 chars
model_description,
]
preprocessing_step = ProcessingStep(
name=...,
processor=...,
code=...,
job_arguments=arguments,
)
As a result, when building pipelines, you end up having to upload even something simple like a description string to a file on S3. What's worse, is that if model_description
in the example above is a Parameter
, a Parameter
can legally be up to 1024 characters. So not even every Parameter can be passed to a Step! Whether the above code will break or not, if model_description
is a Parameter
, depends on the Parameter's runtime value. Surely this is a very sad state of affairs.
Even just fairly modest increases to the size limits would greatly improve quality-of-life. At the very least, the size limit should match the size limit of ParameterString, at 1024 characters, so that any Parameter
can be passed as a step argument. Ideally, for both ParameterString and the argument length, the limit would be increased further to a number more like 10k. That would allow any small content that's reasonable to have as a Parameter (e.g. small text blob) to be a Parameter while things like data files, etc. would still have to be uploaded to S3.
I am fairly confident that this could be done without hitting command length limitations on any modern shell. If the issue is that there may be up to 100 parameters, impose an overall length limit on the shell command so that each individual parameter does not need to be so constrained.
How would this feature be used? Please describe.
Users would more easily pass more data directly, without having to put things in S3 files all the time, greatly increasing the usability of Sagemaker pipelines.
Describe alternatives you've considered
The workaround is that any content even potentially exceeding 256 characters must go in an S3 file.