Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Add custom formatting specifiers for uppercase, lowercase, etc. in templates #5736

Open
2 tasks done
BigRoy opened this issue Oct 9, 2023 · 7 comments
Open
2 tasks done
Assignees
Labels
community contribution type: enhancement Enhancements to existing functionality

Comments

@BigRoy
Copy link
Collaborator

BigRoy commented Oct 9, 2023

Is there an existing issue for this?

  • I have searched the existing issues.

Please describe the feature you have in mind and explain what the current shortcomings are?

It might be worth using a custom formatter in OpenPype so we don't need all the keys like Task task, TASK etc. in the data and can actually explicitly state what we want to have done with the value.

It would allow explicitly forcing lowercase, uppercase or whatever we want as custom formatting. 🙂
Plus, you'd just need to just extend the anatomy formatting class and don't need to actually wrangle all data. It'd then automatically work for all data formatted with anatomy.

Simpler. (And faster potentially since the formatting would only occur when requested instead of preparing the data beforehand).
So uppercase task would e.g. be {task:upper} or {task!u} or whatever we want to use as format specifier or solution.

How would you imagine the implementation of the feature?

Custom value conversions

An example of specifying custom conversion specs for formatting can be found here which shows how to add e.g. {myvar!u} to uppercase a string or {myvar!l} to lowercase a string.

Here's an example of that:

from string import Formatter

class ExtendedFormatter(Formatter):
    """An extended format string formatter

    Formatter with extended conversion symbol
    """
    def convert_field(self, value, conversion):
        """ Extend conversion symbol
        
        Conversions:
        * l: convert to string and low case
        * u: convert to string and up case

        """
        if conversion == "u":
            return str(value).upper()
        elif conversion == "l":
            return str(value).lower()
        # Do the default conversion or raise error if no matching conversion found
        return super(ExtendedFormatter, self).convert_field(value, conversion)


# Example
template = "{task!u}/{asset!l}/v{version:03d}"
data = {
  "task": "anim",
  "asset": "char_SuperHero",
  "version": 1
}
formatter = ExtendedFormatter()
output = formatter.format(template, **data)
print(output)
# ANIM/char_superhero/v001

This requires the conversion character to start with ! and the conversion can only be a single character.

Custom field format specifiers

However, it's also possible to define custom field conversions - where we can use the format spec to implement the same but allowing e.g. longer readable specs like :lower and :upper

from string import Formatter

class MyFormatter(Formatter):
    def format_field(self, value, format_spec):
        if format_spec == "upper":
            return str(value).upper()
        elif format_spec == "lower":
            return str(value).lower()
        
        return super(MyFormatter, self).format_field(value, format_spec)


# Example
template_str = "{task:upper}/{asset:lower}/{subset:lower}/v{version:03d}"
data = {
  "task": "anim",
  "asset": "char_SuperHero",
  "subset": "helloWorld",
  "version": 1
}
myformatter = MyFormatter()
output = myformatter.format(template_str, **data)
print(output)
# ANIM/char_superhero/helloworld/v001

Maybe low and up are nicer because they are shorter maybe?

These could be implemented on the StringTemplate class when formatting.

Custom field format specifiers - uppercase only first character

We currently also support {Task} to only uppercase the first character - we can take that same logic and implement it as well:

from string import Formatter

class MyFormatter(Formatter):
    def format_field(self, value, format_spec):
        if format_spec == "upper":
            return str(value).upper()
        elif format_spec == "lower":
            return str(value).lower()
        elif format_spec == "upperfirst":
            value = str(value)
            if value:
                # Uppercase first character, leave rest as is
                return "".join([value[0].upper(), value[1:]])
            else:
                return value
        
        return super(MyFormatter, self).format_field(value, format_spec)


# Example
template_str = "{task:upper}/{asset:lower}/{family}{task:upperfirst}{variant:upperfirst}/v{version:03d}"
data = {
  "task": "anim",
  "asset": "char_SuperHero",
  "family": "render",
  "variant": "main",
  "version": 1
}
myformatter = MyFormatter()
output = myformatter.format(template_str, **data)
print(output)
# ANIM/char_superhero/renderAnimMain/v001

Are there any labels you wish to add?

  • I have added the relevant labels to the enhancement request.

Describe alternatives you've considered:

The alternative is basically the hassle we've been riding with currently where all data needs to be prepared with: {task}, {Task}, {TASK}, etc.

It would mean we'd need to preformat all data, which can be slow due to the fact that we're likely formatting a lot of versions that we don't end up using in the data. Also with these keys it's unclear what e.g. would be a key for force lowercase formatting.

Additional context:

Originally proposed on Discord

[cuID:OP-7100]

@BigRoy BigRoy added the type: enhancement Enhancements to existing functionality label Oct 9, 2023
@BigRoy BigRoy changed the title Enhancement: Add custom formatting specifiers for uppercase, lowercase, etc. Enhancement: Add custom formatting specifiers for uppercase, lowercase, etc. in templates Oct 9, 2023
@MustafaJafar
Copy link
Contributor

MustafaJafar commented Oct 16, 2023

Simply Brilliant!
I like the idea of adding extended formatter, I think we may be able to combine the two proposed ideas
e.g.
resolve : User -> user:upperFirst
then use your code above.

@MustafaJafar
Copy link
Contributor

I've made a quick prototype.
I believe this function can be imporved!

def my_formatter(my_dict, key):
    # Test if only the first letter is uppercase
    if key[0].isupper() and not key.isupper() :
        return my_dict[key.lower()].capitalize()
        
    # Test whether all letters are uppercase
    elif key.isupper() :
        return my_dict[key.lower()].upper()
    
    # I'm forcing lower case because someone may write stupid thing like "uSER"
    return my_dict[key.lower()]


phonebook = {"user" : "mustafa"}

print( my_formatter(phonebook, "user"))
print( my_formatter(phonebook, "User"))
print( my_formatter(phonebook, "USER"))

@BigRoy
Copy link
Collaborator Author

BigRoy commented Oct 16, 2023

resolve : User -> user:upperFirst
then use your code above.

Yes, we could do that for backwards compatibility for the time being - but of course having to parse the keys all the time would be slower than just formatting what gets requested.

Technically we could override Formatter.parse to convert certain cases as legacy like you described:

from string import Formatter

class MyFormatter(Formatter):
    
    def format_field(self, value, format_spec):
        if format_spec == "upper":
            return str(value).upper()
        elif format_spec == "lower":
            return str(value).lower()
        elif format_spec == "upperfirst":
            value = str(value)
            if value:
                # Uppercase first character, leave rest as is
                return "".join([value[0].upper(), value[1:]])
            else:
                return value
        
        return super(MyFormatter, self).format_field(value, format_spec)

    def parse(self, format_string):
        for (literal_text, field_name, format_spec, conversion) in super(MyFormatter, self).parse(format_string):
            if field_name and not format_spec:
                # Allow legacy uppercase conversions
                if field_name[0].isupper() and not field_name[1:].isupper():
                    # If field name has only uppercase first character 
                    field_name = field_name.lower()
                    format_spec = "upperfirst"
                elif field_name.isupper():
                    # All characters are uppercase
                    field_name = field_name.lower()
                    format_spec = "upper"
            yield (literal_text, field_name, format_spec, conversion)
                
        
formatter = MyFormatter()
text = """
upper: {HELLO} {WORLD}
upperfirst: {Hello} {World}
regular: {hello} {world}
"""
data = {"world": "world", "hello": "hello"}
result = formatter.format(text, **data)
print(result)

Prints:

upper: HELLO WORLD
upperfirst: Hello World
regular: hello world

@BigRoy
Copy link
Collaborator Author

BigRoy commented Oct 16, 2023

Note that the above example I provided would be problematic with formatting e.g. environment variables because the keys there are always all uppercase and thus would be considered the upper format specifier, e.g.:

formatter.format("{AVALON_TASK}", **{"AVALON_TASK": "test"})

This will result in a key error because avalon_task key is not provided.

@iLLiCiTiT
Copy link
Member

iLLiCiTiT commented Nov 9, 2023

Ad backwards compatibility. I don't think we should handle backwards compatibility in the formatting logic. We should still pass in the same data as we do now, until it's fully deprecated.

I was thinking about implementing callable attributes access in the formatter, instead of using formatting description.
That requires formatter_field_name_split from _string because it is C implementation, but I think it is more "intuitive" to use.

import re
from string import Formatter, _string


class Akwargs:
    def __init__(self, args_str):
        args, kwargs = self._parse(args_str)
        self.args = tuple(args)
        self.kwargs = kwargs

    def _convert_value(self, value):
        if value == "True":
            return True
        if value == "False":
            return False
        if value == "None":
            return None

        try:
            return float(value)
        except ValueError:
            pass

        try:
            return int(value)
        except ValueError:
            pass
        return value.strip('"')

    def _parse(self, args_str):
        parts = args_str.split(",")
        args = []
        kwargs = {}
        for part in parts:
            part = part.strip(" ")
            if not part:
                continue
            if "=" in part:
                key, value = part.split("=")
                kwargs[key] = self._convert_value(value)
            else:
                args.append(self._convert_value(part))
        return args, kwargs


class MyFormatter(Formatter):
    _char_regex = re.compile(r"[a-zA-Z0-9]")

    def _upperfirst(self, value):
        capitalized = ""
        for idx in range(len(value or "")):
            char = value[idx]
            if not self._char_regex.match(char):
                capitalized += char
            else:
                capitalized += char.upper()
                capitalized += value[idx + 1:]
                break
        return capitalized

    def _parse_func_args_kwargs(self, attr_name):
        if "(" not in attr_name:
            return False, tuple(), dict()

        attr_name, args = attr_name.split("(", 1)
        args_kwargs = Akwargs(args.strip(")"))
        return attr_name, True, args_kwargs.args, args_kwargs.kwargs

    def get_field(self, field_name, args, kwargs):
        first, rest = _string.formatter_field_name_split(field_name)

        obj = self.get_value(first, args, kwargs)

        # loop through the rest of the field_name, doing
        #  getattr or getitem as needed
        for is_attr, attr_name in rest:
            if not is_attr:
                obj = obj[attr_name]
                continue
            attr_name, is_callable, func_args, func_kwargs = (
                self._parse_func_args_kwargs(attr_name)
            )
            # Fake 'str' method
            if attr_name == "upperfirst":
                if not is_callable:
                    raise AttributeError(
                        "'str' object has no attribute 'upperfirst'"
                    )
                obj = self._upperfirst(obj)
                continue

            obj = getattr(obj, attr_name)
            if is_callable:
                obj = obj(*func_args, **func_kwargs)
        return obj, first


class TestClass:
    def get_value(self, test=True):
        if test:
            return "Value1"
        return "Value2"

# Example
template_str = "{task.upper()}/{folder[name].lower()}/{family}{task.upperfirst()}/{variant.get_value(test=False)}/v{version:03d}"
data = {
    "task": "anim",
    "asset": "char_SuperHero",
    "folder": {"name": "char_SuperHero"},
    "family": "render",
    "variant": TestClass(),
    "version": 1
}

myformatter = MyFormatter()
output = myformatter.format(template_str, **data)
print(output)

Note: It does not work if called method would have brackets (, ) or dot . in args (e.g. {some_obj.add_offset(offset=1.0, data=dict())}).

@BigRoy
Copy link
Collaborator Author

BigRoy commented Nov 9, 2023

Ad backwards compatibility. I don't think we should handle backwards compatibility in the formatting logic. We should still pass in the same data as we do now, until it's fully deprecated.

Agreed - safest way forward.


I think allowing custom scripts to be triggered will likely be:

  • More complex in code
  • More edges cases someone could run into - easier to break
  • Likely "nice" for developers but harder for admins?
  • Slower maybe?
  • But mostly, potentially dangerous if someone just happens to be able to create a string that runs random code via the templating system.

Additionally - I feel like if we're going that far with the formatting we're just as well off doing f-string evaluations or calls to eval - which also, is dangerous (and slow?).

I think we're best off designing it so that it only allows certain calls, not all - so we can document them and test them. If more is needed specifically and it's a logical request - then we implement the extra formatting.

@iLLiCiTiT
Copy link
Member

iLLiCiTiT commented Nov 9, 2023

More edges cases someone could run into

I don't see any more other edge cases than with custom descriptor...

Likely "nice" for developers but harder for admins?

I would say more admins would more understand python calls than using custom formatting descriptor. Learning curve is same if they don't understand neither.

But mostly, potentially dangerous if someone just happens to be able to create a string that runs random code via the templating system.

It would be dangerous only if you would pass dangerous objects in... Only methods can be called, not random functions as in eval (you can't call sys.exit(0) until "sys" is passed in as {"sys": sys}).

Slower maybe?

We still have to use StringTemplate because of optional keys, which is slow because it's custom implementation of template formatting.

But almost nothing of that is important. We would still need to "insert" the logic to StringTemplate, and keep it's logic. If you think it's easy, and must have now, then do it please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community contribution type: enhancement Enhancements to existing functionality
Projects
None yet
Development

No branches or pull requests

5 participants