This shouldn't be too difficult to extend... let's define a `count_garbage` function to replace `remove_garbage`

Define `parse_group` as before, but change `remove_garbage` to `count_garbage`, returning a pair of the new input string and the number of contained characters:

In [4]:
def parse_group(input_str, currentDepth_i=1, garbageCount_i=0):
    '''
    Build the parse tree from the start of the next group. Check
    that the first character of input_str is a '{'.
    Return a list of the depths of the contained groups.
    '''
    assert input_str[0]=='{'
    out_ls=[]
    
    while input_str[0] != '}':
        # Remove the first character of the string:
        input_str=input_str[1:]
        
        # if it's a '{', then parse the next group:
        if input_str[0]=='{':
            (parse_ls, input_str, garbageCount_i)=parse_group(input_str, currentDepth_i+1, garbageCount_i)
            out_ls.append(parse_ls)
        
        # if it's a '<', parse as garbage:
        if input_str[0]=='<':
            (gc_i, input_str)=count_garbage(input_str)
            garbageCount_i += gc_i
        
    # Finally, add the current depth to the beginning 
    # of the output list, and return it, the current
    # (shortened) input string, and the number of garbage
    # characters.
    out_ls[0:0]=[currentDepth_i]
    return (out_ls, input_str[1:], garbageCount_i)

So just for the `parse_group` function (that is, without the garbage recogniser), the following test cases should give the correct answers:

In [5]:
# Should have 1 group
parse_group('{}')

([1], '', 0)

In [6]:
# Should have 3 groups
parse_group('{{{}}}')

([1, [2, [3]]], '', 0)

In [7]:
# Should have 6 groups
parse_group('{{{},{},{{}}}}')

([1, [2, [3], [3], [3, [4]]]], '', 0)

OK, seems OK so far. Let's try adding the garbage counter now:

In [16]:
def count_garbage(input_str):
    '''
    Remove garbage from the front of input_str, and return a
    pair of the string with the garbage removed and the number
     of garbage characters.
    '''
    # Check that we have the garbage opening character:
    assert input_str[0]=='<'
    
    garbageCount_i=0
    
    # Remove initial '<', 'cos we're not including
    # it in the garbage count
    input_str=input_str[1:]
    
    # Now pass over anything until the closing character:
    while input_str[0] != '>':
        # unless it's a '!', in which case ignore the following character
        if input_str[0]=='!':
            input_str=input_str[2:]
        # otherwise, remove the next character and increment the count:
        else:
            input_str=input_str[1:]
            garbageCount_i+=1
    
    # Finally, return the pair of the remaining input string
    # (without the closing bracket) and the garbage count
    return (garbageCount_i, input_str[1:])

In [18]:
# Run the test cases:

assert count_garbage('<>End of garbage')==(0, 'End of garbage')
assert count_garbage('<random characters>End of garbage')==(17, 'End of garbage')
assert count_garbage('<<<<>End of garbage')==(3, 'End of garbage')
assert count_garbage('<{!>}>End of garbage')==(2, 'End of garbage')
assert count_garbage('<!!>End of garbage')==(0, 'End of garbage')
assert count_garbage('<!!!>>End of garbage')==(0, 'End of garbage')
assert count_garbage('<{o"i!a,<{i<a>End of garbage')==(10, 'End of garbage')

Good. Now check the remaining test cases.

In [19]:
# Should have 4 garbage characters
parse_group('{<a>,<a>,<a>,<a>}')

([1], '', 4)

In [20]:
# Should have 4 garbage characters
parse_group('{{<a>},{<a>},{<a>},{<a>}}')

([1, [2], [2], [2], [2]], '', 4)

In [21]:
# Should have 13 garbage characters
parse_group('{{<!>},{<!>},{<!>},{<a>}}')

([1, [2]], '', 13)

That all seems to be behaving properly. So let's do my input:

In [22]:
with open('data/day9.txt') as fIn:
    myInput_str=fIn.read().strip()

parse_group(myInput_str)[2]

7298