# Documentation introspection Aalysis

1. Get the documentation URL:
    This can be done by using the documentation path from the lib. I the case of pytorch, it is
    `https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear`
    and for pygnn it is `https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GCNConv`
    In both cases we just need to subtitute the URL hash fragment in the end
2. Get the docs strign andn remove unwanted sections from it, such as the Examples section
3. Figure out how many inputs and outputs the graph editor node component will have 

## 1. Get the documentation URL:

In [1]:
from app.features.model.generate import layers, featurizers

def is_from_pygnn(class_path: str) -> bool:
    return class_path.startswith('torch_geometric.')

def is_from_pytorch(class_path: str) -> bool:
    return class_path.startswith('torch.')

def get_documentation_link(class_path: str) -> str:
    if is_from_pygnn(class_path):
        return f'https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#{class_path}'
    elif is_from_pytorch(class_path):
        return f'https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#{class_path}'
    return None
    
print([ get_documentation_link(comp.name) for comp in layers + featurizers ])

[None, None, 'https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear', 'https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Sigmoid', 'https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.ReLU', 'https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.GCNConv', None]


## 2. Get the docs

In [2]:
import pandas as pd

from app.features.model.generate import layers, featurizers
from app.features.model.utils import get_class_from_path_string


alldocs = [
    (layer.name, get_class_from_path_string(layer.name).__doc__)
    for layer in layers + featurizers
]
def get_by_first_start(tuples, first_start):
    return [
        item
        for item in tuples
        if item[0].startswith(first_start)
    ]

torch_geometric_comps = get_by_first_start(alldocs, 'torch_geometric.')
torch_comps = get_by_first_start(alldocs, 'torch.')
app_comps = get_by_first_start(alldocs, 'app.')

def has_examples(docs):
    return 'Examples:' in docs


def has_args(docs):
    return 'Args:\n' in docs

def has_docs(docs):
    return docs is not None and len(docs) > 0

df = pd.DataFrame({ 'src_lib': [], 'docs': [], 'has_examples': [], 'has_args': [], 'has_examples': [] })
def info(title, stuff, df):
    for (class_path, docs) in stuff:
        df.loc[class_path] = {
            'docs': docs,
            'src_lib': title,
            'has_args': int(has_args(docs)),
            'has_examples': int(has_examples(docs))
        }
    
info('pygnn', torch_geometric_comps, df)
info('torch', torch_comps, df)
info('app', app_comps, df)
df.loc[:]

  from .autonotebook import tqdm as notebook_tqdm


Unnamed: 0,src_lib,docs,has_examples,has_args
torch_geometric.nn.GCNConv,pygnn,"The graph convolutional operator from the `""Se...",0,1
torch.nn.Linear,torch,Applies a linear transformation to the incomin...,1,1
torch.nn.Sigmoid,torch,Applies the element-wise function:\n\n .. m...,1,0
torch.nn.ReLU,torch,Applies the rectified linear unit function ele...,1,1
app.features.model.layers.GlobalPooling,app,A global pooling module that wraps the usage o...,0,1
app.features.model.layers.Concat,app,\n A helper layer that concatenates the out...,0,0
app.features.model.featurizers.MoleculeFeaturizer,app,\n Small molecule featurizer.\n Args:\n ...,0,1


In [3]:
print(df.loc['torch.nn.Linear', 'docs'])

Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``

    Shape:
        - Input: :math:`(N, *, H_{in})` where :math:`*` means any number of
          additional dimensions and :math:`H_{in} = \text{in\_features}`
        - Output: :math:`(N, *, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.

    Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias 

### 2.1 Removing Sections of the docs
To make the doc. simpler, we need to remove sections that refer exclusivelly to coding such as the Examples section

As can be seen, documentation is divided by indentation blocks.

So an ideia to remove sections is to find it's title, and the next indentation block. then slice it off

In [4]:
print(df.loc['torch_geometric.nn.GCNConv', 'docs'])

The graph convolutional operator from the `"Semi-supervised
    Classification with Graph Convolutional Networks"
    <https://arxiv.org/abs/1609.02907>`_ paper

    .. math::
        \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
        \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

    where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
    adjacency matrix with inserted self-loops and
    :math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
    The adjacency matrix can include other values than :obj:`1` representing
    edge weights via the optional :obj:`edge_weight` tensor.

    Its node-wise formulation is given by:

    .. math::
        \mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
        \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
        \hat{d}_i}} \mathbf{x}_j

    with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
    :math:`e_{j,i}` denotes the edge weight from

### 2.2 Parse the docs

In [5]:

def count_tabs(line: str):
    # naive loop over chars
    total = 0
    for c in line:
        if c == ' ' or c == '\t':
            total += 1
        else:
            break
    return total

def remove_indentation_of_section(text: str, section_title: str) -> str:
    lines = text.split('\n')
    start_idx = None
    tab_size = None
    end_idx = None
    for idx, line in enumerate(lines):
        if section_title in line:
            start_idx = idx
            end_idx = start_idx
            tab_size = count_tabs(line) + 4
            continue
        if start_idx is not None and count_tabs(line) >= tab_size:
            end_idx += 1
        elif start_idx is not None:
            break
    if start_idx is None:
        return text
    return '\n'.join(lines[:start_idx] + lines[end_idx+1:])
def test_remove_indentation():
    assert remove_indentation_of_section(r'''The graph convolutional operator from the `"Semi-supervised
    Classification with Graph Convolutional Networks"
    <https://arxiv.org/abs/1609.02907>`_ paper

    .. math::
        \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
        \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

    where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
    adjacency matrix with inserted self-loops and
    :math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
    The adjacency matrix can include other values than :obj:`1` representing
    edge weights via the optional :obj:`edge_weight` tensor.

    Its node-wise formulation is given by:

    .. math::
        \mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
        \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
        \hat{d}_i}} \mathbf{x}_j

    with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
    :math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to target
    node :obj:`i` (default: :obj:`1.0`)

    Args:
        in_channels (int): Size of each input sample, or :obj:`-1` to derive
            the size from the first input(s) to the forward method.
        out_channels (int): Size of each output sample.
        improved (bool, optional): If set to :obj:`True`, the layer computes
            :math:`\mathbf{\hat{A}}` as :math:`\mathbf{A} + 2\mathbf{I}`.
            (default: :obj:`False`)
        cached (bool, optional): If set to :obj:`True`, the layer will cache
            the computation of :math:`\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
            \mathbf{\hat{D}}^{-1/2}` on first execution, and will use the
            cached version for further executions.
            This parameter should only be set to :obj:`True` in transductive
            learning scenarios. (default: :obj:`False`)
        add_self_loops (bool, optional): If set to :obj:`False`, will not add
            self-loops to the input graph. (default: :obj:`True`)
        normalize (bool, optional): Whether to add self-loops and compute
            symmetric normalization coefficients on the fly.
            (default: :obj:`True`)
        bias (bool, optional): If set to :obj:`False`, the layer will not learn
            an additive bias. (default: :obj:`True`)
        **kwargs (optional): Additional arguments of
            :class:`torch_geometric.nn.conv.MessagePassing`.

    Shapes:''', 'Args') == r'''The graph convolutional operator from the `"Semi-supervised
    Classification with Graph Convolutional Networks"
    <https://arxiv.org/abs/1609.02907>`_ paper

    .. math::
        \mathbf{X}^{\prime} = \mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}}
        \mathbf{\hat{D}}^{-1/2} \mathbf{X} \mathbf{\Theta},

    where :math:`\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}` denotes the
    adjacency matrix with inserted self-loops and
    :math:`\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}` its diagonal degree matrix.
    The adjacency matrix can include other values than :obj:`1` representing
    edge weights via the optional :obj:`edge_weight` tensor.

    Its node-wise formulation is given by:

    .. math::
        \mathbf{x}^{\prime}_i = \mathbf{\Theta}^{\top} \sum_{j \in
        \mathcal{N}(v) \cup \{ i \}} \frac{e_{j,i}}{\sqrt{\hat{d}_j
        \hat{d}_i}} \mathbf{x}_j

    with :math:`\hat{d}_i = 1 + \sum_{j \in \mathcal{N}(i)} e_{j,i}`, where
    :math:`e_{j,i}` denotes the edge weight from source node :obj:`j` to target
    node :obj:`i` (default: :obj:`1.0`)


    Shapes:'''
    
test_remove_indentation()

## 3. Get info on the inputs and outputs of layer

The actual information we can use might be more than what I'm selecting right now.

### The Model Editor (ME)

The editor consists of a wrapper of the [react-flow library](https://reactflow.dev/)
along with other components to configure the dataset used for training and validation.

Follows a textual description of it, although the editing experience should be familiar
to those that have ever used a diagram flow editing tool:

ME is intended to build the model's architecture, which corresponds to the layers field
in [our model schema]() as well as the preprocessing of the inputs given to it during
training and prediction, which corresponds to the featurizers field in the same schema.

The initial state of the model editor are the configurations of a given `dataset`, namely
which columns are gonna be used as features and targets. In the ME, this is shown as
draggable nodes, each one labeled after the column it was originated from. Each node
must have at least one endpoint, which can be a source endpoint or a target endpoint.
The only connections allowed between nodes are from source endpoints to target ones.
Inputs/outputs are nodes with a single source/target endpoint,

Given this initial state, we can use ME options as building blocks for the rest
of the model editing. Each option is selected from `mariner`, `pytorch` and `pygnn`
libraries and may accept one or more inputs, and one or more output. For each input
it should accept one connection in it's source endpoint, and for each output, it should
require a connection from it's source endpoint.

| component      | inputs | outputs |   special conditions                               |
| -------------  | ------ | ------- | -------------------------------------------------- |
| mariner.Concat |   2    |    1    |   inputs should be tensors of same dtype           |
| torch.Linear   |   1    |    1    |   input should be Tensors                          |
| torch.GCN      |   1    |    1    |   inputs should be graph featurized smiles columns |



### How type hints may help us

In [6]:
from typing import get_type_hints
from pprint import pprint
import torch
import torch_geometric
from app.features.model import layers as mariner

linear_type_hints = get_type_hints(torch.nn.Linear.forward)
gcn_type_hints = get_type_hints(torch_geometric.nn.GCNConv.forward)
concat_type_hints = get_type_hints(mariner.Concat.forward)
global_pooling_type_hints = get_type_hints(mariner.GlobalPooling.forward)
# Each type hints is a dictionory where the keys
# are the method argument names and a special 'return'
# and the value is the detected types for each arg and the
# return type for 'return' key

pprint('Linear type hints:')
pprint(linear_type_hints)
pprint('GCNConv type hints:')
pprint(gcn_type_hints)
pprint('Concat type hints:')
pprint(concat_type_hints)
pprint('GlobalPooling type hints:')
pprint(global_pooling_type_hints)

'Linear type hints:'
{'input': <class 'torch.Tensor'>, 'return': <class 'torch.Tensor'>}
'GCNConv type hints:'
{'edge_index': typing.Union[torch.Tensor, torch_sparse.tensor.SparseTensor],
 'edge_weight': typing.Optional[torch.Tensor],
 'return': <class 'torch.Tensor'>,
 'x': <class 'torch.Tensor'>}
'Concat type hints:'
{'x1': <class 'torch.Tensor'>, 'x2': <class 'torch.Tensor'>}
'GlobalPooling type hints:'
{'batch': typing.Optional[torch.Tensor],
 'return': <class 'torch.Tensor'>,
 'size': typing.Optional[int],
 'x': <class 'torch.Tensor'>}


In [19]:
from typing import List, Literal, Union
from pydantic import BaseModel

MessagePassingRule = Literal['graph-receiver']
InputsSameTypeRule = Literal['inputs-same-type']
LayerRule = Union[
    MessagePassingRule,
    InputsSameTypeRule
]


def get_inputs_outputs_and_rules(component_cls) -> tuple[int, int, List[LayerRule]]:
    rules = []
    if (
        issubclass(component_cls, torch_geometric.nn.MessagePassing) or
        issubclass(component_cls, mariner.GlobalPooling)
    ):
        rules.append('graph-receiver')
        return 1, 1, rules
    elif issubclass(component_cls, mariner.Concat):
        rules.append('inputs-same-type')
        return 2, 1, rules
    elif issubclass(component_cls, torch.nn.Module):
        type_hints_keys = get_type_hints(component_cls.forward).keys()
        inputs = len(type_hints_keys) - int('return' in type_hints_keys)
        return inputs, 1, rules
    else:
        type_hints_keys = get_type_hints(component_cls.__call__).keys()
        inputs = len(type_hints_keys) - int('return' in type_hints_keys)
        return inputs, 1, rules
    
assert get_inputs_outputs_and_rules(torch.nn.Linear) == (1, 1, [])
assert get_inputs_outputs_and_rules(torch_geometric.nn.GCNConv) == (1, 1, ['graph-receiver'])
assert get_inputs_outputs_and_rules(mariner.Concat) == (2, 1, ['inputs-same-type'])

## All Together Now

In [20]:
from typing import Optional

class LayerAnnotations(BaseModel):
    num_inputs: int
    num_outputs: int
    docs: str
    docs_link: Optional[str] # not all layers have docs string just yet
    rules: List[LayerRule]
    
def get_annotations_from_cls(cls_path: str) -> LayerAnnotations:
    docs_link = get_documentation_link(cls_path)
    cls = get_class_from_path_string(cls_path)
    docs = remove_indentation_of_section(cls.__doc__, 'Examples:')
    inputs, outputs, rules = get_inputs_outputs_and_rules(cls)
    return LayerAnnotations(pread
        docs_link=docs_link,
        docs=docs,
        num_inputs=inputs,
        num_outputs=outputs,
        rules=rules
    )

for component in layers + featurizers:
    assert get_annotations_from_cls(component.name)

get_annotations_from_cls('torch.nn.Linear')

LayerAnnotations(num_inputs=1, num_outputs=1, docs='Applies a linear transformation to the incoming data: :math:`y = xA^T + b`\n\n    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.\n\n    Args:\n        in_features: size of each input sample\n        out_features: size of each output sample\n        bias: If set to ``False``, the layer will not learn an additive bias.\n            Default: ``True``\n\n    Shape:\n        - Input: :math:`(N, *, H_{in})` where :math:`*` means any number of\n          additional dimensions and :math:`H_{in} = \\text{in\\_features}`\n        - Output: :math:`(N, *, H_{out})` where all but the last dimension\n          are the same shape as the input and :math:`H_{out} = \\text{out\\_features}`.\n\n    Attributes:\n        weight: the learnable weights of the module of shape\n            :math:`(\\text{out\\_features}, \\text{in\\_features})`. The values are\n            initialized from :math:`\\mathcal{U}(-\\sqrt{k}, \\sqrt{k})`, where\n      