# Python Code2Vec
Trying to replicate Code2Vec model using Python AST path contexts. From the [Code2Vec GitHub](https://github.com/tech-srl/code2vec#extending-to-other-languages):

> In order to extend code2vec to work with other languages, a new extractor (similar to the JavaExtractor) should be implemented, and be called by preprocess.sh. Basically, an extractor should be able to output for each directory containing source files:
> - A single text file, where each row is an example.
>  - Each example is a space-delimited list of fields, where:
>  1. The first "word" is the target label, internally delimited by the "|" character.
> 2. Each of the following words are contexts, where each context has three components separated by commas (","). Each of these components cannot include spaces nor commas. We refer to these three components as a token, a path, and another token, but in general other types of ternary contexts can be considered.

From the [Code2Seq Github](https://github.com/tech-srl/code2seq#extending-to-other-languages):

>Each path is a path between two tokens, split to path nodes (or other kinds of building blocks) using the "|" character.

>Example for a context:
>`my|key,StringExression|MethodCall|Name,get|value`

>Here `my|key` and `get|value` are tokens, and `StringExression|MethodCall|Name` is the syntactic path that connects them.

The [Python AST documentation](https://docs.python.org/3/library/ast.html) is dense but helpful

In [1]:
import ast

In [39]:
# open and read testing file
f = open('ast_parse_test.py', 'r')
file = f.read()
f.close()

# parse abstract syntax tree
module = ast.parse(file)

# list to hold nodes
function_nodes = []

# gather functions and class methods
for node in module.body:

    if isinstance(node, ast.FunctionDef):
        function_nodes.append(node)
    
    elif isinstance(node, ast.ClassDef):
        
        # loop over class body
        for class_child in node.body:
            if isinstance(class_child, ast.FunctionDef):
                function_nodes.append(class_child)

for node in function_nodes:
    print(node.name)

__init__
get_name
set_name
a_function
