Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flor doesn't work with functions with arguments :( #31

Closed
rjurney opened this issue Dec 6, 2018 · 5 comments
Closed

Flor doesn't work with functions with arguments :( #31

rjurney opened this issue Dec 6, 2018 · 5 comments

Comments

@rjurney
Copy link

rjurney commented Dec 6, 2018

Flor won't work with functions with arguments.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-ba0e1c3e4b78> in <module>
     13     epochs=10,
     14     validation_split=0.2,
---> 15     embedding_size=100
     16 ):
     17 

~/flor/flor/interface/input/execution_tracker.py in track(f)
    103     if filename not in os.listdir(secret_dir):
    104         # Needs compilation
--> 105         with open(tru_path, 'r') as sourcefile:
    106             tree = ast.parse(sourcefile.read())
    107             logger.debug(astor.dump_tree(tree))

TypeError: 'OutStream' object is not callable
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Flatten, Dense, Embedding, Activation, Dropout, LSTM, GlobalAveragePooling1D

@flor.track
def main(
    sample_size=1000, 
    vocab_size=1000, 
    batch_size=32, 
    epochs=10, 
    validation_split=0.2,
    embedding_size=100
):
    # From Series to lists
    groups = [p for p in patents['group_id']][0:sample_size]
    num_labels = len(set(groups))
    summaries = [s for s in patents['summary']][0:sample_size]

    # Test/train split, summary sequences as X, CPC groups as Y
    train_sums, test_sums, train_labels, test_labels = train_test_split(
        summaries,
        groups
    )

    label_count = len(set(groups))

    # Tokenize the summaries
    tokenizer = Tokenizer(num_words=vocab_size)
    tokenizer.fit_on_texts(summaries) 
    x_train = tokenizer.texts_to_matrix(train_sums, mode='tfidf')
    x_test = tokenizer.texts_to_matrix(test_sums, mode='tfidf')

    encoder = LabelBinarizer()
    encoder.fit(groups)
    y_train = encoder.transform(train_labels)
    y_test = encoder.transform(test_labels)

    print('Label count: {}'.format(label_count))
    print('x_train shape:', x_train.shape)
    print('x_test shape:', x_test.shape)
    print('y_train shape:', y_train.shape)
    print('y_test shape:', y_test.shape)

    model = Sequential()

    # Original
    # model.add(Embedding(vocab_size, embedding_size, input_length=vocab_size))
    # #model.add(Dropout(0.2))
    # model.add(Flatten())
    # model.add(Dense(512, input_shape=(vocab_size,)))
    # model.add(Activation('relu'))
    # # model.add(Dense(64, input_shape=(vocab_size,)))
    # # model.add(Activation('relu'))
    # model.add(Dense(label_count, input_dim=(vocab_size,)))
    # model.add(Activation('softmax'))

    model.add(Embedding(vocab_size, embedding_size, input_length=vocab_size))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(512, input_shape=(vocab_size,)))
    model.add(Activation('relu'))
    model.add(Dropout(0.2))
    model.add(Dense(label_count, input_dim=(vocab_size,)))
    model.add(Activation('sigmoid'))

    model.compile(
        loss='categorical_crossentropy',
        optimizer='adam',
        metrics=['accuracy']
    )
    model.summary()

    history = model.fit(
        x_train, 
        y_train, 
        batch_size=batch_size, 
        epochs=epochs, 
        verbose=1, 
        validation_split=validation_split
    )

    score = model.evaluate(
        x_test,
        y_test,
        batch_size=batch_size, 
        verbose=1
    )
    print('Test accuracy:', score[1])

with flor.Context('patent'):
    main(sample_size=100)
@rjurney
Copy link
Author

rjurney commented Dec 6, 2018

I don't understand how inspect.getsourcefile can return an OutputStream. How is this possible?

https://docs.python.org/3/library/inspect.html#inspect.getsourcefile

I can't figure out what the fuck an OutputStream is! I don't know how to print(type(tru_path)) in a Jupyter notebook :(

@rlnsanz
Copy link
Collaborator

rlnsanz commented Dec 6, 2018

Thanks for bringing this issue to our attention. Are you running this code from a Jupyter Notebook cell? We don't yet support interactive environments (but we're aware of the issue #11 ). If you're running this code from Jupyter, could you try running an ordinary python script instead and see if the issue persists? Thank you!

@rjurney
Copy link
Author

rjurney commented Dec 6, 2018 via email

@rlnsanz
Copy link
Collaborator

rlnsanz commented Dec 6, 2018

Thank you, the demo was for an earlier version of Flor. We've since started doing program analysis (to significantly alleviate the burden of manual specification) so some features that were working in the past need to be re-enabled. We'll make it a priority.

@rlnsanz rlnsanz closed this as completed Dec 6, 2018
@rjurney
Copy link
Author

rjurney commented Dec 8, 2018

Cool, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants