In [1]:
import time
import pandas as pd
from py2neo import Graph, Node, Relationship
from textgenrnn import textgenrnn

Using TensorFlow backend.


In [2]:
def query_to_df(query, graph):
    print("Starting query...", end=" ")
    query_start_time = time.time()
    df = graph.run(query).to_data_frame()
    print("Done ({:.2f} minutes).".format((time.time()-query_start_time)/60))
    return df

In [3]:
graph = Graph( "bolt://matlaber5.media.mit.edu:7687", auth=('neo4j','myneo'))
print("Connected to graph database with {:,} nodes and {:,} relationships!".format(
    graph.database.primitive_counts['NumberOfNodeIdsInUse'], 
    graph.database.primitive_counts['NumberOfRelationshipIdsInUse']))

Connected to graph database with 278,432,359 nodes and 1,844,501,832 relationships!


In [26]:
def gen_query(venue):
    query =  """
    MATCH (a:Author)-[:AUTHORED]-(q:Quanta)
    WHERE q.venue = '{}'
    RETURN COLLECT(q.title) AS titles
    """.format(venue)
    return query

In [27]:
top_5 = ['Cell', 'Nature', 'Nature Biotechnology','Proceedings of the National Academy of Sciences of the United States of America','Science']
top_10 = ['Cell', 'Nature', 'Nature Biotechnology','Proceedings of the National Academy of Sciences of the United States of America','Science', 'Journal of the American Chemical Society', 'JAMA', 'The New England Journal of Medicine', 'Nature Genetics', 'Neuron']
top_42 = ['Angewandte Chemie','Blood','Cancer Cell','Cancer Discovery','Cancer Research','Cell','Cell Host & Microbe','Cell Metabolism','Cell Stem Cell','Chemistry & Biology','The EMBO Journal','Genes & Development','Immunity','Journal of Neurology','Journal of the American Chemical Society','JAMA','Journal of Biological Chemistry','Journal of Cell Biology','Journal of Clinical Investigation','Journal of Experimental Medicine','Journal of Medicinal Chemistry','The Lancet','Nature Cell Biology','Nature Chemical Biology','Nature Chemistry','Nature Medicine','Nature Methods','Nature','Nature Biotechnology','The New England Journal of Medicine','Neuron','Nature Genetics','Nature Immunology','Nature Neuroscience','Nature Structural & Molecular Biology','PLOS Biology','PLOS Genetics','PLOS Pathogens','Proceedings of the National Academy of Sciences of the United States of America','Science Signaling','Science Translational Medicine','Science']

In [41]:
for venue in top_5:
    query = gen_query(venue)
    df = query_to_df(query, graph)
    df.to_csv('C:\\Users\\Brend\\Downloads\\Title Gen\\{}_titles.csv'.format(venue), index = False, encoding = "UTF-8")

Starting query... Done (0.58 minutes).
Starting query... Done (1.60 minutes).
Starting query... Done (0.10 minutes).
Starting query... Done (1.84 minutes).
Starting query... Done (1.58 minutes).


In [97]:
import ast
import random
def gen_titles(venue, num_titles=5, num_samples=1000):
    df_titles = pd.read_csv('C:\\Users\\Brend\\Downloads\\Title Gen\\{}_titles.csv'.format(venue), encoding = "UTF-8")
    title_list = df_titles.iloc[0]['titles']
    tl = ast.literal_eval(title_list)
    print('Venue: ', venue)
    print('Number of titles:', len(tl))
    print('Sampling: ', num_samples)
    texts = []
    for i in range(num_samples):
        texts.append(random.choice(tl))
    textgen = textgenrnn()
    textgen.train_on_texts(texts, num_epochs=1, batch_size=128)
    textgen.generate_to_file('Top5_Generated\\{}_generated.txt'.format(venue),
                    n=num_titles, 
                    temperature=[.2 for i in range(num_titles)],
                    max_gen_length=100,
                    progress=False)

In [98]:
for venue in top_5:
    gen_titles(venue)

Venue:  Cell
Number of titles: 99673
Sampling:  1000
Training on 89,296 character sequences.
Epoch 1/1






####################
Temperature: 0.2
####################
A specific sequence of the specific control of the structure of the specific cell containing the specific protein in the specific protein structure of the specific containing the structure of the sequence the drosophila match interactions in the structure of the specific cell protein structure of t

A Chemoribitin Is the Enhance of the Control of the Containing the Containing Interactions of the Containing A Cell Specific Activity in The Containing the Control of the Compathy Control of the Methylate Dependent Interacting the Cell Containing The Specific Activity in A Control Structure of the 

A Cell Control of the Specific Cell Gene Cell Containing A Neuroment Cell Gene Interacting the Structure of the Containing The Dependent and Activity Interacting Containing Containing Metabole Interacting the Control of the DNA Is A Binding Cell Control of the Cell Cell Between Cell Structure Inter

####################
Temperature: 0.5




####################
Temperature: 0.2
####################
Dissensible of the and member of the contross of the and phyolanace and channels in a specification of the and heads of the and and comparison in the complex in the contact and specification of the contact and and regulated and sequence of the the and receptor of the sequence of the DNA the homosit

The sequence of the contact and metalian the comparison contally in the neuron and colon contross in the human cell colone of the regular insights in the human channel colon in the homose and the and genomic and the defect and the and the and the cancel in the thermal charged of the Massible of the

The and the comparison of the the and metal in the and phyolanast in the comparation in the and home and the and receptor and the defect and the colone in the metalogenocy of the thermal and the regulation of the charged and the contact and the cancel and the member of the comparative channel trans

####################
Temperature: 0.5




####################
Temperature: 0.2
####################
Protein of the protein control of the production of protein in the control of a complete control of cells and in the genome cells for production of the control in the reporter in the gene to the protein production of protein cells

Erratural control of cells in the production of the biotechnol production of cells in the protein production of protein controls in the france in the modifient in the control for the biome protein production of protein control of the genetic sequencing to constrite to production of control of prote

A complextive protein of the production of production of the recombinatoration of protein cells in the biomation of control of the protein control in the modifient in the modified production of cells in community production of complex of recombination of the modified production of human complextive



####################
Temperature: 0.5
####################
Errator of the Micror Human Colonic Ergense in Allectivity.

Generation of the recombinature biology to production of protein production of compretion of something quantiote plants of cells from cells and in human prosters in single genetic interaction and marker genes in the data in the teen-the origeration in the control in the topasite cell protein conjec

Engineering the regulate cells for the microarray broad quantimatic discovery in biologing in the reproducible the creection of the control for mutatoming of production of cells for convinutuals of protein of practices the neurons to consortive for based plant remotoman against the biomation cancer

####################
Temperature: 1.0
####################
The neurifies dorance in wife

Decrytome Human Gameric Biote Recoloring Wrion, Mething Draftieng CRISPOR-32, Autelficing Pviation in Reclafning (x) 2' MicRR-cells

A sequencing of e'f. CNC-Viro Sequencing activate targ





####################
Temperature: 0.2
####################
A sequence of the model of the progress of the and and molecular complex in model of the model of the endose of the progression of the progression of the dependent of the model of the model of the complex and model of the and sequence of the model of the monocyte of the different of an and sequence

A mediate progress of the induced by and protein progress of the protein controlled by modulation of the progression of the mouse the model of biological progress of the and the sequence of the protein and sequence of and sequent of an and progress and the controle sequence of the model of the mous

A signaling and molecular protein of the mouse protein and progression of the molecular progression of the maintain and sequence of the and and progress of the protein spot in the channel by model of the protein and subunity of the progress and the spectrosis of the and sequence of the model of the

####################
Temperature: 0.5




####################
Temperature: 0.2
####################
The Sereation of the Sereation of the Scientific Spectrocynogna Controllers

The Methof of Structure of the Spectroconsition of the Shepthering of the Scientific Structure of Molecular Spectrocynochy Controllers

The Spectroconsition of Structure of the Scientific Structure of The Sermination of Are A Structure of Methomens

####################
Temperature: 0.5
####################
The Human Shoes on Catmands by A Specific Chemistry of Strong Controller

Conception of bears: catchels of the and cattle tell in a churastine caters

The Accident Genes and Trynaming On Sereation

####################
Temperature: 1.0
####################
Human-Shef Sciencinal Computes Pole by Myine Detires in Last In Enomehienlanosstic Bungifility in Scientific Singary in the Skil And Life in Nuclanenar'kings in the Ferexiting petwear and anoymonerature

Carbinicity Antarloxide Hyphootchmility of Americas.

Function of analysic and Neuron, subemin