# Neural citation network for local citation recommendation

In [1]:
from ncn.evaluation import Evaluator
from ncn.data import get_datasets

In [2]:
def display(d):
    """Display a dict nice and pretty."""
    for key, value in d.items():
        print(f"Citation rank {key}|\t {value}")

## Data: Basic statistics
1. Removed 8260 triplets of paper data due to empty/missing files.  
2. Removed 1 data sample throwing regex error.  
3. Removed 161670 context samples where information was missing/could not be parsed from files.   
* This leaves __502353 context - citation pairs__ with full information.
* __Context vocabulary__ size after processing: __72046__.  
* __Title vocabulary__ size after processing: __43208__.  
* Number of __citing authors__: __28200__.  
* Number of __cited authors__: __169236__. 

![Context and title length distributions](assets/title_context_distribution.jpg)

In [3]:
path_to_weights = "/home/timo/Downloads/best_model_bn_TDNN/NCN_7_17_10.pt"
path_to_data = "/home/timo/DataSets/KD_arxiv_CS/arxiv_data.csv"

In [4]:
data = get_datasets(path_to_data)

INFO:neural_citation.data:Getting fields...
INFO:neural_citation.data:Loading dataset...
INFO:neural_citation.data:Building vocab...


## Data preprocessing with torchtext Fields

## Bucketting: What it is and why do we need it?

In [5]:
evaluator = Evaluator(path_to_weights, data, evaluate=False)

INFO:neural_citation.evaluation:INITIALIZING NEURAL CITATION NETWORK WITH AUTHORS = True
Running on: cpu
Number of model parameters: 23,533,236
Encoders: # Filters = 128, Context filter length = [4, 4, 5],  Context filter length = [1, 2]
Embeddings: Dimension = 128, Pad index = 1, Context vocab = 30002, Author vocab = 30002, Title vocab = 30004
Decoder: # GRU cells = 2, Hidden size = 128
Parameters: Dropout = 0.2
-------------------------------------------------
INFO:neural_citation.evaluation:Creating corpus in eval=False mode.
INFO:neural_citation.evaluation:Number of samples in BM25 corpus: 1054941


In [9]:
context = "Neural networks are really cool, especially if they are convolutional."
authors = "Jim Foo, Bruce Lee"

In [10]:
recomms = evaluator.recommend(context, authors)

In [11]:
display(recomms)

Citation rank 0|	 Visualizing and understanding convolutional networks
Citation rank 1|	 Imagenet classification with   deep convolutional neural networks
Citation rank 2|	 Group equivariant convolutional networks
Citation rank 3|	 Fully convolutional networks for semantic segmentation
Citation rank 4|	 Convolutional network codes


## Parameter tuning
![Context and title length distributions](assets/lecun.jpeg)

## Documentation
![Context and title length distributions](assets/Documentation.png)