# Laboratorio Sistemas Recomendadores

> **Grupo**: Andres Altamirano y Diego Quintana

> **Entrega de informe:** Miércoles 31 de Mayo


In [1]:
import os
import nltk
import sklearn
import gensim
import string
import pandas as pd
from scipy.sparse import csr_matrix
from collections import Counter
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer 
from gensim import corpora, models, similarities
from sklearn.neighbors import NearestNeighbors

In [2]:
corpus_df = pd.read_csv('./corpus1.csv', sep='\t', header=None, encoding='latin')
corpus_df.columns = ['id', 'title', 'abstract']
corpus_df = corpus_df[['id', 'title', 'abstract']]

In [3]:
stemm = False
stemmer = PorterStemmer()

def get_tokens(text):
    lowers = text.lower()
    no_punctuation = lowers.translate(string.punctuation)
    tokens = nltk.word_tokenize(no_punctuation)
    if stemm:
        tokens = map(stemmer.stem, tokens)
    return tokens

El siguiente código crea un diccionario y puede tomar un par de minutos

In [4]:
dic_file = 'resources/dictionary-stemm.p' if stemm else 'resources/dictionary.p'
if os.path.isfile(dic_file):
    dictionary = corpora.dictionary.Dictionary().load(dic_file)
else:
    dictionary = corpora.dictionary.Dictionary(documents=corpus_df.tokenised_abstract.tolist())
    dictionary.save(dic_file)
    
corpus_df['tokenised_abstract'] = corpus_df.abstract.map(get_tokens)

In [5]:
corpus_df['bow'] = corpus_df.tokenised_abstract.map(dictionary.doc2bow)
del corpus_df['tokenised_abstract']

In [6]:
corpus = corpus_df['bow'].tolist()

**(1)** Acá creamos la representación del corpus con TF-IDF

In [7]:
tfidf_model_file = 'resources/tfidf_model-stemm.p' if stemm else 'resources/tfidf_model.p'
if os.path.isfile(tfidf_model_file):
    tfidf_model = models.tfidfmodel.TfidfModel().load(tfidf_model_file)
else:
    tfidf_model = models.tfidfmodel.TfidfModel(corpus, dictionary=dictionary)
    tfidf_model.save(tfidf_model_file)

# tfidf_model = models.tfidfmodel.TfidfModel(corpus, dictionary=dictionary)
corpus_df['tf_idf'] = tfidf_model[corpus_df.bow.tolist()]

**(2)** Acá creamos la representación del corpus con LDA usando 5 tópicos (Ignoren el mensaje de Warning)

In [22]:
%%time

topic_number = 5

lda_model = models.LdaModel(corpus, num_topics=topic_number)
corpus_df['lda'] = lda_model[corpus_df.bow.tolist()]



CPU times: user 39.7 s, sys: 0 ns, total: 39.7 s
Wall time: 39.8 s


In [9]:
corpus_df['lda'] = lda_model[corpus_df.bow.tolist()]

Acá vamos a muestrear 3 documentos del corpus aleatoriamente, vamos a suponer que son los documentos favoritos de un usuario 'x'

In [10]:
samples = corpus_df.sample(3)

for n, (ix, paper) in enumerate(samples.iterrows()):
    idx, title, abstract, bow, tf_idf, lda = paper
    print '%d) %s' % (n+1, title)
    print ''
    print abstract
    print '\n' 

1) Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

This paper considers the problem of creating message-passing protocols for parallel computers. It is assumed that the processors are connected by a network that provides guaranteed delivery of every message, provided that each message delivered by the network is removed by the receiving processor unconditionally and in finite time. Two models of message-passing are considered, namely a selective model in which the receiver specifies the source of the message, and a nonselective model in which the receiver accepts messages from all sources. We consider only space-efficient protocols in which each processor has storage for a constant number of messages and message headers. We present three main results. First, we give a protocol for the selective model that performs a constant amount of communication per send or receive posted by the application. Second, we prove that no such efficient protocol exists for th

#### El siguiente es una función necesaria para procesar LDA

In [11]:
def to_sparse(matrix):
    return csr_matrix([gensim.matutils.sparse2full(row, length=N) for row in matrix]) 

##Actividad
Ejecute el código siguiente haciendo los siguientes cambios de parametros:

** nearest_neighbors : 5, 10, 20 **
¿qué efecto tiene el modelo en las recomendaciones observadas?

Eligiendo un valor fijo para nearest neighbors, ejecute
** metric = 'cosine' **
¿qué efecto tiene la métrica de distancia en las recomendaciones observadas?

Eligiendo un valor fijo de nearest_neighbors y metric
** model : 'lda' **
¿qué efecto tiene el usar LDA versus TF-IDF en las recomendaciones observadas?

Pruebe nuevamente con LDA usando sólo 5 tópicos, rehacer modelo más arriba en **(2)**
¿qué efecto tiene el número de tópicos en las recomendaciones observadas?

In [12]:
model = 'tf_idf' # 'lda' 'tf_idf'
nearest_neighbors = 5
metric = 'euclidean' # 'cosine' 'euclidean'
M = len(corpus)
N = len(dictionary)

X = to_sparse(corpus_df[model].tolist())
document_index = NearestNeighbors(n_neighbors=(nearest_neighbors+1), algorithm='brute', metric=metric).fit(X)

In [13]:
for n, (ix, paper) in enumerate(samples.iterrows()):
    dists, neighbors = document_index.kneighbors(gensim.matutils.sparse2full(paper[model],length=N))
    print paper['title']
    print ''
    print 'Documentos cercanos: '
    i = 1
    for neighbour in neighbors[0]:
        if ix != neighbour:
            line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
            print line
            i+=1
    print '\n' 



Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

Documentos cercanos: 
1. An Atomic Model for Message-Passing
2. Universal Continuous Routing Strategies
3. Trade-Offs in Implementing Optimal Message Logging Protocols
4. Object-Oriented Open Implementation of Reliable Communication Protocols
5. Systematic Design of Two-Party Authentication Protocols*


A Network Calculus with Effective Bandwidth

Documentos cercanos: 
1. A New Approach for Allocating Buffers and Bandwidth to Heterogeneous, Regulated Traffic in an ATM Node
2. Passive Traffic Measurement for IP Operations
3. Graph Wavelets for Spatial Traffic Analysis
4. On the Effect of Traffic Self-Similarity on Network Performance
5. Rapid Model Parameterization from Traffic Measurements


Hierarchical Non-linear Factor Analysis and Topographic Maps

Documentos cercanos: 
1. Self-Organizing Feature Maps with Lateral Connections: Modeling Ocular Dominance
2. Scale-space Properties of Quadratic Feature Detect

# Respuestas

Ejecute el código siguiente haciendo los siguientes cambios de parametros:


## Modificación del parámetro `nearest_neighbors`
nearest_neighbors : 5, 10, 20 ¿qué efecto tiene el modelo en las recomendaciones observadas?

### Respuesta:
Al revisar los títulos recomendados sólo se observa que aumenta la cantidad de documentos en la lista. No se encontraron diferencias mas allá de la cantidad. El título de los 5 primeros se mantienen en los dos casos siguientes, así como los 10 primeros en el último caso.

In [14]:
import time
from IPython.display import display, Markdown
# Se define arreglo con los distintos valores para nearest_neighbors
neighbors_arr=[5,10,20]

for nearest_neighbors in neighbors_arr:
    display(Markdown("# neighbors =  " + str(nearest_neighbors)))

    model = 'tf_idf' # 'lda' 'tf_idf'
    #nearest_neighbors = 5
    metric = 'euclidean' # 'cosine' 'euclidean'
    M = len(corpus)
    N = len(dictionary)

    s_time=time.time()
    X = to_sparse(corpus_df[model].tolist())
    ttts=time.time() - s_time
    
    s_time=time.time()
    document_index = NearestNeighbors(n_neighbors=(nearest_neighbors+1), algorithm='brute', metric=metric).fit(X)
    tfit=time.time() - s_time
    
    print "Tiempo to_sparse:", ttts
    print "Tiempo NN fit   :", tfit
    
    for n, (ix, paper) in enumerate(samples.iterrows()):
        dists, neighbors = document_index.kneighbors(gensim.matutils.sparse2full(paper[model],length=N))
        display(Markdown("## Titulo: " + paper['title']))
        #print paper['title']
        display(Markdown("### Documentos cercanos"))

        #print 'Documentos cercanos: '
        i = 1
        for neighbour in neighbors[0]:
            if ix != neighbour:
                line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
                print line
                i+=1
        print '\n' 

# neighbors =  5

Tiempo to_sparse: 11.3871970177
Tiempo NN fit   : 0.0441038608551




## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. An Atomic Model for Message-Passing
2. Universal Continuous Routing Strategies
3. Trade-Offs in Implementing Optimal Message Logging Protocols
4. Object-Oriented Open Implementation of Reliable Communication Protocols
5. Systematic Design of Two-Party Authentication Protocols*






## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. A New Approach for Allocating Buffers and Bandwidth to Heterogeneous, Regulated Traffic in an ATM Node
2. Passive Traffic Measurement for IP Operations
3. Graph Wavelets for Spatial Traffic Analysis
4. On the Effect of Traffic Self-Similarity on Network Performance
5. Rapid Model Parameterization from Traffic Measurements






## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Self-Organizing Feature Maps with Lateral Connections: Modeling Ocular Dominance
2. Scale-space Properties of Quadratic Feature Detectors
3. How Lateral Interaction Develops in a Self-Organizing Feature Map
4. A Unified Neural Network Model for the Self-organization of Topographic Receptive Fields and Lateral Interaction
5. Factor Analysis Using Delta-Rule Wake-Sleep Learning




# neighbors =  10

Tiempo to_sparse: 25.065032959
Tiempo NN fit   : 0.0741899013519




## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. An Atomic Model for Message-Passing
2. Universal Continuous Routing Strategies
3. Trade-Offs in Implementing Optimal Message Logging Protocols
4. Object-Oriented Open Implementation of Reliable Communication Protocols
5. Systematic Design of Two-Party Authentication Protocols*
6. A knowledge-based algorithm for the Internet protocol TCP
7. Nonblocking and Orphan-Free Message Logging Protocols
8. A Probabilistically Correct Leader Election Protocol for Large Groups
9. A Unified and Generalized Treatment of Authentication Theory
10. Distributed Constraint Reasoning under Unreliable Communication






## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. A New Approach for Allocating Buffers and Bandwidth to Heterogeneous, Regulated Traffic in an ATM Node
2. Passive Traffic Measurement for IP Operations
3. Graph Wavelets for Spatial Traffic Analysis
4. On the Effect of Traffic Self-Similarity on Network Performance
5. Rapid Model Parameterization from Traffic Measurements
6. Traffic Engineering With Traditional IP Routing Protocols
7. On the Relevance of Time Scales in Performance Oriented Traffic Characterizations
8. On Bandwidth Smoothing
9. Earliest Deadline Scheduling for Real-Time Database Systems
10. Efficient Network QoS Provisioning Based on per Node Traffic Shaping






## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Self-Organizing Feature Maps with Lateral Connections: Modeling Ocular Dominance
2. Scale-space Properties of Quadratic Feature Detectors
3. How Lateral Interaction Develops in a Self-Organizing Feature Map
4. A Unified Neural Network Model for the Self-organization of Topographic Receptive Fields and Lateral Interaction
5. Factor Analysis Using Delta-Rule Wake-Sleep Learning
6. Stereo Image Compression with Disparity Compensation Using the MRF Model
7. The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data
8. Self-Organization and Functional Role of Lateral Connections and Multisize Receptive Fields in the Primary Visual Cortex
9. Processing Images By Semi-Linear Predictability Minimization
10. Computing Stereo Disparity and Motion with Known Binocular Cell Properties




# neighbors =  20

Tiempo to_sparse: 17.0441319942
Tiempo NN fit   : 0.0351529121399




## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. An Atomic Model for Message-Passing
2. Universal Continuous Routing Strategies
3. Trade-Offs in Implementing Optimal Message Logging Protocols
4. Object-Oriented Open Implementation of Reliable Communication Protocols
5. Systematic Design of Two-Party Authentication Protocols*
6. A knowledge-based algorithm for the Internet protocol TCP
7. Nonblocking and Orphan-Free Message Logging Protocols
8. A Probabilistically Correct Leader Election Protocol for Large Groups
9. A Unified and Generalized Treatment of Authentication Theory
10. Distributed Constraint Reasoning under Unreliable Communication
11. Four Issues Concerning the Semantics of Message Flow Graphs
12. A Routing Scheme for Content-Based Networking
13. MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet
14. A Note on Redundancy in Encrypted Messages
15. Volatile Logging in n-Fault-Tolerant Distributed Systems
16. Verification Techniques for Cache Coherence Pro



## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. A New Approach for Allocating Buffers and Bandwidth to Heterogeneous, Regulated Traffic in an ATM Node
2. Passive Traffic Measurement for IP Operations
3. Graph Wavelets for Spatial Traffic Analysis
4. On the Effect of Traffic Self-Similarity on Network Performance
5. Rapid Model Parameterization from Traffic Measurements
6. Traffic Engineering With Traditional IP Routing Protocols
7. On the Relevance of Time Scales in Performance Oriented Traffic Characterizations
8. On Bandwidth Smoothing
9. Earliest Deadline Scheduling for Real-Time Database Systems
10. Efficient Network QoS Provisioning Based on per Node Traffic Shaping
11. Statistical Per-Flow Service Bounds in a Network with Aggregate Provisioning
12. Analysis of an ATM Buffer with Self-Similar (Fractal) Input Traffic
13. Experimental Queueing Analysis with Long-Range Dependent Packet Traffic
14. Link Capacity Allocation and Network Control by Filtered Input Rate in High Speed Networks
15. Fast, Approximate Synthesis of Fracti



## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Self-Organizing Feature Maps with Lateral Connections: Modeling Ocular Dominance
2. Scale-space Properties of Quadratic Feature Detectors
3. How Lateral Interaction Develops in a Self-Organizing Feature Map
4. A Unified Neural Network Model for the Self-organization of Topographic Receptive Fields and Lateral Interaction
5. Factor Analysis Using Delta-Rule Wake-Sleep Learning
6. Stereo Image Compression with Disparity Compensation Using the MRF Model
7. The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data
8. Self-Organization and Functional Role of Lateral Connections and Multisize Receptive Fields in the Primary Visual Cortex
9. Processing Images By Semi-Linear Predictability Minimization
10. Computing Stereo Disparity and Motion with Known Binocular Cell Properties
11. Using Real-Valued Genetic Algorithms to Evolve Rule Sets for Classification
12. A Simple Algorithm for Topic Identification in 0-1 Data
13. Kalman Filter-based Algorithms for E

## metric = cosine
Eligiendo un valor fijo para nearest neighbors, ejecute
** metric = 'cosine' **
¿qué efecto tiene la métrica de distancia en las recomendaciones observadas?
### Respuesta:
De la misma forma que en el caso anterior, no se ve variación mas allá de la cantidad de items recomendados.


In [15]:
nearest_neighbors=10
model = 'tf_idf' # 'lda' 'tf_idf'
metric = 'cosine' # 'cosine' 'euclidean'

display(Markdown("# `neighbors=" + str(nearest_neighbors) + "` y `metric='" + metric + "'`"))


M = len(corpus)
N = len(dictionary)

s_time=time.time()
X = to_sparse(corpus_df[model].tolist())
ttts=time.time() - s_time
s_time=time.time()
document_index = NearestNeighbors(n_neighbors=(nearest_neighbors+1), algorithm='brute', metric=metric).fit(X)
tfit=time.time() - s_time
print "Tiempo to_sparse:", ttts
print "Tiempo NN fit   :", tfit
for n, (ix, paper) in enumerate(samples.iterrows()):
    dists, neighbors = document_index.kneighbors(gensim.matutils.sparse2full(paper[model],length=N))
    display(Markdown("## Titulo: " + paper['title']))
    #print paper['title']
    display(Markdown("### Documentos cercanos"))

    #print 'Documentos cercanos: '
    i = 1
    for neighbour in neighbors[0]:
        if ix != neighbour:
            line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
            print line
            i+=1
    print '\n' 

# `neighbors=10` y `metric='cosine'`

Tiempo to_sparse: 18.8653659821
Tiempo NN fit   : 0.0730140209198




## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. An Atomic Model for Message-Passing
2. Universal Continuous Routing Strategies
3. Trade-Offs in Implementing Optimal Message Logging Protocols
4. Object-Oriented Open Implementation of Reliable Communication Protocols
5. Systematic Design of Two-Party Authentication Protocols*
6. A knowledge-based algorithm for the Internet protocol TCP
7. Nonblocking and Orphan-Free Message Logging Protocols
8. A Probabilistically Correct Leader Election Protocol for Large Groups
9. A Unified and Generalized Treatment of Authentication Theory
10. Distributed Constraint Reasoning under Unreliable Communication






## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. A New Approach for Allocating Buffers and Bandwidth to Heterogeneous, Regulated Traffic in an ATM Node
2. Passive Traffic Measurement for IP Operations
3. Graph Wavelets for Spatial Traffic Analysis
4. On the Effect of Traffic Self-Similarity on Network Performance
5. Rapid Model Parameterization from Traffic Measurements
6. Traffic Engineering With Traditional IP Routing Protocols
7. On the Relevance of Time Scales in Performance Oriented Traffic Characterizations
8. On Bandwidth Smoothing
9. Earliest Deadline Scheduling for Real-Time Database Systems
10. Efficient Network QoS Provisioning Based on per Node Traffic Shaping






## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Self-Organizing Feature Maps with Lateral Connections: Modeling Ocular Dominance
2. Scale-space Properties of Quadratic Feature Detectors
3. How Lateral Interaction Develops in a Self-Organizing Feature Map
4. A Unified Neural Network Model for the Self-organization of Topographic Receptive Fields and Lateral Interaction
5. Factor Analysis Using Delta-Rule Wake-Sleep Learning
6. Stereo Image Compression with Disparity Compensation Using the MRF Model
7. The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data
8. Self-Organization and Functional Role of Lateral Connections and Multisize Receptive Fields in the Primary Visual Cortex
9. Processing Images By Semi-Linear Predictability Minimization
10. Computing Stereo Disparity and Motion with Known Binocular Cell Properties




## model : lda
Eligiendo un valor fijo de nearest_neighbors y metric
** model : 'lda' **
¿qué efecto tiene el usar LDA versus TF-IDF en las recomendaciones observadas?

### Respuesta:
Utilizando el modelo `lda` cambian completamente los documentos recomendados. Incluso parecen alejarse de el tópico del documento elegido.

In [16]:
nearest_neighbors=10
model = 'lda' # 'lda' 'tf_idf'
metric = 'cosine' # 'cosine' 'euclidean'

display(Markdown("# `neighbors=" + str(nearest_neighbors) + "`, `metric='" + metric + "'` y `model='" + model + "'`"))

M = len(corpus)
N = len(dictionary)

X = to_sparse(corpus_df[model].tolist())

document_index = NearestNeighbors(n_neighbors=(nearest_neighbors+1), algorithm='brute', metric=metric).fit(X)


for n, (ix, paper) in enumerate(samples.iterrows()):
    dists, neighbors = document_index.kneighbors(gensim.matutils.sparse2full(paper[model],length=N))
    display(Markdown("## Titulo: " + paper['title']))
    #print paper['title']
    display(Markdown("### Documentos cercanos"))

    #print 'Documentos cercanos: '
    i = 1
    for neighbour in neighbors[0]:
        if ix != neighbour:
            line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
            print line
            i+=1
    print '\n' 

# `neighbors=10`, `metric='cosine'` y `model='lda'`



## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. How To Prove Yourself: Practical Solutions to Identification and Signature Problems
2. Definitions And Properties Of Zero-Knowledge Proof Systems
3. Smart Playing Cards: A Ubiquitous Computing Game
4. On Two-Step Routing for FPGAs
5. Learning Context Sensitive Languages with LSTM Trained with Kalman Filters
6. Performance Anomaly of 802.11b
7. Best-Path vs. Multi-Path Overlay Routing
8. Algorithm for Optimal Winner Determination in Combinatorial Auctions
9. Multicast in DKS(N, k, f) Overlay Networks
10. AIMD, Fairness and Fractal Scaling of TCP Traffic






## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. Rateless Codes and Big Downloads
2. Trajectory Based Forwarding and Its Applications
3. Strong Interaction Fairness via Randomization
4. Opportunistic Fair Scheduling over Multiple Wireless Channels
5. Focusing Search in Hierarchical Structures with Directory Sets
6. Evaluating High Accuracy Retrieval Techniques
7. Exploring the VLSI Scalability of Stream Processors
8. From Symptom to Cause: Localizing Errors in Counterexample Traces
9. A Synchronization Strategy for a Time-Triggered Multicluster Real-Time System
10. A Neural Network Based Head Tracking System






## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Data-Centric Storage in Sensornets
2. Hierarchical Wavelet Networks for Facial Feature Localization
3. Secure Identity Based Encryption without Random Oracles
4. Tail Index Estimation for Dependent Data
5. A Multigrid Approach for Hierarchical Motion Estimation
6. An Image-Based Approach To Three-Dimensional Computer Graphics
7. Image Representation Using 2D Gabor Wavelets
8. Human Pose Estimation From Silhouettes: A Consistent Approach Using Distance Level Sets
9. Epipolar Curves on Surfaces
10. Separability of Polyhedra for Optimal Filtering of Spatial and Constraint Data




## LDA y 5 tópicos
Pruebe nuevamente con LDA usando sólo 5 tópicos, rehacer modelo más arriba en **(2)**
¿qué efecto tiene el número de tópicos en las recomendaciones observadas?


### Respuesta
Se corrige `topic_number=10` dada la aclaración de Denis por correo. 
Cambió completamente la lista de recomendación. Se observa ademas que la creación del corpus en este casó tarda cerca de 11 segundos mas que con 5 tópicos.


In [21]:
%%time
# Se modifica a 5 topicos y se vuelve a ejecutar código
topic_number = 10

lda_model = models.LdaModel(corpus, num_topics=topic_number)
corpus_df['lda'] = lda_model[corpus_df.bow.tolist()]



CPU times: user 50.7 s, sys: 0 ns, total: 50.7 s
Wall time: 51 s


In [23]:
%%time

nearest_neighbors=10
model = 'lda' # 'lda' 'tf_idf'
metric = 'cosine' # 'cosine' 'euclidean'

display(Markdown("# `neighbors=" + str(nearest_neighbors) + "`, `metric='" + metric + "'` y `model='" + model + "'`"))

M = len(corpus)
N = len(dictionary)

X = to_sparse(corpus_df[model].tolist())

document_index = NearestNeighbors(n_neighbors=(nearest_neighbors+1), algorithm='brute', metric=metric).fit(X)


for n, (ix, paper) in enumerate(samples.iterrows()):
    dists, neighbors = document_index.kneighbors(gensim.matutils.sparse2full(paper[model],length=N))
    display(Markdown("## Titulo: " + paper['title']))
    #print paper['title']
    display(Markdown("### Documentos cercanos"))

    #print 'Documentos cercanos: '
    i = 1
    for neighbour in neighbors[0]:
        if ix != neighbour:
            line = str(i) + ". " + corpus_df.iloc[neighbour]['title']
            print line
            i+=1
    print '\n' 

# `neighbors=10`, `metric='cosine'` y `model='lda'`



## Titulo: Bounds on the Efficiency of Message-Passing Protocols for Parallel Computers

### Documentos cercanos

1. Satometer: How Much Have We Searched?
2. Linearizing Intuitionistic Implication
3. Asymptotic Equivalence of Density Estimation and Gaussian White Noise
4. Computer-Assisted Bounds for the Rate of Decay of Correlations
5. Compressing Polygon Mesh Geometry with Parallelogram Prediction
6. Simulating Physics with Computers
7. Precise Flow-Insensitive May-Alias Analysis is NP-Hard
8. Computing Contour Closure
9. Incremental Gaussian Processes
10. Pac-learning Recursive Logic Programs: Negative Results
11. Bottom-up Grammar Analysis - A Functional Formulation






## Titulo: A Network Calculus with Effective Bandwidth

### Documentos cercanos

1. Protocol Analysis using a timed version of SDL
2. Analysis of Sequential SLG Evaluation
3. Verification of Linear Hybrid Systems By Means of Convex Approximations
4. Real Time Inverse Kinematics for General 6R Manipulators
5. Second-Order Stability Cells of a Frictionless Rigid Body Grasped by Rigid Fingers
6. Quantum Computers and Dissipation
7. Modal Logic and Attribute Value Structures
8. On Unifying Some Cryptographic Protocol Logics
9. A Logical Semantics For Nonmonotonic Sorts
10. Verifying Programs with Unreliable Channels
11. Adding learning to the cellular development of neural networks: Evolution and the Baldwin effect.






## Titulo: Hierarchical Non-linear Factor Analysis and Topographic Maps

### Documentos cercanos

1. Greedy approximation Algorithm For constructing Shortest Common Superstring
2. Axiomatic Rewriting Theory IV - A stability theorem in Rewriting Theory
3. Computing With Very Weak Random Sources
4. Short Length Menger's Theorem and Reliable Optical Routing,
5. Gene Pool Recombination in Genetic Algorithms
6. Computing Discrete Minimal Surfaces and Their Conjugates
7. Three-Dimensional Orthogonal Graph Drawing with Optimal Volume
8. Infinitary Logic and Inductive Definability over Finite Structures
9. Branching Processes
10. Categorical Combinators for Charity
11. Space Efficient Suffix Trees


CPU times: user 3.49 s, sys: 944 ms, total: 4.43 s
Wall time: 5.05 s
