Patent text similarity and cross-cultural venture-backed innovation

Tensorflow-Keras neural network implementation of venture capital/corporate innovations research

Tools used: Tensorflow-Keras, PlaidML (for GPU), and Hyperas for hyperparameter tuning.

This repository contains a series of predictive objects made using the data from my finance research paper "Patent text similarity and cross-cultural venture-backed innovation," which is currently under embargo for publication in the Journal of Behavioral and Experimental Finance's special issue on AI in finance. An earlier version of the paper that has the exact details of the methodology behind the research is located here.

The research finds that venture-backed portfolio companies which produce patents that are textually more similar to those of their industry peers produce both more patents and also higher quality patents, as quality is measured by the number of citations that a patent receives. Data with 4.9 million cosine similarity measures between the patents of 961 venture-backed portfolio companies from 28 different nations provide results that are able to generalize beyond the English language with the addition of a cultural-linguistic distance measure from West and Graham (2004).

Industry categories within those patents are defined by Thomson Reuters and split into six categories: biotechnology, communications & media, computer-related, medical/health/life sciences, non-high tech, and semiconductor-related types of patents. To validate these categories, I fit a bayesian latent Dirichlet allocation (LDA) topic modeling algorithm to the patent text and constrained the model to find six different distributions of words, or topics, among the patents. I then matched the LDA-derived topics to the Thomson Reuters topics using the top five most frequently occurring terms in each LDA-derived topic. Finally, I took the KL distance between the two measures and was therefore able to control for aggregate measurement error in each topic.

As for the VC data, it is shown in this density plot of VC deals by nation in the sample. The data in the plot is for a slightly larger sample of 1294 VC deals of which the 961 VC deals in this data are a subset. Though the represesntation of nations by year varies, the effects of the dot-com bubble in 1999 is clearly visible in the plot:

Although inferential statistics on data such as this are the standard in academic business research, a great deal of econometrics applies to neural networks. My dissertation chapters each made use of left-censored tobits that are essentially inferential versions of ReLUs from neural networks. A co-authored journal publication of mine makes use of structural equation models, which are essentially inferential versions of causal Bayesian neural networks where the path weights are of interest, rather than predictions on new data.

Furthermore, despite the focus in finance research on such inferential statistics, predictive analytics are increasingly being used in practice and in the area of venture capital. For example, Google Ventures actively uses machine learning to predict whether or not a venture capital investment should be undertaken. Other venture capital firms are beginning to follow suit as well as venture capital shifts overseas. To that end, and to take the opportunity to apply neural networks to research-quality data, I have made predictive objects from my research available for practitioners.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
model		model
notebooks		notebooks
src		src
test		test
.gitignore		.gitignore
Density_of_VC_deals_by_year.png		Density_of_VC_deals_by_year.png
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
topic_words_matched_with_SDC_industry_categories.png		topic_words_matched_with_SDC_industry_categories.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

notebooks

notebooks

src

src

test

test

.gitignore

.gitignore

Density_of_VC_deals_by_year.png

Density_of_VC_deals_by_year.png

README.md

README.md

environment.yml

environment.yml

requirements.txt

requirements.txt

topic_words_matched_with_SDC_industry_categories.png

topic_words_matched_with_SDC_industry_categories.png

Repository files navigation

Patent text similarity and cross-cultural venture-backed innovation

About

Releases

Packages

Contributors 2

Languages

tr7200/Patent-text-similarity-and-venture-backed-innovation

Folders and files

Latest commit

History

Repository files navigation

Patent text similarity and cross-cultural venture-backed innovation

About

Resources

Stars

Watchers

Forks

Languages