Skip to content

tr7200/Patent-text-similarity-and-venture-backed-innovation

Repository files navigation

Patent text similarity and cross-cultural venture-backed innovation

Tensorflow-Keras neural network implementation of venture capital/corporate innovations research

Tools used: Tensorflow-Keras, PlaidML (for GPU), and Hyperas for hyperparameter tuning.

This repository contains a series of predictive objects made using the data from my finance research paper "Patent text similarity and cross-cultural venture-backed innovation," which is currently under embargo for publication in the Journal of Behavioral and Experimental Finance's special issue on AI in finance. An earlier version of the paper that has the exact details of the methodology behind the research is located here.

The research finds that venture-backed portfolio companies which produce patents that are textually more similar to those of their industry peers produce both more patents and also higher quality patents, as quality is measured by the number of citations that a patent receives. Data with 4.9 million cosine similarity measures between the patents of 961 venture-backed portfolio companies from 28 different nations provide results that are able to generalize beyond the English language with the addition of a cultural-linguistic distance measure from West and Graham (2004).

Industry categories within those patents are defined by Thomson Reuters and split into six categories: biotechnology, communications & media, computer-related, medical/health/life sciences, non-high tech, and semiconductor-related types of patents. To validate these categories, I fit a bayesian latent Dirichlet allocation (LDA) topic modeling algorithm to the patent text and constrained the model to find six different distributions of words, or topics, among the patents. I then matched the LDA-derived topics to the Thomson Reuters topics using the top five most frequently occurring terms in each LDA-derived topic. Finally, I took the KL distance between the two measures and was therefore able to control for aggregate measurement error in each topic.

topic_words_matched_with_SDC_industry_categories.png

As for the VC data, it is shown in this density plot of VC deals by nation in the sample. The data in the plot is for a slightly larger sample of 1294 VC deals of which the 961 VC deals in this data are a subset. Though the represesntation of nations by year varies, the effects of the dot-com bubble in 1999 is clearly visible in the plot:

Density_of_VC_deals_by_year.png

Although inferential statistics on data such as this are the standard in academic business research, a great deal of econometrics applies to neural networks. My dissertation chapters each made use of left-censored tobits that are essentially inferential versions of ReLUs from neural networks. A co-authored journal publication of mine makes use of structural equation models, which are essentially inferential versions of causal Bayesian neural networks where the path weights are of interest, rather than predictions on new data.

Furthermore, despite the focus in finance research on such inferential statistics, predictive analytics are increasingly being used in practice and in the area of venture capital. For example, Google Ventures actively uses machine learning to predict whether or not a venture capital investment should be undertaken. Other venture capital firms are beginning to follow suit as well as venture capital shifts overseas. To that end, and to take the opportunity to apply neural networks to research-quality data, I have made predictive objects from my research available for practitioners.

About

Tensorflow-Keras implementation of JBEF.2020.100319

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published