Skip to content
HackerNews analytics using word2vec
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
README.md
_config.yml
d3.layout.cloud.js
d3.min.js
data.json
h2-1.4.196.jar
hnstats-ewan-robertson-208059.jpg
hnstats-ewan-robertson-208059.png
logging.properties
pom.xml
terms.html

README.md

HackerNews analytics

Using available HackerNews dataset produce some insight into the most meaningful topics.

Ultimate reason

  • Most discussed topics (and yearly shift)
  • Top technology and startups

Technology behind it

  • Java
  • Deeplearning4J
  • Word2vec
  • Stanford CoreNLP (lemmatizing)

Project

Online version

Roadmap

  • Gather data (DONE)
  • Produce JSON (DONE)
  • For selected terms - related words trending through years 2007 - 2017 (DONE)
  • All terms - display counts every year (TODO)
  • Term cleanup (DONE)
  • Auto build SPA, i.e. push -> CI -> deploy (TODO)
  • Fine tune word2vec params (see below)

Source repo

Project is hosted on GitHub

Word2vec tuning

Help is welcome in fine-tuning word2vec parameters. Here is current setup:

Word2Vec vec = new Word2Vec.Builder()
                .minWordFrequency(5)
                .iterations(1)
                .layerSize(100)
                .seed(System.currentTimeMillis())
                .windowSize(5)
                .iterate(iter)
                .tokenizerFactory(t)
                .build();
You can’t perform that action at this time.