ResearchPapersAnalyticsDemo

Demonstration of what we can do with corpus of research papers.

Research papers were collected crawling from google scholar using SIIT instructors names obtain from SIIT website.

Files explanation

siit_instructor_crawler: Get all of the SIIT instructor names from SIIT website and store in csv.
gscholar_crawler.py: Crawl research paper in pdf from google scholar using SIIT staff info.
Summary.ipynb: short version of everything.

Paper Content Preprocessing.ipynb: Convert many pdfs into one .csv file.
PdfPaper Analytics (Bag of Words): Using bag of words model to find similarity in research paper.
Paper Analytics with tf-idf.ipynb: Using Tf-Idf to find similarity in research paper.
Apply word2vec to papers.ipynb : Word2Vec implementation.

Data

Our data has 784 papers and 5 columns.
Each row in .csv represent on research paper.

Column Information

Name: Author's name
Content: content in papers extracted from pdf file
Directory: directory of the paper pdf
Faculty: Faculty that the professor is in

Result

Summary: [Link]

TF-IDF wordcloud: [Link]

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Apply word2vec to papers.ipynb		Apply word2vec to papers.ipynb
Paper Analytics with tf-idf.ipynb		Paper Analytics with tf-idf.ipynb
Paper Content Preprocessing.ipynb		Paper Content Preprocessing.ipynb
PdfPaper Analytics (Bag of Words).ipynb		PdfPaper Analytics (Bag of Words).ipynb
README.md		README.md
Summary.ipynb		Summary.ipynb
Word2Vec (Manual).ipynb		Word2Vec (Manual).ipynb
gscholar_crawler.py		gscholar_crawler.py
siit_instructor_crawler.py		siit_instructor_crawler.py
tfidf_wordcloud_sample.png		tfidf_wordcloud_sample.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply word2vec to papers.ipynb

Apply word2vec to papers.ipynb

Paper Analytics with tf-idf.ipynb

Paper Analytics with tf-idf.ipynb

Paper Content Preprocessing.ipynb

Paper Content Preprocessing.ipynb

PdfPaper Analytics (Bag of Words).ipynb

PdfPaper Analytics (Bag of Words).ipynb

README.md

README.md

Summary.ipynb

Summary.ipynb

Word2Vec (Manual).ipynb

Word2Vec (Manual).ipynb

gscholar_crawler.py

gscholar_crawler.py

siit_instructor_crawler.py

siit_instructor_crawler.py

tfidf_wordcloud_sample.png

tfidf_wordcloud_sample.png

Repository files navigation

ResearchPapersAnalyticsDemo

Files explanation

Data

Column Information

Result

About

Languages

teerapat-ch/ResearchPapersAnalyticsDemo

Folders and files

Latest commit

History

Repository files navigation

ResearchPapersAnalyticsDemo

Files explanation

Data

Column Information

Result

About

Topics

Resources

Stars

Watchers

Forks

Languages