Skip to content

Generates headline given a text of data using Recurrent Neural Network

License

Notifications You must be signed in to change notification settings

vatsalgit/DeepNews

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepNews License

Automatic Headline Generation from News Articles in Hindi Language

DeepNews is a high-level headline generating tool, written in Python and capable of running on top of either Keras, TensorFlow or Theano. It was developed for media orgnizations or writters where they can quickly come up with headline that is short and information conveying.


Getting started

Installing

DeepNews in written on top of Python and Keras, ThensorFlow and Theano.

Installing Python:

Installing Keras

  • sudo pip install keras
  • Windows Based System can follow this steps Stackoverflow

Installing TensorFlow

Amazon AWS (All libraries are installed in the AMI image)

Neural networks are computations heavy, GPU configuration is recommended.


Dataset

Word2Vec (Hindi Language)

Word2Vec Link Image

Neural Network Model

Input Model

Input NN Model

Dataset Statistics

Length of Article histogram

Length of Article Histogram

Length of Headline histogram

Length of Headline Histogram

FIRE Dataset stats

features values
no of articles 2,97,965
no of tokens 85,940,081 (85.94M)
no of unique tokens in articles 3,88,449
no of unique tokens in headlines 58,448
avg length of article 272
avg length of headline 7
size of dataset 1.06GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 43

Crawled Dataset stats

features values
no of articles 5,95,847
no of tokens 20,92,32,922 (209M)
no of unique tokens in articles 10,26,083
no of unique tokens in headlines 1,24,965
avg length of article 316
avg length of headline 11
size of dataset 3.70GB
avg. of (ratio len(article)/len(headline)) (Behind 43 words of description, headline contain 1 word) 34

Number of Crawled Articles per source

News Website Number of Articles URL
Aaj Tak 92765 http://www.aajtak.intoday.in
ABP News 13654 http://www.abpnews.abplive.in
Amar Ujala 181 http://www.amarujala.com
BBC Hindi 28861 http://bbc.com/hindi
Deshbandhu 3174 http://deshbandhu.co.in
Economic Times 993 http://hindi.economictimes.indiatimes.com
Jagran 73290 http://www.jagran.com
Navbharat Times 10329 http://www.navbharattimes.indiatimes.com
NDTV 92942 http://www.khabar.ndtv.com/news/
News18 38833 http://www.news18.com
Patrika 68288 http://www.patrika.com
Punjab Kesari 15494 http://www.punjabkesari.in
Rajasthan Patrika 89038 http://www.rajasthanpatrika.patrika.com
Zee News 10463 http://www.zeenews.india.com/hindi