In [1]:

import nltk
import re
from bs4 import BeautifulSoup
import urllib.request 

In [2]:
site = 'https://en.wikipedia.org/wiki/Healthcare_in_India'
html = urllib.request.urlopen(site)

In [3]:
soup = BeautifulSoup(html,'html.parser')

In [4]:
text = ''
for para in soup.find_all('p'):
    text += para.text

In [5]:
text

'\nThe Indian Constitution makes the provision of healthcare in India the responsibility of the state governments, rather than the central federal government. It makes every state responsible for "raising the level of nutrition and the standard of living of its people and the improvement of public health as among its primary duties".[1][2]\nThe National Health Policy was endorsed by the Parliament of India in 1983 and updated in 2002,  and then again updated in 2017. The recent four main updates in 2017 mentions the need to focus on the growing burden of non-communicable diseases, on the emergence of the robust healthcare industry, on growing incidences of unsustainable expenditure due to health care costs and on rising economic growth enabling enhanced fiscal capacity.[3]\nIn practice however, the private healthcare sector is responsible for the majority of healthcare in India, and most healthcare expenses are paid directly out of pocket by patients and their families, rather than thr

In [6]:
#preprocessing text
text = re.sub(r'[[0-9]*]',' ',text)
text = re.sub(r'\s+',' ',text)
clean_text = text.lower()
clean_text = re.sub(r'\W+',' ',clean_text)
clean_text = re.sub(r'\d+',' ',clean_text)
clean_text = re.sub(r'\s+',' ',clean_text)
clean_text

  


' the indian constitution makes the provision of healthcare in india the responsibility of the state governments rather than the central federal government it makes every state responsible for raising the level of nutrition and the standard of living of its people and the improvement of public health as among its primary duties the national health policy was endorsed by the parliament of india in and updated in and then again updated in the recent four main updates in mentions the need to focus on the growing burden of non communicable diseases on the emergence of the robust healthcare industry on growing incidences of unsustainable expenditure due to health care costs and on rising economic growth enabling enhanced fiscal capacity in practice however the private healthcare sector is responsible for the majority of healthcare in india and most healthcare expenses are paid directly out of pocket by patients and their families rather than through health insurance government health policy

In [7]:
#tokensize sentences
sent = nltk.sent_tokenize(text)
sent

[' The Indian Constitution makes the provision of healthcare in India the responsibility of the state governments, rather than the central federal government.',
 'It makes every state responsible for "raising the level of nutrition and the standard of living of its people and the improvement of public health as among its primary duties".',
 'The National Health Policy was endorsed by the Parliament of India in 1983 and updated in 2002, and then again updated in 2017.',
 'The recent four main updates in 2017 mentions the need to focus on the growing burden of non-communicable diseases, on the emergence of the robust healthcare industry, on growing incidences of unsustainable expenditure due to health care costs and on rising economic growth enabling enhanced fiscal capacity.',
 'In practice however, the private healthcare sector is responsible for the majority of healthcare in India, and most healthcare expenses are paid directly out of pocket by patients and their families, rather than

In [8]:
len(sent)

259

In [9]:
# for stopwords
stop_word = nltk.corpus.stopwords.words('english')

In [10]:
#making a histogram
word_count = {}
for word in nltk.word_tokenize(clean_text):
    if word not in stop_word:
        if word not in word_count.keys():
            word_count[word] = 1
        else:
            word_count[word] += 1

In [11]:
#making weighted histogram
for key in word_count.keys():
    word_count[key] = word_count[key]/max(word_count.values())

In [12]:
word_count

{'indian': 0.08928571428571429,
 'constitution': 0.008928571428571428,
 'makes': 0.017857142857142856,
 'provision': 0.03571428571428571,
 'healthcare': 1.0,
 'india': 0.6145833333333334,
 'responsibility': 0.052083333333333336,
 'state': 0.1875,
 'governments': 0.09375,
 'rather': 0.03125,
 'central': 0.020833333333333332,
 'federal': 0.020833333333333332,
 'government': 0.2708333333333333,
 'every': 0.020833333333333332,
 'responsible': 0.03125,
 'raising': 0.010416666666666666,
 'level': 0.13541666666666666,
 'nutrition': 0.03125,
 'standard': 0.041666666666666664,
 'living': 0.052083333333333336,
 'people': 0.15625,
 'improvement': 0.020833333333333332,
 'public': 0.3541666666666667,
 'health': 1.0,
 'among': 0.0851063829787234,
 'primary': 0.1276595744680851,
 'duties': 0.02127659574468085,
 'national': 0.425531914893617,
 'policy': 0.10638297872340426,
 'endorsed': 0.02127659574468085,
 'parliament': 0.0425531914893617,
 'updated': 0.0425531914893617,
 'recent': 0.085106382978723

In [15]:
#sentences with their score
#sentences = nltk.sent_tokenize(text)


sent_score = {}
for sentence in sent:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word_count.keys():
            if len(sentence.split(' ')) < 30:
                if sentence not in sent_score.keys():
                    sent_score[sentence] = word_count[word]
                else:
                    sent_score[sentence] += word_count[word]


In [16]:
sent_score

{' The Indian Constitution makes the provision of healthcare in India the responsibility of the state governments, rather than the central federal government.': 2.443452380952382,
 'The National Health Policy was endorsed by the Parliament of India in 1983 and updated in 2002, and then again updated in 2017.': 2.2954343971631204,
 'Government health policy has thus far largely encouraged private-sector expansion in conjunction with well designed but limited public health programmes.': 3.199468085106383,
 'A government-funded health insurance project was launched in 2018 by the Government of India, called Ayushman Bharat.': 2.268395390070922,
 'According to the World Bank, the total expenditure on health care as a proportion of GDP in 2015 was 3.89%.': 2.702127659574468,
 'Public healthcare is free and subsidized for those who are below the poverty line.': 1.6307624113475179,
 'The Indian public health sector encompasses 18% of total outpatient care and 44% of total inpatient care.': 3.

In [20]:
# summary
import heapq
best_sentence = heapq.nlargest(25,sent_score,sent_score.get)

In [25]:
for sentences in best_sentence:
    print(sentences)

Other previous studies have also delved into the influence of gender in terms of access to healthcare in rural areas, finding gender inequalities in access to healthcare.
Much of the public healthcare sector caters to the rural areas, and the poor quality arises from the reluctance of experienced healthcare providers to visit the rural areas.
In terms of other healthcare providers, the study found that of the qualified paramedical staff present in Madhya Pradesh, 71% performed work in the rural areas of the region.
A 2016 study by Wameq Raza et al., published in BMC Health Services Research, specifically surveyed healthcare-seeking behaviors among people in rural Bihar and Uttar Pradesh, India.
The initiative recognizes that urban healthcare is lacking due to overpopulation, exclusion of populations, lack of information on health and economic ability, and unorganized health services.
In terms of adherence to treatment, two sub-factors were investigated, persistence of treatment and tre

In [24]:
summary

['Other previous studies have also delved into the influence of gender in terms of access to healthcare in rural areas, finding gender inequalities in access to healthcare.',
 'Much of the public healthcare sector caters to the rural areas, and the poor quality arises from the reluctance of experienced healthcare providers to visit the rural areas.',
 'In terms of other healthcare providers, the study found that of the qualified paramedical staff present in Madhya Pradesh, 71% performed work in the rural areas of the region.',
 'A 2016 study by Wameq Raza et al., published in BMC Health Services Research, specifically surveyed healthcare-seeking behaviors among people in rural Bihar and Uttar Pradesh, India.',
 'The initiative recognizes that urban healthcare is lacking due to overpopulation, exclusion of populations, lack of information on health and economic ability, and unorganized health services.',
 'In terms of adherence to treatment, two sub-factors were investigated, persistenc