<font size="5">__NLP Final Project__</font><br><br>
<font size="3">__Topic Modeling: Finding Related Articles__</font><br><br>
Team Member:
- David Kurniadi
- Rene Lizarra
- Xin Gu

In [46]:
import numpy as np 
import pandas as pd
import os
import json
import glob
import sys
import scispacy
import spacy
import joblib
from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from scipy.spatial.distance import jensenshannon
import pyLDAvis
import pyLDAvis.sklearn
from IPython.display import HTML, display
from ipywidgets import interact, Layout, HBox, VBox, Box
import ipywidgets as widgets
from IPython.display import clear_output
from tqdm import tqdm

import nltk
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import cosine_similarity

In [47]:
import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

In [48]:
pyLDAvis.enable_notebook()

In [49]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 100)

## Load and Prepare Data

### Kaggle_Covid19_All_Sources.csv

In [50]:
sys.path.insert(0, "../")
root_path = './CORD-19-research-challenge/2020-03-13/'

In [51]:
corona_features = {"doc_id": [None], "source": [None], "title": [None],
                  "abstract": [None], "text_body": [None]}
corona_df = pd.DataFrame.from_dict(corona_features)

In [52]:
corona_df.shape

(1, 5)

### Get all JSON files

In [53]:
json_filenames = glob.glob(f'{root_path}/**/*.json', recursive=True)

In [54]:
json_filenames

['./CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\0015023cc06b5362d332b3baf348d11567ca2fbb.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\004f0f8bb66cf446678dc13cf2701feec4f36d76.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\00d16927588fb04d4be0e6b269fc02f0d3c2aa7b.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\013d9d1cba8a54d5d3718c229b812d7cf91b6c89.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\01d162d7fae6aaba8e6e60e563ef4c2fca7b0e18.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\01e3b313e78a352593be2ff64927192af66619b5.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\02201e4601ab0eb70b6c26480cf2bfeae2625193.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\0255ea4b2f26a51a3bfa3bd8f3e1978c82c976d5.json',
 './CORD-19-research-challenge/2020-03-13\\biorxiv_medrxiv\\029c1c588047f1d612a219ee15494d2d19ff7439.json',
 './CORD-19-research-challen

### Iterate over the files and populate the data frame. 

In [55]:
################################################################################################
#
#    Title: Create Corona.csv File
#    Author: Frank Mitchell
#    Date: 2020
#    Code version: 1
#    Availability: https://www.kaggle.com/fmitchell259/create-corona-csv-file
# 
################################################################################################

def return_corona_df(json_filenames, df, source):

    for file_name in json_filenames:

        row = {"doc_id": None, "source": None, "title": None,
              "abstract": None, "text_body": None}

        with open(file_name) as json_data:
            data = json.load(json_data)

            row['doc_id'] = data['paper_id']
            row['title'] = data['metadata']['title']

            abstract_list = [data['abstract'][x]['text'] for x in range(len(data['abstract']) - 1)]
            abstract = "\n ".join(abstract_list)

            row['abstract'] = abstract
            
            body_list = []
            for _ in range(len(data['body_text'])):
                try:
                    body_list.append(data['body_text'][_]['text'])
                except:
                    pass

            body = "\n ".join(body_list)
            
            row['text_body'] = body
            
            if source == 'b':
                row['source'] = "biorxiv_medrxiv"
            elif source == "c":
                row['source'] = "common_use_sub"
            elif source == "n":
                row['source'] = "non_common_use"
            elif source == "p":
                row['source'] = "pmc_custom_license"
            
            df = df.append(row, ignore_index=True)
    
    return df

In [56]:
corona_df = return_corona_df(json_filenames, corona_df, 'b')

In [57]:
corona_df.shape

(13203, 5)

In [58]:
corona_out = corona_df.to_csv('kaggle_covid19.csv')

In [59]:
corona_df.head()

Unnamed: 0,doc_id,source,title,abstract,text_body
0,,,,,
1,0015023cc06b5362d332b3baf348d11567ca2fbb,biorxiv_medrxiv,The RNA pseudoknots in foot-and-mouth disease virus are dispensable for genome replication but e...,word count: 194 22 Text word count: 5168 23 24 25 author/funder. All rights reserved. No reuse a...,"VP3, and VP0 (which is further processed to VP2 and VP4 during virus assembly) (6). The P2 64 an..."
2,004f0f8bb66cf446678dc13cf2701feec4f36d76,biorxiv_medrxiv,Healthcare-resource-adjusted vulnerabilities towards the 2019-nCoV epidemic across China,,"The 2019-nCoV epidemic has spread across China and 24 other countries 1-3 as of February 8, 2020..."
3,00d16927588fb04d4be0e6b269fc02f0d3c2aa7b,biorxiv_medrxiv,"Real-time, MinION-based, amplicon sequencing for lineage typing of infectious bronchitis virus f...",Infectious bronchitis (IB) causes significant economic losses in the global poultry industry. Co...,"Infectious bronchitis (IB), which is caused by infectious bronchitis virus (IBV), is one of the ..."
4,013d9d1cba8a54d5d3718c229b812d7cf91b6c89,biorxiv_medrxiv,"Assessing spread risk of Wuhan novel coronavirus within and beyond China, January-April 2020: a ...",,"In December 2019, a cluster of patients with pneumonia of unknown cause were reported in the cit..."


### Load All Sources Metadata

In [60]:
sources = pd.read_csv('./CORD-19-research-challenge/2020-03-13/all_sources_metadata_2020-03-13.csv')

sources.drop_duplicates(subset=['sha'], inplace=True)

def doi_url(d):
    if d.startswith('http://'):
        return d
    elif d.startswith('doi.org'):
        return f'http://{d}'
    else:
        return f'http://doi.org/{d}'

sources.doi = sources.doi.fillna('').apply(doi_url)

papers = pd.read_csv('kaggle_covid19.csv')
papers = papers.iloc[1:, 1:].reset_index(drop=True)

### Merge All Data Frames

In [61]:
cols_to_use = sources.columns.difference(papers.columns)
all_data = pd.merge(papers, sources[cols_to_use], left_on='doc_id', right_on='sha', how='left')

In [62]:
all_data.to_csv('Kaggle_Covid19_All_Sources.csv')

### Upload from CSV into Data Frame for building LDA model

In [63]:
all_data = pd.read_csv('Kaggle_Covid19_All_Sources.csv')

### Data Preparations

In [64]:
all_data['publish_year'] = all_data.publish_time.str[:4].fillna(-1).astype(int) # 360 times None

all_data.title = all_data.title.astype(str) # change to string, there are also some numeric values

In [65]:
all_data.head(2)

Unnamed: 0.1,Unnamed: 0,doc_id,source,title,abstract,text_body,Microsoft Academic Paper ID,WHO #Covidence,authors,doi,has_full_text,journal,license,pmcid,publish_time,pubmed_id,sha,source_x,publish_year
0,0,0015023cc06b5362d332b3baf348d11567ca2fbb,biorxiv_medrxiv,The RNA pseudoknots in foot-and-mouth disease virus are dispensable for genome replication but e...,word count: 194 22 Text word count: 5168 23 24 25 author/funder. All rights reserved. No reuse a...,"VP3, and VP0 (which is further processed to VP2 and VP4 during virus assembly) (6). The P2 64 an...",,,"Ward, J. C. J.; Lasecka-Dykes, L.; Neil, C.; Adeyemi, O.; Gold, S.; McLean, N.; Wright, C.; Hero...",http://doi.org/10.1101/2020.01.10.901801,True,,See https://www.biorxiv.org/about-biorxiv,,2020-01-11,,0015023cc06b5362d332b3baf348d11567ca2fbb,biorxiv,2020
1,1,004f0f8bb66cf446678dc13cf2701feec4f36d76,biorxiv_medrxiv,Healthcare-resource-adjusted vulnerabilities towards the 2019-nCoV epidemic across China,,"The 2019-nCoV epidemic has spread across China and 24 other countries 1-3 as of February 8, 2020...",,,,http://doi.org/10.1101/2020.02.11.20022111,True,,See https://www.medrxiv.org/submit-a-manuscript,,,,004f0f8bb66cf446678dc13cf2701feec4f36d76,medrxiv,-1


### We consider the text body, but the approach could also be applied to the abstracts only.

In [66]:
all_texts = all_data.text_body

In [67]:
all_texts[0][:500]

'VP3, and VP0 (which is further processed to VP2 and VP4 during virus assembly) (6). The P2 64 and P3 regions encode the non-structural proteins 2B and 2C and 3A, 3B (1-3) (VPg), 3C pro and 4 structural protein-coding region is replaced by reporter genes, allow the study of genome 68 replication without the requirement for high containment (9, 10) ( figure 1A ).\n The FMDV 5′ UTR is the largest known picornavirus UTR, comprising approximately 1300 71 nucleotides and containing several highly struc'

## Latend Dirichlet Allocation

### Medium Model

In [68]:
import en_core_sci_md
nlp = en_core_sci_md.load(disable=["tagger", "parser", "ner"])
nlp.max_length = 2000000

In [69]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def spacy_tokenizer(sentence):
    return [word.lemma_ for word in nlp(sentence) if not (word.like_num or word.is_stop or word.is_punct or word.is_space)] # remove numbers (e.g. from references [1], etc.)

def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "\nTopic #%d: " % topic_idx
        message += " ".join([feature_names[i]
                             for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

### New stop words list 

In [70]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
customize_stop_words = [
    'doi', 'preprint', 'copyright', 'peer', 'reviewed', 'org', 'https', 'et', 'al', 'author', 'figure', 
    'rights', 'reserved', 'permission', 'used', 'using', 'biorxiv', 'fig', 'fig.', 'al.',
    'di', 'la', 'il', 'del', 'le', 'della', 'dei', 'delle', 'una', 'da',  'dell',  'non', 'si'
]

for w in customize_stop_words:
    nlp.vocab[w].is_stop = True

In [71]:
tf_vectorizer = CountVectorizer(tokenizer = spacy_tokenizer)

tf = tf_vectorizer.fit_transform(tqdm(all_texts))

tf.shape

100%|████████████████████████████████████████████████████████████████████████████| 13202/13202 [05:53<00:00, 37.34it/s]


(13202, 641818)

In [72]:
joblib.dump(tf_vectorizer, 'tf_vectorizer.csv')
joblib.dump(tf, 'tf.csv')

['tf.csv']

In [73]:
lda_tf = LatentDirichletAllocation(n_components=50, random_state=0, n_jobs=-1)
lda_tf.fit(tf)

LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
                          evaluate_every=-1, learning_decay=0.7,
                          learning_method='batch', learning_offset=10.0,
                          max_doc_update_iter=100, max_iter=10,
                          mean_change_tol=0.001, n_components=50, n_jobs=-1,
                          perp_tol=0.1, random_state=0, topic_word_prior=None,
                          total_samples=1000000.0, verbose=0)

In [74]:
joblib.dump(lda_tf, 'lda.csv')

['lda.csv']

In [75]:
joblib.dump(lda_tf, 'lda.pkl')

['lda.pkl']

In [76]:
lda_tf = joblib.load('lda.pkl')

## Discovered Topics

In [77]:
tfidf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda_tf, tfidf_feature_names, 25)


Topic #0: disease increase cell study effect lung cancer cause include patient tissue lesion show treatment clinical report associate find inflammation bacterium inflammatory image damage activity reduce

Topic #1: q prp cell ifit1 show datum sc resistance increase bind observe mm model line week numb antibiotic prion concentration ifit3 result complex time growth license

Topic #2: cell virus viral hiv-1 infection entry hiv fusion ifitm3 virion protein particle receptor target membrane human env antiviral t replication host inhibit glycoprotein infect envelope

Topic #3: ace2 ang ii activity ace heart increase diabetes level mouse kidney cardiac facemask effect ras receptor study renal ang-(1 plasma cardiovascular hypertension expression angiotensin inhibitor

Topic #4: cat sample fcov fip t. study c type result fipv ° show p. antibody control gondii high temperature test table min extract feline fee time

Topic #5: dog cat = study p group disease concentration clinical test sample h

Topic #48: disease human health pathogen animal infectious country population global risk emerge new control research include increase cause infection area world example development approach africa change

Topic #49: cell expression gene protein infection response level prrsv express signal show study ifn analysis pathway result immune control type virus antibody induce h c datum



### Visualization

In [78]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
viz = pyLDAvis.sklearn.prepare(lda_tf, tf, tf_vectorizer)
pyLDAvis.display(viz)
pyLDAvis.save_html(viz, 'lda.html')

In [79]:
topic_dist = pd.DataFrame(lda_tf.transform(tf))

In [80]:
topic_dist.to_csv('topic_dist.csv', index=False)

In [81]:
topic_dist.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
0,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,0.259887,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,0.44256,2.5e-05,2.5e-05,0.064784,0.231597,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05,2.5e-05
1,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,0.038465,5e-05,0.133305,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,0.012764,0.7847,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,0.028499,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05,5e-05
2,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,0.581407,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,0.003909,1e-05,1e-05,1e-05,1e-05,0.355169,0.059073,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05,1e-05
3,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,0.067244,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,0.79732,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,0.135038,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06,8e-06
4,0.005689,9e-06,0.020226,0.043477,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,0.090508,9e-06,9e-06,0.317526,9e-06,0.135642,0.063361,0.029706,9e-06,0.032777,9e-06,9e-06,0.091177,9e-06,9e-06,9e-06,9e-06,0.007241,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,0.162339,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06,9e-06


In [82]:
topic_dist.shape

(13202, 50)

## Get "Nearest" Papers (in Topic Space)

In [83]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def get_k_nearest_docs(doc_dist, k=5, lower=1950, upper=2020, only_covid19=False):
    '''
    doc_dist: topic distribution (sums to 1) of one article
    
    Returns the index of the k nearest articles (as by Jensen–Shannon divergence in topic space). 
    '''
    
    relevant_time = all_data.publish_year.between(lower, upper)
    
    if only_covid19:
        is_covid19_article = all_data.text_body.str.contains('COVID-19|SARS-CoV-2|2019-nCov')
        topic_dist_temp = topic_dist[relevant_time & is_covid19_article]
        
    else:
        topic_dist_temp = topic_dist[relevant_time]
         
    distances = topic_dist_temp.apply(lambda x: jensenshannon(x, doc_dist), axis=1)
    k_nearest = distances[distances != 0].nsmallest(n=k).index
        
    return k_nearest

## Search related papers to a chosen one

In [84]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def recommendation(doc_id, k=5, lower=1950, upper=2020, only_covid19=False):
    '''
    Returns the title of the k papers that are closest (topic-wise) to the paper given by paper_id.
    '''
    
    print(all_data.title[all_data.doc_id == doc_id].values[0])

    recommended = get_k_nearest_docs(topic_dist[all_data.doc_id == doc_id].iloc[0], k, lower, upper, only_covid19)
    recommended = all_data.iloc[recommended]
    
    h = '<br/>'.join(['<a href="' + l + '" target="_blank">'+ n + '</a>' for l, n in recommended[['doi','title']].values])
    display(HTML(h))

In [85]:
recommendation('a137eb51461b4a4ed3980aa5b9cb2f2c1cf0292a', k=5, lower=2005, upper=2018, only_covid19=False)

The effect of inhibition of PP1 and TNFα signaling on pathogenesis of SARS coronavirus


In [86]:
recommendation('a137eb51461b4a4ed3980aa5b9cb2f2c1cf0292a', k=5, lower=1950, upper=2020, only_covid19=True)

The effect of inhibition of PP1 and TNFα signaling on pathogenesis of SARS coronavirus


In [87]:
recommendation('90b5ecf991032f3918ad43b252e17d1171b4ea63', k=5, only_covid19=True)

The role of absolute humidity on transmission rates of the COVID-19 outbreak


In [88]:
recommendation('c04c7fb330a409a00f67040dde0f83b3da88eacb', k=5, only_covid19=True)

Potential inhibitors for 2019-nCoV coronavirus M protease from clinically approved medicines


In [89]:
recommendation('36521caf90f471c9da1a4e84f8562440d73ead9a', k=10)

Estimation of the epidemic properties of the 2019 novel coronavirus: A mathematical modeling study


## Widget: Pick a COVID-19-Paper

In [90]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def related_papers():
    '''
    Creates a widget where you can select one of many papers about covid-19 and then displays related articles from the whole dataset.
    '''
    covid_papers = all_data[all_data.text_body.str.contains('COVID-19|SARS-CoV-2|2019-nCov')][['doc_id', 'title']] # are there more names?
    title_to_id = covid_papers.set_index('title')['doc_id'].to_dict()
    
    def main_function(bullet, k=5, year_range=[1950, 2020], only_covid19=False):
        recommendation(title_to_id[bullet], k, lower=year_range[0], upper=year_range[1], only_covid19=only_covid19)
    
    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                                   continuous_update=False, layout=Layout(width='40%'))
    covidW = widgets.Checkbox(value=False,description='Only COVID-19-Papers',disabled=False, indent=False, layout=Layout(width='20%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='20%'))

    bulletW = widgets.Select(options=title_to_id.keys(), layout=Layout(width='90%', height='200px'), description='Title:')

    widget = widgets.interactive(main_function, bullet=bulletW, k=kWidget, year_range=yearW, only_covid19=covidW)

    controls = VBox([Box(children=[widget.children[:-1][1], widget.children[:-1][2], widget.children[:-1][3]], 
                         layout=Layout(justify_content='space-around')), widget.children[:-1][0]])
    output = widget.children[-1]
    display(VBox([controls, output]))

In [91]:
related_papers()

VBox(children=(VBox(children=(Box(children=(IntSlider(value=10, description='k', layout=Layout(width='20%'), m…

## Browse Tasks

We can now also map a task or bullet point into the topic space and find related articles that might help to solve the question at hand.

Note: Some of the bullet points are very short - results might not be reliable in this case.

(A similar approach, but with a different underlying model, can be found in CORD-19 Search articles with Doc2Vec)

In [92]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
task1 = ["Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery.",
"Prevalence of asymptomatic shedding and transmission (e.g., particularly children).",
"Seasonality of transmission.",
"Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).",
"Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).",
"Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic).",
"Natural history of the virus and shedding of it from an infected person",
"Implementation of diagnostics and products to improve clinical processes",
"Disease models, including animal models for infection, disease and transmission",
"Tools and studies to monitor phenotypic change and potential adaptation of the virus",
"Immune response and immunity",
"Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings",
 "Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings",
"Role of the environment in transmission"]

task2 = ['Data on potential risks factors',
'Smoking, pre-existing pulmonary disease',
'Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities',
'Neonates and pregnant women',
'Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.',
'Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors', 
'Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups',
'Susceptibility of populations',
'Public health mitigation measures that could be effective for control']

task3 = ['Real-time tracking of whole genomes and a mechanism for coordinating the rapid dissemination of that information to inform the development of diagnostics and therapeutics and to track variations of the virus over time.',
'Access to geographic and temporal diverse sample sets to understand geographic distribution and genomic differences, and determine whether there is more than one strain in circulation. Multi-lateral agreements such as the Nagoya Protocol could be leveraged.',
'Evidence that livestock could be infected (e.g., field surveillance, genetic sequencing, receptor binding) and serve as a reservoir after the epidemic appears to be over.',
'Evidence of whether farmers are infected, and whether farmers could have played a role in the origin.',
'Surveillance of mixed wildlife- livestock farms for SARS-CoV-2 and other coronaviruses in Southeast Asia.',
'Experimental infections to test host range for this pathogen.',
'Animal host(s) and any evidence of continued spill-over to humans',
'Socioeconomic and behavioral risk factors for this spill-over',
'Sustainable risk reduction strategies']

task4 = ["Guidance on ways to scale up NPIs in a more coordinated way (e.g., establish funding, infrastructure and authorities to support real time, authoritative (qualified participants) collaboration with all states to gain consensus on consistent guidance and to mobilize resources to geographic areas where critical shortfalls are identified) to give us time to enhance our health care delivery system capacity to respond to an increase in cases.",
"Rapid design and execution of experiments to examine and compare NPIs currently being implemented. DHS Centers for Excellence could potentially be leveraged to conduct these experiments.",
"Rapid assessment of the likely efficacy of school closures, travel bans, bans on mass gatherings of various sizes, and other social distancing approaches.",
"Methods to control the spread in communities, barriers to compliance and how these vary among different populations..",
"Models of potential interventions to predict costs and benefits that take account of such factors as race, income, disability, age, geographic location, immigration status, housing status, employment status, and health insurance status.",
"Policy changes necessary to enable the compliance of individuals with limited resources and the underserved with NPIs.",
"Research on why people fail to comply with public health advice, even if they want to do so (e.g., social or financial costs may be too high).",
"Research on the economic impact of this or any pandemic. This would include identifying policy and programmatic alternatives that lessen/mitigate risks to critical government services, food distribution and supplies, access to critical household supplies, and access to health diagnoses, treatment, and needed care, regardless of ability to pay."]

task5 = ["Effectiveness of drugs being developed and tried to treat COVID-19 patients. Clinical and bench trials to investigate less common viral inhibitors against COVID-19 such as naproxen, clarithromycin, and minocyclinethat that may exert effects on viral replication.",
"Methods evaluating potential complication of Antibody-Dependent Enhancement (ADE) in vaccine recipients.",
"Exploration of use of best animal models and their predictive value for a human vaccine.",
"Capabilities to discover a therapeutic (not vaccine) for the disease, and clinical effectiveness studies to discover therapeutics, to include antiviral agents.",
"Alternative models to aid decision makers in determining how to prioritize and distribute scarce, newly proven therapeutics as production ramps up. This could include identifying approaches for expanding production capacity to ensure equitable and timely distribution to populations in need.",
"Efforts targeted at a universal coronavirus vaccine.",
"Efforts to develop animal models and standardize challenge studies",
"Efforts to develop prophylaxis clinical studies and prioritize in healthcare workers",
"Approaches to evaluate risk for enhanced disease after vaccination",
"Assays to evaluate vaccine immune response and process development for vaccines, alongside suitable animal models [in conjunction with therapeutics]"]

task6 = ["Efforts to articulate and translate existing ethical principles and standards to salient issues in COVID-2019", 
"Efforts to embed ethics across all thematic areas, engage with novel ethical issues that arise and coordinate to minimize duplication of oversight",
"Efforts to support sustained education, access, and capacity building in the area of ethics",
"Efforts to establish a team at WHO that will be integrated within multidisciplinary research and operational platforms and that will connect with existing and expanded global networks of social sciences.",
"Efforts to develop qualitative assessment frameworks to systematically collect information related to local barriers and enablers for the uptake and adherence to public health measures for prevention and control. This includes the rapid identification of the secondary impacts of these measures. (e.g. use of surgical masks, modification of health seeking behaviors for SRH, school closures)",
"Efforts to identify how the burden of responding to the outbreak and implementing public health measures affects the physical and psychological health of those providing care for Covid-19 patients and identify the immediate needs that must be addressed.",
"Efforts to identify the underlying drivers of fear, anxiety and stigma that fuel misinformation and rumor, particularly through social media."]

task7 = ["How widespread current exposure is to be able to make immediate policy recommendations on mitigation measures. Denominators for testing and a mechanism for rapidly sharing that information, including demographics, to the extent possible. Sampling methods to determine asymptomatic disease (e.g., use of serosurveys (such as convalescent samples) and early detection of disease (e.g., use of screening of neutralizing antibodies such as ELISAs).",
"Efforts to increase capacity on existing diagnostic platforms and tap into existing surveillance platforms.",
"Recruitment, support, and coordination of local expertise and capacity (public, private—commercial, and non-profit, including academic), including legal, ethical, communications, and operational issues.",
"National guidance and guidelines about best practices to states (e.g., how states might leverage universities and private laboratories for testing purposes, communications to public health officials and the public).",
"Development of a point-of-care test (like a rapid influenza test) and rapid bed-side tests, recognizing the tradeoffs between speed, accessibility, and accuracy.",
"Rapid design and execution of targeted surveillance experiments calling for all potential testers using PCR in a defined area to start testing and report to a specific entity. These experiments could aid in collecting longitudinal samples, which are critical to understanding the impact of ad hoc local interventions (which also need to be recorded).",
"Separation of assay development issues from instruments, and the role of the private sector to help quickly migrate assays onto those devices.",
"Efforts to track the evolution of the virus (i.e., genetic drift or mutations) and avoid locking into specific reagents and surveillance/detection schemes.",
"Latency issues and when there is sufficient viral load to detect the pathogen, and understanding of what is needed in terms of biological and environmental sampling.",
"Use of diagnostics such as host response markers (e.g., cytokines) to detect early disease or predict severe disease progression, which would be important to understanding best clinical practice and efficacy of therapeutic interventions.",
"Policies and protocols for screening and testing.",
"Policies to mitigate the effects on supplies associated with mass testing, including swabs and reagents.",
"Technology roadmap for diagnostics.",
"Barriers to developing and scaling up new diagnostic tests (e.g., market forces), how future coalition and accelerator models (e.g., Coalition for Epidemic Preparedness Innovations) could provide critical funding for diagnostics, and opportunities for a streamlined regulatory environment.",
"New platforms and technology (e.g., CRISPR) to improve response times and employ more holistic approaches to COVID-19 and future diseases.",
"Coupling genomics and diagnostic testing on a large scale.",
"Enhance capabilities for rapid sequencing and bioinformatics to target regions of the genome that will allow specificity for a particular variant.",
"Enhance capacity (people, technology, data) for sequencing with advanced analytics for unknown pathogens, and explore capabilities for distinguishing naturally-occurring pathogens from intentional.",
"One Health surveillance of humans and potential sources of future spillover or ongoing exposure for this organism and future pathogens, including both evolutionary hosts (e.g., bats) and transmission hosts (e.g., heavily trafficked and farmed wildlife and domestic food and companion species), inclusive of environmental, demographic, and occupational risk factors."]

task8 = ["Resources to support skilled nursing facilities and long term care facilities.",
"Mobilization of surge medical staff to address shortages in overwhelmed communities",
"Age-adjusted mortality data for Acute Respiratory Distress Syndrome (ARDS) with/without other organ failure – particularly for viral etiologies",
"Extracorporeal membrane oxygenation (ECMO) outcomes data of COVID-19 patients",
"Outcomes data for COVID-19 after mechanical ventilation adjusted for age.",
"Knowledge of the frequency, manifestations, and course of extrapulmonary manifestations of COVID-19, including, but not limited to, possible cardiomyopathy and cardiac arrest.",
"Application of regulatory standards (e.g., EUA, CLIA) and ability to adapt care to crisis standards of care level.",
"Approaches for encouraging and facilitating the production of elastomeric respirators, which can save thousands of N95 masks.",
"Best telemedicine practices, barriers and faciitators, and specific actions to remove/expand them within and across state boundaries.",
"Guidance on the simple things people can do at home to take care of sick people and manage disease.",
"Oral medications that might potentially work.",
"Use of AI in real-time health care delivery to evaluate interventions, risk factors, and outcomes in a way that could not be done manually.",
"Best practices and critical challenges and innovative solutions and technologies in hospital flow and organization, workforce protection, workforce allocation, community-based support resources, payment, and supply chain management to enhance capacity, efficiency, and outcomes.",
"Efforts to define the natural history of disease to inform clinical care, public health interventions, infection prevention control, transmission, and clinical trials",
"Efforts to develop a core clinical outcome set to maximize usability of data across a range of trials",
"Efforts to determine adjunctive and supportive interventions that can improve the clinical outcomes of infected patients (e.g. steroids, high flow oxygen)"]

task9 = ["Methods for coordinating data-gathering with standardized nomenclature.",
"Sharing response information among planners, providers, and others.",
"Understanding and mitigating barriers to information-sharing.",
"How to recruit, support, and coordinate local (non-Federal) expertise and capacity relevant to public health emergency response (public, private, commercial and non-profit, including academic).",
"Integration of federal/state/local public health surveillance systems.",
"Value of investments in baseline public health response infrastructure preparedness",
"Modes of communicating with target high-risk populations (elderly, health care workers).",
"Risk communication and guidelines that are easy to understand and follow (include targeting at risk populations’ families too).",
"Communication that indicates potential risk of disease to all population groups.",
"Misunderstanding around containment and mitigation.",
"Action plan to mitigate gaps and problems of inequity in the Nation’s public health capability, capacity, and funding to ensure all citizens in need are supported and can access information, surveillance, and treatment.",
"Measures to reach marginalized and disadvantaged populations.",
"Data systems and research priorities and agendas incorporate attention to the needs and circumstances of disadvantaged populations and underrepresented minorities.",
"Mitigating threats to incarcerated people from COVID-19, assuring access to information, prevention, diagnosis, and treatment.",
"Understanding coverage policies (barriers and opportunities) related to testing, treatment, and care"]

In [93]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def relevant_articles(tasks, k=3, lower=1950, upper=2020, only_covid19=False):
    tasks = [tasks] if type(tasks) is str else tasks 
    
    tasks_tf = tf_vectorizer.transform(tasks)
    tasks_topic_dist = pd.DataFrame(lda_tf.transform(tasks_tf))

    for index, bullet in enumerate(tasks):
        print(bullet)
        recommended = get_k_nearest_docs(tasks_topic_dist.iloc[index], k, lower, upper, only_covid19)
        recommended = all_data.iloc[recommended]

        h = '<br/>'.join(['<a href="' + l + '" target="_blank">'+ n + '</a>' for l, n in recommended[['doi','title']].values])
        display(HTML(h))

### What is known about transmission, incubation, and environmental stability?

In [94]:
relevant_articles(task1, 5, only_covid19=True)

Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery.


Prevalence of asymptomatic shedding and transmission (e.g., particularly children).


Seasonality of transmission.


Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).


Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).


Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic).


Natural history of the virus and shedding of it from an infected person


Implementation of diagnostics and products to improve clinical processes


Disease models, including animal models for infection, disease and transmission


Tools and studies to monitor phenotypic change and potential adaptation of the virus


Immune response and immunity


Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings


Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings


Role of the environment in transmission


### What do we know about COVID-19 risk factors?

In [95]:
relevant_articles(task2, 5, only_covid19=True)

Data on potential risks factors


Smoking, pre-existing pulmonary disease


Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities


Neonates and pregnant women


Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.


Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors


Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups


Susceptibility of populations


Public health mitigation measures that could be effective for control


### What do we know about virus genetics, origin, and evolution?

In [96]:
relevant_articles(task3, 5, only_covid19=True)

Real-time tracking of whole genomes and a mechanism for coordinating the rapid dissemination of that information to inform the development of diagnostics and therapeutics and to track variations of the virus over time.


Access to geographic and temporal diverse sample sets to understand geographic distribution and genomic differences, and determine whether there is more than one strain in circulation. Multi-lateral agreements such as the Nagoya Protocol could be leveraged.


Evidence that livestock could be infected (e.g., field surveillance, genetic sequencing, receptor binding) and serve as a reservoir after the epidemic appears to be over.


Evidence of whether farmers are infected, and whether farmers could have played a role in the origin.


Surveillance of mixed wildlife- livestock farms for SARS-CoV-2 and other coronaviruses in Southeast Asia.


Experimental infections to test host range for this pathogen.


Animal host(s) and any evidence of continued spill-over to humans


Socioeconomic and behavioral risk factors for this spill-over


Sustainable risk reduction strategies


### What do we know about non-pharmaceutical interventions?

In [97]:
relevant_articles(task4, 5, only_covid19=True)

Guidance on ways to scale up NPIs in a more coordinated way (e.g., establish funding, infrastructure and authorities to support real time, authoritative (qualified participants) collaboration with all states to gain consensus on consistent guidance and to mobilize resources to geographic areas where critical shortfalls are identified) to give us time to enhance our health care delivery system capacity to respond to an increase in cases.


Rapid design and execution of experiments to examine and compare NPIs currently being implemented. DHS Centers for Excellence could potentially be leveraged to conduct these experiments.


Rapid assessment of the likely efficacy of school closures, travel bans, bans on mass gatherings of various sizes, and other social distancing approaches.


Methods to control the spread in communities, barriers to compliance and how these vary among different populations..


Models of potential interventions to predict costs and benefits that take account of such factors as race, income, disability, age, geographic location, immigration status, housing status, employment status, and health insurance status.


Policy changes necessary to enable the compliance of individuals with limited resources and the underserved with NPIs.


Research on why people fail to comply with public health advice, even if they want to do so (e.g., social or financial costs may be too high).


Research on the economic impact of this or any pandemic. This would include identifying policy and programmatic alternatives that lessen/mitigate risks to critical government services, food distribution and supplies, access to critical household supplies, and access to health diagnoses, treatment, and needed care, regardless of ability to pay.


### What do we know about vaccines and therapeutics?

In [98]:
relevant_articles(task5, 5, only_covid19=True)

Effectiveness of drugs being developed and tried to treat COVID-19 patients. Clinical and bench trials to investigate less common viral inhibitors against COVID-19 such as naproxen, clarithromycin, and minocyclinethat that may exert effects on viral replication.


Methods evaluating potential complication of Antibody-Dependent Enhancement (ADE) in vaccine recipients.


Exploration of use of best animal models and their predictive value for a human vaccine.


Capabilities to discover a therapeutic (not vaccine) for the disease, and clinical effectiveness studies to discover therapeutics, to include antiviral agents.


Alternative models to aid decision makers in determining how to prioritize and distribute scarce, newly proven therapeutics as production ramps up. This could include identifying approaches for expanding production capacity to ensure equitable and timely distribution to populations in need.


Efforts targeted at a universal coronavirus vaccine.


Efforts to develop animal models and standardize challenge studies


Efforts to develop prophylaxis clinical studies and prioritize in healthcare workers


Approaches to evaluate risk for enhanced disease after vaccination


Assays to evaluate vaccine immune response and process development for vaccines, alongside suitable animal models [in conjunction with therapeutics]


### What has been published about ethical and social science considerations?

In [99]:
relevant_articles(task6, 5, only_covid19=True)

Efforts to articulate and translate existing ethical principles and standards to salient issues in COVID-2019


Efforts to embed ethics across all thematic areas, engage with novel ethical issues that arise and coordinate to minimize duplication of oversight


Efforts to support sustained education, access, and capacity building in the area of ethics


Efforts to establish a team at WHO that will be integrated within multidisciplinary research and operational platforms and that will connect with existing and expanded global networks of social sciences.


Efforts to develop qualitative assessment frameworks to systematically collect information related to local barriers and enablers for the uptake and adherence to public health measures for prevention and control. This includes the rapid identification of the secondary impacts of these measures. (e.g. use of surgical masks, modification of health seeking behaviors for SRH, school closures)


Efforts to identify how the burden of responding to the outbreak and implementing public health measures affects the physical and psychological health of those providing care for Covid-19 patients and identify the immediate needs that must be addressed.


Efforts to identify the underlying drivers of fear, anxiety and stigma that fuel misinformation and rumor, particularly through social media.


### What do we know about diagnostics and surveillance?

In [100]:
relevant_articles(task7, 5, only_covid19=True)

How widespread current exposure is to be able to make immediate policy recommendations on mitigation measures. Denominators for testing and a mechanism for rapidly sharing that information, including demographics, to the extent possible. Sampling methods to determine asymptomatic disease (e.g., use of serosurveys (such as convalescent samples) and early detection of disease (e.g., use of screening of neutralizing antibodies such as ELISAs).


Efforts to increase capacity on existing diagnostic platforms and tap into existing surveillance platforms.


Recruitment, support, and coordination of local expertise and capacity (public, private—commercial, and non-profit, including academic), including legal, ethical, communications, and operational issues.


National guidance and guidelines about best practices to states (e.g., how states might leverage universities and private laboratories for testing purposes, communications to public health officials and the public).


Development of a point-of-care test (like a rapid influenza test) and rapid bed-side tests, recognizing the tradeoffs between speed, accessibility, and accuracy.


Rapid design and execution of targeted surveillance experiments calling for all potential testers using PCR in a defined area to start testing and report to a specific entity. These experiments could aid in collecting longitudinal samples, which are critical to understanding the impact of ad hoc local interventions (which also need to be recorded).


Separation of assay development issues from instruments, and the role of the private sector to help quickly migrate assays onto those devices.


Efforts to track the evolution of the virus (i.e., genetic drift or mutations) and avoid locking into specific reagents and surveillance/detection schemes.


Latency issues and when there is sufficient viral load to detect the pathogen, and understanding of what is needed in terms of biological and environmental sampling.


Use of diagnostics such as host response markers (e.g., cytokines) to detect early disease or predict severe disease progression, which would be important to understanding best clinical practice and efficacy of therapeutic interventions.


Policies and protocols for screening and testing.


Policies to mitigate the effects on supplies associated with mass testing, including swabs and reagents.


Technology roadmap for diagnostics.


Barriers to developing and scaling up new diagnostic tests (e.g., market forces), how future coalition and accelerator models (e.g., Coalition for Epidemic Preparedness Innovations) could provide critical funding for diagnostics, and opportunities for a streamlined regulatory environment.


New platforms and technology (e.g., CRISPR) to improve response times and employ more holistic approaches to COVID-19 and future diseases.


Coupling genomics and diagnostic testing on a large scale.


Enhance capabilities for rapid sequencing and bioinformatics to target regions of the genome that will allow specificity for a particular variant.


Enhance capacity (people, technology, data) for sequencing with advanced analytics for unknown pathogens, and explore capabilities for distinguishing naturally-occurring pathogens from intentional.


One Health surveillance of humans and potential sources of future spillover or ongoing exposure for this organism and future pathogens, including both evolutionary hosts (e.g., bats) and transmission hosts (e.g., heavily trafficked and farmed wildlife and domestic food and companion species), inclusive of environmental, demographic, and occupational risk factors.


### What has been published about medical care?

In [101]:
relevant_articles(task8, 5, only_covid19=True)

Resources to support skilled nursing facilities and long term care facilities.


Mobilization of surge medical staff to address shortages in overwhelmed communities


Age-adjusted mortality data for Acute Respiratory Distress Syndrome (ARDS) with/without other organ failure – particularly for viral etiologies


Extracorporeal membrane oxygenation (ECMO) outcomes data of COVID-19 patients


Outcomes data for COVID-19 after mechanical ventilation adjusted for age.


Knowledge of the frequency, manifestations, and course of extrapulmonary manifestations of COVID-19, including, but not limited to, possible cardiomyopathy and cardiac arrest.


Application of regulatory standards (e.g., EUA, CLIA) and ability to adapt care to crisis standards of care level.


Approaches for encouraging and facilitating the production of elastomeric respirators, which can save thousands of N95 masks.


Best telemedicine practices, barriers and faciitators, and specific actions to remove/expand them within and across state boundaries.


Guidance on the simple things people can do at home to take care of sick people and manage disease.


Oral medications that might potentially work.


Use of AI in real-time health care delivery to evaluate interventions, risk factors, and outcomes in a way that could not be done manually.


Best practices and critical challenges and innovative solutions and technologies in hospital flow and organization, workforce protection, workforce allocation, community-based support resources, payment, and supply chain management to enhance capacity, efficiency, and outcomes.


Efforts to define the natural history of disease to inform clinical care, public health interventions, infection prevention control, transmission, and clinical trials


Efforts to develop a core clinical outcome set to maximize usability of data across a range of trials


Efforts to determine adjunctive and supportive interventions that can improve the clinical outcomes of infected patients (e.g. steroids, high flow oxygen)


### What has been published about information sharing and inter-sectoral collaboration?

In [102]:
relevant_articles(task9, 5, only_covid19=True)

Methods for coordinating data-gathering with standardized nomenclature.


Sharing response information among planners, providers, and others.


Understanding and mitigating barriers to information-sharing.


How to recruit, support, and coordinate local (non-Federal) expertise and capacity relevant to public health emergency response (public, private, commercial and non-profit, including academic).


Integration of federal/state/local public health surveillance systems.


Value of investments in baseline public health response infrastructure preparedness


Modes of communicating with target high-risk populations (elderly, health care workers).


Risk communication and guidelines that are easy to understand and follow (include targeting at risk populations’ families too).


Communication that indicates potential risk of disease to all population groups.


Misunderstanding around containment and mitigation.


Action plan to mitigate gaps and problems of inequity in the Nation’s public health capability, capacity, and funding to ensure all citizens in need are supported and can access information, surveillance, and treatment.


Measures to reach marginalized and disadvantaged populations.


Data systems and research priorities and agendas incorporate attention to the needs and circumstances of disadvantaged populations and underrepresented minorities.


Mitigating threats to incarcerated people from COVID-19, assuring access to information, prevention, diagnosis, and treatment.


Understanding coverage policies (barriers and opportunities) related to testing, treatment, and care


## Widget: Pick a Task

In [103]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def relevant_articles_for_task():
    tasks={'What is known about transmission, incubation, and environmental stability?': task1,
           'What do we know about COVID-19 risk factors?': task2, 
           'What do we know about virus genetics, origin, and evolution?': task3, 
           'What do we know about non-pharmaceutical interventions?': task4,
           'What do we know about vaccines and therapeutics?': task5, 
           'What has been published about ethical and social science considerations?': task6, 
           'What do we know about diagnostics and surveillance?': task7,
           'What has been published about medical care?': task8, 
           'What has been published about information sharing and inter-sectoral collaboration?': task9}

    def main_function(bullet, task, k=5, year_range=[1950, 2020], only_covid19=False):
        relevant_articles([bullet], k, lower=year_range[0], upper=year_range[1], only_covid19=only_covid19)
        bulletW.options = tasks[task]    

    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                                   continuous_update=False, layout=Layout(width='40%'))
    covidW = widgets.Checkbox(value=True,description='Only COVID-19-Papers',disabled=False, indent=False, layout=Layout(width='20%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='30%'))

    taskW = widgets.Dropdown(options=tasks.keys(), layout=Layout(width='90%', height='50px'), description='Task:')
    init = taskW.value
    bulletW = widgets.Select(options=tasks[init], layout=Layout(width='90%', height='200px'), description='Bullet Point:')

    widget = widgets.interactive(main_function, task=taskW, bullet=bulletW, k=kWidget, year_range=yearW, only_covid19=covidW)
    
    controls = VBox([HBox([widget.children[2], widget.children[3], widget.children[4]], layout=Layout(width='90%', justify_content='space-around')),
                     widget.children[1],
                     widget.children[0]], layout=Layout(align_items='center'))
    
    output = widget.children[-1]
    display(VBox([controls, output]))

In [104]:
relevant_articles_for_task()

VBox(children=(VBox(children=(HBox(children=(IntSlider(value=10, description='k', layout=Layout(width='30%'), …

## Widget: Free Text Search

In [105]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def relevant_articles_for_text():    
    textW = widgets.Textarea(
        value='',
        placeholder='Type something',
        description='',
        disabled=False,
        layout=Layout(width='90%', height='200px')
    )

    yearW = widgets.IntRangeSlider(min=1950, max=2020, value=[2010, 2020], description='Year Range', 
                               continuous_update=False, layout=Layout(width='40%'))
    covidW = widgets.Checkbox(value=True,description='Only COVID-19-Papers',disabled=False, indent=False, layout=Layout(width='25%'))
    kWidget = widgets.IntSlider(value=10, description='k', max=50, min=1, layout=Layout(width='25%'))

    button = widgets.Button(description="Search")

    display(VBox([HBox([kWidget, yearW, covidW], layout=Layout(width='90%', justify_content='space-around')),
        textW, button], layout=Layout(align_items='center')))

    def on_button_clicked(b):
        clear_output()
        display(VBox([HBox([kWidget, yearW, covidW], layout=Layout(width='90%', justify_content='space-around')),
            textW, button], layout=Layout(align_items='center')))        
        relevant_articles(textW.value, kWidget.value, yearW.value[0], yearW.value[1], covidW.value)

    button.on_click(on_button_clicked)

In [106]:
relevant_articles_for_text()

VBox(children=(HBox(children=(IntSlider(value=10, description='k', layout=Layout(width='25%'), max=50, min=1),…

covid 19 covid-19 corona virus


## Finding Related Sentences (GloVe and cosine_similarity)

In [107]:
from scipy.spatial import distance
from scipy import spatial

### Load Data "Kaggle_Covid19_All_Sources.csv"

In [108]:
all_data = pd.read_csv('Kaggle_Covid19_All_Sources.csv')

In [109]:
all_data['publish_year'] = all_data.publish_time.str[:4].fillna(-1).astype(int)

In [110]:
tf = joblib.load('tf.csv')

In [111]:
lda_tf = joblib.load('lda.pkl')

In [112]:
topic_dist = pd.read_csv('topic_dist.csv')

### Split Text into Sentences

In [113]:
sentences = []
for s in all_data.text_body:
    sentences.append(sent_tokenize(s))
sentences = [y for x in sentences for y in x]

### Extract word vectors

Download GloVe: Global Vectors for Word Representation Wikipedia 2014 + Gigaword 5
Ref: Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. 


In [66]:
!unzip glove*.zip

Archive:  glove.6B.zip
  inflating: glove.6B.50d.txt        
  inflating: glove.6B.100d.txt       
  inflating: glove.6B.200d.txt       
  inflating: glove.6B.300d.txt       


In [114]:
#########################################################################################################################
#
#    Title: An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)
#    Author: Prateek Joshi
#    Date: 2018
#    Code version: 1
#    Availability: https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
#########################################################################################################################

# Extract word vectors
word_embeddings = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    word_embeddings[word] = coefs
f.close()

### Clean sentences

In [115]:
def fun_clean_sentences(sentences):

    # remove punctuations, numbers and special characters
    clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")

    # make alphabets lowercase
    clean_sentences = [s.lower() for s in clean_sentences]
    stop_words = stopwords.words('english')
    stop_words.extend(['background', 'methods', 'introduction', 'conclusions', 'results', 
                   'purpose', 'materials', 'discussions','methodology','result analysis'])
    def remove_stopwords(sen):
        sen_new = " ".join([i for i in sen if i not in stop_words])
        return sen_new
    clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]
    return clean_sentences
    

In [116]:
clean_sentences = fun_clean_sentences(sentences)

### Vector Representation of Text-body Sentences

In [117]:
#########################################################################################################################
#
#    Title: An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)
#    Author: Prateek Joshi
#    Date: 2018
#    Code version: 1
#    Availability: https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
#########################################################################################################################
        
sentence_vectors = []
for i in clean_sentences:
    if len(i) != 0:
        v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
    else:
        v = np.zeros((100,))
    sentence_vectors.append(v)

### Vector Representation of Question Sentences

In [118]:
questions = ["Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery. days, years, time, period, childern, kid, young, senior, adults, old, COVID-19, 2019-nCov, 'coronavirus', 'cov-2', 'sars-cov-2', 'sars-cov, hcov, 2019-ncov",
"Prevalence of asymptomatic shedding and transmission (e.g., particularly children).",
"Seasonality of transmission. spring, winter, summer, autumn, fall, cold, hot, warm",
"Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).",
"Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).",
"Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic,paper, wood, metal, food, cloth, hair, skin, porcelain).",
"Natural history of the virus and shedding of it from an infected person",
"Implementation of diagnostics and products to improve clinical processes",
"Disease models, including animal models for infection, disease and transmission",
"Tools and studies to monitor phenotypic change and potential adaptation of the virus",
"Immune response and immunity",
"Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings",
"Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings, mask, google, gloves",
"Role of the environment in transmission"]

In [119]:
#########################################################################################################################
#
#    Title: An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)
#    Author: Prateek Joshi
#    Date: 2018
#    Code version: 1
#    Availability: https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
#########################################################################################################################

clean_questions = fun_clean_sentences(questions)
question_vectors = []
for i in clean_questions:
    if len(i) != 0:
        v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
    else:
        v = np.zeros((100,))
    question_vectors.append(v)

### Cosine Similarity Matrix

In [120]:
sim_mat= cosine_similarity(sentence_vectors, question_vectors)
sim_mat_df = pd.DataFrame(sim_mat)

In [121]:
question_vectors[:10]

[array([ 0.02598357,  0.28678034,  0.14133468, -0.05999389, -0.25050776,
         0.06722982,  0.0099764 , -0.0545159 , -0.12801499, -0.107351  ,
        -0.12949992,  0.04054813,  0.39085074, -0.07860075,  0.38762581,
        -0.13924788,  0.03399591, -0.06326686, -0.22715112, -0.0834815 ,
         0.08473672,  0.02601746, -0.09232635,  0.03516977,  0.08181691,
         0.09949684,  0.15261544, -0.28712908, -0.26239698,  0.07720456,
        -0.02225858,  0.2132642 ,  0.1177438 , -0.14531829,  0.01536673,
         0.01753675, -0.08392329,  0.10756533, -0.15314619, -0.0931374 ,
        -0.35264745, -0.08428739, -0.03105246, -0.1670065 ,  0.24759063,
         0.0225693 ,  0.32225427, -0.2098243 , -0.01991602, -0.32972661,
         0.1620186 , -0.32015048, -0.06429728,  0.63006004,  0.06188462,
        -1.13237943, -0.04947465, -0.13596332,  0.90605182,  0.39211474,
        -0.04900051,  0.70174528,  0.00291355, -0.07888153,  0.39033658,
         0.38007339,  0.03124955, -0.19778275,  0.3

Print out the top n most similar sentences with given question sentences (starting from 1)

In [122]:
def top_n_similar_sentences_glove_cosine(n_question, n=10):
    print(f"QUESTION #{n_question}: {questions[n_question-1]}\n\n")
    top_n_similar_sentences = []
    top_n_sim = sim_mat_df.loc[:,(n_question-1)].nlargest(n)
    for idx in top_n_sim.index:
        top_n_similar_sentences.append(sentences[idx])
    return(top_n_similar_sentences)

## Glove-Cosine after LDA selected articles

### Get "Nearest" Papers (in Topic Space)

In [123]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def get_k_nearest_docs(doc_dist, k=5, lower=1950, upper=2020, only_covid19=False):
    '''
    doc_dist: topic distribution (sums to 1) of one article
    
    Returns the index of the k nearest articles (as by Jensen–Shannon divergence in topic space). 
    '''
    
    relevant_time = all_data.publish_year.between(lower, upper)
    
    if only_covid19:
        is_covid19_article = all_data.text_body.str.contains('COVID-19|SARS-CoV-2|2019-nCov')
        topic_dist_temp = topic_dist[relevant_time & is_covid19_article]
        
    else:
        topic_dist_temp = topic_dist[relevant_time]
         
    distances = topic_dist_temp.apply(lambda x: jensenshannon(x, doc_dist), axis=1)
    k_nearest = distances[distances != 0].nsmallest(n=k).index
        
    return k_nearest

In [124]:
#########################################################################################################################
#
#    Title: Topic Modeling: Finding Related Articles
#    Author: Daniel Wolffram
#    Date: 2020
#    Code version: 10
#    Availability: https://www.kaggle.com/danielwolffram/topic-modeling-finding-related-articles?scriptVersionId=30463507
# 
#########################################################################################################################
def relevant_articles_doc_id(tasks, k=3, lower=1950, upper=2020, only_covid19=False):
    tasks = [tasks] if type(tasks) is str else tasks 
    
    tasks_tf = tf_vectorizer.transform(tasks)
    tasks_topic_dist = pd.DataFrame(lda_tf.transform(tasks_tf))

    for index, bullet in enumerate(tasks):

        recommended = get_k_nearest_docs(tasks_topic_dist.iloc[index], k, lower, upper, only_covid19)
        recommended = pd.DataFrame(all_data.iloc[recommended])
        recommended_doc_id = recommended.doc_id
    return recommended_doc_id

In [125]:
stop_words = stopwords.words('english')
stop_words.extend(['background', 'methods', 'introduction', 'conclusions', 'results', 
               'purpose', 'materials', 'discussions','methodology','result analysis'])

def remove_stopwords(sen):
    sen_new = " ".join([i for i in sen if i not in stop_words])
    return sen_new

def fun_clean_sentences(sentences):

    clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")

    clean_sentences = [s.lower() for s in clean_sentences]

    clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]
    return clean_sentences
    

In [126]:
def top_n_similar_sentences(n_question, sentence_df, n=10):
    print(f"QUESTION #{n_question}: {task1[n_question-1]}\n\n")
    top_n_similar_sentences = []
    for i in range(len(sentence_df)):
        sentence_df['sim_score'][i] = sentence_df['sim_score'][i][0][0]
    top_n_similar_sentences = sentence_df.sort_values(by=['sim_score'], ascending=False).iloc[0:n,]
    return(top_n_similar_sentences)

In [127]:
#########################################################################################################################
#
#    Title: An Introduction to Text Summarization using the TextRank Algorithm (with Python implementation)
#    Author: Prateek Joshi
#    Date: 2018
#    Code version: 1
#    Availability: https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/
#########################################################################################################################

def glove_vectorizer(clean_sentences):
    sentence_vectors = []
    for i in clean_sentences:
        if len(i) != 0:
            v = sum([word_embeddings.get(w, np.zeros((100,))) for w in i.split()])/(len(i.split())+0.001)
        else:
            v = np.zeros((100,))
        sentence_vectors.append(v)
    return sentence_vectors

In [128]:
def get_recommanded_articles(question_no, top_n_articles, lower=1950, upper=2020, only_covid19=True):
    recommanded_articles = pd.DataFrame(columns=all_data.columns)
    recommended_doc_id = relevant_articles_doc_id(tasks=task1[question_no-1], k=top_n_articles, lower=lower, upper=upper, only_covid19=only_covid19)    
    recommended_doc_id = list(recommended_doc_id)
    for i in range(len(recommended_doc_id)):
        recommanded_articles = recommanded_articles.append(all_data[all_data.doc_id == recommended_doc_id[i]])[['doc_id', 'title', 'text_body', 'doi']]
    return recommanded_articles

In [129]:
def sent_tokenizer(text):
    sentences = []
    for s in text:
        sentences.append(sent_tokenize(s))
    sentences = [y for x in sentences for y in x]
    return sentences

In [130]:
def top_n_similar_sentences_LDA_glove_cosine(question_no, top_n_articles, top_n_sentences, only_covid19=True):
    recommanded_articles = get_recommanded_articles(question_no, top_n_articles, only_covid19=only_covid19)
    sentence_df = pd.DataFrame(columns = ['doc_id','title','sentences','clean_sentences', 'sentence_vectors', 'sim_score'])
    for i in range(len(recommanded_articles)):
        article_df = pd.DataFrame(columns = ['doc_id','title','sentences'])    
        sentences = sent_tokenize(recommanded_articles.iloc[i].text_body)
        article_df['sentences'] = sentences
        article_df['doc_id'] = recommanded_articles.iloc[i].doc_id
        article_df['title'] = recommanded_articles.iloc[i].title
        article_df['doi'] = recommanded_articles.iloc[i].doi
        sentence_df = sentence_df.append(article_df, ignore_index=True)
    clean_questions = fun_clean_sentences(questions[question_no-1])
    question_vectors = glove_vectorizer(clean_questions)
    for i in range(len(sentence_df)):
        sentence_df.loc[i,'clean_sentences'] = fun_clean_sentences(sentence_df.iloc[i].sentences)
        sentence_df.loc[i,'sentence_vectors'] = glove_vectorizer(sentence_df.iloc[i].clean_sentences)
        sentence_df.loc[i,'sim_score'] = cosine_similarity(sentence_df.iloc[i].sentence_vectors, question_vectors)
    similar_sentences = top_n_similar_sentences(question_no, sentence_df, top_n_sentences)
    for i in range(len(similar_sentences)):
        l,n = similar_sentences.iloc[i][['doi','title']].values
        if n !=n:
            n = 'no title'
            h = '<a href="' + l + '" target="_blank">'+ n + '</a>'
        else:
            h = '<a href="' + l + '" target="_blank">'+ n + '</a>'
        print(similar_sentences.iloc[i].sentences)
        display(HTML(h))

## Comparations of results of two approches

Question 1: 

In [131]:
top_n_similar_sentences_glove_cosine(1,30)

QUESTION #1: Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery. days, years, time, period, childern, kid, young, senior, adults, old, COVID-19, 2019-nCov, 'coronavirus', 'cov-2', 'sars-cov-2', 'sars-cov, hcov, 2019-ncov




['Depending on how much time has passed since exposure to the primary infected individual, those infected may not yet be symptomatic -this period of time between infection and symptoms is an important epidemiological trait of an infectious disease called the incubation period.',
 'It is known that people infected with hepatitis A virus experience an incubation period of 28 days ranging 15-50 days to become ill [16, 24] .',
 'In microbiological terms, the book describes the emergence and spread of a pandemic whose infection and mortality rates are 100%, with an incubation period of a few days, whose symptoms make those infected extremely dangerous to society, and for which there is no treatment.',
 'Of the 154 SARS patients who satisfied the WHO definition for SARS, their age range was 20-80 years (mean 41.5 years).',
 'If we assume a particular infectious pathogen has a contact rate, = 0.75, with an incubation period of two days, = 0.5, and an infectious period of five days, = 0.2 (sim

In [162]:
top_n_similar_sentences_LDA_glove_cosine(1, 5, 30)

QUESTION #1: Range of incubation periods for the disease in humans (and how this varies across age and health status) and how long individuals are contagious, even after recovery.


A detailed analysis of one of the early COVID-19 clusters by Chan and colleagues 19 revealed symptomatic infections in five adult members of the same household, while a child in the same household aged 10 years was infected but remained asymptomatic, potentially indicating biological differences in the risk of clinical disease driven by age.


In any case, if the age distribution of cases reported here was to be confirmed and the epidemic were to progress globally, we would expect an increase in respiratory mortality concentrated among people aged 30 years and older.


Adjustment for the age demographics of China confirmed a deficit of infections among children, with a RR below 0·5 in patients younger than 15 years (figure 1).


If the current intervention continues, the number of infected individuals is expected to peak in early March 2020 (80 days since initiation) with a peak population size of 827 (421-1232) infectious individuals in China.


Even in the period after Jan 18, 2020, when awareness of the outbreak increased, a shorter delay between symptom onset and seeking care at a hospital or clinic was seen for international patients than for those in mainland China (Wilcoxon test p<0·0001).


A key area of uncertainty is whether, and for how long, individuals are infectious before symptom onset, and whether subclinical infection occurs; both are likely to make the outbreak harder to control.


Patient-level information is important to estimate key time-to-delay events (such as the incubation period and interval between symptom onset and visit to a hospital), analyse the age profile of infected patients, reconstruct epidemic curves by onset dates, and infer transmission parameters.


Across the study period, the median delay between symptom onset and seeking care at a hospital or clinic was 2 days (IQR 0-5 days) in mainland China ( figure 4 ).


A series of epidemiological criteria were required for COVID-19 testing, including travel history to Wuhan within the past 2 weeks; residence in Wuhan within the past 2 weeks; contact with individuals from Wuhan (with fever and respiratory symptoms) within the past 2 weeks; and being part of an established disease cluster.


Nevertheless, we would also expect children younger than 5 years to be at risk of severe outcomes and to be reported to the healthcare system, as is seen for other respiratory infections.


This mortality pattern would be substantially different from the profile of the 2009 influenza pandemic, for which excess mortality was concentrated in those younger than 65 years.


This sampling approach ensured that the serial interval and incubation period for each case was correlated, and prevented biologically implausible scenarios where a case could develop symptoms soon after exposure, but not become infectious until very late after exposure and vice versa.


Secondary cases were only created if the person with the infection had not been isolated by the time of infection.


However, we acknowledge that in the early phase of the epidemic, the death cases are likely under-reported as many infected cases have not progressed to the critical stage.


cough, lethargy, myalgia) infected individuals was identified early in the course of this outbreak, with human-to-human transmission detected in international case series [7] .


For SARS in Hong Kong, the average time from illness to death for fatal cases was 24 days [26] .


23
 Outbreak control was defined as no new infections between 12 and 16 weeks after the initial cases.


Increased awareness of prodromal symptoms, and therefore short delays until isolation-as seen in the SARS outbreak in Beijing in 2003 35 -would increase control of outbreaks in our model.


Our line list comprised 507 patients reported from Jan 13, to Jan 31, 2020, including 364 (72%) from mainland China and 143 (28%) from outside of China (table) The age distribution of COVID-19 cases was skewed towards older age groups with a median age of 45 years (IQR 33-56) for patients who were alive or who had an unknown outcome at the time of reporting (figure 1).


Although the data so far suggest that the disease is mild in most cases and that the case fatality rate is currently reported to be lower than SARS or MERS, the situation is likely to go on for months and could cause severe disruption in countries that are not well prepared.


Given that the cases reported outside Wuhan have mostly not been severe, it would be reasonable to infer that there might be a large number of undetected relatively mild infections in Wuhan and that the infection fatality risk is below 1% or even below 0.1%.


Effective contact tracing and isolation could contribute to reducing the overall size of an outbreak or bringing it under control over a longer time period.


Based on this definition, we reported the probability that an outbreak of a severe acute respiratory syndrome coronavirus 2-like pathogen would be controlled within 12 weeks for each scenario, assuming that the basic reproduction number remained constant and no other interventions were implemented.


While the overall severity profile among cases may change as more mild cases are identified, we estimate a risk of fatality among hospitalised cases at 14% (95% confidence interval: 3.9-32%).


Our model indicates that every one-day reduction in this duration would reduce the peak population size by 72-84% and the cumulative infected cases and deaths both by 68-80% (Figure 1c,d) .


[13] [14] [15] [16] The incubation period is a useful parameter to guide isolation and contact tracing; based on existing data, the disease status of a contact should be known with near certainty after a period of observation of 14 days.


We calculated the effective reproduction number (R eff ) of the simulation as the average number of secondary cases produced by each infected person in the presence of isolation and contact tracing.


As an example (figure 1), a person infected with the virus could potentially produce three secondary infections (because three is drawn from the negative binomial distribution), but only two transmissions might occur before the case is isolated.


Facing the rapidly rising epidemic, the Chinese government has timely amended the Law of the PRC on the Prevention and Treatment of Infectious Diseases on 20 th January 2020 to include the 2019-nCov as a class-B infection but manage it as a class-A infection due to its severity [8] .


This occurs because with better contact tracing it becomes possible to control outbreaks with higher numbers of weekly cases.


Question 2: 

In [133]:
top_n_similar_sentences_glove_cosine(2,30)

QUESTION #2: Prevalence of asymptomatic shedding and transmission (e.g., particularly children).




['F I G U R E 1 Viral load variation among the groups of asymptomatic contacts, children possibly not transmitting RSV, and children possibly transmitting RSV (11 days) , and additional studies related to RSV infection among these patients would understand the dynamic of this type of viral infection.',
 'Furthermore, mother-to-child transmission of GBV-C reduces the vertical transmission of HIV-1 from GBV-C/HIV-1 coinfected mothers [21] , and recently it was reported that accidental GBV-C acquisition via transfusion is associated with a significant reduction in mortality in HIV-infected individuals [22] .',
 'Both pregnancies that end with vertical transmission of the infection, the mothers were in third stage of HIV-infection with severe immune suppression (CD4 <200/μl) and high viral load of HIV > 100 000 c/μl.',
 'Infants carrying H4 or H6 haplotypes had increased risk of IU HIV-1-infection, whereas H2 haplotype carriers were less likely to be infected during pregnancy compared to i

In [163]:
top_n_similar_sentences_LDA_glove_cosine(2, 5, 30)

QUESTION #2: Prevalence of asymptomatic shedding and transmission (e.g., particularly children).


After the stage, likely after mid-January, 2020, the virus further spread to the family via infected adults to cause intrafamilial transmission, especially transmission to the elderly and children, who are vulnerable to the infection.


With more diagnostic detection done, the proportion of mild infections mainly in children and young adults became higher.


1B , during the emerging stage of the SARS-CoV-2 outbreak, the infection was disseminated by person-to-person transmission in the community almost exclusively among adults.


First, asymptomatic cases were diagnosed based on positive viral nucleic acid test results, but without any COVID-19 symptoms, such as fever, gastrointestinal, or respiratory symptoms, and no significant abnormalities on chest radiograph 7, 8 However, the transmission of COVID-19 through asymptomatic carriers via person-to-person contact was observed in many reports.


9 Similarly, at the beginning of the 2009 pandemic H1N1 influenza outbreak, the percentage age distributions for mortality and morbidity for patients with severe pneumonia show a marked shift to persons between the ages of 5 and 59 years, as compared with distributions observed during previous periods of epidemic influenza.


reported that the viral load detected in asymptomatic patients was similar to that found in symptomatic patients; however, the viral loads from patients with severe diseases were higher than those in patients with mild-to-moderate presentations.


The number of pediatric patients may increase in the future and a lower number of pediatric patients at the beginning of a pandemic does not necessarily mean that children are less susceptible to the infection.


It is essential to know the incubation period, the time elapsing between the moment of exposure to an infectious agent and the appearance of signs and Furthermore, patients with pneumonia were older, with a higher prevalence of smoking history, more underlying diseases, and were more likely to have fever, myalgia/fatigue, dyspnea, headache, and nausea/vomiting compared to patients with ARD (all p < .05) ( Table 3 ).


7e11 Most infected children have mild clinical manifestations and usually have a good prognosis.


First, the clinical manifestation of COVID-19 ranges from the asymptomatic carrier 19 state to severe pneumonia; however, most early reports only showed the findings of SARS-CoV-2 pneumonia, in which the ratio of male patients was much larger than that of female patients, there were no pediatric cases, and the mortality rate was high.


Thus, it is surprising to see that all the attention focused on a virus whose mortality ultimately appears to be of the same order of magnitude as that of common coronaviruses or other respiratory viruses such as influenza or respiratory syncytial virus, while the four common HCoV diagnosed go unnoticed although their incidence is high.


Adults with COVID-19 usually showed a significant or progressive decrease in the absolute number of peripheral blood lymphocytes at the early stage of the disease.


T lymphocyte subsets showed a decrease in both CD4 þ and CD8 þ T cell subsets, and neutrophil-to-lymphocyte ratio is an early and reliable indicator for the development of severe COVID-19, suggesting that SARS-CoV-2 can consume lymphocytes, which may also be an important reason for the virus to proliferate and spread in the early stage of the disease.


13 In children, however, white blood cell count and absolute lymphocyte count were mostly normal, and no lymphocyte depletion occurred, suggesting less immune dysfunction after the SARS-CoV-2 infection.


10 We assume that neonates born to infected mothers via vaginal delivery could still be at risk for the infection due to close baby-mother contact during the delivery.


The positive correlation of the accumulated cases from adult and pediatric populations strongly supports the transmission dynamics of pediatric patients we described (Fig.


11, 17 Fortunately, no evidence was found for intrauterine infection caused by vertical transmission in women who contracted COVID-19 pneumonia in late pregnancy.


11 A difference in the distribution, maturation, and functioning of viral receptors is frequently mentioned as a possible reason of the age-related difference in incidence.


17 This finding may not be consistent with a relatively low susceptibility of children to COVID-19.


In one study using the MulBSTA score system, 46 which includes six indices, namely multilocular infiltration, lymphopenia, bacterial co-infection, smoking history, hypertension, and age, revealed that these indices were poor prognostic factors.


[12] [13] [14] Subsequent to the publication of the studies of patients with only ARD or mild pneumonia, we found the ratio of male-to-female patients decreased, children or neonates could contract COVID-19, and the mortality rate declined compared to that of previous reports.


15 Similarly, the China CDC reported that patients aged ≥ 80 years had the highest case fatality rate, 14.8%, among different age groups, and the case fatality rate of patients in which disease severity was critical was 49.0%.


Among these, the most relevant is influenza, usually characterised by fever, myalgia, headache and non-productive cough, that may also cause complications with high morbidity and mortality rate, such as pneumonia, myocarditis, central nervous system disease and death [10, 11] .


Nevertheless, all these children belonged to familial cluster circles, so aggregative onset is an important feature in pediatric cases, and this is also a strong indicator that the virus is highly contagious.


This illustrated the major disconnect between the fear of a hypothetical spread in France of a virus emerging in the Middle East and the reality of the absence of diagnosed cases, while concomitantly the very real and high incidence of respiratory viruses common worldwide and in our country and their associated mortality appeared largely neglected.


However, this study involved a population of only 18 patients, including one asymptomatic patient.


12 Other suggested reasons include children having a more active innate immune response, healthier respiratory tracts because they have not been exposed to as much cigarette smoke and air pollution as adults, and fewer underlying disorders.


Nevertheless, those aged 15e24 and 25e44 years experienced sharply elevated death rates.


14 Most, if not all, of the infants received regular immunizations, including BCG, in China and other Asian countries, and it is well known that influenza can cause more ARDS in the adults, yet very less in children.


Usually they recover within 1e2 weeks after the onset of the disease.


Question 3: 

In [135]:
top_n_similar_sentences_glove_cosine(3,30)

QUESTION #3: Seasonality of transmission. spring, winter, summer, autumn, fall, cold, hot, warm




['The weather during spring and autumn is usually warm during the days and cooler at nights.',
 'The spring and autumn are relatively short when compared with the duration of summer and winter.',
 'Adenovirus was most likely to be detected in summer, followed by spring, winter and autumn.',
 'RSV was most likely to be detected in winter, followed by spring, autumn, and summer.',
 'IFVB was active in cold winter and IFVA was active in autumn, while both were rarely detected in summer.',
 'positivity was low in spring-summer and high in fall-winter, although no significant differences were observed (38.8, 40.9, 47.3 and 49.5% in spring, summer, fall and winter, respectively).',
 'The area has a continental monsoon climate, winter is longer than other seasons, and summer is warm.',
 'In detail, ADV infection was prevalent in summer (20/34, 58.8%) and spring (10/34, 29.4%) seasons, FLU was prevalent in spring (7/14, 50%) and winter (4/14, 28.5%), HMPV in winter (2/4, 50%), HCoV in autumn/w

In [164]:
top_n_similar_sentences_LDA_glove_cosine(3, 5, 30)

QUESTION #3: Seasonality of transmission.


In 2003, the spring festival transport period started from Jan 17 to Feb 25, 2020 and coincided with the peak incidence [ Figure 1F , purple box].


Both outbreaks happened in the winter, when the two provinces have similar climate patterns suitable for virus survival and spread.


Coincidentally, the SARS outbreak duration also coincided with the Chinese spring festival.


The spring festival travel period in 2020 started from Jan 10 to Feb 18, which coincided with the rapid increase in SARS-CoV-2 cases between Jan 10 and 22, 2020 [ Figure 1F , red box].


Similar to the SARS outbreak, this outbreak also occurred during the spring festival, the most important of the Chinese traditional festivals, when 3 billion people travel throughout the country [6] .


This person reported no previous visit to wet markets in Wuhan nor contact with any other case within 2 weeks before illness onset.


One cluster of three close relatives all with illness onset on the same day, were thought to have occurred through a common exposure since they all lived together and worked in the same stall in the Southern China Seafood Wholesale Market.


Each year, the Chinese government launches a 40-day spring festival transport support system, and during this period, billions of people migrate around China.


The timing of this outbreak around the lunar new year widely celebrated in China coincides with a period of highest annual human movement patterns in the region and between China and globally [8] , increasing the potential for rapid geographic dispersal of the infection.


While wet markets selling such perishable food products are common in China, they usually do not sell such a wide variety of wild animals.


These super-spreaders may be distributed in different places and are difficult to track.


The lack of exposure history to wet markets in Wuhan in two of four generally-mild exported cases indicated that there might be a larger number of undetected infections in Wuhan.


This market is a large open complex of 50,000 square metres including sections selling seafood, fresh meat, produce, other perishable goods, and a very wide variety of live wild animals for consumption.


Given previous trends, this is unlikely to be the incidence peak of this new virus outbreak.


If the current intervention continues, the number of infected individuals is expected to peak in early March 2020 (80 days since initiation) with a peak population size of 827 (421-1232) infectious individuals in China.


Scenario 1 comprises a large zoonotic spillover event starting in early December 2019, perhaps over a number of days or weeks, and very limited human-to-human transmission subsequently.


The current duration from symptom onset to isolation is about six days.


Nevertheless, as China is facing its 'Spring Festival travel rush' and the epidemic has spread beyond its borders, further investigation on its potential spatiotemporal transmission pattern and novel intervention strategies are warranted.


Nevertheless, low and middle countries on these continents are more likely to see the ongoing spread and major disruption from the introduction of a single case, even if the risk of importation is lower.


This large-scale migration has brought favorable conditions for disease spread that are difficult to control.


Across the study period, the median delay between symptom onset and seeking care at a hospital or clinic was 2 days (IQR 0-5 days) in mainland China ( figure 4 ).


Compared to the SARS outbreak of 2003, the situation in 2020 differs due to the increased frequency and volume of international air travel.


We did not consider the risk associated with the travel route through water and land which might have an impact in the spread of 2019-nCoV.


Because we are now in the early stage of the outbreak, we must be prepared for subsequent larger-scale outbreaks and predict the scale of the outbreak.


This person was in their 70s and had landed in Bangkok on 13 January, reporting an illness onset on 6 January and unclear history of exposure to the market in Wuhan.


13, 14 A narrow window of exposure could be defined for a subset of patients who had a short stay in Wuhan, at a time when the epidemic was still localised to Wuhan.


Although the data so far suggest that the disease is mild in most cases and that the case fatality rate is currently reported to be lower than SARS or MERS, the situation is likely to go on for months and could cause severe disruption in countries that are not well prepared.


A series of epidemiological criteria were required for COVID-19 testing, including travel history to Wuhan within the past 2 weeks; residence in Wuhan within the past 2 weeks; contact with individuals from Wuhan (with fever and respiratory symptoms) within the past 2 weeks; and being part of an established disease cluster.


Setting the upper limit of cumulative incidence (K) to 50,000, 60,000, or 70,000, the end date of incidences will be in 56 days (Mar 6, 2020), 60 days (Mar 10, 2020), or 62 days (Mar 12, 2020), respectively.


As of Jan 31, 2020, province-level epidemic curves are only available by date of reporting, rather than date of symptom onset, which usually inflates recent case counts if detection has increased.


Question 4: 

In [137]:
top_n_similar_sentences_glove_cosine(4,30)

QUESTION #4: Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).




['This study, aimed at controlling the microbial bioburden and infectious reservoir on inanimate, everyday objects, builds upon our previous work in the development of photo-activated, antimicrobial polymers for the prevention of medical device associated infection [34, 35, 37] , thereby expanding scope of photoactive materials for potential infection control applications in a healthcare setting.',
 'Understanding infectious disease transmission is a complex problem due to the many factors involved, such as environmental conditions, human activity (e.g., more crowding in buildings during wintertime), hygiene, and host susceptibility.',
 'The review addresses the following topics: a) the burden of waterborne disease, both globally and in the United States, with discussion of the dramatic underreporting of disease; b) etiology ofwaterborne disease, e.g., direct contamination of water by enteric pathogens and growth of opportunistic pathogens within distribution system biofilms; c) charac

In [165]:
top_n_similar_sentences_LDA_glove_cosine(4, 5, 30)

QUESTION #4: Physical science of the coronavirus (e.g., charge distribution, adhesion to hydrophilic/phobic surfaces, environmental survival to inform decontamination efforts for affected areas and provide information about viral shedding).


[8] supports that data from various technological products can help enrich health databases, provide more accurate, efficient, comprehensive and real-time information on outbreaks and their dispersal, thus aiding in the provision of better urban fabric risk management decisions.


4 I anticipate international efforts in these areas over the coming decade will enable the tapping of useful new biological functions and processes, methods for controlling infection, and the deployment of symbiotic or subclinical viruses in new therapies and biotechnologies that are so crucially needed.


This is valid as smart cities host a rich array of technological products [6, 7] that can assist in early detection of outbreaks; either through thermal cameras or Internet of Things (IoT) sensors, and early discussions could render efforts towards better management of similar situations in case of future potential outbreaks, and to improve the health fabric of cities generally.


2 These frame Australia's Medical Research and Innovation Priorities, which include antimicrobial resistance, global health and health security, drug repurposing and translational research infrastructure, 15 capturing many of the key elements of this CTI Special Feature.


Multidisciplinary research in biomedical, social, and environmental sciences is required to achieve a deeper understanding of disease transmission and develop more effective systems for emergency response.


Similarly, though substantial data and information on the disease has been shared, Wetsman [26] acknowledges that there is a lack of some vital information, like the ease of spread of the virus from person-to-person, and this is a key to containing the disease as interactions between people from different parts of the globe are still active.


On this, the saddest part is that some global cities are less prepared to handle the challenges posed by this type of outbreak for lack of information on issues like symptoms of the virus, the protective measures to be taken, and the treatment procedures that an infected person should be processed through, amongst other issues.


We must also take full advantage of existing knowledge and experience to improve the diagnosis, treatment, prevention, and control of the disease and accelerate the development of drugs and vaccines to save lives.


[33] hail these devices for their role in transforming the health care sector especially by allowing for Connected Health (CH) care, where data collected from them can be analyzed and provide insightful information on the health scenario in any given area.


Such data-sharing truth is emphasized in situations like the recent case of Coronavirus outbreak threatening the global health environment, facilitated by air transportation.


It is therefore an urgent priority for local and international health and wildlife regulatory authorities to structure and implement robust control mechanisms that effectively reduce human exposure to wild game meat and their products.


[36] and Allam [37] , it would be possible to facilitate early detection, achieve better diagnosis and provide better urban management decisions for increased efficiency for virus containment.


However, these exact processes ultimately restrict viral infectivity by strongly limiting virus genome sizes and their incorporation of new information.


While the significance of such data in advancing efficiency, productivity and processes in different sectors is being lauded, there are criticisms arising as to the nature of data collection, storage, management and accessibility by only a small group of users.


The sharing of data has also been quicker, as immediately after the virus' genetic sequence was discovered, Chinese scientists were able to share the information with the WHO, thus helping in its identification and enabling the auctioning of precautionary measures in other countries.


For the effective control of the spread of a newly identified virus, we must first understand its infection and pathogenicity patterns, as quickly and as thoroughly as possible, to provide insights into the outbreak and develop targeted prevention and control strategies.


Though these are not specifically fashioned to track the present case of virus outbreak, they are able to track other related parameters like heartbeat, blood pressure, body temperature and others variables, that when analyzed can offer valuable insights.


While thermal cameras are not sufficient on their own for the detection of pandemics -like the case of the COVID-19, the integration of such products with artificial intelligence (AI) can provide added benefits.


Despite recent advances in the therapeutic control of immune function and viral infection, current therapies are often challenging to develop, expensive to deploy and readily select for resistance-conferring mutants.


7 They also discuss their recent work revealing how two IFN-cinducible factors exhibit broad-spectrum inhibition of IAV, measles (MV), zika (ZikV) and HIV by suppressing furin activity.


Nakagawa and colleagues here report on their latest experiments using this system, further improving its performance for use in resource-poor contexts for meningitis diagnoses.


The spread of infectious diseases is affected not only by the biological characteristics of the pathogen but also by various other factors such as politics, culture, economy, and the environment.


This is possible by ensuring collaborative, proactive measures to control outbreak spread and thus, human movements.


Latest technological tools have also allowed for the receipt of information in realtime, in contrast to traditional epidemiological approaches that would have required months to identify the outbreak type [25] .


Through this platform, scientists from other regions were observed to gain access to information and are, subsequently, able to act in a much faster capacity; like in the case of scientists from the Virus Identification Laboratory based at Doherty Institute, Australia, who managed to grow a similar virus in the laboratory after accessing the data shared by the Chinese scientists [5] .


Beyond the aspect of pandemic preparedness and response, the case of COVID-19 virus and its spread provide a fascinating case study for the thematics of urban health.


Such could lead to an even earlier detection scenario of future virus outbreaks, and in the better curative management of the same, without minimal compromise on urban functions and on an urban economy.


Altogether, there exist significant differences in transmission, pathogenesis, clinical treatment, and vaccine development between these two viruses.


However, while the potential for the data market is understood, such issues like privacy of information, data protection and sharing, and obligatory requirements of healthcare management and monitoring, among others, are critical.


With a collaborated data sharing protocol, it would be possible to have a larger dataset resulting in increased processing capabilities especially with technologies that are powered by artificial intelligence (AI) tools.


Question 5: 

In [139]:
top_n_similar_sentences_glove_cosine(5,30)

QUESTION #5: Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).




['A liquid extract of each fecal sample (1:5 dilution) was used for NMH analysis and fecal NMH concentrations were back-calculated for the wet weight of the fecal samples and expressed in ng/g feces.',
 'The presence of surrounding organic material (e.g., blood, saliva, mucus, etc.)',
 'Viral RNAs could be found in nasal discharge, sputum, and sometimes blood or feces.',
 'Viral RNAs could be found in nasal discharge, sputum, and sometimes blood or feces.',
 'In particular, the concentration of PEDV nucleic acid in individual pig rectal swabs was significantly lower than oral fluid or penbased fecal samples.',
 'Shedding in urine was not strictly associated with high blood loads (no significant correlation between blood and urine loads was observed, p S > 0.05) or blood contamination of the urine sample (data not shown).',
 'feces, blood or tissues) and viral detection method (e.g.',
 'A positive result from blood or pleural fluid (e.g., for meningitis, cerebrospinal fluid) samples by 

In [161]:
top_n_similar_sentences_LDA_glove_cosine(5, 5, 30)

QUESTION #5: Persistence and stability on a multitude of substrates and sources (e.g., nasal discharge, sputum, urine, fecal matter, blood).


In these traditional methods, the samples need pre-lysis in an appropriate buffer to release nucleic acids from viral particles before binding to the column membrane and multiple centrifugation steps are required to enable binding, washing and elution of extracted nucleic acids.


Among other goals for this work, we plan to evaluate: (1) sensitivity of the SARS-CoV-2 assay against clinical isolates and patient samples-including sputum, throat, and nasal swabs-some of which may be challenging sample types to test; (2) specificity at both the species and subspecies levels against highly related viruses.


By following a simple lysis/binding-washing-elution protocol shown in Scheme2, 10 5 copies of SARS-CoV-2 pseudovirus in 200 μL serum samples were extracted and subject to RT-PCR analysis.


Three types of aerosol samples were collected: 1) Aerosol samples of total suspended particles (TSP) with no upper size limit to quantify RNA concentration of SARS-CoV-2 aerosol; 2) Aerodynamic size segregated aerosol samples to determine the size distribution of airborne SARS-CoV-2; 3) Aerosol deposition samples to determine the deposition rate of airborne SARS-CoV-2.


This result suggests that our pcMNPs-based viral RNA extraction protocol not only exhibits nearly 100% RNA extraction efficiency in serum samples, but also provides high-purity products without PCR inhibitors.


11 Our finding has confirmed the aerosol transmission as an important pathway for surface contamination.


The measurement of the fragment ions (MS2) has a higher specificity and lower level of false positives and is the method of choice in clinical diagnostics.


Our findings add support to a hypothesis that virus-laden aerosol deposition may play a role in surface contamination and subsequent contact by susceptible people resulting in human infection.


This may come from either the patient's breath or the aerosolization of the virus-laden aerosol from patient's faeces or urine during use.


The much longer integration time of 7 days for the deposition sample has contributed to the accumulation of virus sediment.


In these experiments, the serum sample without pseudoviruses was used as a negative control, while the PCR reaction mixture directly spiked with 10 5 copies of pseudoviruses was regarded as a positive control.


In RNA extraction, one lysis/binding step and one washing step are required for nucleic acids extraction and purification from complex samples.


For the latter, we intend to use a mixture of synthetic targets reflecting different viral sequences, and patient samples or viral seedstocks when available.


[16] [17] [18] Proteolytic digestion of a protein mixture increases the total number of molecules present and thereby increases the relative background noise of the sample.


The aerosol deposition sample collected from the Renmin Hospital ICU room had raw counts of SARS-CoV-2 RNA significantly above the detection limit as shown in Table S1 , although the TSP aerosol sample concentration inside this ICU room was below detection limit during the 3 hour sampling period.


After fast washing steps to eliminate trace impurities, purified nucleic acids can be further released from the surface of MNPs by elution buffer with altered ionic strength.


The concentration reported by the procedure equals copies of template per microliter of the final 1x ddPCR reaction, which was normalized to copies m -3 in all the results, and hence the virus or viral RNA concentration in aerosol is expressed in copies m -3 hereafter.


A high throughput automated RNA extraction method was adapted using a commercial NP968-C automatic nucleic acid extraction system (TIANLONG, Xi'an, China), which could simultaneously process up to 32 parallel samples in 96-well sample plates.


9 At present, there is little information on the characteristics of airborne SARS-CoV-2 containing aerosols, their concentration patterns and behaviour during airborne transmission due to the difficulties in sampling virus-laden aerosols and challenges in their quantification at low concentration.


[13] [14] In the control and diagnosis towards SARS-CoV-2 currently, silica-based spin column RNA extraction methods are widely used, in which a silica membrane or glass fiber is applied to bind nucleic acids.


As shown in Figure4B, the amplification curve of the automated sample is very close to that of the positive control and manually performed Direct RT-PCR samples, which suggests that our pcMNPs-based method is highly suitable for the automated high throughput viral RNA extraction.


[11] [12] Low extraction quality, on the other hand, may contain a variety of PCR inhibitors, which gives unreliable readouts during amplification.


We extensively tested this assay using a synthetic RNA target and determined the limit of detection to be 10 copies/µl using both fluorescent and lateral flow detection ( Figure 1 ).


In detail, 400 μL of lysis/binding buffer, 40 μg pcMNPs and 200 μL of the sample containing a specific number of pseudovirus particles was sequentially author/funder.


[9] [10] While RT-PCR-based methods have been widely used in COVID-19 diagnosis, their application in accurate diagnosis of viral infection and epidemic control is severely hampered by their laborious and time-consuming sample processing steps.


While the mass to charge (m/z) ratio of a peptide or protein (MS1) may be a specific diagnostic in some materials, the majority of LCMS methods employ tandem MS in which the peptide or protein parent ion is subjected to gas phase collision to produce fragment ions.


A very low frequency of the residues will create very large peptides, or undigested ("intact") proteins in some cases, that are difficult to detect and fragment.


Because no centrifugation steps are required, MNPs-based methods allow fully automated nucleic acid purification, which is highly important in current SARS-CoV-2 diagnosis.


Secondly, pcMNPs have excellent viral RNA binding performances, which results in 10-copy sensitivity and the high linearity over 5 logs of gradient in SARS-CoV-2 viral RNA detection using RT-PCR.


In DDA, the masses of all ions are observed in a relatively wide m/z range (MS1); the MS1 peptide ions meeting user-defined thresholds are subjected to fragmentation.


Question 6: 

In [141]:
top_n_similar_sentences_glove_cosine(6,30)

QUESTION #6: Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic,paper, wood, metal, food, cloth, hair, skin, porcelain).




['Non-animal food-contact swabs were analyzed by surface type (metal garage, metal tabletop, concrete floor, and rubber boot bottoms worn during the experiment).',
 'It can survive on plastic surfaces, stainless steel, glass slides, and paper files.',
 'Thus, the material itself (vinyl chloride, aluminum, plastic, stainless steel) had no effect on the diminishing bacterial numbers under dry conditions.',
 'Virus recovery from surfaces of Styrofoam, nitrile gloves, aluminum foil, Tyvek ® coverall, metal, rubber, plastic, cardboard, and cloth showed no significant differences between the materials at RT, suggesting that storage temperature had a substantial influence on virus survival.',
 'There was a significant difference in appearance between purified HuCoV-229E exposed to stainless steel and that exposed to copper surfaces (Fig.',
 'Virus that had been exposed to copper and brass surfaces demon- was applied to 1-cm 2 coupons of a range of brasses (A and B [early time points only]), c

In [166]:
top_n_similar_sentences_LDA_glove_cosine(6, 5, 30)

QUESTION #6: Persistence of virus on surfaces of different materials (e,g., copper, stainless steel, plastic).


RBD is shown in a space-filled model with colored surface.


Indicated proteins were coated on 96 well plates using CBS buffer over night at 4℃.


[13] [14] In the control and diagnosis towards SARS-CoV-2 currently, silica-based spin column RNA extraction methods are widely used, in which a silica membrane or glass fiber is applied to bind nucleic acids.


Bare magnetic nanoparticles (MNPs) were prepared based on a simple coprecipitation protocol as previously reported.


ACE2 is shown as gray tube model.


After centrifuge for 1 min at 12,000×g, the column was placed onto a fresh 2 mL EP collection tube, followed by adding 600 µL washing buffer and then centrifuge for 30 sec at 12,000×g.


BSA was used for blocking at room temperature for 1h.


Data on the transmissibility of coronaviruses from contaminated surfaces to hands were not found.


The mixture was vortexed thoroughly and then transferred into a column placed onto a 2 mL EP collection tube.


A high throughput automated RNA extraction method was adapted using a commercial NP968-C automatic nucleic acid extraction system (TIANLONG, Xi'an, China), which could simultaneously process up to 32 parallel samples in 96-well sample plates.


Once the program was finished, the 96-well sample plate was removed and the 15 μL of the eluted product was analysed using the conventional RT-PCR protocol as described above.


Finally, the 96-well sample plate was plugged onto the matrix and RNA extraction was performed by following an optimized program (TableS1).


The pseudovirus samples were obtained from Zeesan Biotech (Xiamen, China) and the standard samples were freshly prepared by step-wise dilution of pseudovirus in fetal calf serum purchased from Thermofisher (Massachusetts, USA) before nucleic acid extraction experiments.


Thoroughly cleaning environmental surfaces with water and detergent and applying commonly used hospital-level disinfectants (such as sodium hypochlorite) are effective and sufficient procedures."


It has been postulated that coronaviruses can be transmitted from contaminated dry surfaces including selfinoculation of mucous membranes of the nose, eyes or mouth [4, 5] .


200 μL of as-prepared standard samples with a known copy number of viral particles (down to 10 copies) were incubated with 400 μL lysis/binding buffer (1 M NaI, 2.5 M NaCl, 10% Triton X-100, 40% polyethylene glycol 8000, 25 mM EDTA) and 40 μg pcMNPs for 10 min at room temperature on a rotating shaker.


Briefly, 500 µL samples preserved in R503 was first added into a 1.5 ml EP tube containing 200 µL of absolute ethanol and 200 µL samples preserved in Hank's solution was added into a 1.5 ml EP tube containing 500 µL lysis buffer.


The supernatant containing pseudovirus was collected after 48-72 hours and filtered through a 0.45μm filter and stored at -80℃ for longtime storage or 4℃ for short time storage.


After centrifuge for 2 min at 12,000×g, the column was then placed onto a 1.5 ml collection tube.


293T-ACE2 cells were plated in 96-well plate at 1,0000 cells/well in 100 μ L DMEM+10% FBS.


As compared with traditional column-based nucleic acids extraction methods, our pcMNPs-based method has several advantages ( Table 2) .


During the extraction process, although the shaking pattern of magnetic rods was set at the most vigorous level, no breakage or leakage of pcMNPs were observed, since the eluted solution are colourless and transparent (FigureS4).


On the other hand, significant portions of RBD (marked in pink) outside the RBM motif are highly conserved.


Compared with the control group that the sample was stored at 4 o C, both DNA and RNA in the samples incubated at 56 o C for 30 -60 minutes also displayed obvious degradation, among which the 28 S and 18 S bands of total RNA of human cells were obviously smeared, and the large bands of genomic DNA became weaker while there were almost no visible genomic DNA and 28 S RNA bands in the sample incubated at 92 o C for 5 minutes (Fig 1D) .


Although the viral load of coronaviruses on inanimate surfaces is not known during an outbreak situation it seems plausible to reduce the viral load on surfaces by disinfection, especially on frequent touch surfaces in the immediate patient surrounding where the highest viral load can be expected.


For the disinfection of small surfaces ethanol (62%e71%; carrier tests) revealed a similar efficacy against coronavirus [7] .


Then 100μL/well of the antibody-PSV mixture was added onto the 293T/ACE2 cell wells and incubated for author/funder.


The human embryonic kidney 293T cell line (Cat:CRL-11268) used for pseudovirus (PSV) packaging were purchased from ATCC.


[11] [12] Low extraction quality, on the other hand, may contain a variety of PCR inhibitors, which gives unreliable readouts during amplification.


On the other hand, the interaction patterns in both complex structures' interfaces are somewhat different ( Figure 1A and Table S2 ).


Question 7: 

In [143]:
top_n_similar_sentences_glove_cosine(7,30)

QUESTION #7: Natural history of the virus and shedding of it from an infected person




['When that virus infects people, a new and unknown form of influenza that is transmitted from person to person develops.',
 'When that virus infects people, a new and unknown form of influenza that is transmitted from person to person develops.',
 '27 It is important\n During the course of their life, humans are infected by several strains of influenza virus.',
 "Scientists think that the occasional outbreaks of the disease occur because the virus ''jumps'' from an infected animal to a person (a rare event) and then is transmitted between people by direct contact with infected blood or other body fluids or parts.",
 'Many of these respiratory viruses spread easily from person to person, or can be spread from an animal reservoir (23) .',
 'This is probably because, compared with some other viruses, the flu virus is transmitted from one person to another very quickly and affects many people.',
 'Based on finding that the utilized virus was virulent and caused EBOV hemorrhagic disease, w

In [167]:
top_n_similar_sentences_LDA_glove_cosine(7, 5, 30)

QUESTION #7: Natural history of the virus and shedding of it from an infected person


Before viruses in wildlife make a jump to infect human beings, they usually accumulate a series of mutations in their viral genomes [42] and invade human beings as a result of human occupation of their normal ecosystem, as exemplified with a story of initial human infection by HIV carried by chimpanzees in rainforests of West Africa [43, 44] .


cough, lethargy, myalgia) infected individuals was identified early in the course of this outbreak, with human-to-human transmission detected in international case series [7] .


At the outset, SARS-COVs might have a species barrier before it can be transmitted to humans.


The increased vulnerability of human beings in winter time and the increased human exposure to wild animals during holidays made infection to SARS-COV-2 more likely.


This mini-review evaluated the common epidemiological patterns of both SARS epidemics in China and identified cold, dry winter as a common environmental condition conducive for SARS virus infection to human beings.


The separate evolution and the recombination of these viruses might lead to the creation of various SARS-CoVs capable of cross-species transmission and ultimate infection of human beings.


Before the emergence of SARS-CoV, four CoVs were known as human coronaviruses (HCoVs), i.e., CoVs capable of infecting human beings.


Although the remotely occurring SARS-2 usually have a human-human linkage and can be traced to a single source of infection, some Wuhan cases and the surrounding cases in Hubei Province still lack reliable sources of infection.


SARS-CoV-2 has entered human communities, and eliminating virus from human bodies does not means its eradication in nature.


Four strains of coronaviruses are known to spread easily in humans, causing generally-mild acute respiratory illnesses known as the common cold [1] .


At the same time, because Zhejiang is a natural habitat for bamboo rats, it is possible that some farms directly introduced wild bamboo rats, which were already infected with SARS-COV-2 virus.


Thus, humans might become unfortunate hosts for SARS-CoVs as a result of some inappropriate interactions with wildlife and thus exposure to unfriendly viruses ( Figure 2 
 Having identified some relevant natural and social factors common for affecting both SARS epidemics, it is also necessary to discuss if variations in these factors contributed to the unique outbreak of SARS-2 in Wuhan.


The cold and dry winter helped viruses to survive in the environment and eventually found some ways to cross the species barrier, a phenomenon known as "viral chatter" [50] .


Thus, in order for these bats and/or rats to pass the virus to humans, they must have first been able to migrate or be moved to Wuhan and also must have carried viruses that actually achieved mutations for affording the capability of infecting human beings.


The model of SARS-COV-2 transmission, similar to Nipah virus, is that farms are built around bat habitats, causing bats to pass the virus to animals through saliva, urine, and feces [30] .


Five days later, clusters of cases, including 15 healthcare workers, were confirmed to have been infected via patients, confirming that SARS-CoV-2 also has human-to-human transmission capability [ Figure 1E ].


How could this likely single source of virus quickly infect so many people in such large geographic area?


Therefore, when bats carrying SARS-COV-2 virus forage at Huanan Seafood Market, they may pass the virus directly or indirectly to intermediate hosts.


A much larger number of coronaviruses have been detected in animals, particularly in bats, but have not been found in humans [2] .


For the above reasons, the bamboo rats carrying SARS-COV-2 virus were transported from the infected place to the incident site in the same way that civets spread SARS-CoV [32] .


This suggests that the way the bats spread the virus is not only via direct contact, but also through feces.


Quarantine of patients (both confirmed and suspected), isolation of susceptible population, and protection of high-risk professions are necessary measures for reducing exposure to the viruses and eliminating the risk of getting infected by the viruses.


However, due to human activities, the virus has expanded its host of infection.


Therefore, there are two possible places for bamboo rat be infected with SARS-COV-2.


If there had been only one case infected by human-to-human transmission among the first 41 identified cases by that date, it implies R 0 was 0.02 (i.e.


The isolated virus showed more than 95% genome sequence identity with human and civet SARS-CoVs.


Subsequent case investigations also showed that SARS-CoV had the capability to multiply and continuously undergo human-to-human transmission [ Figure 1D ]; at least four generations of cases were identified from one original patient.


Prior to December 2019 when clusters of pneumonia cases with unknown aetiology were detected in Wuhan, China, only two additional strains of coronaviruses had caused outbreaks of severe acute respiratory disease around the world [3] .


On 31 December 2019, local hospitals in Wuhan, China reported that they had detected a series of cases of Novel Coronavirus-infected pneumonia to the World Health Organization (WHO) [1] .


Although the origins and the occurrences of SARS-CoV-2 are both unclear, the control measures for the current epidemic should focus on immediate cut-off of transmission of the disease and through disinfection of infected locations.


Question 8: 

In [145]:
top_n_similar_sentences_glove_cosine(8,30)

QUESTION #8: Implementation of diagnostics and products to improve clinical processes




['There is a need for availability of data and information on humanvector-pathogen-ecosystem interfaces, drugs and vaccines development as well as diagnostics techniques and tools from preclinical to clinical levels.',
 'Improving the optimization and implementation of protocols suitable for clinical samples will no doubt improve microbial diagnosis in clinical practice.',
 'These techniques are expected to greatly accelerate the identification of specific phage binders, facilitating mAb development for use in research, clinical diagnostics, and pharmaceuticals for the treatment of human disease.',
 'Integration of knowledge about microbiology and immunology, establishment of efficient vaccine development strategies, and streamlining of regulatory approval processes may facilitate this trend.',
 'Tools from information technology and progress in microbiology will reduce diagnostic uncertainty and improve antimicrobial dosing, selection, and treatment duration.',
 'There were 8 key elem

In [168]:
top_n_similar_sentences_LDA_glove_cosine(8, 5, 30)

QUESTION #8: Implementation of diagnostics and products to improve clinical processes


13 Incremental improvements in rapid sample preparation techniques, chromatography, and data processing have also contributed to the increasing use of LCMS-based clinical testing.


Ongoing SARS-CoV-2 sequencing is key to developing and monitoring diagnostics and similar surveillance tools.


The subfractionation of existing heparin preparations against anticoagulant activities (with proven low-toxicity profiles, good bioavailability and industrial-scale manufacturing) for off-label pathologies, provides an attractive strategy for quickly and effectively responding to COVID-19 and for the development of next generation heparin-based therapeutics.


Today, protein array and antibody-based methods are falling out of favor in both research and clinical diagnostics, due in large part to the improvements in LCMS technology.


Although much simpler and faster than spin column-based methods, most MNPs-based extraction strategies still contain multiple processing steps such as lysis, binding, washing and elution, which increases operational difficulties in real clinical diagnosis.


Because no centrifugation steps are required, MNPs-based methods allow fully automated nucleic acid purification, which is highly important in current SARS-CoV-2 diagnosis.


In conclusion, due to its simplicity, robustness, and excellent performances, our pcMNPs-based method may provide a promising alternative to solve the laborious and time-consuming viral RNA extraction operations, and thus exhibits a great potential in the high throughput SARS-CoV-2 molecular diagnosis.


Given recent studies concerning high variability in antibody production, LCMS-based methods are an attractive alternative approach for the rapid identification of small molecules, proteins, and peptides in clinical settings where consistency is paramount.


Here, we describe preliminary tests for the ability of the SARS-CoV-2 S1 RBD to bind heparin, an important prerequisite for the underpinning research related to the development of SARS-CoV-2 heparin-based therapeutic.


We have been developing algorithms and machine learning models for rapidly designing nucleic acid detection assays, linked in a system called ADAPT (manuscript in preparation).


Efficient and robust nucleic acids extraction from complex clinical samples is the first and the most important step for subsequent molecular diagnosis, but currently it is still highly labour intensive and time-consuming.


The prominent proteomic technical journals, the Journal of Proteome Research and Molecular and Cellular Proteomics, strictly require that all unprocessed instrument files and processed results are made publicly available through these services.


[9] [10] While RT-PCR-based methods have been widely used in COVID-19 diagnosis, their application in accurate diagnosis of viral infection and epidemic control is severely hampered by their laborious and time-consuming sample processing steps.


11, 12 A review of this growth by Grebe and Singh described a clinical lab with no LCMS systems in 1998 that completed over 2 million individual LCMS clinical assays in 2010.


Our work demonstrates not only the feasibility of this approach, but also its ability to rapidly develop methods even in the face of limitation of access to sample experimental data.


Traditional drug development processes are slow and ineffective against emerging public health threats such as the current SARS-CoV-2 coronavirus outbreak which makes the repurposing of existing drugs a timely and attractive alternative.


Overall, the updated MB-based extraction method had highly extraction efficiency and compatibility of PCR amplification in any of the patterns, which dramatically simplified laborious sample processing work and was ideally suitable for RT-PCR assay of SARS-CoV-2 with a sensitivity of 10 copies at least.


Thus, fast, convenient and automated nucleic acids extraction methods are highly desirable not just in the molecular diagnosis of SARS-CoV-2, but also in the monitoring and prevention of other infectious diseases.


New tools that employ deep learning algorithms have been demonstrated to produce theoretical MS2 spectra superior to previous prediction models and, in the absence of true experimental data, are the best resources currently available.


First, high case counts overwhelm diagnostic testing capacity, underscoring the need for a rapid pipeline for sample processing [5, 6] .


https://doi.org/10.1101/2020.02.22.961268 doi: bioRxiv preprint sequences of SARS-CoV2 have been fully revealed and various RT-PCR-based detection kits have been developed, timely diagnosis of COVID-19 is still highly challenging partially due to the lack of satisfactory viral RNA extraction strategy.


We have not yet experimentally tested most of these designs, instead focusing our efforts so far on extensively testing a point-of-care assay for SARS-CoV-2 using the Cas13-based SHERLOCK technology [6, 8, 9] .


• Predicted sensitivity : Assays are predicted by our machine learning model to have high detection activity against the full scope of targeted genomic diversity (here, based on Lwa Cas13a activity only).


We and others, relying on this data [14] , have shown that it is possible to rapidly design CRISPR-based tools for detection and surveillance during an outbreak.


The Prosit spectral libraries (Supplemental Material) enable the interrogation of DIA data and may be used for DDA experiments that employ tools such as the MSPepSearch (NIST).


Our in silico developed materials facilitate both global and targeted analysis by providing all necessary materials for both DDA and DIA investigation of these materials through the production of FASTA databases, spectral libraries and a list of predicted PTMs investigators should consider when searching with historic peptide search engines.


In vitro validation of this method is required and outside the scope of this paper given our lack of access to such samples.


Therefore, it is important to characterize these other pathogens, for both patient diagnostics and outbreak response.


Therefore, the feasibility of automating viral RNA extraction procedure based on our pcMNPs was subsequently evaluated.


Such a lack of understanding limits effective risk assessment, prevention and control of COVID-19 disease outbreaks.


Question 9: 

In [147]:
top_n_similar_sentences_glove_cosine(9,30)

QUESTION #9: Disease models, including animal models for infection, disease and transmission




['More recently it has been shown that novel combinations of alleles have also resulted in new models for human disease such a spontaneous colitis [67] , and that F1 hybrids of CC mice were used to create an improved mouse model for Ebola virus disease [68] including hemorrhagic signs of disease previously not observed in a small animal model.',
 'The ferret is the most suitable disease model for human influenza infection as it displays very human-like disease [20] [21] [22] .',
 'The MA15 virus will enhance the use of the mouse model for SARS because infection with this virus in mice reproduces many aspects of severe human disease, including morbidity, mortality, and pulmonary pathology.',
 'Ferrets are commonly used as experimental models of infection for a variety of respiratory viruses due to their susceptibility to these viruses and the close resemblance of the pathological features to those found in human infections [11, 12] , including the development of severe respiratory and n

In [169]:
top_n_similar_sentences_LDA_glove_cosine(9, 5, 30)

QUESTION #9: Disease models, including animal models for infection, disease and transmission


The second component, f late , models HIV transmission during the asymptomatic stage and the disease stage (after progression to Acquired Immune Deficiency Syndrome (AIDS)).


Furin, along with other PCSK family members, is widely implicated in immune regulation, cancer and the entry, maturation or release of a broad array of evolutionarily diverse viruses including human papillomavirus (HPV), influenza (IAV), Ebola (EboV), dengue (DenV) and human immunodeficiency virus (HIV).


It has been noted that the two previously known human coronaviruses causing epidemic disease and spread, SARS-CoV and MERS-CoV, had a relatively low rate of spread from an individual infected patient (an index referred to as its basic reproductive number-R。).


Coronaviruses have in the past been known to be the etiologic agents of mild upper respiratory infections in humans, similar to the ubiquitous and relatively benign "common cold"-type upper respiratory illnesses induced by the human rhinoviruses in adults and children.


Observed characteristics of the outbreak led us to believe that the cluster of cases was due to "Disease X" (i.e., an infectious disease of previously unknown viral etiology).


In our HIV example, both the HIV epidemic and the test-and-treat intervention can be best characterized using speed.


We model the infection kernel of the HIV as a sum of two gamma distributions:
 The first component, f early (τ ), models early HIV transmission during the acute infection stage.


The outbreaks of two previous coronaviruses, SARS-CoV and MERS-CoV in 2003 and 2012, respectively, have approved the transmission from animal to animal, and human to human [4] .


In this instance, atypical pneumonia combined with reduced white blood cell counts and the lack of response to antibiotics indicated that the pathogen was consistent with viral rather than bacterial infection.


The probability that the outbreak is due to an unknown pathogen (Disease X) increases as more information becomes available, for two reasons: (i) the current outbreak can be seen to exhibit characteristics that are not similar to those observed in previous outbreaks, and; (ii) previously observed pathogens are ruled out by laboratory test results.


In our model, disease incidence at time t is given by:
 Here, K(τ, t) is the infection kernel describing how infectious we expect an individual infected τ time units ago to be in the population.


For instance, most cases shared a history of visiting or working at a seafood market in Wuhan [3] , where exposure to the novel coronavirus is suspected to have occurred with no evidence of direct human-to-human transmission [2] , although human-to-human transmission was found later to be common.


Because the only information on 30 December 2019 was that cases had symptoms of atypical pneumonia, the distances between the ongoing outbreak and the eleven known pathogens were all zero; thus, all eleven candidate pathogens initially showed an identical probability of 8.3% (i.e., 1/12, when the possibility of Disease X is accounted for).


It is now hypothesized that one of the reservoir coronavirus species in bats crossed the species barrier to an intermediate mammal host (presumed to be a masked civet) sold at the wet market at the epicenter of the current epidemic, with subsequent mutation and transmission to humans, initiating the present epidemic of COVID-19.


In particular, we provided an alternative explanation for the result of Eaton and Hallett (2014) who used detailed mathematical modeling of HIV transmission to show that the amount of early transmission does not affect the effectiveness of the ART: we can control an outbreak if we can identify infected individuals and enroll them on ART faster than the observed rate at which new cases are generated, which does not depend on the estimates of the amount of early transmission.


Now imagine an intervention that reduces transmission at a constant hazard rate φ across the disease generation ( Fig.


In particular, we study how the amount of early HIV transmission affects estimates of intervention effectiveness.


Next, we consider a "test-and-treat" strategy in which infected individuals are identified, linked to care and receive antiretroviral therapy (ART) with the goal of both preserving health and preventing transmission through viral suppression.


As the outbreak unfolded, we calculated in real-time the probability that the pathogen responsible for the atypical pneumonia cases was novel (Disease X), as opposed to the outbreak instead being generated by a previously known pathogen that can cause atypical pneumonia.


For example, in the classic case of vaccination to eliminate a previously established childhood disease, both disease spread and intervention can be clearly characterized using strength (Anderson and May, 1985) .


Despite the future improvements to our statistical modelling framework that are required, including the need to test our approach using data from outbreaks of previously known pathogens, this short study demonstrated clearly that the ongoing outbreak is consistent with causation by a novel pathogen, "Disease X".


We suggest that infectious disease modelers should be aware of the complementarity of these two frameworks when analyzing disease outbreaks.


Six coronavirus species had, prior to the 08 th January 2020, been known to cause disease in humans.


An impressive series of rapid virological examinations ruled out common pneumonia-causing viruses such as influenza viruses, adenoviruses, and the coronaviruses associated with Middle East respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS) [2] [3] [4] [5] .


Subsequent to the severe acute respiratory syndrome (SARS) outbreak in China 2003, and the Middle East respiratory syndrome (MERS) outbreak in the Middle East in 2012, global concerns regarding the pathogenicity and epidemic/ pandemic potential of novel human coronaviruses began to emerge, with some experts predicting that novel coronaviruses could likely again cross the species barrier and present humans with future pandemic-potential infections [1] .


Although virological investigation is the gold standard for pathogen identification, and the virus has now been confirmed to be a novel coronavirus that is a relative of SARS, laboratory-based outcomes can only be obtained after successfully sequencing the novel virus, which can sometimes be a lengthy process.


SARS-CoV-2 is the seventh coronavirus species that is now known to infect humans, is also zoonotic in origin, and is the causative organism for the current viral pneumonia epidemic in China.


A significantly large variety of coronavirus species cause a diverse range of diseases in domesticated and wild mammals and birds, and these animals may also be carriers of and reservoirs for coronaviruses [2] .


Four species are endemic in human populations, and cause mild common cold symptoms in immunocompetent humans.


The two remaining species, SARS-CoV and MERS-CoV, are zoonotic in origin, and their infection of humans may have fatal outcomes.


Question 10: 

In [149]:
top_n_similar_sentences_glove_cosine(10,30)

QUESTION #10: Tools and studies to monitor phenotypic change and potential adaptation of the virus




['We demonstrate viral genetic and phenotypic differences in viruses from West Africa, which may be relevant to differences in zoonotic potential, highlighting the need for studies of MERS-CoV at the animal-human interface.',
 'New animal models that recapitulate features of human lung disease allow preclinical studies to refine delivery protocols and measure novel metrics of phenotypic correction.',
 'Theoretically, a replicating RNA virus expresses a range of genetic and phenotypic variants and has the potential to generate novel virions, which may be selected in response to environmental pressures.',
 'To assess the risk and potential for human pathogenicity of novel bat viruses, it is crucial to establish valid research tools for comparative in vitro infection modeling.',
 'Future experimental studies on these reassortant viruses, that assess viral transmissibility between species, together with epidemiological studies, such as viral monitoring within Indonesian animal populations 

In [170]:
top_n_similar_sentences_LDA_glove_cosine(10, 5, 30)

QUESTION #10: Tools and studies to monitor phenotypic change and potential adaptation of the virus


Importantly, 149
 Muc4 has been detected in synovial sarcomas in humans, thus presenting a novel tissue Screening genetically diverse mouse models provides an opportunity to identify natural 163 variation in novel factors which drive viral disease responses.


It could facilitate identification of human virus receptors in either computational or experimental studies.


developed an in silico computational framework (P-HIPSTer) that employs structural information to predict more than 280,000 PPIs between 1,001 human-infecting viruses and humans, and made a series of new findings about human-virus interactions (Lasso et al., 2019) .


Daily results for all pathology-related data types (histopathology, weight 329 change, and footpad swelling) was analyzed by three-way ANOVA to determine the impact of 330 mouse strain, sex, and infection status.


In this study, we used TWIRLS, a machine-based approach to collect, summarize, and analyze about 15,000 biomedical articles related to coronavirus, with the aim to elucidate the mechanisms underlying coronavirus-induced host pathological changes.


Our initial hypothesis for Muc4 to play a 176 role in controlling virus replication proved incorrect, or, at the very least, substantially more 177 complex than a simple, direct correlate; however, our exploration found a disease-interaction 178 that played a role in pathogenesis across multiple viruses.


Therefore, TWIRLS clusters CSSEs according to the rules defined by CSHG distribution, as genetic level research can accurately answer and solve physiological and pathological problems.


These studies can also provide 164 therapeutic, prophylactic and molecular insights into emerging pathogens, which are difficult to 165 study during the context of an outbreak.


Here, prior phenotypic QTL analysis, bioinformatics, 166
 and RNA expression analysis were leveraged to identify Muc4 as a high priority candidate gene 167 driving differences in SARS-CoV titer.


By combining this system with generalized interaction databases, we can reveal further associations that can provide a deeper understanding of the biological mechanisms of the disease phenotype caused by virus-host interactions.


Therefore, genetic analysis of expression quantitative trait loci (eQTLs) 8 and potential functional coding variants in ACE2 among populations are required for further epidemiological investigations of 2019-nCoV/SARS-CoV-2 spreading in East Asian (EAS) and other populations.


Combining the RF model with the model of PPI predictions such as Lasso's work can help identify virus-receptor interactions.


We found that these two genes are assigned to the C5 category, which has a corresponding HR label of "Spike protein (S) of coronavirus", suggesting that TWIRLS can automatically provide an interface to summarize human findings and help human experts quickly understand the research directions and necessary knowledge in this field.


These studies could facilitate identification of human virus receptors.


Subsequently, these linker genes may contain information on the biological mechanisms that may be important for understanding the disease.


Here, we developed a computational model to predict the receptorome of the human-infecting virome based on the features of human virus receptors and protein sequences.


To systematically investigate the candidate functional coding variants in ACE2 and the allele frequency (AF) differences between populations, we analyzed all the 1700 variants (Supplementary Table S1) in ACE2 gene region from the ChinaMAP (China Metabolic Analytics Project, under reviewing) and 1KGP (1000 Genomes Project) 9 databases.


This study provides a biological background for the epidemic investigation of the 2019-nCov infection disease, and could be informative for future anti-ACE2 therapeutic strategy development.


The development of computational methods for identifying the human virus receptors is in great need.


Therefore, TWIRLS can be used to guide human researchers by providing further potential therapeutic target information for the treatment of acute viral lung injury based on the regulation of RAS.


By establishing a random distribution of one of the candidate genes in a control sample, the significance of this gene appearing in the local samples can be determined when the frequency of the current gene is an outlier of the random distribution of the control samples (see Methods for details).


The sooner this information is added to the current clinical knowledge of these viruses, the better the control and treatment of this disease.


Transformed daily 327
 Buxco results were similarly analyzed by two-way ANOVA to determine the impact of infection 328 and mouse strain.


Currently, several experimental approaches have been developed in identifying virus receptors.


The study would greatly facilitate identification of human virus receptors.


Firstly, the number of human virus receptor proteins was much less than that of human membrane proteins in the modeling, which may hinder accurate modeling.


Largescale and multiple tissue-level analysis of single-cell RNAseq would be more accurate for the expression analysis of ACE2 in different populations.


The genes that serve as linkers are potential targets for gain-and loss-of-function experiments to identify those systems described by the meaningful entities in these categories.


In contrast, the ACE2 expression analysis using the RNA-seq and microarray datasets from control lung tissues indicated there were no significant differences between Asian and Caucasian, or male and female 11 .


Combination of the model with Lasso's work further predicted receptors for 693 human-infecting viruses.


Question 11: 

In [151]:
top_n_similar_sentences_glove_cosine(11,30)

QUESTION #11: Immune response and immunity




['Immune response.',
 'Immune Response.',
 'They confer protection mainly through humoral immune responses with little or no cellular immunity.',
 'Protection against IAV infection is also provided by the humoral immune response.',
 'Activation of innate immunity in response to IFV regulates anti-viral protective immunity 31 .',
 'Activation of innate immunity in response to IFV regulates anti-viral protective immunity 26 .',
 'Apart from the humoral immune response, cell-mediated immune response is also induced after mucosal vaccination.',
 'As the antigens are expressed intracellularly, both humoral and cell-mediated immunity can be activated to offer broad immune protection.',
 'Regardless of any humoral immune response elicited in these animals it is unlikely that antibody alone confers protection.',
 'Evidence indicates that the maternal immune system may tolerate fetal antigens by suppressing cell-mediated immunity while retaining normal humoral immunity.',
 'This result suggests

In [171]:
top_n_similar_sentences_LDA_glove_cosine(11, 5, 30)

QUESTION #11: Immune response and immunity


We saw what appears to be an innate immune response at the 10,000 PFU EBOV exposure level.


It has been suggested that EBOV can mediate an innate immunity response through stimulation of TLR-4 [28] .


Hence, the S2 subunit may serve as an important antigen for inducing 2 0 2 both humoral as well as cell-mediated immunity against SARS-CoV and SARS-CoV-2.


We chose Muc4 192 as our priority candidate gene for follow up, with the initial hypothesis that Muc4 suppressed 193 apoptosis and possible the interferon response, and that its absence in a Muc4 -/mouse would 194 therefore lead to increased apoptosis and inflammation, thereby inhibiting Our validation utilized a Muc4 -/mouse and confirmed a role for Muc4 in protection from 197 SARS-CoV-and CHIKV-induced disease and pathogenesis.


Other vaccine studies have Dosing 
 Most preventative vaccines are designed to elicit a humoral immune response, typically via the administration of whole protein from a pathogen.


The neutralization mechanism of these non-blocking antibodies is not clear yet.


Interaction of EBOV specific antibody, NHP lung tissue and EBOV delivered to NHPs via aerosol can produce a more lethal effect than in NHPs without 210 circulating anti-EBOV antibody exposed to aerosolized EBOV (unpublished conference presentation).


Other potential 72 targets included Lrrc33, Sec22a, Parp14, and Ildr1; however, these genes either have little 73 known linkage to viral replication and the immune response (e.g., Lrrc33, Sec22a, and Ildr1) or 74 the directionality of the correlation did not support their relationship to viral titer in the lung 75 (Parp14).


Developing potent and cross-protective therapeutic antibodies and vaccines is possible but could be challenging.


The neutralization mechanism for these non-blocking but cross-reactive antibodies is likely unrelated to ACE2 blockage.


However, coronavirus is a single-stranded RNA virus prone to rapid mutations during transmission, nAbs without cross-reactivity to a broad spectrum of viral mutants could lead to treatment failure 10, [27] [28] [29] , therefore, highly potent and cross-protective nAbs and prophylactic vaccines against SARS-CoV-2 are in urgent needs.


In contrast,
 a T-cell vaccine is meant to elicit a cellular immune response directing CD8+ NP44-52 is located within one of the EBOV nucleocapsid proteins considered 8 author/funder.


ELISPOT analysis of PBMCs taken from the peripheral blood of COVID-19 controllers and progres-285 sors to assess the presence of a differential response to the 53 peptides could lead to a broadly applicable protective CTL vaccine against COVID-19 by incorporating peptides into the vaccine that are more commonly targeted for CD8+ attack by the controllers versus the progressors.


This suggests that a CTL vaccine may be more effective for prophylaxis against filovirus protection than an antibody vaccine if the anticipated route of EBOV exposure is via aerosol.


Whether these non-blocking RBD antibodies also interfere with S2 protein's configuration change remains to be investigated.


Due to the limited number of antibodies obtained, it is difficult to conclude whether there is a consistent pattern between ACE2 blockage and neutralizing potency, but one can speculate that neutralizing antibodies targeting conserved epitopes outside the RBM region may be cross-protective but may also be less potent due to the lack of ACE2 blocking activities.


Mechanism of neutralization for the non-blocking RBD antibodies remains to be investigated.


Virus-specific nucleotidepositive and viral-protein seroconversion was observed in all patients tested and provides evidence of an association between the disease and the presence of this virus.


Although peptide vaccines are by their nature HLA restricted, it may be possible to create a CTL vaccine directed against EBOV for use alone or in conjunction with a whole protein vaccine to produce an antibody response in tandem, by incorporating additional Class I peptides from epitopes targeted by controllers to broaden the HLA cov- in-vitro testing is also possible.


The above experimental results are consistent with our structure modeling analysis, which indicates that SARS-CoV-2 virus likely infects human cells through similar mechanisms as SARS-CoV virus by binding to human ACE2 with comparable affinities, and hence may possess similar transmissibility.


IF N −γ release as quantified by ELISPOT after the spleens were harvested on day 14 100 after immunization showed that the immune response to the 9mer NP44-52 was higher than the immune response after vaccination with NP43-53 and that this difference was statistically significant (P < 0.0001) Figure 2 .


Understanding whether antibodies raised from SARS-CoV spike protein immunization have cross-reactivity to the new SARS-CoV-2 will offer important insights and guidance to therapeutic antibody and prophylactic vaccine development.


mouse using an adjuvanted microsphere peptide vaccine formulation containing NP44-52 is enough to confer immunity in mice.


If researchers act now during the COVID-19 outbreak, perhaps controller and progressor blood samples could be collected and prospectively analyzed, quickly creating a database of optimal candidate class I peptides for inclusion into a CTL 295 vaccine with potentially broad HLA coverage for subsequent rapid manufacture and deployment.


In the long term, broad-spectrum antiviral drugs and vaccines should be prepared for emerging infectious diseases that are caused by this cluster of viruses in the future.


Due to the nature of RNA virus' rapid mutation rates, changes in the S-protein's amino acid sequence, especially in the RBD's receptor binding motif, could have significant impact on virus infectivity, pathogenicity, transmissibility, and cross-protection from previous coronavirus infection, as well as therapeutic antibody and prophylactic vaccine development.


Neutralizing antibodies targeting epitopes in these regions could potentially have cross-protective activities against different mutant strains.


Although speculative, it is possible that Muc4 functions not to modulate local viral 244 replication, but rather to limit disseminated disease.


It is known that antibodies targeting the S2 region could block S2 protein's configuration change and hence interfere with virus entry into host cells.


The disease was determined to be caused by virus-induced pneumonia by clinicians according to clinical symptoms and other criteria, including a rise in body temperature, decreases in the number of lymphocytes and white blood cells (although levels of the latter were sometimes normal), new pulmonary infiltrates on chest radiography and no obvious improvement after treatment with antibiotics for three days.


Question 12: 

In [153]:
top_n_similar_sentences_glove_cosine(12,30)

QUESTION #12: Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings




['In resource-limited settings, relative to resource-adequate settings, there continues to be a paucity of data in support of infection prevention and control, and patient safety interventions to ensure that regional, if not national, healthcare systems work effectively to improve infection prevention and control interventions.',
 'Thus, studies to prevent transmission in healthcare setting are critical for the development of control measures.',
 'Hence, our current working toolbox available to control the spread of Ebola still hinges on supportive medical care to increase the survival of those infected and basic non-pharmaceutical public health measures [96] to prevent transmission, namely: 1) infection control measures including standard precautions in health care settings; 2) rapid contact tracing and isolation of infectious individuals; and 3) social distancing interventions in the community which may include the dissemination of awareness campaigns to inform the population on how 

In [172]:
top_n_similar_sentences_LDA_glove_cosine(12, 5, 30)

QUESTION #12: Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings


This type of transmission would make effective contact tracing challenging, and good respiratory and hand hygiene would be crucial to reduce this route of transmission, coupled with environmental decontamination in health-care settings.


If COVID-19 can be controlled by isolation and contact tracing, then public health efforts should be focused on this strategy; however, if this is not enough to control outbreaks, then additional resources might be needed for additional interventions.


International investment needs to be directed especially to countries with limited healthcare and public health surveillance capacity to enable the detection of cases and disease control [16, 17] Our estimation showed a lower risk of transmission in Africa and South America.


In this population-level observational study, we used crowdsourced reports from DXY.cn, a social network for Chinese physicians, health-care professionals, pharmacies, and health-care facilities established in 2000.


Future studies based on larger samples of patients with COVID-19 could explore in more detail the transmission dynamics of the outbreak in different locations, the effectiveness of interventions, and the demographic factors driving transmission.


Line list data can help assess the effectiveness of interventions and the potential for widespread transmission beyond the initial foci of infection.


The authors suggest an ongoing risk-based approach to the prioritisation of and investment by international and national agencies and authorities, in emergency interventions for the prevention of movement of 2019-nCoV (SARS-COV-2) through human travel.


Future research on the transmission characteristics could improve precision on control estimates.


Closure of certain routes, targeted airport screening, risk communication, public awareness and targeted training and vigilance of health workers associated with the portals of entry of visitors to their countries will help mitigate the force of further spread of 2019-nCoV.


We assessed the ability of isolation and contact tracing to control disease outbreaks in areas without widespread transmission using a mathematical model.


Notably, DXY.cn does not generate data outside of traditional surveillance systems but rather provides a channel of rapid communication between the public and health authorities.


12 Line lists provide unique information on the delays between symptom onset and detection by the health-care system, reporting delays, and travel histories.


A crowd sourced system would not be expected to catch all cases, especially if many cases are too mild to be captured by the health-care system, digital surveillance, or social media.


Furthermore, because patient data in our dataset were captured by the health system, they are biased towards the more severe spectrum of the disease, especially for patients from mainland China.


The model could be modified to include some transmission after isolation (such as in hospitals), which would decrease the probability of achieving control.


We simplified our model to determine the effect of contact tracing and isolation on the control of outbreaks under different scenarios of transmission; however, as more data becomes available, the model can be updated or tailored to particular public health contexts.


This might be of concern to local authorities for reducing the health-care surges, and might limit geographical spread.


It can be effective but might require intensive public health effort and cooperation to effectively reach and monitor all contacts.


To estimate trends in the strength of case detection and interventions, we analysed delays between symptom onset and visit to a health-care provider, at a hospital or clinic, and from seeking care at a hospital or clinic to reporting, by time period and location.


Our model did not include other control measures that might decrease the reproduction number and therefore also increase the probability of achieving control of an outbreak.


We searched DXY.cn, a Chinese health-care-oriented social network that broadcasts information from local and national health authorities, to reconstruct patient-level information on COVID-19 in China.


The robustness of control measures is likely to be affected both by differences in transmission between countries, but also by the concurrent number of cases that require contact tracing in each scenario.


We determined conditions in which case isolation, contact tracing, and preventing transmission by contacts who are infected would be sufficient to control a new COVID-19 outbreak in the absence of other control measures.


A quick diagnosis that leads to quarantine and integrated interventions will have a major impact on its future trend.


7 These efforts can help generate and disseminate detailed information in the early stages of an outbreak when little other data are available, enabling independent estimation of key parameters that affect interventions.


The main issue with the quality of patient-level data obtained during health emergencies is the potential lack of information from locations overwhelmed by the outbreak (in this case, Hubei province and other provinces with weaker health infrastructures).


We further looked in more detail at the risk to Africa where the health infrastructure would be challenged tracking a new epidemic across its 54 countries.


The aim of the current study was to explore the effect of sustained transmission from the four Chinese cities of Wuhan, Beijing, Shanghai and Guangzhou on international disease importation risk to 168 countries and territories, with a specific focus on Africa where current levels of healthcare infrastructure could provide a significant challenge for managing this novel epidemic.


Hence countries ranked as high risk in our model (4 th and 3 rd quantiles) should take all steps necessary to ensure prompt detection of cases and the capacity to manage these cases to prevent ongoing spread.


31 Values of the natural history represent the current best understanding of COVID-19 transmission, and we used 20 index cases and a short delay to isolation to represent a relatively large influx into a setting of high awareness of possible infection.


Question 13: 

In [155]:
top_n_similar_sentences_glove_cosine(13,30)

QUESTION #13: Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings, mask, google, gloves




['Effective use of personal protective equipment (PPE) is essential to protect personnel and patients in healthcare settings [1] .',
 'Effective use of personal protective equipment (PPE) by healthcare workers (HCWs) is an important component of infection prevention in healthcare settings.',
 'Third, healthcare providers need to be better trained to maximize the effectiveness of infection control measures, including use of masks, respirators, and other personal protective equipment.',
 'A key aspect of patient isolation is proper use of personal protective equipment (PPE) to protect HCWs from pathogen exposure during patient care.',
 'have shown that in health care settings, the use of masks could reduce the transmission of infl uenza [23] .',
 'Other factors that affect the use of personal protective equipment, such as staff and management attitudes about the value of respirator use, fatigue and the availability of replacement masks, also need to be considered.',
 'Triage, isolation a

In [173]:
top_n_similar_sentences_LDA_glove_cosine(13, 5, 30)

QUESTION #13: Effectiveness of personal protective equipment (PPE) and its usefulness to reduce risk of transmission in health care and community settings


However, there are still many problems with the current psychological interventions, including effective utilization of Internet resources/tools, and efficient cooperation between medical staffs and psychologists.


Whilst strong epidemiology and surveillance systems are indispensable tools for the detection and monitoring of outbreaks and public health emergencies, strong primary care systems form the foundation of any emergency response.


Decent access to primary health care is essential in health emergencies, and its infrastructure crucial for containment, 10 just as good access to high-quality primary care is at the foundation of any strong health system.


As the 'front door' of the health system, primary care professionals should be involved in planning and action for health emergency risk management.


5) is an effective method for understanding and managing psychological impacts among medical staffs, including managing the full risk and resilience in the responder "hazard specific" stress.


9 Strong health systems built on comprehensive primary care are able to integrate both functions, disseminating the emergency response resources and information required to community-level staff who have the breadth of training required to manage new suspected cases alongside routine family medicine.


In the UK, primary care handles over 95% of all health system activity.


To guarantee their continued effective work, their mental health status should be monitored and a continuum of timely interventions should be made available to support them.


[3] [4] [5] Sudden outbreaks of public health events always pose huge challenges to the mental health service system.


8 suggested three important factors: 1) multidisciplinary mental health teams (psychiatrists, psychiatric nurses, clinical psychologists, and other mental health professionals), 2) clear communication with regular and accurate updates about the COVID-19 outbreak, and 3) establishment of secure services to provide psychological counseling (e.g., electronic devices and applications).


11 With the support for remote psychological intervention provided by the development of Internet technology, especially the widespread application of 4G or 5G networks and smartphones, we developed a new intervention model to handle the present COVID-19 public health event.


WHO member states have repeatedly affirmed their commitment to developing their primary care systems with a view to training up community-based health professionals who are able to provide care across the spectrum of prevention, preparedness, response, and recovery.


8 could be a good reference for providing mental health care in response to the COVID-19 outbreak in Korea.


4 The resources needed to do this effectively may not be routinely available, however, particularly in low-resource settings.


7 The absence of mental health and psychosocial support systems and the lack of well-trained psychiatrists and/or psychologists in these regions increased the risks of psychological distress and progression to psychopathology.


Responding to such concerns is not usually part of public health approaches to epidemic communications, which emphasise biomedical and epidemiological information.


Tracking heroisation and blame dynamics in realtime, as epidemics unfold, can help health authorities to understand public attitudes both to the threats posed by epidemics and the hope offered by health interventions, to fine-tune targeted health communication strategies accordingly, to identify and amplify local and international heroes, to identify and counter attempts to blame, scapegoat, and spread misinformation, and to improve crisis management practices for the future.


Heroes can include, for example, whistle-blowers (who put their careers on the line to alert the public) and health workers (who generate essential information while doing their work).


Follow-up is performed regardless of whether the individual reports mental health problems or not.


WONCA (the global professional body for family medicine) has actively championed the ways in which primary care can be supported to deliver care during population emergencies.


But trust is a crucial support to public health systems.


It is also necessary that the detailed mental health problems of the confirmed or suspected COV-ID-19 patients, healthcare workers treating the infected patients, and public should be collected to update the information regarding the distress caused by infectious disease outbreaks and provide more advancedmental health care for COVID-19 pneumonia.


15 In the APD process, medical staffs receive a pre-event stress training focusing on the psychosocial impact of high-casualty events on the hospital and field disaster settings.


Medical staffs working for the quarantined are the special group who need a lot of social support, and they are also an important force to provide social support for the isolated patients.


To improve timely access to data in the context of the COVID-19 emergency the Bulletin of the World Health Organization will implement an "COVID-19 Open" data sharing and reporting protocol, which will apply during the current COVID-19 emergency.


National primary care bodies can coordinate with public health leads to cascade information to practitioners, communicate with the public, and collate health intelligence from the frontline primary care.


Psychological assistance (such as hotline, online consulting) is used to identify and help the target groups who need intervention.


12 Specifically, psychological crisis interventions should be integrated into the treatment of pneumonia and blocking of the transmission routes.


Moreover, mental health care services for the COVID-19 outbreak are being provided by national hospitals and community mental health centers throughout the country.


WHO has asked all countries to prepare for cases including through surveillance, tracing, treatment and isolation practices, and by sharing data.


Question 14: 

In [157]:
top_n_similar_sentences_glove_cosine(14,30)

QUESTION #14: Role of the environment in transmission




['Uncov- ering the basic mechanisms governing complex human behaviors in resource-poor urban environments is paramount for developing better infrastructure, fostering local economic development and responding to the emergence, transmission and propagation of infectious disease threats.',
 'Thus, studies to prevent transmission in healthcare setting are critical for the development of control measures.',
 'Therefore, improved understanding of the potential wildlife host involvement in the transmission of emerging flaviviruses is essential.',
 'Economic behavior is known to play a key role in disease transmission.',
 'Static and dynamic interactions can play a role in disease transmission [3, 4] .',
 'Human, animal, and vector interactions play a major role in disease transmission and form a dynamic transmission cycle.',
 'These events underscore the potential for aerosol transmission in non-health care settings and the dramatic role such transmission can play in the global transmission 

In [174]:
top_n_similar_sentences_LDA_glove_cosine(14, 5, 30)

QUESTION #14: Role of the environment in transmission


This review does not aspire to cover the large subject of human-to-human transmission control, but also here a mixture of measures is important from strictly medical (transmission routes, efficiency of PPE, vaccines, antivirals and so on) to more social science-oriented (How do people behave when they suspect they could be infected?


There is an urgent need for the implementation of multidisciplinary One Health to address the current complex health challenges at the human-animal-environment interface [42, 43] One Health approaches in China have recently been described [49] [50] [51] [52] .


[3] Studies also showed that the virus has strong human-to-human transmission capability.


The implementation and development of One Health collaborations on a global scale are critical to reduce the threats of emerging viruses [42, 43] .


Fuelled by the availability of new research technologies, as well as changing disease, cost and other pressing issues of our time, further growth in this exciting space will undoubtedly continue.


2 These frame Australia's Medical Research and Innovation Priorities, which include antimicrobial resistance, global health and health security, drug repurposing and translational research infrastructure, 15 capturing many of the key elements of this CTI Special Feature.


It is therefore an urgent priority for local and international health and wildlife regulatory authorities to structure and implement robust control mechanisms that effectively reduce human exposure to wild game meat and their products.


We must also take full advantage of existing knowledge and experience to improve the diagnosis, treatment, prevention, and control of the disease and accelerate the development of drugs and vaccines to save lives.


Multidisciplinary research in biomedical, social, and environmental sciences is required to achieve a deeper understanding of disease transmission and develop more effective systems for emergency response.


iii) Decreasing the human-to-human transmission.


4 I anticipate international efforts in these areas over the coming decade will enable the tapping of useful new biological functions and processes, methods for controlling infection, and the deployment of symbiotic or subclinical viruses in new therapies and biotechnologies that are so crucially needed.


Therefore, it is needless to state that for each phase in the control and mitigation of a novel emerging pathogen, adequate laboratory preparedness and response are crucial.


Managing this requires international cooperation using traditional and proven public health strategies that ultimately succeeded inthe SARS epidemic.


Infections due to SARS-CoV-2 among healthcare workers and family clusters were also reported and human-to-human transmission has been confirmed [37] , however further investigations are required to determine and understand the full extent of this mode of transmission.


Therefore, we maintain that the urgent implementation and monitoring of diagnostic capacities and capabilities for SARS-CoV-2 in Europe is proportional to the containment and expected mitigation phase of the global public health response and is critical for care of local patients.


Nakagawa and colleagues here report on their latest experiments using this system, further improving its performance for use in resource-poor contexts for meningitis diagnoses.


The spread of infectious diseases is affected not only by the biological characteristics of the pathogen but also by various other factors such as politics, culture, economy, and the environment.


There are two main lines of combat against this public health threat: (1) control and prevention of the epidemic and (2) scientific research.


Emerging infections in humans and animals, along with other threats such as antimicrobial resistance, are difficult challenges to humanity, to a large extent driven by increasing food production and other issues related to a growing and more resource-demanding population.


In this Clinical & Translational Immunology Special Feature, I illustrate a strategic vision integrating these themes to create new, effective, economical and robust antiviral therapies and immunotherapies, with both the realities and the opportunities afforded to researchers working in our changing world squarely in mind.


So far, there is no evidence of airborne transmission of the SARS-CoV-2, however precautionary measures are recommended due to the lack of information excluding this mode of transmission.


While the current response strategy is still focused on containment, it is becoming increasingly clear that the epidemic may turn global, in which case mitigation will be the next option to control the impact of the pandemic.


7 They also discuss their recent work revealing how two IFN-cinducible factors exhibit broad-spectrum inhibition of IAV, measles (MV), zika (ZikV) and HIV by suppressing furin activity.


For the effective control of the spread of a newly identified virus, we must first understand its infection and pathogenicity patterns, as quickly and as thoroughly as possible, to provide insights into the outbreak and develop targeted prevention and control strategies.


Each preventable zoonotic outbreak costs the country of origin and the world vast amounts of money and resources, and an inestimable cost in human lives, and if emerging zoonotic outbreaks can be prevented by severely limiting human exposure to wild animals and their trade, then effective measures to ensure that this occurs should be implemented by regulatory government authorities globally as soon as it is practicable.


Scientific research is of vital importance for tackling emerging infectious diseases and developing effective intervention methods.


Ominously, a further mathematical model, proposed by Tang et al, [9] suggests that the basic reproductive number for SARS-CoV-2 might be as high as 6.47. outbreak to the global economy will, without doubt, be scrupulously studied after the present outbreak ends, and the global economic costs will be immense, and the human cost, agonizing.


6 These RBPs include tristetraprolin and AUF1, which promote degradation of AU-rich element (ARE)-containing mRNA; members of the Roquin and Regnase families, which respectively promote or effect degradation of mRNAs harbouring stem-loop structures; and the increasingly apparent role of the RNA methylation machinery in controlling inflammatory mRNA stability.


By expanding and cooperatively leveraging our respective research strengths, our efforts may yet solve the many pressing disease, cost and other sustainability issues of our time.


One of the cornerstones and a prerequisite for a proper public health and clinical response is the availability of a reliable diagnostic and reference laboratory service with adequate capacity.
