# TopicAnalyser

For those curious about Data Science, jupyter notebooks are an essential part of experimentation. You can quickly test out different AI/ML models on different data. Feel free to experiment with different methods of topic analysis here, or try with different data here. Consider making your experimented code available in TopicAnalyser.py, and part of your API in main.py.

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.datasets import fetch_20newsgroups
from sklearn.decomposition import NMF, LatentDirichletAllocation

def display_topics(model, feature_names, no_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic %d:" % (topic_idx))
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-no_top_words - 1:-1]]))

# dataset = fetch_20newsgroups(shuffle=True, random_state=1, remove=('headers', 'footers', 'quotes'))
# documents = dataset.data
# print(documents)
documents = []
with open('../frontend_notebook/articles/article1.txt') as f:
    documents = f.readlines()

# print("*" * 25)
# print(documents)

no_features = 1000

# NMF is able to use tf-idf
tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(documents)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()

# # LDA can only use raw term counts for LDA because it is a probabilistic graphical model
# tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')
# tf = tf_vectorizer.fit_transform(documents)
# tf_feature_names = tf_vectorizer.get_feature_names()

no_topics = 20

# Run NMF
nmf = NMF(n_components=no_topics, random_state=1, alpha=.1, l1_ratio=.5, init='nndsvd').fit(tfidf)

# Run LDA
# lda = LatentDirichletAllocation(n_components=no_topics, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(tf)

no_top_words = 10
display_topics(nmf, tfidf_feature_names, no_top_words)
# display_topics(lda, tf_feature_names, no_top_words)

Topic 0:
reduced significantly overhaul strategic capital db strength leverage market exposure
Topic 1:
bank ratings deutsche upgrade moody placed upgrades outlook performance agency
Topic 2:
revenue businesses bank market segments global attrition areas good added
Topic 3:
upgrade key finance credit db debt deutsche diversified earnings equity
Topic 4:
cost sustained db achieve leverage restructuring operating equity attrition billion
Topic 5:
risk db billion unsecured half debt diversified asset segments exposure
Topic 6:
progress deutsche bank new activities transforming officer added ago chief
Topic 7:
business cost half transforming chief model officer sustainable great profit
Topic 8:
sustainable ahead targets bank finance key deutsche db moody goals
Topic 9:
bank markets base capital db substantial sound profit strengths ago
Topic 10:
revenue core capital db restructuring model strengths ahead growing goals
Topic 11:
capital bank credit good noted rating sound substantial equity