<a href="https://colab.research.google.com/github/textspur/testland/blob/main/BIGSSS_BERTopic_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Go to Runtime -> Change runtime type -> Set "Hardware Accelerater" to "GPU" (if not set already)
---



# BERTopic example
BERTopic is a topic modeling technique that leverages transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports all kinds of topic modeling techniques.

More info in the [documentation](https://maartengr.github.io/BERTopic/getting_started/quickstart/quickstart.html), [Github](https://github.com/MaartenGr/BERTopic) and in a [preprint](https://arxiv.org/abs/2203.05794)


In [1]:
pip install bertopic



## Data set
Newsgroup - The fetch_20newsgroups dataset is a collection of ~18,000 newsgroup documents from 20 different newsgroups


In [2]:
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

In [3]:
print(docs[676])
print("type of object docs: ", type(docs))
# keep the first half
docs = docs[:len(docs)//2]
print("number of documents: ", len(docs))


I've been wondering about this myself.  The house wiring thing is really
hokey.  There is no doubt that high pressure ultrasound is annoying, but to
whom?  Given that these devices have been advertised to be effective against
everything from insects to rodents to nasty dogs, what is to say that my
insect repeller won't just annnoy my dog and give me headaches?  Could there
be that much selectivity in frequencies?  Have there been ANY studies
on the effects of various pressure levels, bands, and sweep patterns on
various life forms?

And how effective could they be?  I certainly would not want to tell anyone
that they are safe from nasty dogs because they were carrying a piezoelectric
buzzer...

type of object docs:  <class 'list'>
number of documents:  9423


## Run BERTopic


Use BERTopic(language="multilingual") to select a model that supports 50+ languages.

In [4]:
from bertopic import BERTopic

topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)


After generating topics and their probabilities, we can access the frequent topics that were generated:


In [5]:
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,3365,-1_the_to_of_is,"[the, to, of, is, and, in, that, it, for, this]",[----- Begin Included Message -----\n\nThe fol...
1,0,873,0_game_he_team_was,"[game, he, team, was, games, play, 10, players...","[Scoring stats for the Swedish NHL players, Ap..."
2,1,381,1_space_shuttle_launch_nasa,"[space, shuttle, launch, nasa, the, mission, o...",[Archive-name: space/controversy\nLast-modifie...
3,2,276,2_whatta_cheek_ditto_ass,"[whatta, cheek, ditto, ass, hello, why, hi, ea...","[\n\nDitto,, \n\tWhatta ass!!!!!\n\n, \n \n ..."
4,3,272,3_key_clipper_encryption_chip,"[key, clipper, encryption, chip, keys, be, gov...","[April 16, 1993\n\nINITIAL EFF ANALYSIS OF CLI..."
...,...,...,...,...,...
121,120,11,120_tape_tapes_marker_backup,"[tape, tapes, marker, backup, holes, 250mb, re...",[\n\tGreetings. There are 3 types of warnings ...
122,121,10,121_w4wg_mail_lan_workgroups,"[w4wg, mail, lan, workgroups, vax, network, ga...",[This may be a simple question but:\n\nWe have...
123,122,10,122_lock_kryptonite_cable_zipper,"[lock, kryptonite, cable, zipper, locks, bicyc...","[Greetings netters,\n\nSteve writes ... <about..."
124,123,10,123_bmw_bmws_r80_moa,"[bmw, bmws, r80, moa, parts, bike, insurance, ...","[Hi,\n I'm now in the market for buying a BM..."


## Visualisation

In [6]:
topic_model.visualize_barchart()

 -1 refers to all outliers and should typically be ignored. Next, let's take a look at the most frequent topic that was generated, topic 0:


In [7]:
topic_model.get_topic(0)


[('game', 0.01237810444676795),
 ('he', 0.009725829504340913),
 ('team', 0.009626583134544285),
 ('was', 0.00741789742754877),
 ('games', 0.007249313095590201),
 ('play', 0.007128070946973859),
 ('10', 0.007078644722751718),
 ('players', 0.0069851367684264755),
 ('the', 0.006753076026264185),
 ('his', 0.0066632457304438624)]

In [8]:
topic_model.get_document_info(docs)

Unnamed: 0,Document,Topic,Name,Representation,Representative_Docs,Top_n_words,Probability,Representative_document
0,\n\nI am sure some bashers of Pens fans are pr...,0,0_game_he_team_was,"[game, he, team, was, games, play, 10, players...","[Scoring stats for the Swedish NHL players, Ap...",game - he - team - was - games - play - 10 - p...,1.0,False
1,My brother is in the market for a high-perform...,5,5_card_monitor_video_drivers,"[card, monitor, video, drivers, vga, windows, ...",[I have a Radius Precision Color 24x video car...,card - monitor - video - drivers - vga - windo...,1.0,False
2,\n\n\n\n\tFinally you said what you dream abou...,-1,-1_the_to_of_is,"[the, to, of, is, and, in, that, it, for, this]",[----- Begin Included Message -----\n\nThe fol...,the - to - of - is - and - in - that - it - fo...,0.0,False
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,12,12_scsi_scsi2_drive_scsi1,"[scsi, scsi2, drive, scsi1, drives, ide, contr...",[The above does not tell the proper story of S...,scsi - scsi2 - drive - scsi1 - drives - ide - ...,1.0,False
4,1) I have an old Jasmine drive which I cann...,-1,-1_the_to_of_is,"[the, to, of, is, and, in, that, it, for, this]",[----- Begin Included Message -----\n\nThe fol...,the - to - of - is - and - in - that - it - fo...,0.0,False
...,...,...,...,...,...,...,...,...
9418,\nAssume in this case the usual canard-adversa...,3,3_key_clipper_encryption_chip,"[key, clipper, encryption, chip, keys, be, gov...","[April 16, 1993\n\nINITIAL EFF ANALYSIS OF CLI...",key - clipper - encryption - chip - keys - be ...,1.0,False
9419,\n\nIt is a long standing good luck Redwing's ...,0,0_game_he_team_was,"[game, he, team, was, games, play, 10, players...","[Scoring stats for the Swedish NHL players, Ap...",game - he - team - was - games - play - 10 - p...,1.0,False
9420,\nWhich translates to 7% not satisfied. I don...,91,91_join_doublespace_dos_stacker,"[join, doublespace, dos, stacker, dos6, workgr...",[\nHe's right ya know. I've helped to install...,join - doublespace - dos - stacker - dos6 - wo...,1.0,False
9421,\nThe willingness of the majority of the peopl...,111,111_government_libertarians_libertarian_regula...,"[government, libertarians, libertarian, regula...",[\nI'm afraid that I've lost the thread here. ...,government - libertarians - libertarian - regu...,1.0,False


# Reduce topic outliers
By default, BERTopic generates outliers which is helpful in creating meaningful topic representations. However, you might want to assign every single document to a topic. We can use .reduce_outliers to map some or all outliers to a topic or use are several different ways to reduce outliers.

## Assign to existing topics

In [9]:
# Reduce outliers
reduced_outliers_topics = topic_model.reduce_outliers(docs, topics)



In [10]:
topic_model.update_topics(docs, topics=reduced_outliers_topics)

In [11]:
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,0,918,0_game_he_team_was,"[game, he, team, was, 10, games, play, the, pl...","[Scoring stats for the Swedish NHL players, Ap..."
1,1,454,1_space_the_shuttle_of,"[space, the, shuttle, of, launch, nasa, and, m...",[Archive-name: space/controversy\nLast-modifie...
2,2,279,2_ypu_hello_why_whatta,"[ypu, hello, why, whatta, stereotype, cheek, d...","[\n\nDitto,, \n\tWhatta ass!!!!!\n\n, \n \n ..."
3,3,426,3_key_encryption_clipper_be,"[key, encryption, clipper, be, chip, to, the, ...","[April 16, 1993\n\nINITIAL EFF ANALYSIS OF CLI..."
4,4,289,4_israel_jews_israeli_arab,"[israel, jews, israeli, arab, jewish, of, the,...","[\n\n""Assuming""? Also: come on, Brad. If we ar..."
...,...,...,...,...,...
120,120,16,120_tape_tapes_marker_irwin,"[tape, tapes, marker, irwin, 16ea, backup, hol...",[\n\tGreetings. There are 3 types of warnings ...
121,121,24,121_w4wg_mail_network_lan,"[w4wg, mail, network, lan, gateway, workgroups...",[This may be a simple question but:\n\nWe have...
122,122,17,122_lock_cable_kryptonite_600,"[lock, cable, kryptonite, 600, zipper, locks, ...","[Greetings netters,\n\nSteve writes ... <about..."
123,123,17,123_bmw_bmws_r80_moa,"[bmw, bmws, r80, moa, parts, bike, dealer, dod...","[Hi,\n I'm now in the market for buying a BM..."


## Change clustering algorithm

In [12]:
from bertopic import BERTopic
from sklearn.cluster import KMeans

cluster_model = KMeans(n_clusters=50)
topic_model_kmeans = BERTopic(hdbscan_model=cluster_model)
topicsKmeans, probsKmeans = topic_model_kmeans.fit_transform(docs)

topic_model_kmeans.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,0,358,0_of_and_to_is,"[of, and, to, is, the, in, that, patients, it,...",[I can not believe the way this thread on cand...
1,1,351,1_he_year_the_in,"[he, year, the, in, his, to, hit, was, and, but]",[\nYes. But this is *irrelevant*. You're tal...
2,2,326,2_window_to_the_this,"[window, to, the, this, is, motif, it, on, if,...",[Im new to Xlib programming. Im having a probl...
3,3,301,3_the_israel_of_that,"[the, israel, of, that, to, in, and, you, not,...",[THE WHITE HOUSE\n\n Office...
4,4,295,4_car_the_cars_engine,"[car, the, cars, engine, it, is, for, and, in,...",[\n\nMy whole point was not to say that the ca...
5,5,293,5_key_the_clipper_to,"[key, the, clipper, to, be, encryption, chip, ...","[Hmm, followup on my own posting... Well, who ..."
6,6,282,6_of_is_that_the,"[of, is, that, the, to, not, and, it, god, are]",[= \n= : [ The discussion begins: why does th...
7,7,279,7_the_game_he_team,"[the, game, he, team, was, in, to, hockey, and...","[It was unlikely, improbable. For the Bruins,..."
8,8,279,8_drive_scsi_ide_drives,"[drive, scsi, ide, drives, controller, disk, t...",[I have a 486sx25 computer with a 105 Mg Seaga...
9,9,278,9_god_that_is_of,"[god, that, is, of, to, the, not, jesus, in, and]",[Someone sent me this FAQ by E-mail and I post...


In [13]:
topic_model_kmeans.get_document_info(docs)

Unnamed: 0,Document,Topic,Name,Representation,Representative_Docs,Top_n_words,Representative_document
0,\n\nI am sure some bashers of Pens fans are pr...,7,7_the_game_he_team,"[the, game, he, team, was, in, to, hockey, and...","[It was unlikely, improbable. For the Bruins,...",the - game - he - team - was - in - to - hocke...,False
1,My brother is in the market for a high-perform...,10,10_card_monitor_video_it,"[card, monitor, video, it, the, for, drivers, ...",[Experiences with Diamond Viper VLB video card...,card - monitor - video - it - the - for - driv...,False
2,\n\n\n\n\tFinally you said what you dream abou...,31,31_armenian_the_of_and,"[armenian, the, of, and, armenians, were, in, ...",[Accounts of Anti-Armenian Human Right Violati...,armenian - the - of - and - armenians - were -...,False
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,8,8_drive_scsi_ide_drives,"[drive, scsi, ide, drives, controller, disk, t...",[I have a 486sx25 computer with a 105 Mg Seaga...,drive - scsi - ide - drives - controller - dis...,False
4,1) I have an old Jasmine drive which I cann...,8,8_drive_scsi_ide_drives,"[drive, scsi, ide, drives, controller, disk, t...",[I have a 486sx25 computer with a 105 Mg Seaga...,drive - scsi - ide - drives - controller - dis...,False
...,...,...,...,...,...,...,...
9418,\nAssume in this case the usual canard-adversa...,5,5_key_the_clipper_to,"[key, the, clipper, to, be, encryption, chip, ...","[Hmm, followup on my own posting... Well, who ...",key - the - clipper - to - be - encryption - c...,False
9419,\n\nIt is a long standing good luck Redwing's ...,7,7_the_game_he_team,"[the, game, he, team, was, in, to, hockey, and...","[It was unlikely, improbable. For the Bruins,...",the - game - he - team - was - in - to - hocke...,False
9420,\nWhich translates to 7% not satisfied. I don...,19,19_windows_dos_it_the,"[windows, dos, it, the, have, problem, to, mem...","[Hi Netters,\n\nAs promised, here are the summ...",windows - dos - it - the - have - problem - to...,False
9421,\nThe willingness of the majority of the peopl...,32,32_the_to_government_of,"[the, to, government, of, that, and, people, i...",[\nIt appears it is time that this article (or...,the - to - government - of - that - and - peop...,False


# Guided topic model

In [15]:
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]
docs = docs[:len(docs)//2]

seed_topic_list = [["drug", "cancer", "drugs", "doctor"],
                   ["windows", "drive", "dos", "file"],
                   ["space", "launch", "orbit", "lunar"]]

topic_model = BERTopic(seed_topic_list=seed_topic_list)
topics, probs = topic_model.fit_transform(docs)

# After generating topics and their probabilities, we can access the frequent topics that were generated:

topic_model.get_topic_info()



Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,3310,-1_to_the_is_and,"[to, the, is, and, of, you, it, for, in, that]",[\nIf I have a habit that I really want to bre...
1,0,871,0_game_he_team_was,"[game, he, team, was, the, games, 10, play, pl...","[Scoring stats for the Swedish NHL players, Ap..."
2,1,452,1_space_launch_the_shuttle,"[space, launch, the, shuttle, orbit, of, nasa,...",[Archive-name: space/astronaut\nLast-modified:...
3,2,360,2_cancer_medical_patients_disease,"[cancer, medical, patients, disease, doctor, o...",[------------- cut here -----------------\nUni...
4,3,276,3_idjits_whatta_cheek_dancing,"[idjits, whatta, cheek, dancing, ditto, ass, w...",[\n \n ...
...,...,...,...,...,...
109,108,11,108_book_commentary_books_introduction,"[book, commentary, books, introduction, harper...",[From time to time I have made reference to a ...
110,109,11,109_context_jim_stephen_trm,"[context, jim, stephen, trm, telepathy, you, q...",[\n(1) Stephen said you took a quote out of c...
111,110,11,110_shaft_wheelie_wheelies_shaftdrive,"[shaft, wheelie, wheelies, shaftdrive, chain, ...","[: \n: >I bought it, I tried it:\n: \n: >It is..."
112,111,11,111_turn_monitor_electricity_off,"[turn, monitor, electricity, off, computer, co...",[\nthis is a bad idea. my machine is on 24 hou...


In [16]:
similar_topics, similarity = topic_model.find_topics("lunar", top_n=5)
topic_model.get_topic(similar_topics[0])

[('space', 0.02001896230657995),
 ('launch', 0.010607673512321716),
 ('the', 0.009232033091110594),
 ('shuttle', 0.009165128058046944),
 ('orbit', 0.008653373295278856),
 ('of', 0.008154579327037498),
 ('nasa', 0.007982041631793177),
 ('mission', 0.007496629788725885),
 ('and', 0.007192128154726913),
 ('to', 0.007050238205802654)]

In [17]:
topic_model.visualize_barchart(n_words= 8, top_n_topics=10)

In [18]:
topic_model.get_topic(0)
topic_model.get_document_info(docs)

Unnamed: 0,Document,Topic,Name,Representation,Representative_Docs,Top_n_words,Probability,Representative_document
0,\n\nI am sure some bashers of Pens fans are pr...,0,0_game_he_team_was,"[game, he, team, was, the, games, 10, play, pl...","[Scoring stats for the Swedish NHL players, Ap...",game - he - team - was - the - games - 10 - pl...,1.000000,False
1,My brother is in the market for a high-perform...,5,5_card_monitor_video_vga,"[card, monitor, video, vga, drivers, windows, ...","[Has anyone connected a high-res, fixed freque...",card - monitor - video - vga - drivers - windo...,1.000000,False
2,\n\n\n\n\tFinally you said what you dream abou...,9,9_armenian_armenians_were_turkish,"[armenian, armenians, were, turkish, and, of, ...",[\n\nA typical Armenian revisionist. As in the...,armenian - armenians - were - turkish - and - ...,0.943913,False
3,\nThink!\n\nIt's the SCSI card doing the DMA t...,-1,-1_to_the_is_and,"[to, the, is, and, of, you, it, for, in, that]",[\nIf I have a habit that I really want to bre...,to - the - is - and - of - you - it - for - in...,0.000000,False
4,1) I have an old Jasmine drive which I cann...,-1,-1_to_the_is_and,"[to, the, is, and, of, you, it, for, in, that]",[\nIf I have a habit that I really want to bre...,to - the - is - and - of - you - it - for - in...,0.000000,False
...,...,...,...,...,...,...,...,...
9418,\nAssume in this case the usual canard-adversa...,4,4_key_clipper_encryption_chip,"[key, clipper, encryption, chip, keys, be, gov...","[Hmm, followup on my own posting... Well, who ...",key - clipper - encryption - chip - keys - be ...,1.000000,False
9419,\n\nIt is a long standing good luck Redwing's ...,0,0_game_he_team_was,"[game, he, team, was, the, games, 10, play, pl...","[Scoring stats for the Swedish NHL players, Ap...",game - he - team - was - the - games - 10 - pl...,1.000000,False
9420,\nWhich translates to 7% not satisfied. I don...,-1,-1_to_the_is_and,"[to, the, is, and, of, you, it, for, in, that]",[\nIf I have a habit that I really want to bre...,to - the - is - and - of - you - it - for - in...,0.000000,False
9421,\nThe willingness of the majority of the peopl...,106,106_government_libertarians_libertarian_regula...,"[government, libertarians, libertarian, regula...",[\nI'm afraid that I've lost the thread here. ...,government - libertarians - libertarian - regu...,1.000000,False


# South Park example

In [19]:
import pandas as pd

# Only first 5 episodes
urls = ["https://raw.githubusercontent.com/BobAdamsEE/SouthParkData/master/by-season/Season-{}.csv".format(i) for i in range(1, 20)]


south_park_data = pd.concat([pd.read_csv(url) for url in urls[0:5]])
south_park_data.head()

Unnamed: 0,Season,Episode,Character,Line
0,1,1,Boys,"School day, school day, teacher's golden ru...\n"
1,1,1,Kyle,"Ah, damn it! My little brother's trying to fol..."
2,1,1,Ike,Zeeponanner.\n
3,1,1,Kyle,"Ike, you can't come to school with me. \n"
4,1,1,Cartman,"Yeah, go home you little dildo.\n"


In [24]:
main_characters = ["Cartman", "Kyle", "Stan", "Kenny"]
south_park_data = south_park_data[south_park_data['Character'].isin(main_characters)]
south_park_data.head()

Unnamed: 0,Season,Episode,Character,Line
1,1,1,Kyle,"Ah, damn it! My little brother's trying to fol..."
3,1,1,Kyle,"Ike, you can't come to school with me. \n"
4,1,1,Cartman,"Yeah, go home you little dildo.\n"
5,1,1,Kyle,"Dude, don't call my brother a dildo!\n"
6,1,1,Stan,What's a dildo?\n


In [27]:
south_park_docs = south_park_data['Line']
south_park_docs[1]

1    Ah, damn it! My little brother's trying to fol...
1                                              What?\n
Name: Line, dtype: object

In [28]:
sp_topic_model = BERTopic()
sp_topics, sp_probs = sp_topic_model.fit_transform(south_park_docs)

In [29]:
# After generating topics and their probabilities, we can access the frequent topics that were generated:
sp_topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,2934,-1_cartman_it_to_the,"[cartman, it, to, the, you, your, we, and, hav...","[...Okay Cartman, what do you want this time?\..."
1,0,167,0_kyle_kyles_bitch_kidney,"[kyle, kyles, bitch, kidney, brother, friend, ...","[Kyle! Kyle!\n, I'm Kyle.\n, Kyle?\n]"
2,1,151,1_jesus_lord_confess_sins,"[jesus, lord, confess, sins, church, crackers,...","[Jesus, ix-nay on the ool-schay.\n, Well, you ..."
3,2,142,2_him_hes_he_dead,"[him, hes, he, dead, gone, found, forget, yet,...","[Do you see him?\n, Yeah. Do you know him?\n, ..."
4,3,136,3_dad_mom_father_dads,"[dad, mom, father, dads, my, nyah, nurection, ...","[Dad.\n, Mom! Dad!\n, Mom! Dad!\n]"
...,...,...,...,...,...
249,248,10,248_cirque_quintuplets_du_cheville,"[cirque, quintuplets, du, cheville, mayors, ro...",[Will you guys be in our Cirque du Celville?\n...
250,249,10,249_house_size_houses_folks,"[house, size, houses, folks, storm, tail, lobs...","[Is this the right house?\n, Oh, come on, Toke..."
251,250,10,250_hurry_hammer_faster_catch,"[hurry, hammer, faster, catch, weve, come, qui...","[You guys, we have to hurry!\n, Hurry up, you ..."
252,251,10,251_what_about__,"[what, about, , , , , , , , ]","[What?\n, What?\n, What??\n]"


## Remove stopwords and infrequent terms

One component of BERTopic is the CountVectorizer and c-TF-IDF calculation. Together, they are responsible for creating the topic representations and luckily can be quite flexible in parameter tuning.


In [30]:
from sklearn.feature_extraction.text import CountVectorizer

# Fine-tune topic representations after training BERTopic
vectorizer_model = CountVectorizer(stop_words="english", ngram_range=(1, 3), min_df=10)
sp_topic_model.update_topics(south_park_docs, vectorizer_model=vectorizer_model)

In [31]:
sp_topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,2934,-1_cartman_isnt_stupid_kenny,"[cartman, isnt, stupid, kenny, cartmans, thats...","[...Okay Cartman, what do you want this time?\..."
1,0,167,0_kyle_bitch_friend_shes,"[kyle, bitch, friend, shes, looking, best, soo...","[Kyle! Kyle!\n, I'm Kyle.\n, Kyle?\n]"
2,1,151,1_jesus_night_god_walk,"[jesus, night, god, walk, hell, needs, eat, sa...","[Jesus, ix-nay on the ool-schay.\n, Well, you ..."
3,2,142,2_hes_dead_dude hes_gone,"[hes, dead, dude hes, gone, dude think, guy, l...","[Do you see him?\n, Yeah. Do you know him?\n, ..."
4,3,136,3_dad_mom_mother_leave,"[dad, mom, mother, leave, whos, moms, happen, ...","[Dad.\n, Mom! Dad!\n, Mom! Dad!\n]"
...,...,...,...,...,...
249,248,10,248_havent_hoh_came_took,"[havent, hoh, came, took, listen, stay, start,...",[Will you guys be in our Cirque du Celville?\n...
250,249,10,249_house_room_times_outside,"[house, room, times, outside, food, ive got, h...","[Is this the right house?\n, Oh, come on, Toke..."
251,250,10,250_hurry_weve gotta_quick_weve,"[hurry, weve gotta, quick, weve, wed, course, ...","[You guys, we have to hurry!\n, Hurry up, you ..."
252,251,10,251____,"[, , , , , , , , , ]","[What?\n, What?\n, What??\n]"


## Topics per Class

In [32]:
classes = south_park_data["Character"]

In [33]:
topics_per_class = sp_topic_model.topics_per_class(south_park_docs, classes=classes)

In [34]:
sp_topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=10)

# Bigger Picture



*   BERTopic is a topic modeling technique that uses transformers
*   Mostly it allows for easily interpretable topics
*   BERTopic generates document embedding with pre-trained transformer-based language models.

*   BERTopic supports all kinds of topic modeling techniques. Check out the documentation and find the right settings for your usecase

https://maartengr.github.io/BERTopic/getting_started/quickstart/quickstart.html

https://github.com/MaartenGr/BERTopic