## 📚 Imported Libraries

In [1]:
import pandas as pd
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import matplotlib.pyplot as plt
import os
import logging
import requests
import bs4
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


In [2]:
# Launch the browser
driver = webdriver.Chrome()
driver.get("https://en.wikipedia.org/wiki/Key_events_of_the_20th_century")

In [3]:
WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.CLASS_NAME, "mw-heading4"))
)


<selenium.webdriver.remote.webelement.WebElement (session="2088b1ee5025e46d94a9483d3083ec19", element="f.C4878FB9701FDED1FCB2E33687AE0548.d.6DB2A5C971AF56B87E39F89EDFF7293C.e.114")>

In [4]:
# Find all year/event headings
year_elems = driver.find_elements(By.CLASS_NAME, "mw-heading4")

# Preview the first few
for i in range(5):  # Adjust as needed
    print(year_elems[i].text)

"The war to end all wars": World War I (1914–1918)[edit]
Russian Revolution and communism[edit]
Economic depression[edit]
The rise of dictatorship[edit]
The war in Europe[edit]


In [5]:
# Find the main article container
article = driver.find_element(By.ID, "mw-content-text")

# Then extract only paragraph tags inside the main content
paragraphs = article.find_elements(By.TAG_NAME, "p")

# Combine all paragraph text
text = ""
for p in paragraphs:
    text += p.text + "\n"

# Save it to a clean .txt file
with open("20th_century_events.txt", "w", encoding="utf-8") as file:
    file.write(text)

print("Scraping complete and file saved.")



Scraping complete and file saved.


In [9]:
import re

years = []
for p in paragraphs:
    found_years = re.findall(r"\b(19|20)\d{2}\b", p.text)
    years.extend(found_years)

print(set(years))  # Unique years


{'20', '19'}


In [10]:
links = driver.find_elements(By.TAG_NAME, "a")
for link in links[:5]:
    print(link.get_attribute("href"))


https://en.wikipedia.org/wiki/Key_events_of_the_20th_century#bodyContent
https://en.wikipedia.org/wiki/Main_Page
https://en.wikipedia.org/wiki/Wikipedia:Contents
https://en.wikipedia.org/wiki/Portal:Current_events
https://en.wikipedia.org/wiki/Special:Random


In [11]:
images = driver.find_elements(By.TAG_NAME, "img")
for img in images[:5]:
    print(img.get_attribute("src"))


https://en.wikipedia.org/static/images/icons/wikipedia.png
https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en.svg
https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-en.svg
https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/Ferdinand_Behr_arrested_in_Sarajevo_1914.jpg/250px-Ferdinand_Behr_arrested_in_Sarajevo_1914.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Dissolution_of_Austria-Hungary.png/250px-Dissolution_of_Austria-Hungary.png


In [12]:
from IPython.display import Image, display

# Show top 5 images that are not icons or logos (optional filter)
for img in images:
    src = img.get_attribute("src")
    if src and src.endswith((".jpg", ".jpeg", ".png")) and "commons" in src:
        display(Image(url=src))


In [14]:
# Combine scraped data
output = []

# Headings (h4)
headings = driver.find_elements(By.CLASS_NAME, "mw-heading4")
output.append("== Headings ==")
for h in headings:
    output.append(h.text)

# Paragraphs (p)
paragraphs = driver.find_elements(By.TAG_NAME, "p")
output.append("\n== Paragraphs ==")
for p in paragraphs[:5]:  # Only taking top 5 for now
    output.append(p.text)


# Image URLs
images = driver.find_elements(By.TAG_NAME, "img")
output.append("\n== Image Links ==")
for img in images[:5]:  # Only top 5
    src = img.get_attribute("src")
    if src:
        output.append(src)

# Save all to a text file
with open("20th_century_scrape.txt", "w", encoding="utf-8") as f:
    for line in output:
        f.write(line + "\n")

print("✅ File saved as '20th_century_keyevents.txt'")


✅ File saved as '20th_century_keyevents.txt'


In [4]:
import re
from collections import defaultdict

# Load the article content
with open("20th_century_events.txt", "r", encoding="utf-8") as file:
    text = file.read()

# Create dictionary to group events by year
years_dict = defaultdict(list)

# Regex pattern for 4-digit years
year_pattern = r'\b(1[0-9]{3}|20[0-9]{2})\b'

# Split the text into sentences
sentences = re.split(r'(?<=\.|\?|!)\s+', text)

# Loop through sentences and collect years
for sentence in sentences:
    years_found = re.findall(year_pattern, sentence)
    for year in years_found:
        years_dict[year].append(sentence.strip())

# Display grouped events
for year in sorted(years_dict.keys()):
    print(f"\n🗓 {year}")
    for event in years_dict[year]:
        print(f" - {event}")



🗓 1910
 - The Korean Peninsula was a Japanese colony between 1910 and 1945, when Soviet and American troops invaded and divided it along the 38th parallel.[198] A communist government controlled the territory north of the border and a capitalist one controlled the South, with both authorities considering the other one illegitimate and claiming sovereignty over the entire peninsula.[199] North Korea's invasion of South Korea on 25 June 1950 led to United Nations intervention.[200] General Douglas MacArthur led troops from the United States, Canada, Australia, Great Britain, and other countries in repulsing the Northern invasion.

🗓 1914
 - 1914 saw the completion of the Panama Canal.
 - From 1914 to 1918, the First World War, and its aftermath, caused major changes in the power balance of the world, destroying or transforming some of the most powerful empires.
 - The First World War (or simply WWI), termed "The Great War" by contemporaries, started in July 1914 and ended in November 19

In [1]:
import re

# Let's say this is your raw scraped text
with open('20th_century_events.txt', 'r', encoding='utf-8') as f:
    raw_text = f.read()

# Remove citations like [144], [26], etc.
clean_text = re.sub(r'\[\d+\]', '', raw_text)

# Remove non-ASCII characters (like corrupted characters or special symbols)
clean_text = re.sub(r'[^\x00-\x7F]+', '', clean_text)

# Optional: remove excessive whitespace
clean_text = re.sub(r'\s+', ' ', clean_text).strip()

# Save to a new file (optional)
with open('cleaned_20th_century_events.txt', 'w', encoding='utf-8') as f:
    f.write(clean_text)

print("🧽 Cleaned text saved successfully!")


🧽 Cleaned text saved successfully!


In [2]:
# Load cleaned text
with open("cleaned_20th_century_events.txt", "r") as file:
    cleaned_text = file.read()

# Optional: use regex to split based on sentence enders
import re
sentences = re.split(r'(?<=[.!?])\s+', cleaned_text.strip())

# Save it back with line breaks
with open("formatted_cleaned_20th_century_events.txt", "w") as file:
    for sentence in sentences:
        file.write(sentence.strip() + '\n')


---

## ✨ BONUS TASK: Alternate Country List Scraper from UN Page

To push myself further, I attempted a second scraping approach using the Wikipedia page titled  
**"List of countries by population (United Nations)"**.  
This demonstrates flexibility in working with multiple page structures, such as HTML tables.

**Objective:**  
Scrape country names from a structured table instead of an unordered list (`<ul>`),  
then clean, sort, and present them neatly in both notebook and `.txt` format.

---


In [50]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

# Step 1: Use Wikipedia page
url = "https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Step 2: Find the main table
table = soup.find("table", class_="wikitable")

# Step 3: Extract country names
countries = []
for row in table.find_all("tr")[1:]:
    cols = row.find_all("td")
    if cols:
        country_name = cols[0].text.strip()
        countries.append(country_name)

# Step 4: Clean + sort
countries = sorted(set(countries))

# Step 5: Display in notebook
markdown_output = "### 🌍 List of Scraped Countries:\n" + "\n".join([f"{i+1}. {country}" for i, country in enumerate(countries)])
display(Markdown(markdown_output))

# Save to txt
with open("countries_list.txt", "w") as f:
    for c in countries:
        f.write(c + "\n")


### 🌍 List of Scraped Countries:
1. Afghanistan
2. Albania
3. Algeria
4. American Samoa (United States)
5. Andorra
6. Angola
7. Anguilla (United Kingdom)
8. Antigua and Barbuda
9. Argentina
10. Armenia
11. Aruba (Netherlands)
12. Australia[f]
13. Austria
14. Azerbaijan
15. Bahamas
16. Bahrain
17. Bangladesh
18. Barbados
19. Belarus
20. Belgium
21. Belize
22. Benin
23. Bermuda (United Kingdom)
24. Bhutan
25. Bolivia
26. Bosnia and Herzegovina
27. Botswana
28. Brazil
29. British Virgin Islands (United Kingdom)
30. Brunei
31. Bulgaria
32. Burkina Faso
33. Burundi
34. Cambodia
35. Cameroon
36. Canada
37. Cape Verde
38. Caribbean Netherlands (Netherlands)[v]
39. Cayman Islands (United Kingdom)
40. Central African Republic
41. Chad
42. Chile
43. China[a]
44. Colombia
45. Comoros
46. Congo
47. Cook Islands (New Zealand)
48. Costa Rica
49. Croatia
50. Cuba
51. Curaçao (Netherlands)
52. Cyprus[s]
53. Czechia
54. DR Congo
55. Denmark
56. Djibouti
57. Dominica
58. Dominican Republic
59. Ecuador
60. Egypt
61. El Salvador
62. Equatorial Guinea
63. Eritrea
64. Estonia
65. Eswatini
66. Ethiopia
67. Falkland Islands (United Kingdom)
68. Faroe Islands (Denmark)
69. Fiji
70. Finland[m]
71. France[c]
72. French Guiana (France)
73. French Polynesia (France)
74. Gabon
75. Gambia
76. Georgia[p]
77. Germany
78. Ghana
79. Gibraltar (United Kingdom)
80. Greece
81. Greenland (Denmark)
82. Grenada
83. Guadeloupe (France)
84. Guam (United States)
85. Guatemala
86. Guernsey (United Kingdom)
87. Guinea
88. Guinea-Bissau
89. Guyana
90. Haiti
91. Honduras
92. Hong Kong (China)[k]
93. Hungary
94. Iceland
95. India
96. Indonesia
97. Iran
98. Iraq
99. Ireland
100. Isle of Man (United Kingdom)
101. Israel
102. Italy
103. Ivory Coast
104. Jamaica
105. Japan
106. Jersey (United Kingdom)
107. Jordan
108. Kazakhstan
109. Kenya
110. Kiribati
111. Kosovo[r]
112. Kuwait
113. Kyrgyzstan
114. Laos
115. Latvia
116. Lebanon
117. Lesotho
118. Liberia
119. Libya
120. Liechtenstein
121. Lithuania
122. Luxembourg
123. Macao (China)[t]
124. Madagascar
125. Malawi
126. Malaysia
127. Maldives
128. Mali
129. Malta
130. Marshall Islands
131. Martinique (France)
132. Mauritania
133. Mauritius
134. Mayotte (France)
135. Mexico
136. Micronesia
137. Moldova[q]
138. Monaco
139. Mongolia
140. Montenegro
141. Montserrat (United Kingdom)
142. Morocco
143. Mozambique
144. Myanmar
145. Namibia
146. Nauru
147. Nepal
148. Netherlands[i]
149. New Caledonia (France)
150. New Zealand
151. Nicaragua
152. Niger
153. Nigeria
154. Niue (New Zealand)
155. North Korea
156. North Macedonia
157. Northern Mariana Islands (United States)
158. Norway[n]
159. Oman
160. Pakistan
161. Palau
162. Palestine[o]
163. Panama
164. Papua New Guinea
165. Paraguay
166. Peru
167. Philippines
168. Poland
169. Portugal[j]
170. Puerto Rico (United States)
171. Qatar
172. Romania
173. Russia
174. Rwanda
175. Réunion (France)
176. Saint Barthélemy (France)
177. Saint Helena (United Kingdom)[w]
178. Saint Kitts and Nevis
179. Saint Lucia
180. Saint Martin (France)
181. Saint Pierre and Miquelon (France)
182. Saint Vincent and the Grenadines
183. Samoa
184. San Marino
185. Saudi Arabia
186. Senegal
187. Serbia[l]
188. Seychelles
189. Sierra Leone
190. Singapore
191. Sint Maarten (Netherlands)
192. Slovakia
193. Slovenia
194. Solomon Islands
195. Somalia[h]
196. South Africa
197. South Korea
198. South Sudan
199. Spain[d]
200. Sri Lanka
201. Sudan
202. Suriname
203. Sweden
204. Switzerland
205. Syria
206. São Tomé and Príncipe
207. Taiwan[g]
208. Tajikistan
209. Tanzania[b]
210. Thailand
211. Timor-Leste
212. Togo
213. Tokelau (New Zealand)
214. Tonga
215. Trinidad and Tobago
216. Tunisia
217. Turkey
218. Turkmenistan
219. Turks and Caicos Islands (United Kingdom)
220. Tuvalu
221. U.S. Virgin Islands (United States)
222. Uganda
223. Ukraine[e]
224. United Arab Emirates
225. United Kingdom
226. United States
227. Uruguay
228. Uzbekistan
229. Vanuatu
230. Vatican City[x]
231. Venezuela
232. Vietnam
233. Wallis and Futuna (France)
234. Western Sahara (disputed)[u]
235. World
236. Yemen
237. Zambia
238. Zimbabwe

In [32]:
import re

# Clean out footnote markers like [f]
countries = [re.sub(r"\[.*?\]", "", c).strip() for c in countries]


In [33]:
import re

cleaned_countries = []
for c in countries:
    c = re.sub(r"\[.*?\]", "", c)  # remove footnote markers like [f], [note 1], etc.
    c = re.sub(r"\s*\(.*?\)", "", c)  # remove parentheses like (United Kingdom)
    c = re.sub(r"\s{2,}", " ", c)  # reduce multiple spaces to single space
    c = c.strip()  # remove leading/trailing spaces
    if c and c[0].isalpha() and len(c) > 2:  # skip any remaining junk
        cleaned_countries.append(c)

countries = sorted(set(cleaned_countries))


In [34]:
import re

cleaned_countries = []
for c in countries:
    c = re.sub(r"\[.*?\]", "", c)         # Remove things like [w], [d], etc.
    c = re.sub(r"\s*\(.*?\)", "", c)      # Remove (France), (UK), etc.
    c = re.sub(r"\s{2,}", " ", c)         # Remove double spaces
    c = c.strip()
    if c and c[0].isalpha() and len(c) > 2:
        cleaned_countries.append(c)

countries = sorted(set(cleaned_countries))


In [35]:
import re

# Full clean-up
cleaned_countries = []
for c in countries:
    c = re.sub(r"\[.*?\]", "", c)              # Remove [w], [d], [e], etc.
    c = re.sub(r"\(.*?\)", "", c)              # Remove (France), (disputed), etc.
    c = re.sub(r"\s{2,}", " ", c)              # Remove double spaces
    c = c.strip()                              # Remove leading/trailing whitespace
    if c and c[0].isalpha() and len(c) > 2:    # Keep valid names
        cleaned_countries.append(c)

countries = sorted(set(cleaned_countries))     # Deduplicate and sort


In [41]:
# Step 5: Display nicely
markdown_output = "### 🌍 List of Scraped Countries:\n\n"
markdown_output += "\n".join([f"&nbsp;&nbsp;&nbsp;{i+1}. {country}" for i, country in enumerate(countries)])
display(Markdown(markdown_output))


### 🌍 List of Scraped Countries:

&nbsp;&nbsp;&nbsp;1. Afghanistan
&nbsp;&nbsp;&nbsp;2. Albania
&nbsp;&nbsp;&nbsp;3. Algeria
&nbsp;&nbsp;&nbsp;4. American Samoa
&nbsp;&nbsp;&nbsp;5. Andorra
&nbsp;&nbsp;&nbsp;6. Angola
&nbsp;&nbsp;&nbsp;7. Anguilla
&nbsp;&nbsp;&nbsp;8. Antigua and Barbuda
&nbsp;&nbsp;&nbsp;9. Argentina
&nbsp;&nbsp;&nbsp;10. Armenia
&nbsp;&nbsp;&nbsp;11. Aruba
&nbsp;&nbsp;&nbsp;12. Australia
&nbsp;&nbsp;&nbsp;13. Austria
&nbsp;&nbsp;&nbsp;14. Azerbaijan
&nbsp;&nbsp;&nbsp;15. Bahamas
&nbsp;&nbsp;&nbsp;16. Bahrain
&nbsp;&nbsp;&nbsp;17. Bangladesh
&nbsp;&nbsp;&nbsp;18. Barbados
&nbsp;&nbsp;&nbsp;19. Belarus
&nbsp;&nbsp;&nbsp;20. Belgium
&nbsp;&nbsp;&nbsp;21. Belize
&nbsp;&nbsp;&nbsp;22. Benin
&nbsp;&nbsp;&nbsp;23. Bermuda
&nbsp;&nbsp;&nbsp;24. Bhutan
&nbsp;&nbsp;&nbsp;25. Bolivia
&nbsp;&nbsp;&nbsp;26. Bosnia and Herzegovina
&nbsp;&nbsp;&nbsp;27. Botswana
&nbsp;&nbsp;&nbsp;28. Brazil
&nbsp;&nbsp;&nbsp;29. British Virgin Islands
&nbsp;&nbsp;&nbsp;30. Brunei
&nbsp;&nbsp;&nbsp;31. Bulgaria
&nbsp;&nbsp;&nbsp;32. Burkina Faso
&nbsp;&nbsp;&nbsp;33. Burundi
&nbsp;&nbsp;&nbsp;34. Cambodia
&nbsp;&nbsp;&nbsp;35. Cameroon
&nbsp;&nbsp;&nbsp;36. Canada
&nbsp;&nbsp;&nbsp;37. Cape Verde
&nbsp;&nbsp;&nbsp;38. Caribbean Netherlands
&nbsp;&nbsp;&nbsp;39. Cayman Islands
&nbsp;&nbsp;&nbsp;40. Central African Republic
&nbsp;&nbsp;&nbsp;41. Chad
&nbsp;&nbsp;&nbsp;42. Chile
&nbsp;&nbsp;&nbsp;43. China
&nbsp;&nbsp;&nbsp;44. Colombia
&nbsp;&nbsp;&nbsp;45. Comoros
&nbsp;&nbsp;&nbsp;46. Congo
&nbsp;&nbsp;&nbsp;47. Cook Islands
&nbsp;&nbsp;&nbsp;48. Costa Rica
&nbsp;&nbsp;&nbsp;49. Croatia
&nbsp;&nbsp;&nbsp;50. Cuba
&nbsp;&nbsp;&nbsp;51. Curaçao
&nbsp;&nbsp;&nbsp;52. Cyprus
&nbsp;&nbsp;&nbsp;53. Czechia
&nbsp;&nbsp;&nbsp;54. DR Congo
&nbsp;&nbsp;&nbsp;55. Denmark
&nbsp;&nbsp;&nbsp;56. Djibouti
&nbsp;&nbsp;&nbsp;57. Dominica
&nbsp;&nbsp;&nbsp;58. Dominican Republic
&nbsp;&nbsp;&nbsp;59. Ecuador
&nbsp;&nbsp;&nbsp;60. Egypt
&nbsp;&nbsp;&nbsp;61. El Salvador
&nbsp;&nbsp;&nbsp;62. Equatorial Guinea
&nbsp;&nbsp;&nbsp;63. Eritrea
&nbsp;&nbsp;&nbsp;64. Estonia
&nbsp;&nbsp;&nbsp;65. Eswatini
&nbsp;&nbsp;&nbsp;66. Ethiopia
&nbsp;&nbsp;&nbsp;67. Falkland Islands
&nbsp;&nbsp;&nbsp;68. Faroe Islands
&nbsp;&nbsp;&nbsp;69. Fiji
&nbsp;&nbsp;&nbsp;70. Finland
&nbsp;&nbsp;&nbsp;71. France
&nbsp;&nbsp;&nbsp;72. French Guiana
&nbsp;&nbsp;&nbsp;73. French Polynesia
&nbsp;&nbsp;&nbsp;74. Gabon
&nbsp;&nbsp;&nbsp;75. Gambia
&nbsp;&nbsp;&nbsp;76. Georgia
&nbsp;&nbsp;&nbsp;77. Germany
&nbsp;&nbsp;&nbsp;78. Ghana
&nbsp;&nbsp;&nbsp;79. Gibraltar
&nbsp;&nbsp;&nbsp;80. Greece
&nbsp;&nbsp;&nbsp;81. Greenland
&nbsp;&nbsp;&nbsp;82. Grenada
&nbsp;&nbsp;&nbsp;83. Guadeloupe
&nbsp;&nbsp;&nbsp;84. Guam
&nbsp;&nbsp;&nbsp;85. Guatemala
&nbsp;&nbsp;&nbsp;86. Guernsey
&nbsp;&nbsp;&nbsp;87. Guinea
&nbsp;&nbsp;&nbsp;88. Guinea-Bissau
&nbsp;&nbsp;&nbsp;89. Guyana
&nbsp;&nbsp;&nbsp;90. Haiti
&nbsp;&nbsp;&nbsp;91. Honduras
&nbsp;&nbsp;&nbsp;92. Hong Kong
&nbsp;&nbsp;&nbsp;93. Hungary
&nbsp;&nbsp;&nbsp;94. Iceland
&nbsp;&nbsp;&nbsp;95. India
&nbsp;&nbsp;&nbsp;96. Indonesia
&nbsp;&nbsp;&nbsp;97. Iran
&nbsp;&nbsp;&nbsp;98. Iraq
&nbsp;&nbsp;&nbsp;99. Ireland
&nbsp;&nbsp;&nbsp;100. Isle of Man
&nbsp;&nbsp;&nbsp;101. Israel
&nbsp;&nbsp;&nbsp;102. Italy
&nbsp;&nbsp;&nbsp;103. Ivory Coast
&nbsp;&nbsp;&nbsp;104. Jamaica
&nbsp;&nbsp;&nbsp;105. Japan
&nbsp;&nbsp;&nbsp;106. Jersey
&nbsp;&nbsp;&nbsp;107. Jordan
&nbsp;&nbsp;&nbsp;108. Kazakhstan
&nbsp;&nbsp;&nbsp;109. Kenya
&nbsp;&nbsp;&nbsp;110. Kiribati
&nbsp;&nbsp;&nbsp;111. Kosovo
&nbsp;&nbsp;&nbsp;112. Kuwait
&nbsp;&nbsp;&nbsp;113. Kyrgyzstan
&nbsp;&nbsp;&nbsp;114. Laos
&nbsp;&nbsp;&nbsp;115. Latvia
&nbsp;&nbsp;&nbsp;116. Lebanon
&nbsp;&nbsp;&nbsp;117. Lesotho
&nbsp;&nbsp;&nbsp;118. Liberia
&nbsp;&nbsp;&nbsp;119. Libya
&nbsp;&nbsp;&nbsp;120. Liechtenstein
&nbsp;&nbsp;&nbsp;121. Lithuania
&nbsp;&nbsp;&nbsp;122. Luxembourg
&nbsp;&nbsp;&nbsp;123. Macao
&nbsp;&nbsp;&nbsp;124. Madagascar
&nbsp;&nbsp;&nbsp;125. Malawi
&nbsp;&nbsp;&nbsp;126. Malaysia
&nbsp;&nbsp;&nbsp;127. Maldives
&nbsp;&nbsp;&nbsp;128. Mali
&nbsp;&nbsp;&nbsp;129. Malta
&nbsp;&nbsp;&nbsp;130. Marshall Islands
&nbsp;&nbsp;&nbsp;131. Martinique
&nbsp;&nbsp;&nbsp;132. Mauritania
&nbsp;&nbsp;&nbsp;133. Mauritius
&nbsp;&nbsp;&nbsp;134. Mayotte
&nbsp;&nbsp;&nbsp;135. Mexico
&nbsp;&nbsp;&nbsp;136. Micronesia
&nbsp;&nbsp;&nbsp;137. Moldova
&nbsp;&nbsp;&nbsp;138. Monaco
&nbsp;&nbsp;&nbsp;139. Mongolia
&nbsp;&nbsp;&nbsp;140. Montenegro
&nbsp;&nbsp;&nbsp;141. Montserrat
&nbsp;&nbsp;&nbsp;142. Morocco
&nbsp;&nbsp;&nbsp;143. Mozambique
&nbsp;&nbsp;&nbsp;144. Myanmar
&nbsp;&nbsp;&nbsp;145. Namibia
&nbsp;&nbsp;&nbsp;146. Nauru
&nbsp;&nbsp;&nbsp;147. Nepal
&nbsp;&nbsp;&nbsp;148. Netherlands
&nbsp;&nbsp;&nbsp;149. New Caledonia
&nbsp;&nbsp;&nbsp;150. New Zealand
&nbsp;&nbsp;&nbsp;151. Nicaragua
&nbsp;&nbsp;&nbsp;152. Niger
&nbsp;&nbsp;&nbsp;153. Nigeria
&nbsp;&nbsp;&nbsp;154. Niue
&nbsp;&nbsp;&nbsp;155. North Korea
&nbsp;&nbsp;&nbsp;156. North Macedonia
&nbsp;&nbsp;&nbsp;157. Northern Mariana Islands
&nbsp;&nbsp;&nbsp;158. Norway
&nbsp;&nbsp;&nbsp;159. Oman
&nbsp;&nbsp;&nbsp;160. Pakistan
&nbsp;&nbsp;&nbsp;161. Palau
&nbsp;&nbsp;&nbsp;162. Palestine
&nbsp;&nbsp;&nbsp;163. Panama
&nbsp;&nbsp;&nbsp;164. Papua New Guinea
&nbsp;&nbsp;&nbsp;165. Paraguay
&nbsp;&nbsp;&nbsp;166. Peru
&nbsp;&nbsp;&nbsp;167. Philippines
&nbsp;&nbsp;&nbsp;168. Poland
&nbsp;&nbsp;&nbsp;169. Portugal
&nbsp;&nbsp;&nbsp;170. Puerto Rico
&nbsp;&nbsp;&nbsp;171. Qatar
&nbsp;&nbsp;&nbsp;172. Romania
&nbsp;&nbsp;&nbsp;173. Russia
&nbsp;&nbsp;&nbsp;174. Rwanda
&nbsp;&nbsp;&nbsp;175. Réunion
&nbsp;&nbsp;&nbsp;176. Saint Barthélemy
&nbsp;&nbsp;&nbsp;177. Saint Helena
&nbsp;&nbsp;&nbsp;178. Saint Kitts and Nevis
&nbsp;&nbsp;&nbsp;179. Saint Lucia
&nbsp;&nbsp;&nbsp;180. Saint Martin
&nbsp;&nbsp;&nbsp;181. Saint Pierre and Miquelon
&nbsp;&nbsp;&nbsp;182. Saint Vincent and the Grenadines
&nbsp;&nbsp;&nbsp;183. Samoa
&nbsp;&nbsp;&nbsp;184. San Marino
&nbsp;&nbsp;&nbsp;185. Saudi Arabia
&nbsp;&nbsp;&nbsp;186. Senegal
&nbsp;&nbsp;&nbsp;187. Serbia
&nbsp;&nbsp;&nbsp;188. Seychelles
&nbsp;&nbsp;&nbsp;189. Sierra Leone
&nbsp;&nbsp;&nbsp;190. Singapore
&nbsp;&nbsp;&nbsp;191. Sint Maarten
&nbsp;&nbsp;&nbsp;192. Slovakia
&nbsp;&nbsp;&nbsp;193. Slovenia
&nbsp;&nbsp;&nbsp;194. Solomon Islands
&nbsp;&nbsp;&nbsp;195. Somalia
&nbsp;&nbsp;&nbsp;196. South Africa
&nbsp;&nbsp;&nbsp;197. South Korea
&nbsp;&nbsp;&nbsp;198. South Sudan
&nbsp;&nbsp;&nbsp;199. Spain
&nbsp;&nbsp;&nbsp;200. Sri Lanka
&nbsp;&nbsp;&nbsp;201. Sudan
&nbsp;&nbsp;&nbsp;202. Suriname
&nbsp;&nbsp;&nbsp;203. Sweden
&nbsp;&nbsp;&nbsp;204. Switzerland
&nbsp;&nbsp;&nbsp;205. Syria
&nbsp;&nbsp;&nbsp;206. São Tomé and Príncipe
&nbsp;&nbsp;&nbsp;207. Taiwan
&nbsp;&nbsp;&nbsp;208. Tajikistan
&nbsp;&nbsp;&nbsp;209. Tanzania
&nbsp;&nbsp;&nbsp;210. Thailand
&nbsp;&nbsp;&nbsp;211. Timor-Leste
&nbsp;&nbsp;&nbsp;212. Togo
&nbsp;&nbsp;&nbsp;213. Tokelau
&nbsp;&nbsp;&nbsp;214. Tonga
&nbsp;&nbsp;&nbsp;215. Trinidad and Tobago
&nbsp;&nbsp;&nbsp;216. Tunisia
&nbsp;&nbsp;&nbsp;217. Turkey
&nbsp;&nbsp;&nbsp;218. Turkmenistan
&nbsp;&nbsp;&nbsp;219. Turks and Caicos Islands
&nbsp;&nbsp;&nbsp;220. Tuvalu
&nbsp;&nbsp;&nbsp;221. U.S. Virgin Islands
&nbsp;&nbsp;&nbsp;222. Uganda
&nbsp;&nbsp;&nbsp;223. Ukraine
&nbsp;&nbsp;&nbsp;224. United Arab Emirates
&nbsp;&nbsp;&nbsp;225. United Kingdom
&nbsp;&nbsp;&nbsp;226. United States
&nbsp;&nbsp;&nbsp;227. Uruguay
&nbsp;&nbsp;&nbsp;228. Uzbekistan
&nbsp;&nbsp;&nbsp;229. Vanuatu
&nbsp;&nbsp;&nbsp;230. Vatican City
&nbsp;&nbsp;&nbsp;231. Venezuela
&nbsp;&nbsp;&nbsp;232. Vietnam
&nbsp;&nbsp;&nbsp;233. Wallis and Futuna
&nbsp;&nbsp;&nbsp;234. Western Sahara
&nbsp;&nbsp;&nbsp;235. World
&nbsp;&nbsp;&nbsp;236. Yemen
&nbsp;&nbsp;&nbsp;237. Zambia
&nbsp;&nbsp;&nbsp;238. Zimbabwe

In [45]:
# Save the cleaned country list to a .txt file
with open("countries_list.txt", "w") as f:
    for i, country in enumerate(countries):
       f.write(f"{country}\n")



In [46]:
# Step 4: Clean and sort
countries = sorted(set(countries))

# Step 6: Save to txt
with open("countries_list.txt", "w") as f:
    for i, country in enumerate(countries):
        f.write(f"{country}\n")



In [48]:
import re

# Step 1: Remove unwanted text like brackets and notes [a], (France), etc.
cleaned_countries = []
for c in countries:
    # Remove footnotes like [a], [f], [w], [x], etc.
    c = re.sub(r"\[[^\]]*\]", "", c)
    # Optional: remove info in parentheses like (France), (UK) if needed
    # c = re.sub(r"\([^)]*\)", "", c)
    c = c.strip()
    if c and c.lower() != 'world':  # optional: remove "World"
        cleaned_countries.append(c)

# Step 2: Deduplicate and sort
cleaned_countries = sorted(set(cleaned_countries))

# Step 3: Save clean list to txt
with open("countries_list.txt", "w") as f:
    for i, country in enumerate(cleaned_countries):
        f.write(f"{country}\n")



### ✅ Bonus Task: Exporting Cleaned Country List

As part of the bonus task, I created and exported a cleaned list of country names into a `.txt` file named `countries_list.txt`.

**What I did:**
- ✅ Removed reference letters (e.g., `[g]`, `[w]`, `[b]`, etc.)
- ✅ Stripped away suffixes like `(France)`, `(United Kingdom)`, etc.
- ✅ Sorted the countries alphabetically
- ✅ Removed duplicates and formatting issues
- ✅ Applied clean numbering without visual overlap

The final list includes **238 countries** and is:
- Neatly numbered  
- Stored in a text file  
- Displayed properly in the notebook as well

This cleaned `.txt` file is now ready for use in any future project or visualization task. 📄🌍
