In [25]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import pandas as pd

In [26]:
# Text extracted from the webpage
text = """
Zimbabwe and several other Southern African countries are expected to suffer drought conditions in the annual rain season from October 2023 to April 2024, which coincides with the regional summer cropping season, according weather scientists. The forecast drought conditions are traced to a weather pattern called El Niño, which has traditionally badly affected farming production in Zimbabwe.

El Niño refers to a cycle of warming and cooling events that happens along the equator in the Pacific Ocean leading to an increase in sea surface temperatures across the Pacific. The warming phase of the phenomenon called El Niño Southern Oscillation (ENSO) stimulates drought conditions. ENSO creates both dry and hot conditions that negatively affect food crops. The cooling part of the cycle is called La Niña and has the opposite effect.

The World Meteorological Organization (WMO) says that El Niño conditions have developed in the tropical Pacific for the first time in seven years, setting the stage for a likely surge in global temperatures and disruptive weather and climate patterns. The WMO statement came on the eve of Zimbabwe’s summer cropping season.

It causes drought and heatwaves, affects water supply for domestic, animal and industrial use and hits farming output for rain-fed agriculture.

The El Niño poses a threat to the agricultural livelihoods of millions of people globally. In Southern Africa and Zimbabwe in particular, the impacts of El Niño have been felt across all sectors affecting the most vulnerable communities.

El Niño and La Niña events happen every two to seven years, on average, but they don’t occur on a regular schedule.

They usually last for 9-12 months but have been known to last for several years at a time.

El Niño affects weather and storm patterns in different parts of the world.

El Nino come in different varieties (no two El Niño events are exactly alike in intensity).

In February 2016, former Zimbabwe President Robert Mugabe declared “ A state of disaster” following a drought triggered by El Niño, which left 2.44 million people struggling for food. The following was the recorded impact:

Some 75 percent of Zimbabwe received less than normal rainfall.

Severe livestock deaths – 17,000 were recorded in 2016.

Crop failure, yield reduction leading to drought.

Grain shortfall of 1.5 million tonnes.

Increased household food insecurity as a result of loss of income. At least 70 percent of food production depends on peasant agriculture, with a majority of farmers in this category having no access to water for irrigation purposes.

Outbreak of water borne diseases as people access water from insecure sources.

Deepening poverty.

Immediately commit resources to fund early action.

Initiate collaborative action between private and public actors to mitigate the socio-economic and environmental risk posed by El Niño.

Facilitate awareness campaigns to educate the public on the strategies to mitigate the impact of El Niño.

Stepping up investment in resilience building for sustainable rural agriculture in Southern Africa.

Upscaling social protection, adopting SMART agriculture techniques, climate change mitigation through smart livelihoods options, and management of natural resources.

The rural communities should make use of indigenous knowledge to enhance and ensure crop and livestock production.

Put in place resilient water management systems.

Put vulnerable groups at the centre of the design and implementation of anticipatory action.

Learn modern agricultural practices.

Use drought resistant varieties.

Do due diligence by seeking expert advice on the best crop choice to plant during El Niño.
"""

In [20]:
# Initialize and fit the TfidfVectorizer object
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([text])


In [21]:
# Get feature names using get_feature_names()
feature_names = tfidf_vectorizer.get_feature_names()
print(feature_names)

['000', '12', '17', '2016', '2023', '2024', '44', '70', '75', 'access', 'according', 'across', 'action', 'actors', 'adopting', 'advice', 'affect', 'affected', 'affecting', 'affects', 'africa', 'african', 'agricultural', 'agriculture', 'alike', 'all', 'along', 'an', 'and', 'animal', 'annual', 'anticipatory', 'april', 'are', 'as', 'at', 'average', 'awareness', 'badly', 'been', 'best', 'between', 'borne', 'both', 'building', 'but', 'by', 'called', 'came', 'campaigns', 'category', 'causes', 'centre', 'change', 'choice', 'climate', 'coincides', 'collaborative', 'come', 'commit', 'communities', 'conditions', 'cooling', 'countries', 'creates', 'crop', 'cropping', 'crops', 'cycle', 'deaths', 'declared', 'deepening', 'depends', 'design', 'developed', 'different', 'diligence', 'disaster', 'diseases', 'disruptive', 'do', 'domestic', 'don', 'drought', 'dry', 'due', 'during', 'early', 'economic', 'educate', 'effect', 'el', 'enhance', 'enso', 'ensure', 'environmental', 'equator', 'eve', 'events', 'e

In [27]:
# Create TF-IDF representation
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([text])

In [31]:
# Convert TF-IDF matrix to DataFrame (Ensure this line is executed)
tfidf_df = pd.DataFrame(tfidf_matrix.toarray(), columns=tfidf_vectorizer.get_feature_names())


In [32]:
# Display TF-IDF results
print("\nTF-IDF Representation:\n")
print(tfidf_df.head())


TF-IDF Representation:

        000        12        17      2016      2023      2024        44  \
0  0.016616  0.016616  0.016616  0.033232  0.016616  0.016616  0.016616   

         70        75    access  ...    water   weather      were     which  \
0  0.016616  0.016616  0.033232  ...  0.08308  0.066464  0.016616  0.049848   

       with       wmo     world     years     yield  zimbabwe  
0  0.033232  0.033232  0.033232  0.049848  0.016616  0.099696  

[1 rows x 302 columns]


In [33]:
# Create Bag of Words representation
bow_vectorizer = CountVectorizer()
bow_matrix = bow_vectorizer.fit_transform([text])

In [35]:
# Convert BoW matrix to DataFrame
bow_df = pd.DataFrame(bow_matrix.toarray(), columns=bow_vectorizer.get_feature_names())

In [36]:
# Display Bag of Words results
print("\nBag of Words Representation:\n")
print(bow_df.head())


Bag of Words Representation:

   000  12  17  2016  2023  2024  44  70  75  access  ...  water  weather  \
0    1   1   1     2     1     1   1   1   1       2  ...      5        4   

   were  which  with  wmo  world  years  yield  zimbabwe  
0     1      3     2    2      2      3      1         6  

[1 rows x 302 columns]


# Explanation


TF-IDF Representation

    TF-IDF (Term Frequency-Inverse Document Frequency) measures how important a word is within a document relative to its frequency across all documents.

    Each word is assigned a weight based on its importance:

        High weight: Rare words that appear frequently within this document.

        Low weight: Common words that appear across many documents (e.g., "the", "and").


Bag of Words Representation

    Bag of Words counts occurrences of words within a document without considering their importance or context.

    Each word is represented by its frequency:

        Example: If "zimbabwe" appears six times, its count is 6.


# Conclusion


    TF-IDF is suitable for identifying important terms related to climate change impacts (e.g., "drought", "El Niño").

    Bag of Words is useful for analyzing raw word frequencies without weighting their importance.
    Both methods provide valuable insights into text data extracted from SCM-related contexts like climate change impacts on agriculture in Zimbabwe.
