## Importation des données

Après avoir tenter d'essayer d'identifier les entreprises dans les textes originaux, nous nous sommes confronté à de nombreux problèmes. Suite à cela, et avec l'accord de notre encadrant Stat_app, nous avons fait le choix d'introduire le nom de certaines entreprises dans les articles originaux. Ce notebook vise à appliquer cette démarche

In [3]:
import pandas as pd

In [4]:
# Charger à partir du fichier pickle
data = pd.read_pickle('data.pkl')
data.head(5)

Unnamed: 0,Article,Date,Auteur,Nombre de mots,Journal,Titre,ID
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z


## Nettoyage des articles

Maintenant que nous avons les informations sur le texte (Date, Auteur, Nombre de mots etc...) Nous pouvons nous permettre de ne garder uniquement le coeur de l'article :

In [5]:
data.insert(1, 'Copy_Article', data['Article'])
data.head(5)

Unnamed: 0,Article,Copy_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,\n\nMagazine Desk; SECTMK\nTalking During Movi...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z


Dans un premier temps, on supprime l'ID de l'article, qui se situe à la fin du texte

In [6]:
import re

# Fonction pour supprimer le texte après le motif spécifié
def supprimer_texte_apres_motif(article, motifs):
    motif = "|".join(motifs)  # Concaténer les motifs en une seule chaîne de caractères
    match = re.search(motif, article)
    if match:
        return article[:match.start()]
    else:
        return article

# Appliquer la fonction supprimer_texte_apres_motif à la colonne 'Coeur_Article' avec une liste de motifs
data['Copy_Article'] = data['Copy_Article'].apply(lambda x: supprimer_texte_apres_motif(x, ["Document J\d+", "Document NYTF\d+"]))

Vérification :

In [7]:
for i in range(1,5):
    print("Derniers caractères du", f"text_{i}", "avant suppression\n\n", data['Article'][i][-50:-1])
    print("Derniers caractères du", f"text_{i}", "après suppression\n\n", data['Copy_Article'][i][-50:-1])
    print("-----------------------------------------------------------------------------")

Derniers caractères du text_1 avant suppression

 M21, MM22. 

Document NYTF000020231231ejcv0006h


Derniers caractères du text_1 après suppression

 cle appeared in print on page MM20, MM21, MM22. 

-----------------------------------------------------------------------------
Derniers caractères du text_2 avant suppression

 K4, MK5. 

Document NYTF000020231231ejcv00064
 


Derniers caractères du text_2 après suppression

 his article appeared in print on page MK4, MK5. 

-----------------------------------------------------------------------------
Derniers caractères du text_3 avant suppression

 age MK3. 

Document NYTF000020231231ejcv00063
 


Derniers caractères du text_3 après suppression

 o.

This article appeared in print on page MK3. 

-----------------------------------------------------------------------------
Derniers caractères du text_4 avant suppression

 ge MK11. 

Document NYTF000020231231ejcv0005z
 


Derniers caractères du text_4 après suppression

 Š

This article 

On supprime maintenant tout ce qui est placé avant "All Rights Reserved.", qui correspond à la partie d'information du texte (Auteur etc...)

In [8]:
# Fonction pour supprimer le texte après le motif spécifié
def supprimer_texte_avant_motif(article, motif):
    match = re.search(motif, article)
    if match:
        return article[match.end():]
    else:
        return article

# Appliquer la fonction supprimer_texte_apres_motif à la colonne 'Coeur_Article' avec une liste de motifs
data['Copy_Article'] = data['Copy_Article'].apply(lambda x: supprimer_texte_avant_motif(x, "All Rights Reserved."))

Vérification :

In [9]:
for i in range(1,3):
    print("Derniers caractères du", f"text_{i}", "avant suppression\n\n", data['Article'][i][0:100],"\n\n\n")
    print("Derniers caractères du", f"text_{i}", "après suppression\n\n", data['Copy_Article'][i][0:100])
    print("-----------------------------------------------------------------------------")

Derniers caractères du text_1 avant suppression

 

Magazine Desk; SECTMM
When Jim Brown and Raquel Welch, Two Sexy Stars, Crossed Paths

By Wesley Mo 



Derniers caractères du text_1 après suppression

  

In their one movie together, their chemistry was radical.

Jim Brown & Raquel Welch B. 1936 and 1
-----------------------------------------------------------------------------
Derniers caractères du text_2 avant suppression

 

Magazine Desk; SECTMK
Talking During Movies: Totally Evil or Part of the Fun?

179 words
31 Decemb 



Derniers caractères du text_2 après suppression

  

debatethis

Talking during movies: Totally evil or part of the fun?

show of hands

The biggest p
-----------------------------------------------------------------------------


In [10]:
data.rename(columns={'Copy_Article': 'Coeur_Article'}, inplace=True)
data.head(5)

Unnamed: 0,Article,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,"\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z


## Modification des articles

### Fonction d'insertion de chaines de charactères dans un article

In [11]:
import random

def insertion_phrase_dans_article(phrase, article):
    # Trouver tous les emplacements des points dans l'article
    emplacements_points = [i for i, char in enumerate(article) if char == '.']

    # Vérifier s'il y a des points dans l'article
    if emplacements_points:
        # Choisir aléatoirement l'un des emplacements des points
        indice_insertion = random.choice(emplacements_points)
        # Insérer la phrase juste après le point choisi
        article = article[:indice_insertion+1] + " " + phrase + article[indice_insertion+1:]
    else:
        # S'il n'y a pas de point, insérer la phrase au début de l'article
        article = phrase + " " + article
    return article

Vérification :

In [12]:
phrase_a_inserer = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA."
article_insertion = data['Coeur_Article'][3]
article_insertion = insertion_phrase_dans_article(phrase_a_inserer, article_insertion)

# Afficher l'article avec la phrase insérée
print(article_insertion)

 

LET KIDS

VOTE!

by Katherine Cusumano

Julia Rottenberg, 17, spent the fall of last year knocking on doors. On Election Day 2022, people in Culver City, Calif., her hometown, would have a big decision to make: Should the voting age for local elections change from 18 to 16? Julia wanted them to vote yes. ''I think a vote is one of the most direct ways that you can express an opinion and actually have some change happen,'' says Julia, who is part of an organization called Vote16 Culver City.

What's the argument for giving kids the vote? Well, as you may have noticed, there are a lot of decisions being made (or not) about things that affect kids directly, like climate change or gun violence or school resources. Yes, young people are already leading political movements around these issues. AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA. But without the vote, they can't elect politicians who represe

### Méthode 1 :

Dans cette méthode, on utilise le dictionnaire environnemental anglais, à partir duquel on créer un série de phrases à trou. Chaque phrases sera aléatoirement rempli par le nom d'entreprises, ainsi que des termes du dictionnaires. On pourra retrouver des phrases au sentiment positif, négatif, ou neutre d'un point de vue environnemental. Pour chaque entreprise, on génère un nombre aléatoire de phrases la concernant. Toutes ces phrases (ne traitant que d'une seule entreprise) sont ensuite insérés dans un seul et unique article. Ainsi : 1 Article = 1 entreprise identifiée.

#### Création des phrases à insérer

In [13]:
Dico_env_en = {
    
    "clean": 1,
    "ecological": 1,
    "sustainable": 1,
    "green": 1,
    "energy-efficient": 1,
    "renewable": 1,
    "responsible": 1,
    "conservation": 1,
    "biodiversity": 1,
    "healthy": 1,
    "organic": 1,
    "eco-friendly": 1,
    "environmentally friendly": 1,
    "efficient": 1,
    "innovative": 1,
    "ethical": 1,
    "fair": 1,
    "efficiency": 1,
    "social responsibility": 1,
    "sustainable": 1,
    "solidarity": 1,
    "conscious spreading": 1,
    "sustainable": 1,
    "clean energy": 1,
    "renewable energy": 1,
    "recycling": 1,
    "energy efficiency": 1,
    "circular economy": 1,
    "solar energy": 1,
    "wind energy": 1,
    "regeneration": 1,
    "preservation": 1,
    "restoration": 1,
    "rehabilitation": 1,
    "recovery": 1,
    "restorer": 1,
    "regenerator": 1,
    "revitalization": 1,
    "positive": 1,
    "beneficial": 1,
    "valorization": 1,
    "fulfillment": 1,
    "continuous improvement": 1,
    "prosperity": 1,
    "harmony": 1,
    "integrity": 1,
    "responsible consumption": 1,
    "eco-responsible": 1,
    "eco-conscious": 1,
    "sustainability": 1,
    "recoverable": 1,
    "green energy": 1,
    "greenhouse effect": 1,
    "eco-efficient": 1,
    "eco-innovation": 1,
    "well-being": 1,
    "eco-design": 1,
    "agroecology": 1,
    "permaculture": 1,
    "eco-citizen": 1,
    "carbon neutral": 1,
    "zero waste": 1,
    "organic": 1,
    "eco-label": 1,
    "sustainable mobility": 1,
    "eco-tourism": 1,
    "eco-habitat": 1,
    "conscious consumption": 1,
    
    "pollution": -1,
    "waste": -1,
    "deforestation": -1,
    "greenhouse gas emissions": -1,
    "contamination": -1,
    "destructive": -1,
    "irresponsible": -1,
    "wasteful": -1,
    "harmful": -1,
    "toxic": -1,
    "deterioration": -1,
    "degradation": -1,
    "damaging": -1,
    "harmful": -1,
    "perilous": -1,
    "worrisome": -1,
    "catastrophic": -1,
    "catastrophe": -1,
    "dangerous": -1,
    "threat": -1,
    "risk": -1,
    "hazardous": -1,
    "harmful": -1,
    "inappropriate": -1,
    "inadequate": -1,
    "inappropriate": -1,
    "harm": -1,
    "damage": -1,
    "pollutant": -1,
    "pollute": -1,
    "deteriorate": -1,
    "disruption": -1,
    "disrespectful": -1,
    "malevolent": -1,
    "damage": -1,
    "aggressive": -1,
    "ravager": -1,
    "spoil": -1,
    "disturb": -1,
    "damage": -1,
    "irreparable": -1,
    "toxicity": -1,
    "unacceptable": -1,
    "ecological damage": -1,
    "illegal logging": -1,
    "overconsumption": -1,
    "resource plundering": -1,
    "environmental degradation": -1,
    "destroyed natural habitat": -1,
    "excessive exploitation": -1,
    "overexploitation": -1,
    "climate change": -1,
    "environmental denial": -1,
}

negation_list = ["not", "no", "never", "none", "nil", "nothing", "nobody", "negative", "without", "more", "less"]

negation_cancellation_list = ["responsible", "originally", "source"]

In [14]:
import pandas as pd
import random

# Listes de structures de phrases
def generate_positive_structures(company, positive_terms):
    return [
        f"The company {company} is committed to a {random.choice(positive_terms)[0]} approach to promote {random.choice(positive_terms)[0]}.",
        f"Thanks to its {random.choice(positive_terms)[0]} initiative, {company} strengthens its commitment to {random.choice(positive_terms)[0]}.",
        f"{company} implements {random.choice(positive_terms)[0]} practices to support {random.choice(positive_terms)[0]}.",
        f"As a {random.choice(positive_terms)[0]} company, {company} takes measures to encourage {random.choice(positive_terms)[0]}.",
        f"{company} communicates about its {random.choice(positive_terms)[0]} commitment and its positive contribution to {random.choice(positive_terms)[0]}.",
        f"{company} is recognized for its {random.choice(positive_terms)[0]} approach and its positive impact on {random.choice(positive_terms)[0]}.",
        f"Through its {random.choice(positive_terms)[0]} actions, {company} aims to improve {random.choice(positive_terms)[0]}.",
        f"{company} adopts a {random.choice(positive_terms)[0]} strategy to promote {random.choice(positive_terms)[0]}.",
        f"The {random.choice(positive_terms)[0]} approach of {company} reflects its commitment to {random.choice(positive_terms)[0]}.",
        f"{company} values its {random.choice(positive_terms)[0]} commitment and its respect for {random.choice(positive_terms)[0]}."
    ]

def generate_negative_structures(company, negative_terms):
    return [
        f"The company {company} is criticized for its lack of commitment to {random.choice(negative_terms)[0]}.",
        f"{company} is singled out for its {random.choice(negative_terms)[0]}.",
        f"{company}'s {random.choice(negative_terms)[0]} practices have raised concerns among environmentalists.",
        f"{company} faces scrutiny for its {random.choice(negative_terms)[0]} approach.",
        f"Some question {company}'s commitment due to its {random.choice(negative_terms)[0]}.",
        f"{company} is under fire for its {random.choice(negative_terms)[0]} strategy.",
        f"Concerns are raised about {company}'s {random.choice(negative_terms)[0]} practices.",
        f"{company} is criticized for its failure to address {random.choice(negative_terms)[0]}.",
        f"{company}'s {random.choice(negative_terms)[0]} initiative is viewed with skepticism.",
        f"{company} is blamed for its {random.choice(negative_terms)[0]} impact."
    ]

def generate_mixed_structures(company, positive_terms, negative_terms):
    return [
        f"{company} is exploring {random.choice(positive_terms)[0]} initiatives to address {random.choice(negative_terms)[0]}.",
        f"The company {company} is researching {random.choice(positive_terms)[0]} solutions for {random.choice(negative_terms)[0]}.",
        f"{company} is developing {random.choice(positive_terms)[0]} practices while managing {random.choice(negative_terms)[0]}.",
        f"The approach of {company} involves {random.choice(positive_terms)[0]} methods to mitigate {random.choice(negative_terms)[0]}.",
        f"{company}'s {random.choice(positive_terms)[0]} efforts are focused on {random.choice(negative_terms)[0]}.",
        f"{company} is committed to {random.choice(positive_terms)[0]} actions and addressing {random.choice(negative_terms)[0]}.",
        f"{company} integrates {random.choice(positive_terms)[0]} strategies with {random.choice(negative_terms)[0]} management.",
        f"The company {company} emphasizes {random.choice(positive_terms)[0]} practices alongside {random.choice(negative_terms)[0]}.",
        f"{company} implements {random.choice(positive_terms)[0]} measures while considering {random.choice(negative_terms)[0]}.",
        f"{company} is dedicated to {random.choice(positive_terms)[0]} approaches and {random.choice(negative_terms)[0]} initiatives."
    ]


# Fonction pour gérer les termes de négation
def handle_negation(term, score):
    if term in negation_list:
        return -score
    elif term in negation_cancellation_list:
        return 0
    else:
        return score

# Fonction pour générer une phrase sur la communication environnementale d'une entreprise
def generate_environmental_communication(company_list, env_dict, a, b):
    positive_terms = [(term, score) for term, score in env_dict.items() if score == 1]
    negative_terms = [(term, score) for term, score in env_dict.items() if score == -1]
    
    company_sentences = {}  # Dictionnaire pour regrouper les phrases par entreprise
    
    for company in company_list:
        num_sentences = random.randint(a, b)
        sentences = []
        
        for _ in range(num_sentences):
            if random.choice([True, False]):
                structures = generate_positive_structures(company, positive_terms)
            else:
                if negative_terms:
                    structures = generate_negative_structures(company, negative_terms) + generate_mixed_structures(company, positive_terms, negative_terms)
                else:
                    structures = generate_positive_structures(company, positive_terms) + generate_mixed_structures(company, positive_terms, negative_terms)

            sentence = random.choice(structures)
            
            # Gérer les termes de négation dans la phrase
            sentence_words = sentence.split()
            for i, word in enumerate(sentence_words):
                if word.lower() in [term.lower() for term, _ in positive_terms + negative_terms]:
                    original_score = next((score for term, score in positive_terms + negative_terms if term.lower() == word.lower()), None)
                    if original_score:
                        new_score = handle_negation(word.lower(), original_score)
                        if new_score != original_score:
                            replacement = next((term for term, score in env_dict.items() if score == new_score), None)
                            if replacement:
                                sentence_words[i] = replacement

            # Reconstruire la phrase modifiée
            modified_sentence = ' '.join(sentence_words)
            sentences.append(modified_sentence)
        
        company_sentences[company] = sentences
    
    return company_sentences

# On ne choisit que les entreprises qui ont un score de controverse non nul à partir de 2009, pour cela :

## Importation des données

In [21]:
!pip install openpyxl
import openpyxl

CON_E = pd.read_excel('CON_E.xlsx', engine='openpyxl')
CON_E.head()



Unnamed: 0,idEntreprise,2002-01,2002-02,2002-03,2002-04,2002-05,2002-06,2002-07,2002-08,2002-09,...,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12,2023-01,2023-02,2023-03,2023-04
0,1,0.0,11.111111,8.888889,7.111111,5.688889,4.551111,3.640889,14.023822,11.219058,...,95.000456,90.878881,86.036438,81.975582,72.247132,74.625457,97.205522,90.921703,83.107733,80.06246
1,2,11.111111,20.0,16.0,12.8,10.24,8.192,17.664711,14.131769,11.305415,...,33.356441,26.685153,43.570345,45.967387,47.885021,38.308016,30.646413,46.739353,37.391482,29.913186
2,3,0.0,0.0,0.0,0.0,11.111111,8.888889,7.111111,5.688889,4.551111,...,24.922552,19.938041,15.950433,12.760347,10.208277,30.388844,24.311075,19.44886,26.670199,32.44727
3,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,21.288006,17.030405,13.624324,10.899459,8.719567,6.975654,5.580523,4.464418,3.571535,2.857228
4,5,36.111111,68.888889,91.952381,91.555556,78.165333,90.417778,85.92,92.937543,86.077611,...,99.799505,98.728493,97.44602,96.017441,97.956816,91.698786,93.367594,99.630794,98.074889,96.40881


In [17]:
covalence_id_firms = pd.read_csv('Universe_Listed_Covalence_31.07.2023_v2_updated.csv',sep=';',on_bad_lines='skip')
covalence_id_firms.head()

Unnamed: 0.1,Unnamed: 0,idEntreprise,Company,ISIN,GICS industry group,GICS sub-industry,Headquarters Region,Headquarters Country
0,0,1,Pfizer Inc,US7170811035,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America
1,1,2,Merck & Co Inc,US58933Y1055,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America
2,2,3,GSK plc,GB0009252882,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.
3,3,4,Eli Lilly and Co,US5324571083,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America
4,4,5,Bayer AG,DE000BAY0017,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Europe,Germany


In [20]:
# Charger le fichier Excel contenant la liste des entreprises
df = pd.read_csv('Firms.csv')
df.head()

Unnamed: 0,ISIN,Company,Country
0,US7170811035,Pfizer Inc,United States of America
1,US58933Y1055,Merck & Co Inc,United States of America
2,GB0009252882,GSK plc,Unable to resolve all requested identifiers.
3,US5324571083,Eli Lilly and Co,United States of America
4,DE000BAY0017,Bayer AG,Germany


## Fusion des tables

In [32]:
df1 = pd.merge(covalence_id_firms, CON_E, on='idEntreprise')
df1.head()

Unnamed: 0.1,Unnamed: 0,idEntreprise,Company,ISIN,GICS industry group,GICS sub-industry,Headquarters Region,Headquarters Country,2002-01,2002-02,...,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12,2023-01,2023-02,2023-03,2023-04
0,0,1,Pfizer Inc,US7170811035,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America,0.0,11.111111,...,95.000456,90.878881,86.036438,81.975582,72.247132,74.625457,97.205522,90.921703,83.107733,80.06246
1,1,2,Merck & Co Inc,US58933Y1055,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America,11.111111,20.0,...,33.356441,26.685153,43.570345,45.967387,47.885021,38.308016,30.646413,46.739353,37.391482,29.913186
2,2,3,GSK plc,GB0009252882,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.,Unable to resolve all requested identifiers.,0.0,0.0,...,24.922552,19.938041,15.950433,12.760347,10.208277,30.388844,24.311075,19.44886,26.670199,32.44727
3,3,4,Eli Lilly and Co,US5324571083,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Americas,United States of America,0.0,0.0,...,21.288006,17.030405,13.624324,10.899459,8.719567,6.975654,5.580523,4.464418,3.571535,2.857228
4,4,5,Bayer AG,DE000BAY0017,"Pharmaceuticals, Biotechnology & Life Sciences",Pharmaceuticals,Europe,Germany,36.111111,68.888889,...,99.799505,98.728493,97.44602,96.017441,97.956816,91.698786,93.367594,99.630794,98.074889,96.40881


In [33]:
df1 = df1.drop(columns=df1.columns[0])
df1 = df1.drop(columns=['GICS industry group', 'GICS sub-industry', 'Headquarters Region', 'Headquarters Country'])
df1

Unnamed: 0,idEntreprise,Company,ISIN,2002-01,2002-02,2002-03,2002-04,2002-05,2002-06,2002-07,...,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12,2023-01,2023-02,2023-03,2023-04
0,1,Pfizer Inc,US7170811035,0.000000,11.111111,8.888889,7.111111,5.688889,4.551111,3.640889,...,95.000456,90.878881,86.036438,81.975582,72.247132,74.625457,97.205522,90.921703,83.107733,80.062460
1,2,Merck & Co Inc,US58933Y1055,11.111111,20.000000,16.000000,12.800000,10.240000,8.192000,17.664711,...,33.356441,26.685153,43.570345,45.967387,47.885021,38.308016,30.646413,46.739353,37.391482,29.913186
2,3,GSK plc,GB0009252882,0.000000,0.000000,0.000000,0.000000,11.111111,8.888889,7.111111,...,24.922552,19.938041,15.950433,12.760347,10.208277,30.388844,24.311075,19.448860,26.670199,32.447270
3,4,Eli Lilly and Co,US5324571083,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,21.288006,17.030405,13.624324,10.899459,8.719567,6.975654,5.580523,4.464418,3.571535,2.857228
4,5,Bayer AG,DE000BAY0017,36.111111,68.888889,91.952381,91.555556,78.165333,90.417778,85.920000,...,99.799505,98.728493,97.446020,96.017441,97.956816,91.698786,93.367594,99.630794,98.074889,96.408810
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8593,19239,F&G Annuities & Life Inc,US30190A1043,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8594,19240,Gujarat Mineral Development Corporation Ltd,INE131A01031,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8595,19247,Borosil Ltd,INE02PY01013,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8596,19251,Healthcare Trust Inc,US42226B2043,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [34]:
df1 = df1[df1['2009-01'] != 0]
df1

Unnamed: 0,idEntreprise,Company,ISIN,2002-01,2002-02,2002-03,2002-04,2002-05,2002-06,2002-07,...,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12,2023-01,2023-02,2023-03,2023-04
0,1,Pfizer Inc,US7170811035,0.000000,11.111111,8.888889,7.111111,5.688889,4.551111,3.640889,...,9.500046e+01,9.087888e+01,8.603644e+01,8.197558e+01,7.224713e+01,7.462546e+01,9.720552e+01,9.092170e+01,8.310773e+01,8.006246e+01
1,2,Merck & Co Inc,US58933Y1055,11.111111,20.000000,16.000000,12.800000,10.240000,8.192000,17.664711,...,3.335644e+01,2.668515e+01,4.357034e+01,4.596739e+01,4.788502e+01,3.830802e+01,3.064641e+01,4.673935e+01,3.739148e+01,2.991319e+01
2,3,GSK plc,GB0009252882,0.000000,0.000000,0.000000,0.000000,11.111111,8.888889,7.111111,...,2.492255e+01,1.993804e+01,1.595043e+01,1.276035e+01,1.020828e+01,3.038884e+01,2.431108e+01,1.944886e+01,2.667020e+01,3.244727e+01
3,4,Eli Lilly and Co,US5324571083,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,2.128801e+01,1.703040e+01,1.362432e+01,1.089946e+01,8.719567e+00,6.975654e+00,5.580523e+00,4.464418e+00,3.571535e+00,2.857228e+00
4,5,Bayer AG,DE000BAY0017,36.111111,68.888889,91.952381,91.555556,78.165333,90.417778,85.920000,...,9.979950e+01,9.872849e+01,9.744602e+01,9.601744e+01,9.795682e+01,9.169879e+01,9.336759e+01,9.963079e+01,9.807489e+01,9.640881e+01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7481,16645,Hindustan Copper Ltd,INE531E01026,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
7685,17097,Fawaz Abdulaziz Alhokair Company SJSC,SA000A0LB2R6,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00
8233,18234,Iveco Group NV,NL0015000LU4,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,3.163846e+01,2.531077e+01,2.024861e+01,1.619889e+01,1.295911e+01,1.036729e+01,8.293831e+00,6.635065e+00,5.308052e+00,4.246442e+00
8290,18386,China Development Bank Financial Leasing Co Ltd,CNE1000027C9,0.000000,0.000000,0.000000,0.000000,0.000000,22.222222,17.777778,...,8.493048e-01,6.794438e-01,5.435551e-01,4.348441e-01,3.478752e-01,2.783002e-01,2.226402e-01,1.781121e-01,1.424897e-01,1.139918e-01


In [None]:
# Sélectionner 100 entreprises au hasard
random_companies = df1['Company'].sample(n=100, random_state=42).tolist()

# Utiliser la fonction pour générer les phrases
a = 3
b = 5
company_sentences = generate_environmental_communication(random_companies, Dico_env_en, a, b)

# Afficher les phrases générées par entreprise
for company, sentences in company_sentences.items():
    print(f"{company}:")
    for idx, sentence in enumerate(sentences, 1):
        print(f"  {idx}. {sentence}")

In [13]:
# Sélectionner 100 entreprises au hasard avec leurs ISIN
random_companies = df[['Company', 'ISIN']].sample(n=100, random_state=42)

# Utiliser la fonction pour générer les phrases
a = 3
b = 5
company_sentences = generate_environmental_communication(random_companies['Company'].tolist(), Dico_env_en, a, b)

# Convertir le dictionnaire company_sentences en DataFrame
df_sentences = pd.DataFrame([(company, sentence) for company, sentences in company_sentences.items() for sentence in sentences], 
                            columns=['Company', 'Sentence'])

# Effectuer une jointure pour ajouter les ISIN à chaque entreprise
df_final = pd.merge(df_sentences, random_companies, on='Company', how='left')

# Afficher les phrases générées par entreprise avec leur ISIN
for idx, row in df_final.iterrows():
    print(f"{row['Company']} (ISIN: {row['ISIN']}):")
    print(f"  {idx + 1}. {row['Sentence']}")
    print('-' * 50)

Kirloskar Brothers Ltd (ISIN: INE732A01036):
  1. Kirloskar Brothers Ltd is dedicated to sustainability approaches and pollution initiatives.
--------------------------------------------------
Kirloskar Brothers Ltd (ISIN: INE732A01036):
  2. Kirloskar Brothers Ltd communicates about its well-being commitment and its positive contribution to restoration.
--------------------------------------------------
Kirloskar Brothers Ltd (ISIN: INE732A01036):
  3. Kirloskar Brothers Ltd's harmony efforts are focused on overconsumption.
--------------------------------------------------
Seer Inc (ISIN: US81578P1066):
  4. Seer Inc is exploring continuous improvement initiatives to address worrisome.
--------------------------------------------------
Seer Inc (ISIN: US81578P1066):
  5. The recoverable approach of Seer Inc reflects its commitment to efficient.
--------------------------------------------------
Seer Inc (ISIN: US81578P1066):
  6. Seer Inc implements innovative practices to support co

On cherche maintenant à insérer une phrase dans le texte, cette insertion se fera de manière aléatoire, la phrase suivra un point. Ainsi, on pourra mettre dans cette phrase le nom d'une entreprise.

In [14]:
# Créer une nouvelle colonne "Entreprise_Insérée" dans le DataFrame
data['Entreprise_Insérée_1'] = None

# Remplir la colonne avec les noms des entreprises
for idx, (company, _) in enumerate(company_sentences.items()):
    data.at[idx, 'Entreprise_Insérée_1'] = company

# Afficher les premières lignes du DataFrame pour vérification
data.head()

Unnamed: 0,Article,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID,Entreprise_Insérée_1
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,"\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d,Kirloskar Brothers Ltd
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h,Seer Inc
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064,Samsung Life Insurance Co Ltd
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063,Kontoor Brands Inc
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z,Tauron Polska Energia SA


In [15]:
# Créer une nouvelle colonne "ISIN" dans le DataFrame
data['ISIN'] = None

# Remplir la colonne avec les ISIN correspondants aux noms des entreprises
for idx, (company, isin) in enumerate(zip(random_companies['Company'], random_companies['ISIN'])):
    data.loc[data['Entreprise_Insérée_1'] == company, 'ISIN'] = isin

# Afficher les premières lignes du DataFrame pour vérification
data.head()


Unnamed: 0,Article,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID,Entreprise_Insérée_1,ISIN
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,"\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d,Kirloskar Brothers Ltd,INE732A01036
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h,Seer Inc,US81578P1066
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064,Samsung Life Insurance Co Ltd,KR7032830002
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063,Kontoor Brands Inc,US50050N1037
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z,Tauron Polska Energia SA,PLTAURN00011


In [16]:
# Créer une nouvelle colonne pour les articles avec phrases insérées
data['Coeur_Article_Inséré_1'] = ""

# Insérer les phrases générées dans les articles
for company, sentences in company_sentences.items():
    article = data[data['Entreprise_Insérée_1'] == company]['Coeur_Article'].iloc[0]  # Récupérer l'article associé à l'entreprise
    for sentence in sentences:
        article = insertion_phrase_dans_article(sentence, article)
    data.loc[data['Entreprise_Insérée_1'] == company, 'Coeur_Article_Inséré_1'] = article  # Mettre à jour l'article dans la nouvelle colonne

data.head()


Unnamed: 0,Article,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID,Entreprise_Insérée_1,ISIN,Coeur_Article_Inséré_1
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,"\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d,Kirloskar Brothers Ltd,INE732A01036,"\n\nStony Brook University, one of two state ..."
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h,Seer Inc,US81578P1066,"\n\nIn their one movie together, their chemis..."
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064,Samsung Life Insurance Co Ltd,KR7032830002,\n\ndebatethis\n\nTalking during movies: Tota...
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063,Kontoor Brands Inc,US50050N1037,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z,Tauron Polska Energia SA,PLTAURN00011,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...


In [17]:
# Enregistrer en tant que fichier pickle pour conserver les types de données
data.to_pickle('data_avec_entreprises_1.pkl')

### Méthode 2

Ici, on cherche juste à insérer de manière aléatoire, le nom d'entreprises dans un article. Pour cela, on créer une fonction qui tire au sort le nombre d'entreprises à insérer dans un article donné, puis on insert toutes ces entreprises dans les articles. Ici : 1 article = plusieurs entreprises

On choisit aléatoirement le nombre d'entreprises à insérer dans le texte (en contrôlant que ce nombre soit borné : on pourra nous même définir les bornes)

In [18]:
import pandas as pd

# Liste de noms d'entreprises
noms_entreprises = pd.read_csv('Firms.csv')['Company'].tolist()

# Fonction pour sélectionner aléatoirement un nombre d'entreprises entre a et b
def choisir_entreprises(a, b):
    # Assurez-vous que b est inférieur ou égal à la longueur de la liste des noms d'entreprises
    b = min(b, len(noms_entreprises))
    # Choisissez un nombre aléatoire d'entreprises compris entre a et b
    nb_entreprises = random.randint(a, b)
    return random.sample(noms_entreprises, nb_entreprises)

choisir_entreprises(1,2)

['City Bank Ltd', 'Fomento Economico Mexicano SAB de CV']

On insert toutes ces entreprises dans un texte

In [19]:
entreprises_a_inserer = choisir_entreprises(1,2)
article_insertion = data['Coeur_Article'][8]

for entreprise in entreprises_a_inserer:
    article_insertion = insertion_phrase_dans_article(entreprise, article_insertion)
    
print(entreprises_a_inserer, article_insertion)

['Tricon Residential Inc']  

A Few Days Full of Trouble: Revelations on the Journey to Justice for My Cousin and Best Friend, Emmett Till, by the Rev. Wheeler Parker Jr. and Christopher Benson. (One World, 432 pp., $18.99.) ''I have known the truth,'' Parker writes again and again in this moving memoir, recounting his family's devastation at the 1955 lynching of his cousin, his life as a minister resisting racism and the aftermath of the F.B.I.'s 2018 "reawakening'' of Till's murder case.

Age of Vice, by Deepti Kapoor. (Riverhead, 560 pp., $20.) Kapoor's thriller ushers readers through the underbelly of contemporary New Delhi. It follows Ajay, a servant of a powerful crime family, charged with protecting their eldest son. But as a journalist narrows in on the family's misdeeds amid a deadly incident, Ajay must increasingly shield himself. Tricon Residential Inc

Roses, in the Mouth of a Lion, by Bushra Rehman. (Flatiron, 288 pp., $17.99.) In 1980s Queens, Razia bristles at the rigid 

#### Automatisation pour tout les articles 

In [20]:
import warnings
warnings.filterwarnings("ignore")

nb_min_entreprise_par_article = 1
nb_max_entreprise_par_article = 2
liste_articles = data['Coeur_Article'].tolist()
data['Article_avec_entreprises_2'] = None
data['Entreprises_inserees_2'] = None


for article in liste_articles:
    numero_article = liste_articles.index(article)
    entreprises_a_inserer = choisir_entreprises(nb_min_entreprise_par_article, nb_max_entreprise_par_article)
    
    for entreprise in entreprises_a_inserer:
        article = insertion_phrase_dans_article(entreprise, article)
        
    data['Entreprises_inserees_2'][numero_article] = entreprises_a_inserer
    data['Article_avec_entreprises_2'][numero_article] = article

data.insert(1,'Entreprises_inserées_2', data['Entreprises_inserees_2'])
data.pop('Entreprises_inserees_2')

data.insert(2,'Articles_avec_entreprises_2', data['Article_avec_entreprises_2'])
data.pop('Article_avec_entreprises_2')

data.head(5)

Unnamed: 0,Article,Entreprises_inserées_2,Articles_avec_entreprises_2,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID,Entreprise_Insérée_1,ISIN,Coeur_Article_Inséré_1
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,[Haohua Chemical Science & Technology Corp Ltd],"\n\nStony Brook University, one of two state ...","\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d,Kirloskar Brothers Ltd,INE732A01036,"\n\nStony Brook University, one of two state ..."
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"[Sensys Gatso Group AB, Banque Cantonale Vaudo...","\n\nIn their one movie together, their chemis...","\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h,Seer Inc,US81578P1066,"\n\nIn their one movie together, their chemis..."
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,"[Canfor Corp, Halfords Group PLC]",\n\ndebatethis\n\nTalking during movies: Tota...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064,Samsung Life Insurance Co Ltd,KR7032830002,\n\ndebatethis\n\nTalking during movies: Tota...
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,"[Victory Capital Holdings Inc, China Yuhua Edu...",\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063,Kontoor Brands Inc,US50050N1037,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,"[Princeton Bancorp Inc, Financial Institutions...",\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z,Tauron Polska Energia SA,PLTAURN00011,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...


In [21]:
data_avec_entreprises = data
data_avec_entreprises.head(5)

Unnamed: 0,Article,Entreprises_inserées_2,Articles_avec_entreprises_2,Coeur_Article,Date,Auteur,Nombre de mots,Journal,Titre,ID,Entreprise_Insérée_1,ISIN,Coeur_Article_Inséré_1
0,\nMetropolitan Desk; SECTMB\nCan an Ambitious ...,[Haohua Chemical Science & Technology Corp Ltd],"\n\nStony Brook University, one of two state ...","\n\nStony Brook University, one of two state ...",31 December 2023,Nick Tabor,529,New York Times,Copyright 2023 The New York Times Company. Al...,NYTF000020240104ejcv0000d,Kirloskar Brothers Ltd,INE732A01036,"\n\nStony Brook University, one of two state ..."
1,\n\nMagazine Desk; SECTMM\nWhen Jim Brown and ...,"[Sensys Gatso Group AB, Banque Cantonale Vaudo...","\n\nIn their one movie together, their chemis...","\n\nIn their one movie together, their chemis...",31 December 2023,Wesley Morris,422,New York Times,"When Jim Brown and Raquel Welch, Two Sexy Star...",NYTF000020231231ejcv0006h,Seer Inc,US81578P1066,"\n\nIn their one movie together, their chemis..."
2,\n\nMagazine Desk; SECTMK\nTalking During Movi...,"[Canfor Corp, Halfords Group PLC]",\n\ndebatethis\n\nTalking during movies: Tota...,\n\ndebatethis\n\nTalking during movies: Tota...,31 December 2023,,179,New York Times,Talking During Movies: Totally Evil or Part of...,NYTF000020231231ejcv00064,Samsung Life Insurance Co Ltd,KR7032830002,\n\ndebatethis\n\nTalking during movies: Tota...
3,\n\nMagazine Desk; SECTMK\nLet Kids Vote!\n\n4...,"[Victory Capital Holdings Inc, China Yuhua Edu...",\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...,31 December 2023,,454,New York Times,Let Kids Vote!,NYTF000020231231ejcv00063,Kontoor Brands Inc,US50050N1037,\n\nLET KIDS\n\nVOTE!\n\nby Katherine Cusuman...
4,\n\nMagazine Desk; SECTMK\nAre We Doomed to Di...,"[Princeton Bancorp Inc, Financial Institutions...",\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...,31 December 2023,Christina Caron,428,New York Times,Are We Doomed to Disagree?,NYTF000020231231ejcv0005z,Tauron Polska Energia SA,PLTAURN00011,\n\nare we DOOMED TO DISAGREE?\n\nwhy it's so...


## Exportation du nouveau tableau des données

In [22]:
# Enregistrer en tant que fichier pickle pour conserver les types de données
data_avec_entreprises.to_pickle('data_avec_entreprises_2.pkl')

# Cohérence de texte

On prends un texte au hasard, et on regarde si les phrases insérees sont cohérentes entre elles, et avec le texte

In [23]:
print("L'entreprise identifie pour ce texte est :",data['Entreprise_Insérée_1'][3], data['Coeur_Article_Inséré_1'][3])

L'entreprise identifie pour ce texte est : Kontoor Brands Inc  

LET KIDS

VOTE!

by Katherine Cusumano

Julia Rottenberg, 17, spent the fall of last year knocking on doors. Kontoor Brands Inc is committed to organic actions and addressing malevolent. On Election Day 2022, people in Culver City, Calif. Concerns are raised about Kontoor Brands Inc's inappropriate practices. Kontoor Brands Inc is dedicated to energy efficiency approaches and greenhouse gas emissions initiatives., her hometown, would have a big decision to make: Should the voting age for local elections change from 18 to 16? Julia wanted them to vote yes. ''I think a vote is one of the most direct ways that you can express an opinion and actually have some change happen,'' says Julia, who is part of an organization called Vote16 Culver City.

What's the argument for giving kids the vote? Well, as you may have noticed, there are a lot of decisions being made (or not) about things that affect kids directly, like climate cha

Les phrases insérées sont : 

  1. As a beneficial company, Kontoor Brands Inc takes measures to encourage eco-innovation.
  2. As a clean company, Kontoor Brands Inc takes measures to encourage energy-efficient.
  3. Kontoor Brands Inc is recognized for its sustainability approach and its positive impact on clean energy.
  4. The revitalization approach of Kontoor Brands Inc reflects its commitment to fulfillment.

ou en français : 

   1. En tant qu'entreprise bénéfique, Kontoor Brands Inc prend des mesures pour encourager l'éco-innovation.
   2. En tant qu'entreprise propre, Kontoor Brands Inc prend des mesures pour encourager l'efficacité énergétique.
   3. Kontoor Brands Inc est reconnue pour son approche en matière de durabilité et son impact positif sur l'énergie propre.
   4. L'approche de revitalisation de Kontoor Brands Inc reflète son engagement envers l'accomplissement.

Analyse : 

    1. La première partie de la phrase de ne semble pas faire sens. En revanche, on identifie clairement un message vert en seconde partie de phrase (ici c'est plutôt une analyse qu'une communication).
    
    2. Le début de phrase semble plus adapté, sans vraiment savoir précisément ce que l'on cherche à décrire. Est-ce une entreprise propre dans le sens écologique du terme ? Dans le sens hygiénique ? On ne sait pas... Tout comme la première phrase, la fin de celle ci semble montrer une image verte de l'entreprise.
    
    3. Toujours pas de communication, plutôt une affirmation qu'autre chose, mais la phrase fait sens à 100%, et laisse une image positive de l'entreprise.
    
    4. Aucun sens...


Est ce que l'insertion d'une phrase dans un texte comme celui çi fait sens ? Dans un premier temps, on résume l'article, sans tenir compte des phrases insérées : 

L'article du New York Times, "LET KIDS VOTE!" par Katherine Cusumano, traite du débat sur l'abaissement de l'âge de vote à 16 ans pour les élections locales aux États-Unis. Julia Rottenberg, 17 ans, a milité pour cette cause à Culver City, en Californie, avec l'organisation Vote16 Culver City. L'argument principal est que les jeunes sont affectés par des décisions majeures sans avoir leur mot à dire, notamment sur des questions comme le changement climatique ou la violence par arme à feu. Bien que certaines villes aient déjà abaissé l'âge de vote, cette proposition a été rejetée à Culver City. Malgré cela, les jeunes militants restent déterminés à continuer leur combat.

La première conclusion que nous pourrions tirer, est que nous n'avons identifié aucune entreprise dans ce texte. En revanche, le thème du changement climatique est abordé ("What's the argument for giving kids the vote? Well, as you may have noticed, there are a lot of decisions being made (or not) about things that affect kids directly, like climate change or gun violence or school resources."), mais avec légerté : le sujet principale reste le droit de vote des jeunes, et non la question écologique. Il semble peu pertinant ici qu'on ait cité la communication verte d'une entreprise comme Kontoor Brands Inc dans un article comme celui ci. De plus, l'insertion de ces phrases s'est fait de manière aléatoire : elles ne sont pas associé / proche de la citation sur le passage avec la situation climatique.

Pour résumé, on s'appuye ici sur un article ne citant aucune entreprise, dont le sujet tourne autour de l'age du droit de vote, avec une très légère évocation du changement climatique. Enfin, nous transformons cet article en insérant des phrases qui semblent présenter une cohérence moyenne, et une tendance à positiver la verdure de l'entreprise.
