## Tutoriel : interagir avec le système de stockage S3 du SSP Cloud (MinIO)

In [1]:
import os

import pandas as pd
import s3fs
import zipfile

### Récupérer les données d'un challenge

In [2]:
# Create filesystem object
S3_ENDPOINT_URL = "https://" + os.environ["AWS_S3_ENDPOINT"]
fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': S3_ENDPOINT_URL})

In [3]:
# Lister les challenges
fs.ls("gvimont/diffusion/hackathon-minarm-2024")

['gvimont/diffusion/hackathon-minarm-2024/AIVSAI',
 'gvimont/diffusion/hackathon-minarm-2024/Acoustique',
 'gvimont/diffusion/hackathon-minarm-2024/Similarité']

In [4]:
# Lister les fichiers d'un challenge
fs.ls("gvimont/diffusion/hackathon-minarm-2024/Similarité")

['gvimont/diffusion/hackathon-minarm-2024/Similarité/.keep',
 'gvimont/diffusion/hackathon-minarm-2024/Similarité/archive.zip']

In [5]:
# Télécharger les données dans le service
PATH_IN = 'gvimont/diffusion/hackathon-minarm-2024/Similarité/archive.zip'
fs.download(PATH_IN, 'data/archive.zip')

[None]

In [7]:
# Décompresser les données
with zipfile.ZipFile("data/archive.zip","r") as zip_file:
    zip_file.extractall("data/")

NB : les données peuvent être également téléchargées directement si besoin, pour être utilisées hors du SSP CLoud.
Exemple pour le fichier ci-dessus (même format de lien pour les autres challenges) : 

http://minio.lab.sspcloud.fr/gvimont/diffusion/hackathon-minarm-2024/AIVSAI/HC3.zip

### Exporter des données

In [10]:
import pandas as pd

#df = pd.read.('/data/HC3/medicine.jsonl', lines=True)
df_train = pd.read_csv(r"./data/anno_train.csv", header=None)
df_test = pd.read_csv(r"./data/anno_test.csv", header=None)

# Ajouter des en-têtes au DataFrame
headers = ['Imagefile', 'Bounding_boxe1','Bounding_boxe2' ,'Bounding_boxe3','Bounding_boxe4','class number']
df_train.columns = headers
df_test.columns = headers



df_train.head()
#df_test


Unnamed: 0,Imagefile,Bounding_boxe1,Bounding_boxe2,Bounding_boxe3,Bounding_boxe4,class number
0,00001.jpg,39,116,569,375,14
1,00002.jpg,36,116,868,587,3
2,00003.jpg,85,109,601,381,91
3,00004.jpg,621,393,1484,1096,134
4,00005.jpg,14,36,133,99,106


In [11]:
df_train.isnull().sum()


Imagefile         0
Bounding_boxe1    0
Bounding_boxe2    0
Bounding_boxe3    0
Bounding_boxe4    0
class number      0
dtype: int64

In [19]:
# Export vers un bucket personnel
PATH_OUT = 'avouacr/diffusion/projet-mongroupe-hackathon/medicine.csv'

with fs.open(PATH_OUT, 'w') as file_out:
    df.to_csv(file_out, index=False)

In [20]:
# NB : le dossier 'diffusion' permet un accès en lecture à tous les membres du groupe !
# Tous les membres peuvent donc le voir et l'utiliser dans un service
fs.ls("avouacr/diffusion/projet-mongroupe-hackathon")

['avouacr/diffusion/projet-mongroupe-hackathon/medicine.csv']

In [21]:
with fs.open(PATH_OUT, mode="r") as file_in:
    df_test = pd.read_csv(file_in)

In [22]:
df_test.head()

Unnamed: 0,question,human_answers,chatgpt_answers
0,Does Primolut N taken during pregnancy affect ...,"['Hi, Thanks for the query. I understand y...",['It is not recommended to use Primolut N duri...
1,Bloating and pain on right lower abdomen. Shou...,"['Hello,Thanks for the query to H.C.M. Forum.P...",['If you are experiencing abdominal pain and b...
2,Is chest pain related to intake of clindamycin...,"['Hello, The use of Clindamycin can cause stom...",['It is possible that chest pain could be rela...
3,Q. Noticed a yellowish sag in the gums of my 1...,['Hello. Revert back with the photos to a dent...,"[""It is difficult to accurately diagnose a con..."
4,"Suggest remedy for low grade fever, hot and co...","['Hi Dear,Welcome to Healthcaremagic Team.Unde...","[""I'm sorry to hear that you're feeling sick. ..."
