With this method the pipeline consists of the following (in order):

1. Dataset download via Kaggle (ArtBench-10).
2. Creation of the new metadata dataframe, if it already exists it's just pulled from Kaggle.
3. Selection of the image that will be used to create the deepfake.
4. Update the metadata dataframe row.
5. Creation of a prompt with an Image-to-Text model.
6. With this prompt generate a deepfake with a Text-to-Image model.
7. Update the metadata dataframe row.
8. Update metadata csv file on Kaggle.

First we install the needed libraries:

In [1]:
%%capture
!pip install kaggle
!pip install ipywidgets

## 1. Dataset download via Kaggle

In [2]:
!kaggle datasets download -d alexanderliao/artbench10
!unzip /content/artbench10.zip
None

import pandas as pd

# read artbench-10 csv
df = pd.read_csv('/content/ArtBench-10.csv')
df.head()

Dataset URL: https://www.kaggle.com/datasets/alexanderliao/artbench10
License(s): other
Downloading artbench10.zip to /content
 96% 319M/332M [00:02<00:00, 139MB/s]
100% 332M/332M [00:02<00:00, 116MB/s]
Archive:  /content/artbench10.zip
  inflating: ArtBench-10.csv         
  inflating: artbench-10-binary/._artbench-10-batches-bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._batches.meta.txt  
  inflating: artbench-10-binary/artbench-10-batches-bin/._data_batch_1.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._data_batch_2.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._data_batch_3.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._data_batch_4.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._data_batch_5.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/._test_batch.bin  
  inflating: artbench-10-binary/artbench-10-batches-bin/batches.meta.txt  
  inflating: artbench-10-binary/artbench-10-batches-b

Unnamed: 0,name,artist,url,is_public_domain,length,width,label,split,cifar_index
0,frank-omeara_towards-night-and-winter.jpg,frank-omeara,https://uploads5.wikiart.org/00316/images/fran...,True,800,657,impressionism,train,43186
1,goldstein-grigoriy_morning.jpg,goldstein-grigoriy,https://uploads5.wikiart.org/images/grigoriy-g...,True,521,499,impressionism,train,41151
2,georges-lemmen_man-reading.jpg,georges-lemmen,https://uploads6.wikiart.org/images/georges-le...,True,800,612,impressionism,train,9754
3,theodor-aman_port-of-constantza-1882.jpg,theodor-aman,https://uploads6.wikiart.org/images/theodor-am...,True,560,336,impressionism,train,44244
4,niccolo-cannicci_il-passo-della-futa-1914.jpg,niccolo-cannicci,https://uploads3.wikiart.org/images/niccolo-ca...,True,2400,2322,impressionism,train,46885


Before proceeding we need to authenticate ourselves on Kaggle to use the Kaggle API:

You need your Kaggle API key to authenticate.

Go to your Kaggle account settings: Kaggle API Token.
Scroll down to the API section and click on Create New API Token.
This will download a kaggle.json file, which contains your username and API key.

We need to upload this file:

In [3]:
# upload kaggle json to authenticate

from google.colab import files
from PIL import Image
from IPython.display import display

# Upload image file
uploaded = files.upload()

Saving kaggle.json to kaggle.json


We move it to the correct location

In [4]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

We define beforehand the dataset description, it will either be used to find it on Kaggle or create it.

(REPLACE USERNAME OR CHANGE IT AFTER WE DECIDE HOW TO PROCEED)

In [5]:
# Define metadata for the Kaggle dataset
dataset_metadata = {
    "title": "DeepFake-DL",  # Replace with your dataset title
    "id": "ciprianstricescu/deepfake-dl",  # Replace with your Kaggle username and dataset name
    "licenses": [{"name": "CC0-1.0"}]  # License for your dataset
}

##2. Creation of the new metadata dataframe, if it already exists it's just pulled from Kaggle

Metadata structure:


| Field | Description |
| --- | --- |
| original_artwork | The title of the original artwork it's based upon |
| artist | The artist of the original artwork |
| date | The period in which the original artwork was created |
| description | A brief description of the original artwork |
| image | The URL of the original artwork |
| original_style | The style of the original artwork |
| medium | The medium used to create the original artwork |
| AI model | The AI model used to generate the target image |
| deepfake image | The URL of the generated target image |


In [6]:
# Check if metadata.csv exists on Kaggle
import os
import json
import pandas as pd
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()

# Function to check if a dataset exists on Kaggle
def dataset_exists_on_kaggle(dataset_id, dataset_title):
    try:
      api.dataset_metadata(path=dataset_id, dataset=dataset_title)
      print("Dataset exists on Kaggle")
      return True
    except:
      print("Dataset does not exist on Kaggle")
      return False

# Dataset info
dataset_id = dataset_metadata["id"]
dataset_title = dataset_metadata["title"]
csv_file_name = "metadata.csv"

dataset_exists = dataset_exists_on_kaggle(dataset_id, dataset_title)
if not dataset_exists:
  metadata = pd.DataFrame(columns=["original_artwork", "artist", "date", "description", "image", "original_style", "medium", "AI model", "deepfake_image"])
  metadata.head()
else:
  api.dataset_download_files(dataset_id, path='./', unzip=True)

  # Read the metadata if it exists
  if os.path.exists(csv_file_name):
      metadata = pd.read_csv(csv_file_name)
      print(f"Metadata CSV loaded: {metadata.head()}")
  else:
      print("Metadata CSV not found.")

Dataset exists on Kaggle
Dataset URL: https://www.kaggle.com/datasets/ciprianstricescu/deepfake-dl
Metadata CSV loaded:                             original_artwork        artist  date  description  \
0  frank-omeara_towards-night-and-winter.jpg  frank-omeara   NaN          NaN   

                                               image original_style  medium  \
0  https://uploads5.wikiart.org/00316/images/fran...  impressionism     NaN   

   AI model  deepfake_image  
0       NaN             NaN  


## 3. Selection of the image that will be used to create the deepfake

In [7]:
import ipywidgets as widgets
from IPython.display import display

# Define the options for multiple choice
style_options = df['label'].unique().tolist()
chosen_style = None
artist_options = []
chosen_artist = None
painting_options = []
chosen_painting = None
url = None


# Create a dropdown widget
dropdown_style = widgets.Dropdown(
    options=style_options,
    description='Style choice:',
    disabled=False
)

dropdown_artist = widgets.Dropdown(
    options=artist_options,
    description='Artist choice:',
    disabled=True
)

dropdown_painting = widgets.Dropdown(
    options=painting_options,
    description='Painting choice:',
    disabled=True
)

# Display the widgets
display(dropdown_style)
display(dropdown_artist)
display(dropdown_painting)

# Function to capture the selected value
def on_value_change_style(change):
    print(f"You selected: {change['new']}")
    global chosen_style
    chosen_style = change['new']
    artist_options = df[df['label'] == chosen_style]['artist'].unique().tolist()
    dropdown_artist.options = artist_options
    dropdown_artist.disabled = False

def on_value_change_artist(change):
    print(f"You selected: {change['new']}")
    global chosen_artist
    chosen_artist = change['new']
    painting_options = df[df['artist'] == chosen_artist]['name'].unique().tolist()
    print(painting_options)
    dropdown_painting.options = painting_options
    dropdown_painting.disabled = False

def on_value_change_painting(change):
    print(f"You selected: {change['new']}")
    global chosen_painting
    chosen_painting = change['new']
    # do something with the selected value


# Attach the function to the dropdown
dropdown_style.observe(on_value_change_style, names='value')
dropdown_artist.observe(on_value_change_artist, names='value')
dropdown_painting.observe(on_value_change_painting, names='value')

# Create a download button
download_button = widgets.Button(description='Download')

# Display the download button
display(download_button)

def on_button_press(button):
    # Get the selected values
    selected_style = dropdown_style.value
    selected_artist = dropdown_artist.value
    selected_painting = dropdown_painting.value
    # Get the URL for the selected painting
    global url
    url = df[df['name'] == selected_painting]['url'].values[0]
    # Download the image
    !wget $url

# Attach the function to the download button
download_button.on_click(on_button_press)

Dropdown(description='Style choice:', options=('impressionism', 'romanticism', 'expressionism', 'surrealism', …

Dropdown(description='Artist choice:', disabled=True, options=(), value=None)

Dropdown(description='Painting choice:', disabled=True, options=(), value=None)

Button(description='Download', style=ButtonStyle())

You selected: romanticism
You selected: bartolomeo-pinelli
['bartolomeo-pinelli_study.jpg', 'bartolomeo-pinelli_the-death-of-epaminondas-1812.jpg', 'bartolomeo-pinelli_bandits-kidnapping-a-woman-1834.jpg', 'bartolomeo-pinelli_ave-maria-at-tivoli-1808.jpg', 'bartolomeo-pinelli_in-tivoli-1808.jpg', 'bartolomeo-pinelli_return-from-the-vintage-1808-0.jpg', 'bartolomeo-pinelli_on-the-road-to-tivoli-1808.jpg', 'bartolomeo-pinelli_a-group-of-three-peasants-in-a-village-1808.jpg', 'bartolomeo-pinelli_rest-during-the-vintage-1808.jpg', 'bartolomeo-pinelli_litigation-of-trasteverini-people-1809.jpg', 'bartolomeo-pinelli_a-woman-and-two-men-with-guns-costumes-of-the-kingdom-of-naples-1808.jpg', 'bartolomeo-pinelli_woman-with-a-baby-praying-before-the-cross-marking-the-place-where-her-husband-was-killed-1808.jpg', 'bartolomeo-pinelli_a-domestic-dispute-in-tivoli-1808.jpg', 'bartolomeo-pinelli_courtship-1818.jpg', 'bartolomeo-pinelli_fight-of-women-in-rome-1808.jpg', 'bartolomeo-pinelli_butcher-of-

Before proceeding we define how we populate our metadata given the original image (STILL NEED TO DEFINE HOW TO GET MISSING INFO)

In [8]:
def populate_metadata(data, metadata):
  # create a new row if it doesnt already exist
  # print(metadata['original_artwork'].unique().tolist())
  if data[2] in metadata['original_artwork'].unique().tolist():
    # metadata.index(metadata['original_artwork'] == data[2])
    print("row already exists")
    # update current value
  else:
    new_row = pd.DataFrame({'original_artwork': data[2], 'artist': data[1], 'date': None, 'description': None, 'image': data[3], 'original_style': data[0], 'medium': None, 'AI model': None, 'deepfake_image': None}, index=[0])
    metadata = pd.concat([metadata, new_row], ignore_index=True)
  return metadata

##4. Update the metadata dataframe row

In [9]:
data = [chosen_style, chosen_artist, chosen_painting, url]
metadata = populate_metadata(data, metadata)
metadata.head()

Unnamed: 0,original_artwork,artist,date,description,image,original_style,medium,AI model,deepfake_image
0,frank-omeara_towards-night-and-winter.jpg,frank-omeara,,,https://uploads5.wikiart.org/00316/images/fran...,impressionism,,,
1,goldstein-grigoriy_portrait-of-a-woman.jpg,goldstein-grigoriy,,,https://uploads0.wikiart.org/images/grigoriy-g...,impressionism,,,


#TO DEFINE

5. Creation of a prompt with an Image-to-Text model.
6. With this prompt generate a deepfake with a Text-to-Image model.
7. Update the metadata dataframe row.

##8. Update metadata csv file on Kaggle

In [11]:
print(dataset_exists)

True


In [12]:
# upload metadata to kaggle as csv file
metadata.to_csv('metadata.csv', index=False)

if dataset_exists:
   # Move metadata.csv to our_data folder
   !mkdir -p our_data
   !mv /content/ciprianstricescu/deepfake-dl/dataset-metadata.json our_data/
   !mv metadata.csv our_data/
   !kaggle datasets version -p /content/our_data -m "Updated with new data"
else:
  # Save the metadata to a file
   with open('dataset-metadata.json', 'w') as f:
    json.dump(dataset_metadata, f)
   # Move metadata.csv to our_data folder
   !mkdir -p our_data
   !mv metadata.csv our_data/
   !mv dataset-metadata.json our_data/
   # Create a new dataset
   !kaggle datasets create -p /content/our_data

Starting upload for file metadata.csv
100% 411/411 [00:00<00:00, 623B/s]
Upload successful: metadata.csv (411B)
Dataset version is being created. Please check progress at https://www.kaggle.com/ciprianstricescu/deepfake-dl
