## Detailed Article Explaination

The detailed code explanation for this article is available at the following link:

https://www.daniweb.com/programming/computer-science/tutorials/541365/converting-pdf-image-to-csv-using-multimodal-google-gemini-pro
    
For my other articles for Daniweb.com, please see this link:

https://www.daniweb.com/members/1235222/usmanmalik57



## Installing and Importing Required Libraries 

In [None]:
## !pip install --upgrade google-cloud-aiplatform

In [1]:
import base64
import glob
import csv
import os
import re
from vertexai.preview.generative_models import GenerativeModel, Part

## Defining Helper Functions for Image Processing

In [9]:
def get_jpg_file_paths(directory):

    jpg_file_paths = glob.glob(os.path.join(directory, '**', '*.jpg'), recursive=True)
    return [os.path.abspath(path) for path in jpg_file_paths]

In [10]:
def read_image(img_paths):

    imgs_b64 = []
    for img in img_paths: 
        with open(img, "rb") as f: # open the image file in binary mode
            img_data = f.read() # read the image data as bytes
            img_b64 = base64.b64encode(img_data) # encode the bytes as base64
            img_b64 = img_b64.decode() # convert the base64 bytes to a string
            img_b64 = Part.from_data(data=img_b64, mime_type="image/jpeg") 
            
            imgs_b64.append(img_b64)

    return imgs_b64

## Extracting information from PDF Using Google Gemini Pro

In [11]:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r"PATH_TO_VERTEX_AI_SERVICE_ACCOUNT JSON FILE"

model = GenerativeModel("gemini-pro-vision")
config={
    "max_output_tokens": 2048,
    "temperature": 0,
    "top_p": 1,
    "top_k": 32
}

In [12]:
def generate(img, prompt):
    

    input = img + [prompt]

    responses = model.generate_content(    
        input,
        generation_config= config,
        stream=True,
    )
    full_response = ""

    for response in responses:
        full_response += response.text

    return full_response

In [13]:
directory_path = r'D:\\Receipts\\'
image_paths = get_jpg_file_paths(directory_path)
imgs_b64 = read_image(image_paths)


In [14]:
prompt = """I have the above receipts. Return a response that contains information from the receipts in a comma-separated file format where row fields are table columns, 
whereas row values are column values. The output should contain (header + number of recept rows). 
The first row should contain all column headers, and the remaining rows should contain all column values from two recepts one in each row.  
Must use all field values in the receipt. """


In [15]:
full_response = generate(imgs_b64, prompt)

In [16]:
print(full_response)

 **Numéro de session,Date,Heure,Pass Easy n°,Fin de validité,Type,Quantité,Prix Unitaire,TVA,Montant total HT,Montant total TTC**
1,16/01/2024,09:32:32,3307837143,30/09/2023,Carnet de Ticket t+,10,17,35 €,10,00 %,15,77 €,17,35 €
1,16/01/2024,09:32:32,3307837143,30/09/2023,Carnet de Ticket t+,10,17,35 €,10,00 %,15,77 €,17,35 €


## Converting Google Gemini Pro Response to a CSV File 

In [18]:

lines = full_response.strip().split('\n')


def process_line(line):
 
    special_patterns = re.compile(r'\d+,\d+\s[€%]')

    temp_replacement = "TEMP_CURRENCY"

    currency_matches = special_patterns.findall(line)

    for match in currency_matches:
        line = line.replace(match, temp_replacement, 1)

    parts = line.split(',')

    for i, part in enumerate(parts):
        if temp_replacement in part:
            parts[i] = currency_matches.pop(0) 

    return parts

In [19]:
csv_file_path = r'D:\\Receipts\\receipts.csv'  

# Open the CSV file for writing
with open(csv_file_path, mode='w', newline='', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    
    # Process each line in the data list
    for line in lines:
        processed_line = process_line(line)
        writer.writerow(processed_line)