# Mad Libs to Python Expert Level 4: From Mad Libs to Simple Medical-Style User Database

## Welcome to Level 4!

### In this lesson, we will:

* Turn our mad libs into a user database.
* Load it in two different ways (the wrong way and the right way).
* Save it to a CSV and reload it again.
* Create a random 7-digit alphanumeric ID for all users.
* Ensure the database ID numbers do not duplicate.

## Here's how:

* First, we'll learn to create a Python file as a pre-existing database that we can call from.
* Next, we will create a DataFrame like in Level 2.
* Only this time, we will save the information we have collected into a CSV from the DataFrame, making it possible to call the information later and rewrite these databases after we have created some more new users.

So let's jump in!

## ____________________________________________________________________________

#### Level 3 Mad Libs Comparison

This is our final mad libs from level 3. If you have your own unique version, you can switch your personalized version for this one. Anyways, it's just here so that we can compare it to the next cell. Notice the similarities as they are functionally identical.

In [None]:
import pandas as pd

# Initialize a list to store all sets of answers
all_madlibs = []
madlib_counter = 0  # Initialize a counter for unique IDs

def create_madlib_entry():
    
    """Gathers madlib inputs and stores them with a unique ID."""
    global madlib_counter #access the global counter.
    madlib_counter += 1 #increment the counter.
    a1 = input('Choose A Type of Event/Place: ').upper()
    a2 = input('Pick An Adjective: ').upper()
    a3 = input('Pick Another Adjective: ').upper()
    a4 = input('Choose a Food Noun: ').upper()
    a5 = input('Pick a Plural Noun Related to Food: ').upper()
    a6 = input('Pick Another Food Noun: ').upper()

    input_dict = {
        "id": madlib_counter, #add the unique ID
        "event": a1,
        "adj2": a2,
        "adj3": a3,
        "noun2": a4,
        "plural_noun": a5,
        "plural_noun2": a6
    }
    all_madlibs.append(input_dict) #add the dictionary to the list of all madlibs.

    madlib2 = f' One of the things most {a2} sports fans look forward to at American  {a1} is eating a/an {a5}. \
    There is nothing more traditional than watching a/an {a3} act and eating {a4} drenched in mustard, relish, and {a6}.'
    
    print(madlib2)
    print(f"Madlib entry {madlib_counter} created.") #feedback to user.

# Run the madlib creation multiple times
for _ in range(3):  # Change the range to get more or fewer sets of answers
    create_madlib_entry()

# Create a DataFrame from all collected madlibs
df_all_madlibs = pd.DataFrame(all_madlibs)

print("\nDataFrame of All Madlibs:")
df_all_madlibs

#### Database Initialization and User Input Handling

Just like in the cell above, we create an empty dictionary to name the database.

---

We still define a function, and within that function are our inputs (the dictionary that the user inputs create). It still makes a basic ID counter AND adds a number to each entry.

---

Literally, everything is the same, EXCEPT that it is not a madlib; it is now my imaginary client's new database prototype.

---

This version still runs a 'for' loop 3 times, but we gave ourselves a break and included in the comments at the bottom of the script some user info to cut and paste or copy in by typing.

---

## Let's Run It!

In [68]:
import pandas as pd

# Initialize a list to store all sets of answers
all_user_info = []
user_counter = 0  # Initialize a counter for unique IDs

def collect_user_info():
    """Gathers user information and stores it with a unique ID."""
    global user_counter
    user_counter += 1
    first_name = input("Enter your First Name: ").capitalize()
    last_name = input("Enter your Last Name: ").capitalize()
    age = input("Enter your Age: ")
    city = input("Enter your City: ").capitalize()
    occupation = input("Enter your Occupation: ").capitalize()
    favorite_food = input("Enter your Favorite Food: ").capitalize()

    user_dict = {
        "id": user_counter,
        "first_name": first_name,
        "last_name": last_name,
        "age": age,
        "city": city,
        "occupation": occupation,
        "favorite_food": favorite_food
    }
    all_user_info.append(user_dict)

    confirmation = f"Thanks {first_name}; we have your information, now let's double check it:\n"
    for key, value in user_dict.items():
        if key != "id":
            confirmation += f"{key}: {value}\n"
    confirmation += "Press Enter to continue."
    print(confirmation)
    input()
    print(f"User entry {user_counter} created.")

# Run the user info collection multiple times
for _ in range(3):
    collect_user_info()

# Create a DataFrame from all collected user info
df_user_info = pd.DataFrame(all_user_info)

print("\nDataFrame of User Information:")
print(df_user_info)

# Imaginary inputs for the students:
# 1. John Doe, 21, New York, Student, Pizza
# 2. Jane Smith, 35, Los Angeles, Teacher, pizza
# 3. David Lee, 35, New york, Doctor, Steak

Enter your First Name:  cheech
Enter your Last Name:  marin
Enter your Age:  75
Enter your City:  toronto
Enter your Occupation:  comedian
Enter your Favorite Food:  chile relleno


Thanks Cheech; we have your information, now let's double check it:
first_name: Cheech
last_name: Marin
age: 75
city: Toronto
occupation: Comedian
favorite_food: Chile relleno
Press Enter to continue.


 


User entry 1 created.


Enter your First Name:  jeff duhnam
Enter your Last Name:  mcclain
Enter your Age:  725
Enter your City:  cleveland
Enter your Occupation:  miner
Enter your Favorite Food:  nachos


Thanks Jeff duhnam; we have your information, now let's double check it:
first_name: Jeff duhnam
last_name: Mcclain
age: 725
city: Cleveland
occupation: Miner
favorite_food: Nachos
Press Enter to continue.


 


User entry 2 created.


Enter your First Name:  tommy
Enter your Last Name:  lee
Enter your Age:  65
Enter your City:  los angeles
Enter your Occupation:  drummer
Enter your Favorite Food:  lobster 


Thanks Tommy; we have your information, now let's double check it:
first_name: Tommy
last_name: Lee
age: 65
city: Los angeles
occupation: Drummer
favorite_food: Lobster 
Press Enter to continue.


 


User entry 3 created.

DataFrame of User Information:
   id   first_name last_name  age         city occupation  favorite_food
0   1       Cheech     Marin   75      Toronto   Comedian  Chile relleno
1   2  Jeff duhnam   Mcclain  725    Cleveland      Miner         Nachos
2   3        Tommy       Lee   65  Los angeles    Drummer       Lobster 


In [80]:
pd.DataFrame(all_user_info)

Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,Cheech,Marin,75,Toronto,Comedian,Chile relleno
1,2,Jeff duhnam,Mcclain,725,Cleveland,Miner,Nachos
2,3,Tommy,Lee,65,Los angeles,Drummer,Lobster


### Building a Python Database from Form Inputs

Creating a Python database from the inputs on our form is simple to do, but it's also the kind of thing that we, as developers, should be able to do!

So let's run through the steps here, and I will show you how as well.

First, let's talk about directories a bit, and then we will make this working program prototype. On the left side of your notebook is a folders or directories tab.

<----- 📁

This will show you what directory you are working in now.

**Steps:**

1.  Right-click in the area where the files you are working on are located, choose: "New File."
2.  Delete the entire filename, including 'untitled.txt', and rename it: `user_db.py`.
3.  If it isn't open already, open `user_db.py` in your notebook.
4.  Inside `user_db.py`, paste the following code (it's our database as it stands so far):

In [None]:
# user_db.py

user_data = [
    {
        "id": 1,
        "first_name": "John",
        "last_name": "Doe",
        "age": "21",
        "city": "New York",
        "occupation": "Student",
        "favorite_food": "Pizza"
    },
    {
        "id": 2,
        "first_name": "Jane",
        "last_name": "Smith",
        "age": "35",
        "city": "Los Angeles",
        "occupation": "Teacher",
        "favorite_food": "Pizza"
    },
    {
        "id": 3,
        "first_name": "David",
        "last_name": "Lee",
        "age": "35",
        "city": "New York",
        "occupation": "Doctor",
        "favorite_food": "Steak"
    }
]

def add_user(user_entry):
    """Adds a user entry to the database."""
    user_data.append(user_entry)

def get_all_users():
    """Returns all user data."""
    return user_data

def find_user(user_id):
    """Finds a user by their ID."""
    for user in user_data:
        if user["id"] == user_id:
            return user
    return None

## Important: Database Code Execution

Do not run the cell above in this notebook. It is your database code, and that will mess stuff up.

If you do run the above cell in the notebook, right-click the cell and choose: "Clear output of this cell."

## New User Entry Form and Database Update

HERE'S OUR NEW DATABASE ENTRY FORM THAT ONLY ASKS FOR ONE USER'S INPUT, ADDS THEM TO OUR DB, AND PULLS ALL USERS INTO A DATAFRAME.

LET'S TRY IT, and we can go over how it works.

In [35]:
import pandas as pd
import user_db  # Import the user_db.py file

def collect_user_info():
    """Collects user information and adds it to the user_db.py database."""
    first_name = input("Enter your First Name: ").capitalize()
    last_name = input("Enter your Last Name: ").capitalize()
    age = input("Enter your Age: ")
    city = input("Enter your City: ").capitalize()
    occupation = input("Enter your Occupation: ").capitalize()
    favorite_food = input("Enter your Favorite Food: ").capitalize()

    # Find the next available ID
    all_users = user_db.get_all_users()
    if all_users:
        next_id = all_users[-1]['id'] + 1
    else:
        next_id = 1

    user_entry = {
        "id": next_id,
        "first_name": first_name,
        "last_name": last_name,
        "age": age,
        "city": city,
        "occupation": occupation,
        "favorite_food": favorite_food
    }

    user_db.add_user(user_entry)  # Add the user to the database

    confirmation = f"Thanks {first_name}; we have your information, now let's double check it:\n"
    for key, value in user_entry.items():
        if key != "id":
            confirmation += f"{key}: {value}\n"
    confirmation += "Press Enter to continue."
    print(confirmation)
    input()
    print(f"User entry {next_id} created and saved.")

# Collect user info once
collect_user_info()

# Display all users in a DataFrame (including the new user)
df_all_users = pd.DataFrame(user_db.get_all_users())
print("\nAll User Data:")
print(df_all_users)

# here's fake person to add if u need one:
## Jenny Fromdablok, 55, Bronx, Dancer, Ham Sandwich

Enter your First Name:  jeff
Enter your Last Name:  tarkington
Enter your Age:  23
Enter your City:  las vegas
Enter your Occupation:  teacher
Enter your Favorite Food:  pizza


Thanks Jeff; we have your information, now let's double check it:
first_name: Jeff
last_name: Tarkington
age: 23
city: Las vegas
occupation: Teacher
favorite_food: Pizza
Press Enter to continue.


 Las Vegas


User entry 4 created and saved.

All User Data:
   id first_name   last_name age         city occupation favorite_food
0   1       John         Doe  21     New York    Student         Pizza
1   2       Jane       Smith  35  Los Angeles    Teacher         Sushi
2   3      David         Lee  42      Chicago     Doctor         Steak
3   4       Jeff  Tarkington  23    Las vegas    Teacher         Pizza


## Now our dataframe has added the new entry and we can get a fancy df like this:

In [38]:
df_all_users # don't use the print(df_all_users)

Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,John,Doe,21,New York,Student,Pizza
1,2,Jane,Smith,35,Los Angeles,Teacher,Sushi
2,3,David,Lee,42,Chicago,Doctor,Steak
3,4,Jeff,Tarkington,23,Las vegas,Teacher,Pizza


# Adding Another Entry

Let's add another entry and see what happens.

Here's an example:

Homer Simpson, 38, Springfield, Nuclear Safety Inspector, Donuts

In [40]:
collect_user_info()

Enter your First Name:  jenny
Enter your Last Name:  fromdablok
Enter your Age:  23
Enter your City:  las vegas
Enter your Occupation:  dancer
Enter your Favorite Food:  egg rols


Thanks Jenny; we have your information, now let's double check it:
first_name: Jenny
last_name: Fromdablok
age: 23
city: Las vegas
occupation: Dancer
favorite_food: Egg rols
Press Enter to continue.


 


User entry 5 created and saved.


In [44]:
df_all_users

Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,John,Doe,21,New York,Student,Pizza
1,2,Jane,Smith,35,Los Angeles,Teacher,Sushi
2,3,David,Lee,42,Chicago,Doctor,Steak
3,4,Jeff,Tarkington,23,Las vegas,Teacher,Pizza
4,5,Jenny,Fromdablok,23,Las vegas,Dancer,Egg rols


# Data Persistence and Security

You may notice 'homer' is not there. Why is that? Because it has pulled our database but not added the new entry into that database when we created it.

The reason for this is simple: the database cannot be written to in this way without the code being susceptible to injection by the user instead of a typical input answer.

In order to call the updated list, we must simply pull it from the memory like this:

In [59]:
# this command adds it back in:
df_all_users = pd.DataFrame(user_db.get_all_users())

# this reprints the list
df_all_users

Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,John,Doe,21,New York,Student,Pizza
1,2,Jane,Smith,35,Los Angeles,Teacher,Sushi
2,3,David,Lee,42,Chicago,Doctor,Steak
3,4,Jeff,Tarkington,23,Las vegas,Teacher,Pizza
4,5,Jenny,Fromdablok,23,Las vegas,Dancer,Egg rols
5,6,Jim,Carrey,64,Los angeles,Actor,Grilled cheese


### Security Considerations for User Data

For the same reason that Homer's data didn't populate, we cannot save new entries directly back to the Python file when the user fills out the form.

This is because it is possible to inject code, and we DO NOT want a user doing that.

Instead, we will create a CSV file and load data into and out of the CSV. This is considered the more secure way.

Let's check it out.

In [49]:
collect_user_info()

Enter your First Name:  Jim
Enter your Last Name:  carrey
Enter your Age:  64
Enter your City:  los angeles
Enter your Occupation:  actor
Enter your Favorite Food:  grilled cheese


Thanks Jim; we have your information, now let's double check it:
first_name: Jim
last_name: Carrey
age: 64
city: Los angeles
occupation: Actor
favorite_food: Grilled cheese
Press Enter to continue.


 


User entry 6 created and saved.


In [None]:
# this command updates to current list once:
df_all_users = pd.DataFrame(user_db.get_all_users())
# we'll run it and check the data
df_all_users

In [146]:
df_all_users.to_csv("user_data.csv")

## Enhancing User Data with Random Alphanumeric IDs

Now that we have a CSV file, let's take it up a notch. Let's assign a random 7-digit alphanumeric ID to each user.

First, we can pull the DataFrame as a dictionary and save a little time.

Let's explore how that works real quick, and then we can implement with confidence.

Here's a random alphanumeric generator; running the cell generates numbers...

In [217]:
import random
import string

def generate_alphanumeric(length=7):
    """Generates a random alphanumeric string."""
    characters = string.ascii_letters + string.digits  # Letters (both cases) and digits
    result = ''.join(random.choice(characters) for i in range(length))
    return result

# Generate and print an alphanumeric string
random_string = generate_alphanumeric()
print(f"Generated Alphanumeric String: {random_string}")

#Generate a string of a different length.
random_short_string = generate_alphanumeric(length=4)
print(f"Generated Alphanumeric String of length 4: {random_short_string}")

Generated Alphanumeric String: VlMkZDr
Generated Alphanumeric String of length 4: DoLF


## Probability of Duplicate 7-Digit Alphanumeric IDs

This analysis explores the likelihood of generating duplicate 7-digit alphanumeric IDs, assessing the practical implications for user management systems.

**1. Number of Possible IDs:**

* **Character Set:**
    * The ID generation uses lowercase letters (26), uppercase letters (26), and digits (10).
    * Total characters available: 26 + 26 + 10 = 62.
* **ID Length:**
    * Each ID is 7 characters long.
* **Total Possible IDs:**
    * The total number of unique IDs is calculated as 62 raised to the power of 7 (62^7).

**2. Calculating 62^7:**

* **Result:**
    * 62^7 = 3,521,614,606,208 (approximately 3.5 trillion).
    * This demonstrates an enormous number of possible unique IDs.

**3. Probability of a Duplicate:**

* **Birthday Paradox:**
    * This problem relates to the "birthday paradox," which highlights that the probability of a duplicate increases as the number of generated IDs grows.
    * The focus is on the probability of *any* two IDs being identical, not a specific ID being duplicated.
* **Low Probability:**
    * Given the massive number of potential IDs (3.5 trillion), the probability of a duplicate is extremely low for a typical user base.

**4. Estimating the Probability (Simplified):**

* **Small User Base:**
    * For applications with a relatively small number of users, the risk of a duplicate ID is negligible.
* **Large User Base (e.g., 1 Million):**
    * Even with a million users, the probability of a duplicate remains very small.
* **Practical Implications:**
    * A 7-digit alphanumeric ID is generally sufficient for most applications to prevent duplicates.
    * Other potential issues, such as hardware failures, are far more likely to occur than a duplicate ID with this method.

**For Extremely High-Volume Systems:**

* **Considerations:**
    * For systems with billions of users, consider the following:
        * **Increasing ID Length:** Extending the ID length significantly increases the number of possible unique IDs.
        * **Using UUIDs (Universally Unique Identifiers):** UUIDs are designed to guarantee uniqueness across systems.
        * **Database Constraints:** Implement database constraints to check for duplicates before inserting new IDs.

**Summary:**

* The probability of generating a duplicate 7-digit alphanumeric ID is exceptionally low for most practical scenarios.
* This ID generation method is suitable for a wide range of applications, providing a high degree of confidence in ID uniqueness.

## Converting a Pandas DataFrame to a Dictionary with List Values

The `to_dict()` method in Pandas allows for flexible conversions of DataFrames to dictionaries. When using `orient='list'`, the DataFrame's columns become the dictionary's keys, and the column values are stored as lists.

**Code:**

```python
df_dict = df.to_dict(orient='list')

In [158]:
# Convert DataFrame to a dictionary with lists as values
df_dict = df_all_users.to_dict(orient='list')
df_dict

{'id': [1, 2, 3, 4, 5, 6],
 'first_name': ['John', 'Jane', 'David', 'Jeff', 'Jenny', 'Jim'],
 'last_name': ['Doe', 'Smith', 'Lee', 'Tarkington', 'Fromdablok', 'Carrey'],
 'age': ['21', '35', '42', '23', '23', '64'],
 'city': ['New York',
  'Los Angeles',
  'Chicago',
  'Las vegas',
  'Las vegas',
  'Los angeles'],
 'occupation': ['Student', 'Teacher', 'Doctor', 'Teacher', 'Dancer', 'Actor'],
 'favorite_food': ['Pizza',
  'Sushi',
  'Steak',
  'Pizza',
  'Egg rols',
  'Grilled cheese']}

## Replacing Simple Index Numbers in a CSV File

We'll start with a script to replace the simple index numbers in our current CSV file.

We need to add our database into the file for the users to get their new index numbers, so we copy the above dictionary and paste it into our cell below in place of the generic data that is there.

In [None]:
import pandas as pd
import random
import string
import os

CSV_FILE = "user_data.csv"

def generate_user_id():
    """Generates a random alphanumeric user ID."""
    characters = string.ascii_letters + string.digits
    return ''.join(random.choice(characters) for i in range(7))  # 7-character ID

def load_data():
    """Loads user data from CSV."""
    if os.path.exists(CSV_FILE):
        try:
            return pd.read_csv(CSV_FILE)
        except pd.errors.EmptyDataError:
            return pd.DataFrame() #return empty dataframe.
    else:
        return pd.DataFrame() #return empty dataframe.

def save_data(df):
    """Saves user data to CSV."""
    df.to_csv(CSV_FILE, index=False)

# Sample DataFrame with existing users (replace with your actual DataFrame)
data = {
    "id": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],  # Existing index-like IDs
    "first_name": ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Henry", "Iris", "Jack"],
    "last_name": ["Adams", "Brown", "Clark", "Davis", "Evans", "Ford", "Green", "Hill", "Irwin", "Jones"],
    # ... other columns ...
}
df = pd.DataFrame(data)

# Generate new IDs and replace the 'id' column
df['id'] = [generate_user_id() for _ in range(len(df))]

save_data(df) #save the data.

df = load_data() #reload the data.

print(df) #print the reloaded data.

## Persistent User Database with Alphanumeric IDs

From now on, we can pull our user database from the 'user_data.csv' file, and it will also create a 7-digit alphanumeric ID for each new user.

Now it saves our new user to 'user_data.csv' and then, AFTERWARD, when the list is now updated, repopulates the DataFrame with users.

Neat!

In [None]:
import pandas as pd
import os
import random
import string

CSV_FILE = "user_data.csv"

def generate_user_id():
    """Generates a random alphanumeric user ID."""
    characters = string.ascii_letters + string.digits
    return ''.join(random.choice(characters) for i in range(7))

def load_user_data():
    """Loads user data from the CSV file."""
    if os.path.exists(CSV_FILE):
        try:
            return pd.read_csv(CSV_FILE).to_dict(orient='records')
        except pd.errors.EmptyDataError:
            return []
    else:
        return []

def save_user_data(data):
    """Saves user data to the CSV file."""
    df = pd.DataFrame(data)
    df.to_csv(CSV_FILE, index=False)

def collect_user_info():
    """Collects user information and adds it to the CSV database."""
    first_name = input("Enter your First Name: ").capitalize()
    last_name = input("Enter your Last Name: ").capitalize()
    age = input("Enter your Age: ")
    city = input("Enter your City: ").capitalize()
    occupation = input("Enter your Occupation: ").capitalize()
    favorite_food = input("Enter your Favorite Food: ").capitalize()

    user_id = generate_user_id()

    user_entry = {
        "id": user_id,
        "first_name": first_name,
        "last_name": last_name,
        "age": age,
        "city": city,
        "occupation": occupation,
        "favorite_food": favorite_food
    }

    user_data = load_user_data() # Load data before adding the new entry
    user_data.append(user_entry)
    save_user_data(user_data) # Save the updated data

    confirmation = f"Thanks {first_name}; we have your information, now let's double check it:\n"
    for key, value in user_entry.items():
        confirmation += f"{key}: {value}\n"
    confirmation += "Press Enter to continue."
    print(confirmation)
    input()
    print(f"User entry {user_id} created and saved.")

# Collect user info once
collect_user_info()

# Display all users in a DataFrame (including the new user)
df_all_users = pd.DataFrame(load_user_data()) #load the most recent data.
print("\nAll User Data:")
print(df_all_users)



## Adding a New User Entry

Let's run the cell above and add another entry to see what happens.

Here's an example:

Jeff Tarkington, 23, Las Vegas, Teacher, Pizza

In [166]:
df_all_users

Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,John,Doe,21,New York,Student,Pizza
1,2,Jane,Smith,35,Los Angeles,Teacher,Sushi
2,3,David,Lee,42,Chicago,Doctor,Steak
3,4,Jeff,Tarkington,23,Las vegas,Teacher,Pizza
4,5,Jenny,Fromdablok,23,Las vegas,Dancer,Egg rols
5,6,Jim,Carrey,64,Los angeles,Actor,Grilled cheese


# LET'S UPGRADE THE DATABASE TO CHECK THE ID NUMBERS BEFORE SAVING THE NEW ONE
## To ensure the database always creates a unique ID (you know, because we want to upscale and not ever have an error).
### Key Changes:

* **`generate_user_id(existing_ids)` Function:**
    * Takes a list of `existing_ids` as input.
    * Uses a `while True` loop to keep generating IDs until a unique one is found.
    * Checks if the generated `user_id` is already in the `existing_ids` list.
    * Returns the unique `user_id`.

* **`collect_user_info()` Function:**
    * `existing_ids = [user['id'] for user in user_data]`: Creates a list of all existing IDs from the loaded `user_data`.
    * `user_id = generate_user_id(existing_ids)`: Calls the `generate_user_id()` function, passing the list of existing IDs to ensure uniqueness.

### Efficiency:

* This approach ensures uniqueness, but it might become less efficient if you have a very large number of existing users because checking for uniqueness within a larger and larger list takes more processing time.

### How it Works:

* The `generate_user_id()` function will now create a new ID and check it against any existing IDs. If that ID already exists, it will create a new ID, and check again, until it finds an ID that does not exist.
* This ensures that no duplicate IDs are created.

In [None]:
import pandas as pd
import os
import random
import string

CSV_FILE = "user_data.csv"

def generate_user_id(existing_ids):
    """Generates a random alphanumeric user ID, ensuring uniqueness."""
    while True:
        characters = string.ascii_letters + string.digits
        user_id = ''.join(random.choice(characters) for i in range(7))
        if user_id not in existing_ids:
            return user_id

def load_user_data():
    """Loads user data from the CSV file."""
    if os.path.exists(CSV_FILE):
        try:
            return pd.read_csv(CSV_FILE).to_dict(orient='records')
        except pd.errors.EmptyDataError:
            return []
    else:
        return []

def save_user_data(data):
    """Saves user data to the CSV file."""
    df = pd.DataFrame(data)
    df.to_csv(CSV_FILE, index=False)

def collect_user_info():
    """Collects user information and adds it to the CSV database."""
    first_name = input("Enter your First Name: ").capitalize()
    last_name = input("Enter your Last Name: ").capitalize()
    age = input("Enter your Age: ")
    city = input("Enter your City: ").capitalize()
    occupation = input("Enter your Occupation: ").capitalize()
    favorite_food = input("Enter your Favorite Food: ").capitalize()

    user_data = load_user_data()
    existing_ids = [user['id'] for user in user_data] # get a list of all existing ids

    user_id = generate_user_id(existing_ids)

    user_entry = {
        "id": user_id,
        "first_name": first_name,
        "last_name": last_name,
        "age": age,
        "city": city,
        "occupation": occupation,
        "favorite_food": favorite_food
    }

    user_data.append(user_entry)
    save_user_data(user_data)

    confirmation = f"Thanks {first_name}; we have your information, now let's double check it:\n"
    for key, value in user_entry.items():
        confirmation += f"{key}: {value}\n"
    confirmation += "Press Enter to continue."
    print(confirmation)
    input()
    print(f"User entry {user_id} created and saved.")

# Collect user info once
collect_user_info()

# Display all users in a DataFrame (including the new user)
df_all_users = pd.DataFrame(load_user_data())
print("\nAll User Data:")
print(df_all_users)

In [180]:
title = 'OUR FANCY DATAFRAME NOW LOOKS LIKE THIS: '
print(title)
df_all_users

OUR FANCY DATAFRAME NOW LOOKS LIKE THIS: 


Unnamed: 0,id,first_name,last_name,age,city,occupation,favorite_food
0,1,John,Doe,21,New York,Student,Pizza
1,2,Jane,Smith,35,Los Angeles,Teacher,Sushi
2,3,David,Lee,42,Chicago,Doctor,Steak
3,4,Jeff,Tarkington,23,Las vegas,Teacher,Pizza
4,5,Jenny,Fromdablok,23,Las vegas,Dancer,Egg rols
5,6,Jim,Carrey,64,Los angeles,Actor,Grilled cheese
