### Requirements
- 1 Nvidia T4 GPU
- Huggingface Access Token
- Applying for permission to Llama-3 Model from huggingface. Please use the instructions in the following link.
https://huggingface.co/meta-llama/Meta-Llama-3-8B

In [4]:
%pip install -q -U trl==0.8.5 peft==0.10.0 bitsandbytes==0.43.1 datasets==2.18.0 transformers==4.39.2 wandb==0.16.6

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import wandb
from datasets import load_dataset
from scipy.special import softmax
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
from sklearn.metrics import accuracy_score, f1_score, log_loss, confusion_matrix
from transformers import set_seed, TrainingArguments, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

In [6]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Load the model

In [2]:
# Hugging face login
token='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'

#quantization configurations - so you quantize the model while inferencing
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_qunat_type = "nf4",
    bnb_4bit_compute_dtype = torch.float16,
)

model_name = 'meta-llama/Meta-Llama-3-8B'
# model_name = 'robinsmits/Mistral-Instruct-7B-v0.2-ChatAlpacaV2-4bit'

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config, 
    device_map="auto",
    trust_remote_code=True,
    token=token
    )
model.config.pad_token_id = tokenizer.pad_token_id

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [3]:
type(model)

transformers.models.llama.modeling_llama.LlamaForCausalLM

In [4]:
type(tokenizer)

transformers.tokenization_utils_fast.PreTrainedTokenizerFast

In [5]:
def make_inference(prompt):
    inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False).to("cuda:0")
    outputs = model.generate(**inputs, max_new_tokens=150, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [6]:
def generate_prompt(instruction, user_query):
    return instruction.replace("{USER_QUERY}", user_query)

In [10]:
instruction = """
### Instructions ###
I want you to act as a principal software engineer. You'll be given a task to determine the parameters of an API call based on the "User's query" below, 
please modify the conditions and create the appropriate API filter. Provide your answer only for the part within ‘{}’. 
Do not include any other explanations in your response. Read the below instructions.

— Filter Options (Optional, so not required if not specified in the User's query) —

1. location
Format: STRING(lowercase)
Example: santa monica, ca

*If not specifically mentioned, use 'chicago, il'.
*Do not alter the format from the above example. 
*If you want to search for a specific neighborhood like Chicago's West Loop, please specify it in the Filter option '16. keywords', not here.
Incorrect example: "location": "west loop, chicago, il"
Correct example: "location": "chicago, il", "keywords": "west loop"


2. status_type
For purchase or for rent
Format: STRING
Example:
ForSale
ForRent

3. home_type
Property type comma-separated or empty for all types
Format: STRING
Example(For Rent):
Townhomes
Houses
Apartments_Condos_Co-ops

Example(For Sale):
Multi-family
Apartments
Houses
Manufactured
Condos
LotsLand
Townhomes

*If it cannot be determined from the User's query, specification is not necessary.

4. minPrice
If status_type = ForSale you can filter by min price.
Format: NUMBER
Example: 200000

5. maxPrice
If status_type = ForSale you can filter by max price.
Format: NUMBER
Example: 900000

6. rentMinPrice
If status_type = ForRent you can filter by min rent price.
Format: NUMBER
Example: 2000

7. rentMaxPrice
If status_type = ForRent you can filter by max rent price.
Format: NUMBER
Example: 3000

8. bathsMin
Bathrooms min count
Format: NUMBER
Example: 2
* If a specific number of baths is specified, rather than a range, please enter the same number for both bathsMin and bathsMax.

9. bathsMax
Bathrooms max count
Format: NUMBER
Example: 3

10. bedsMin
Bedrooms min count
Format: NUMBER
Example: 1
* If a specific number of bedrooms is specified, rather than a range, please enter the same number for both bedsMin and bedsMax.

11. bedsMax
Bedrooms max count
Format: NUMBER
Example: 3

12. sqftMin
Square Feet min value
Format: NUMBER
Example: 600

13. sqftMax
Square Feet max value
Format: NUMBER
Example: 1500

14. buildYearMin
Year Built min value
Format: NUMBER
Example: 1980

15. buildYearMax
Year Built max value
Format: NUMBER
Example: 2023

16. keywords
Filter with keywords.
Format: STRING
Example1: ‘gym’
Example2: ‘west loop’
Example3: ‘pet pool south loop‘
*Not necessary if already specified in another filter or if there is no clear specification in user’s query. Specify conditions that cannot be specified in other filters by using keywords.

Now, use the following query to determine the appropriate parameters.

### Example query ###
I am looking for a one-bedroom.

### Example Output ###
{"location":"chicago, il","bedsMin":"1","bedsMax":"1"}

### Example query ###
I am looking for a rental in South Loop, Chicago, with at least one bedroom and a rent of $3000 or less per month.

### Example Output ###
{
"location": "chicago, il",
"keywords": "south loop",
"status_type": "For Rent",
"rentMaxPrice": 3000,
"bedsMin": 1
}

### Example query ###
I am looking for a rental in West Loop, Chicago, with one bedroom and a rent between $2000 and $2500.

### Example Output ###
{
"location": "chicago, il",
"keywords": "south loop",
"status_type": "For Rent",
"rentMinPrice": 2500,
"rentMaxPrice": 3000,
"bedsMin": 1,
"bedsMax": 1
}

### Example query ###
I want to buy a house in West Loop with at least two bedrooms.

### Example Output ###
{"location":"chicago, il","keywords":"west loop","bedsMin":"2"}

### Example query ###
I am looking for a rental around Wicker Park in Chicago with a gym and a pool. It should be built within the last 10 years.

### Example Output ###
{
"location": "chicago, il",
"home_type": "Apartments_Condos_Co-ops",
"status_type": "For Rent",
"buildYearMin": 2013,
"keywords": "Wicker Park, gym, pool"
}

### User’s query ###
{USER_QUERY}

### Output ###
"""

### Test 1

In [11]:
%%time
user_query = "I'm looking for a condo for rent in South Loop Chicago with a minimum of 2 bathrooms and 2 bedrooms."
prompt = generate_prompt(instruction, user_query)
response = make_inference(prompt)

CPU times: user 10.6 s, sys: 108 ms, total: 10.7 s
Wall time: 10.7 s


In [12]:
print(response.split("### Output ###")[1].split("### User’s query ###")[0])


{
"location": "chicago, il",
"home_type": "Apartments_Condos_Co-ops",
"status_type": "For Rent",
"bathsMin": 2,
"bedsMin": 2
}




##### GPT-4 Output 
{
  "location": "chicago, il",
  "status_type": "ForRent",
  "home_type": "Condos",
  "bathsMin": 2,
  "bathsMax": 2,
  "bedsMin": 2,
  "bedsMax": 2
}


### Test 2

In [13]:
%%time
user_query = "I want to buy a house in Chicago with at least 3 bedrooms, built after 1990."
prompt = generate_prompt(instruction, user_query)
response = make_inference(prompt)

CPU times: user 10.3 s, sys: 145 ms, total: 10.4 s
Wall time: 10.4 s


In [14]:
print(response.split("### Output ###")[1].split("### User’s query ###")[0])


{
"location": "chicago, il",
"home_type": "Houses",
"status_type": "For Sale",
"bedsMin": 3,
"bedsMax": 3,
"buildYearMin": 1990
}




##### GPT-4 Output 
{
  "location": "chicago, il",
  "status_type": "ForSale",
  "home_type": "Houses",
  "bedsMin": 3,
  "buildYearMin": 1990
}


### Test 3

In [15]:
%%time
user_query = "Show me listings for multi-family homes for sale in West Loop Chicago, with a minimum square footage of 1200."
prompt = generate_prompt(instruction, user_query)
response = make_inference(prompt)

CPU times: user 10.4 s, sys: 189 ms, total: 10.6 s
Wall time: 10.6 s


In [16]:
print(response.split("### Output ###")[1].split("### User’s query ###")[0])


{
"location": "chicago, il",
"home_type": "Multi-family",
"status_type": "For Sale",
"sqftMin": 1200,
"keywords": "west loop"
}




##### GPT-4 Output 
{
  "location": "chicago, il",
  "status_type": "ForSale",
  "home_type": "Multi-family",
  "sqftMin": 1200,
  "keywords": "west loop"
}


### Test 4

In [17]:
%%time
user_query = "Find me a townhome in Chicago's River North area for rent with a gym in the building."
prompt = generate_prompt(instruction, user_query)
response = make_inference(prompt)

CPU times: user 10.5 s, sys: 102 ms, total: 10.7 s
Wall time: 10.6 s


In [18]:
print(response.split("### Output ###")[1].split("### User’s query ###")[0])


{
"location": "chicago, il",
"home_type": "Townhomes",
"status_type": "For Rent",
"keywords": "River North, gym"
}




##### GPT-4 Output 
{
  "location": "chicago, il",
  "status_type": "ForRent",
  "home_type": "Townhomes",
  "keywords": "River North, gym"
}


### Test 5

In [19]:
%%time
user_query = "Show listings for apartments in Chicago with at least 3 bedrooms and 2 bathrooms, for rent under $3000."
prompt = generate_prompt(instruction, user_query)
response = make_inference(prompt)

CPU times: user 10.3 s, sys: 153 ms, total: 10.5 s
Wall time: 10.5 s


In [20]:
print(response.split("### Output ###")[1].split("### User’s query ###")[0])


{
"location": "chicago, il",
"home_type": "Apartments_Condos_Co-ops",
"status_type": "For Rent",
"rentMaxPrice": 3000,
"bedsMin": 3,
"bedsMax": 3,
"bathsMin": 2,
"bathsMax": 2
}




##### GPT-4 Output 
{
  "location": "chicago, il",
  "status_type": "ForRent",
  "home_type": "Apartments",
  "bedsMin": 3,
  "bedsMax": 3,
  "bathsMin": 2,
  "bathsMax": 2,
  "rentMaxPrice": 3000
}

