In [1]:
%pip install google-generativeai langchain langchain_google_genai

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.1.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()

True

In [3]:
import os
import json
import re
import google.generativeai as genai

# Step 1: Setup Gemini API
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

  from .autonotebook import tqdm as notebook_tqdm


In [19]:
problem_statement = """
Metro station wants to establish a TicketDistributor machine that issues tickets for
passengers travelling on metro rails. Travellers have options of selecting a ticket for a single
trip, round trips or multiple trips. They can also issue a metro pass for regular passengers or
a time card for a day, a week or a month according to their requirements. The discounts on
tickets will be provided to frequent travelling passengers. The machine is also supposed to
read the metro pass and time cards issued by the metro counters or machine. The ticket rates
differ based on whether the traveller is a child or an adult. The machine is also required to
recognize original as well as fake currency notes. The typical transaction consists of a user
using the display interface to select the type and quantity of tickets and then choosing a
payment method of either cash, credit/debit card or smartcard. The tickets are printed and
dispensed to the user. Also, the messaging facilities after every transaction are required on
the registered number. The system can also be operated comfortably by a touch-screen. A
large number of heavy components are to be used. We do not want our system to slow down,
and also the usability of the machine.
The TicketDistributor must be able to handle several exceptions, such as aborting the
transaction for incomplete transactions, the insufficient amount given by the travellers to the
machine, money return in case of an aborted transaction, change return after a successful
transaction, showing insufficient balance in the card, updated information printed on the
tickets e.g. departure time, date, time, price, valid from, valid till, validity duration, ticket
issued from and destination station. In case of exceptions, an error message is to be displayed.
We do not want user feedback after every development stage but after every two stages to
save time. The machine is required to work in a heavy load environment such that in the
morning and evening time on weekdays, and weekends performance and efficiency would
not be affected."""

### Step 1 - Identify Initial and Final state 

In [21]:
def identify_initial_final_states(problem_statement):

    prompt = f"""
    You are an expert software analyst. Your task is to identify Initial state and Final state from the given problem statement inorder to generate an activity diagram.     

    Output Format (Strict JSON):
    Return only valid JSON in the following example format:
    {{
        "initial": ["Start transaction"],
        "final": ["End transaction"]
    }}

    Problem Statement:
    {problem_statement}

    Now extract the initial and final states..
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0.0, "top_p": 1, "top_k": 1})
    
    # Debugging: Print raw response
    response_text = response.text.strip()
    print("Raw Response from Gemini:\n", response_text)

    # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Additional Debugging: Print cleaned text before parsing
    print("Cleaned JSON:\n", repr(cleaned_text))  # Use repr() to detect hidden characters

     # Check if cleaned_text is empty
    if not cleaned_text:
        print("Error: Cleaned JSON is empty. Cannot parse.")
        return {}

    # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("initial", {})  + output.get("final", {})
    except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        return {}

identify_initial_final_states = identify_initial_final_states(problem_statement)
print(identify_initial_final_states)

Raw Response from Gemini:
 ```json
{
  "initial": ["Start transaction"],
  "final": ["End transaction"]
}
```
Cleaned JSON:
 '{\n  "initial": ["Start transaction"],\n  "final": ["End transaction"]\n}'
['Start transaction', 'End transaction']


### Identify Actors & Actions

In [28]:
def identified_actions(problem_statement):

    prompt = f"""
    You are an expert software analyst. Your task is to identify potential Actors from the given problem statement and their respective actions they perform. 
    Follow this detailed step-by-step process carefully to ensure accurate results:

    ### **Step-by-Step Approach:**
    1. **Identify Actions for Each Actors:**
       - Carefully analyze the problem statement for any **verbs** or **actions** linked to each actors.
       - If a verb is associated with an object or role, treat it as a candidate for a class operation.
       - Ensure that actions reflect the **core behavior** or responsibility of the actor.
       - Ignore vague or irrelevant actions.

    2. **Ensure Coherence Between Actors and Actions:**
       - Make sure that the Actions are relevant to the actor.

    3. **Ignore Unrelated or Redundant Terms:**
       - Ignore adjectives, adverbs, and irrelevant terms unless they provide meaningful context.
       - Focus on meaningful, domain-relevant terms only.

    Output Format (Strict JSON):
    Return only valid JSON in the following example format:
    {{
        "actions": {{
            "Actor1": ["action1", "action2"],
            "Actor2": ["action1", "action2"],
            ...
        }}
    }}

    **Problem Statement:**
    {problem_statement}

    Now extract the actions for each actor.
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0.0, "top_p": 1, "top_k": 1})
    
    # Debugging: Print raw response
    response_text = response.text.strip()
    print("Raw Response from Gemini:\n", response_text)

    # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Additional Debugging: Print cleaned text before parsing
    print("Cleaned JSON:\n", repr(cleaned_text))  # Use repr() to detect hidden characters

    # Check if cleaned_text is empty
    if not cleaned_text:
        print("Error: Cleaned JSON is empty. Cannot parse.")
        return {}

    # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("actions", {})  
    except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        return {}

identified_actions = identified_actions(problem_statement)
print("\nIdentified Actions", identified_actions)

Raw Response from Gemini:
 ```json
{
  "actions": {
    "Traveller": [
      "select ticket type",
      "select ticket quantity",
      "choose payment method",
      "insert cash",
      "insert credit/debit card",
      "use smartcard"
    ],
    "TicketDistributor": [
      "issue tickets",
      "issue metro pass",
      "issue time card",
      "read metro pass",
      "read time card",
      "calculate fare",
      "apply discounts",
      "dispense tickets",
      "return change",
      "return money",
      "display error messages",
      "print tickets",
      "send SMS",
      "recognize currency",
      "abort transaction",
      "handle insufficient funds"
    ],
    "System": [
      "handle heavy load",
      "maintain performance",
      "maintain efficiency"
    ]
  }
}
```
Cleaned JSON:
 '{\n  "actions": {\n    "Traveller": [\n      "select ticket type",\n      "select ticket quantity",\n      "choose payment method",\n      "insert cash",\n      "insert credit/debit 

### Identify intermediate activities

In [25]:
def identified_activities(problem_statement, actions):

    prompt = f"""
    You are an expert software analyst. Your task is to identify activities from the given problem statement and their respective actions performed by actors. 
    Follow this detailed step-by-step process carefully to ensure accurate results:

    Output Format (Strict JSON):
    Return only valid JSON in the following example format:
    {{
        "activities": ["activity1", "activity2"]
    }}

    **Actions Identified:**
    {actions}

    **Problem Statement:**
    {problem_statement}

    Now extract the activities.
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0.0, "top_p": 1, "top_k": 1})
    
    # Debugging: Print raw response
    response_text = response.text.strip()
    print("Raw Response from Gemini:\n", response_text)

    # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Additional Debugging: Print cleaned text before parsing
    print("Cleaned JSON:\n", repr(cleaned_text))  # Use repr() to detect hidden characters

    # Check if cleaned_text is empty
    if not cleaned_text:
        print("Error: Cleaned JSON is empty. Cannot parse.")
        return {}

    # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("activities", {})  
    except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        return {}

# Get identified classes and their operations
identified_activities = identified_activities(problem_statement, identified_actions)
print("\nIdentified Activities", identified_activities)

Raw Response from Gemini:
 ```json
{
  "activities": ["Ticket Purchase", "Metro Pass Issuance", "Time Card Issuance", "Ticket Reading (Metro Pass/Time Card)", "Fare Calculation", "Discount Application", "Payment Processing (Cash, Credit/Debit Card, Smartcard)", "Ticket Dispensing", "Change Return", "Money Return (Aborted Transaction)", "Error Handling", "SMS Notification", "Currency Recognition", "Transaction Management (Abort/Complete)", "System Performance Maintenance under Heavy Load"]
}
```
Cleaned JSON:
 '{\n  "activities": ["Ticket Purchase", "Metro Pass Issuance", "Time Card Issuance", "Ticket Reading (Metro Pass/Time Card)", "Fare Calculation", "Discount Application", "Payment Processing (Cash, Credit/Debit Card, Smartcard)", "Ticket Dispensing", "Change Return", "Money Return (Aborted Transaction)", "Error Handling", "SMS Notification", "Currency Recognition", "Transaction Management (Abort/Complete)", "System Performance Maintenance under Heavy Load"]\n}'

Identified Activiti

### Identify Control Flow

In [29]:
def identify_controlflow(problem_statement, actions):

    prompt = f"""
    You are an expert software analyst. Your task is to identify **all types of control flows** between the given actions and actors based on the problem statement.
    Control flows are logical flow between actions or order of executions in activity diagram.

    Each flow should specify:
      Source : Starting activity
      Target : Next activity
      Condition : Decision criteria (if applicable)
      Type : Type of flow (control, decision, fork, join)
    ✅ Ensure the output follows **strict JSON format** as shown below:  
      {{
          "flows": [
                {{
                  "source": "<Source Activity>",
                  "target": "<Target Activity>",
                  "type": "<Type of Flow>",
                  "condition": "<Condition if any>"
                }}
              ]
      }}

    **actions Identified:**
    {actions}

    **Problem Statement:**
    {problem_statement}

    Now extract **all types of control flows** between the actions.
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0, "top_p": 1, "top_k": 1})

   # Debugging: Print raw response
    response_text = response.text.strip()
    print("Raw Response from Gemini:\n", response_text)

    # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Additional Debugging: Print cleaned text before parsing
    print("Cleaned JSON:\n", repr(cleaned_text))  # Use repr() to detect hidden characters

    # Check if cleaned_text is empty
    if not cleaned_text:
        print("Error: Cleaned JSON is empty. Cannot parse.")
        return {}

    # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("flows", {})  
    except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        return {}

identify_controlflow = identify_controlflow(problem_statement, identified_actions)

print("ControlFlows Identified:", identify_controlflow)

Raw Response from Gemini:
 ```json
{
  "flows": [
    {
      "source": "Traveller: select ticket type",
      "target": "TicketDistributor: calculate fare",
      "type": "control",
      "condition": null
    },
    {
      "source": "Traveller: select ticket quantity",
      "target": "TicketDistributor: calculate fare",
      "type": "control",
      "condition": null
    },
    {
      "source": "TicketDistributor: calculate fare",
      "target": "TicketDistributor: apply discounts",
      "type": "control",
      "condition": "Frequent traveller"
    },
    {
      "source": "TicketDistributor: calculate fare",
      "target": "Traveller: choose payment method",
      "type": "control",
      "condition": null
    },
    {
      "source": "Traveller: choose payment method",
      "target": "Traveller: insert cash",
      "type": "decision",
      "condition": "Payment method = Cash"
    },
    {
      "source": "Traveller: choose payment method",
      "target": "Traveller: inse

### Generate plantUML Script

In [23]:
import os
import json

def generate_plantuml(classes, attributes,operations, relationships, output_file="class_diagram2.puml"):
    """
    Generates a PlantUML class diagram from identified classes, attributes, and relationships.
    """
    plantuml_code = "@startuml\n\n"

    # === Define Classes, Attributes, and Operations ===
    for cls in classes:
        plantuml_code += f"class {cls} {{\n"
        
        # Add attributes if available
        if cls in attributes:
            for attr in attributes[cls]:
                plantuml_code += f"  + {attr}\n"  # Public attribute

        # Add operations if available
        if cls in operations:
            for op in operations[cls]:
                plantuml_code += f"  + {op}()\n"  # Public method

        plantuml_code += "}\n\n"

    # === Convert Multiplicity to UML format ===
    def convert_multiplicity(multiplicity):
        if multiplicity == "many":
            return '"*"'
        elif multiplicity == "1":
            return '"1"'
        elif multiplicity == "0..1":
            return '"0..1"'
        else:
            return '""'  # Default for no multiplicity

    # === Define Relationships ===
    for rel in relationships:
        source = rel["source"]
        target = rel["target"]
        rel_type = rel["type"]
        description = rel["description"]
        multiplicity = convert_multiplicity(rel.get("multiplicity", ""))

        if rel_type == "Association":
            plantuml_code += f'{source} {multiplicity} --> {multiplicity} {target} : {description}\n'
        elif rel_type == "Generalization":
            plantuml_code += f"{source} <|-- {target} : {description}\n"
        elif rel_type == "Aggregation":
            plantuml_code += f'{source} {multiplicity} o-- {multiplicity} {target} : {description}\n'
        elif rel_type == "Composition":
            plantuml_code += f'{source} {multiplicity} *-- {multiplicity} {target} : {description}\n'

    plantuml_code += "\n@enduml"

    # Save to file
    with open(output_file, "w") as file:
        file.write(plantuml_code)

    print(f"PlantUML file '{output_file}' generated successfully.")

# 🛠 Debugging Check
print("Classes Extracted:", identified_classes)
print("Attributes Extracted:", identified_attributes)
print("Operations Extracted:", identified_operations)
print("Relationships Extracted:", identified_relationships)

# Generate UML
generate_plantuml(identified_classes, identified_attributes,identified_operations, identified_relationships)

# To render, use: `plantuml class_diagram.puml` in the terminal


Classes Extracted: ['TicketDistributor', 'Passenger', 'Ticket', 'MetroPass', 'TimeCard', 'Payment', 'Transaction', 'DisplayInterface', 'Currency', 'ErrorMessage', 'TouchScreen']
Attributes Extracted: {'TicketDistributor': ['type', 'status'], 'Passenger': ['type', 'ID', 'frequent_traveller_status'], 'Ticket': ['type', 'quantity', 'price', 'departure_time', 'date', 'time', 'valid_from', 'valid_till', 'validity_duration', 'ticket_issued_from', 'destination_station'], 'MetroPass': ['type', 'ID', 'validity_duration'], 'TimeCard': ['type', 'validity_duration'], 'Payment': ['method', 'amount'], 'Transaction': ['status', 'amount', 'type'], 'DisplayInterface': ['type'], 'Currency': ['type', 'status'], 'ErrorMessage': ['message', 'type'], 'TouchScreen': ['type', 'size']}
Operations Extracted: {'TicketDistributor': ['issueTicket', 'issueMetroPass', 'issueTimeCard', 'readMetroPass', 'readTimeCard', 'processPayment', 'dispenseTicket', 'returnChange', 'returnMoney', 'displayErrorMessage', 'abortTran

In [31]:
# type(identified_classes)
# type(identified_attributes)
# type(identified_operations)
type(identified_relationships)

list

#### Debugging

In [1]:
identified_classes = ['MetroStation', 'TicketDistributor', 'Traveller', 'Passenger', 'Child', 'Adult', 'MetroPass', 'TimeCard', 'Transaction', 'User', 'Machine', 'CurrencyNote']

identified_attributes = {'MetroStation': ['ID', 'name'], 'TicketDistributor': ['ID', 'type', 'status', 'capacity'], 'Traveller': ['ID', 'age', 'type'], 'Passenger': ['ID', 'type'], 'Child': ['ID', 'type'], 'Adult': ['ID', 'type'], 'MetroPass': ['ID', 'type', 'duration'], 'TimeCard': ['ID', 'type', 'duration'], 'Transaction': ['ID', 'status', 'amount'], 'User': ['ID', 'name', 'registered_number'], 'Machine': ['ID', 'type', 'load_environment'], 'DisplayInterface': ['ID', 'type'], 'PaymentMethod': ['ID', 'type'], 'Smartcard': ['ID', 'type'], 'CurrencyNote': ['ID', 'type'], 'TouchScreen': ['ID', 'type']}

identified_relationships = [
    {"source": "MetroStation", "target": "TicketDistributor", "type": "Aggregation", "description": "Metro station contains TicketDistributor"},
    {"source": "Traveller", "target": "Passenger", "type": "Generalization", "description": "Traveller is a subclass of Passenger"},
    {"source": "Child", "target": "Adult", "type": "Generalization", "description": "Child is a subclass of Adult"},
    {"source": "TicketDistributor", "target": "MetroPass", "type": "Aggregation", "description": "TicketDistributor contains MetroPass"},
    {"source": "TicketDistributor", "target": "TimeCard", "type": "Aggregation", "description": "TicketDistributor contains TimeCard"},
    {"source": "User", "target": "Machine", "type": "Aggregation", "description": "User contains Machine"},
    {"source": "Machine", "target": "DisplayInterface", "type": "Aggregation", "description": "Machine contains DisplayInterface"},
    {"source": "PaymentMethod", "target": "Smartcard", "type": "Generalization", "description": "PaymentMethod is a subclass of Smartcard"},
    {"source": "PaymentMethod", "target": "CurrencyNote", "type": "Generalization", "description": "PaymentMethod is a subclass of CurrencyNote"}
  ]

In [None]:
import os
import json

def generate_plantuml(classes, attributes, relationships, output_file="class_diagram2.puml"):
    """
    Generates a PlantUML class diagram from identified classes, attributes, and relationships.
    """
    if not isinstance(classes, list):
        print("Error: 'classes' should be a list. Received:", type(classes))
        return
    
    plantuml_code = "@startuml\n\n"

    # Define Classes and Attributes
    for cls in classes:
        plantuml_code += f"class {cls} {{\n"
        if isinstance(attributes, dict) and cls in attributes:
            for attr in attributes[cls]:
                plantuml_code += f"  {attr}\n"
        plantuml_code += "}\n\n"

   # Define relationships
    for rel in relationships:
        source = rel["source"]
        target = rel["target"]
        rel_type = rel["type"]
        description = rel["description"]

        if rel_type == "Association":
            plantuml_code += f"{source} --> {target} : {description}\n"
        elif rel_type == "Generalization":
            plantuml_code += f"{source} --|> {target} : {description}\n"
        elif rel_type == "Aggregation":
            plantuml_code += f"{source} o-- {target} : {description}\n"
        elif rel_type == "Composition":
            plantuml_code += f"{source} *-- {target} : {description}\n"

    plantuml_code += "\n@enduml"

    # Save to file
    with open(output_file, "w") as file:
        file.write(plantuml_code)

    print(f"✅ PlantUML file '{output_file}' generated successfully.")

# 🛠 Debugging Check
print("Classes Extracted:", identified_classes)
print("Attributes Extracted:", identified_attributes)
print("Relationships Extracted:", identified_relationships)

# Generate UML
generate_plantuml(identified_classes, identified_attributes, identified_relationships)

# To render, use: `plantuml class_diagram.puml` in the terminal


Classes Extracted: ['MetroStation', 'TicketDistributor', 'Traveller', 'Passenger', 'Child', 'Adult', 'MetroPass', 'TimeCard', 'Transaction', 'User', 'Machine', 'CurrencyNote']
Attributes Extracted: {'MetroStation': ['ID', 'name'], 'TicketDistributor': ['ID', 'type', 'status', 'capacity'], 'Traveller': ['ID', 'age', 'type'], 'Passenger': ['ID', 'type'], 'Child': ['ID', 'type'], 'Adult': ['ID', 'type'], 'MetroPass': ['ID', 'type', 'duration'], 'TimeCard': ['ID', 'type', 'duration'], 'Transaction': ['ID', 'status', 'amount'], 'User': ['ID', 'name', 'registered_number'], 'Machine': ['ID', 'type', 'load_environment'], 'DisplayInterface': ['ID', 'type'], 'PaymentMethod': ['ID', 'type'], 'Smartcard': ['ID', 'type'], 'CurrencyNote': ['ID', 'type'], 'TouchScreen': ['ID', 'type']}
Relationships Extracted: [{'source': 'MetroStation', 'target': 'TicketDistributor', 'type': 'Aggregation', 'description': 'Metro station contains TicketDistributor'}, {'source': 'Traveller', 'target': 'Passenger', 'typ

In [None]:
import json
import re

def identify_attributes(problem_statement, classes):
    """
    Identifies attributes for each class extracted from the problem statement using Gemini API.
    Ensures the output is a parsed dictionary.
    """
    prompt = f"""
    You are an expert software analyst. Your task is to extract **attributes** for each identified class from the given problem statement.

    **Rules:**
    - Attributes describe **properties** of a class (e.g., "User" has "name", "email").
    - Ignore verbs and relationships.
    - Output should be strictly in **JSON format**, like:
      {{
          "attributes": {{
              "Class1": ["attr1", "attr2"],
              "Class2": ["attr1", "attr2"]
          }}
      }}

    **Classes Identified:**
    {classes}

    **Problem Statement:**
    {problem_statement}

    Extract attributes for each class.
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0})

    # Debugging: Print raw response
    response_text = response.text.strip()
    print("Raw Response from Gemini:\n", response_text)

   # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Additional Debugging: Print cleaned text before parsing
    print("Cleaned JSON:\n", repr(cleaned_text))  # Use repr() to detect hidden characters

    # Check if cleaned_text is empty
    if not cleaned_text:
        print("Error: Cleaned JSON is empty. Cannot parse.")
        return {}

     # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("attributes", {})  # Extract attributes dictionary
    except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        return {}

identified_attributes = identify_attributes(problem_statement, identified_classes)

print("Extracted Attributes:\n", identified_attributes)


Raw Response from Gemini:
 ```json
{
  "attributes": {
    "TicketDistributor": ["heavy components"],
    "Passenger": ["isChild", "isFrequentTraveler"],
    "Ticket": ["type", "quantity", "price", "departureTime", "date", "time", "validFrom", "validTill", "validityDuration", "ticketIssuedFrom", "destinationStation", "updatedInformation"],
    "MetroPass": [],
    "TimeCard": ["duration"], 
    "Transaction": ["type", "amount", "paymentMethod"],
    "DisplayInterface": ["type"],
    "PaymentMethod": ["type"],
    "CurrencyNote": ["isOriginal", "value"],
    "Error": [],
    "ErrorMessage": ["message"],
    "TouchScreen": []
  }
}
```
Cleaned JSON:
 '{\n  "attributes": {\n    "TicketDistributor": ["heavy components"],\n    "Passenger": ["isChild", "isFrequentTraveler"],\n    "Ticket": ["type", "quantity", "price", "departureTime", "date", "time", "validFrom", "validTill", "validityDuration", "ticketIssuedFrom", "destinationStation", "updatedInformation"],\n    "MetroPass": [],\n    "Tim

dict

In [34]:
json_input = '''{
  "attributes": {
    "TicketDistributor": ["heavy components"],
    "Passenger": ["isChild", "isFrequentTraveler"],
    "Ticket": ["type", "quantity", "price", "departureTime", "date", "time", "validFrom", "validTill", "validityDuration", "ticketIssuedFrom", "destinationStation", "updatedInformation"],
    "MetroPass": [],
    "TimeCard": ["duration"], 
    "Transaction": ["type", "amount", "paymentMethod"],
    "DisplayInterface": ["type"],
    "PaymentMethod": ["type"],
    "CurrencyNote": ["isOriginal", "value"],
    "Error": [],
    "ErrorMessage": ["message"],
    "TouchScreen": []
  }
}'''

try:
        output = json.loads(json_input)  # Directly parse JSON
        print(output.get("attributes", {})) # Extract attributes dictionary
except json.JSONDecodeError as e:
        print("JSON parsing error:", str(e))
        print("No output...")

{'TicketDistributor': ['heavy components'], 'Passenger': ['isChild', 'isFrequentTraveler'], 'Ticket': ['type', 'quantity', 'price', 'departureTime', 'date', 'time', 'validFrom', 'validTill', 'validityDuration', 'ticketIssuedFrom', 'destinationStation', 'updatedInformation'], 'MetroPass': [], 'TimeCard': ['duration'], 'Transaction': ['type', 'amount', 'paymentMethod'], 'DisplayInterface': ['type'], 'PaymentMethod': ['type'], 'CurrencyNote': ['isOriginal', 'value'], 'Error': [], 'ErrorMessage': ['message'], 'TouchScreen': []}


In [59]:
import re
import json
import google.generativeai as genai

def identify_relationships(problem_statement, classes, attributes):
    """
    Identifies relationships between extracted classes using Gemini API with CoT prompting.
    Extracts Association, Generalization (Inheritance), Aggregation, and Composition.
    """
    prompt = f"""
    You are an expert software analyst. Your task is to identify **all types of relationships** between the given classes based on the problem statement.

    **Types of Relationships:**
    1️⃣ **Association** - One class interacts with another. Example: "User borrows Book".
    2️⃣ **Generalization (Inheritance)** - One class is a specialized form of another. Example: "Admin is a subclass of User".
    3️⃣ **Aggregation** - One class is made up of another, but they have independent lifecycles. Example: "Library has Books, but Books exist independently".
    4️⃣ **Composition** - A stronger form of Aggregation, where the part **cannot exist** without the whole. Example: "House has Rooms, Rooms cannot exist without a House".

    **Step-by-Step Process:**
    - Identify the **verbs** and **context clues** in the problem statement to find relationships.
    - Classify the relationship into one of the four types above.
    - Provide the relationships in **strict JSON format**, like this:
      ```json
      {{
          "relationships": [
              {{"source": "ClassA", "target": "ClassB", "type": "Generalization", "description": "ClassA is a subclass of ClassB"}},
              {{"source": "ClassC", "target": "ClassD", "type": "Aggregation", "description": "ClassC contains ClassD but ClassD can exist independently"}},
              {{"source": "ClassE", "target": "ClassF", "type": "Composition", "description": "ClassE owns ClassF and ClassF cannot exist without ClassE"}}
          ]
      }}
      ```

    **Classes Identified:**
    {classes}

    **Attributes Identified:**
    {attributes}

    **Problem Statement:**
    {problem_statement}

    Now extract **all types of relationships** between the classes.
    """

    model = genai.GenerativeModel("gemini-1.5-flash") 
    response = model.generate_content(prompt, generation_config={"temperature": 0, "top_p": 1, "top_k": 1})

    # Debugging: Print raw response
    response_text = response.text.strip()

    # Remove triple backticks and 'json' keyword
    cleaned_text = re.sub(r"```json|```", "", response_text).strip()

    # Debugging: Print cleaned JSON before parsing
    print("🛠 Cleaned JSON:\n", repr(cleaned_text))

    # Check if cleaned_text is empty
    if not cleaned_text:
        print("❌ Error: Cleaned JSON is empty. Cannot parse.")
        return {}

    # Parse JSON safely
    try:
        output = json.loads(cleaned_text)  # Convert string to JSON
        return output.get("relationships", [])  
    except json.JSONDecodeError as e:
        print("❌ JSON parsing error:", str(e))
        return {}

# Call function and print results
identified_relationships = identify_relationships(problem_statement, identified_classes, identified_attributes)

print("✅ Relationships Identified:", identified_relationships)


🛠 Cleaned JSON:
 '{\n  "relationships": [\n    {\n      "source": "TicketDistributor",\n      "target": "Ticket",\n      "type": "Composition",\n      "description": "TicketDistributor issues Tickets, and Tickets cannot exist independently without being issued."\n    },\n    {\n      "source": "TicketDistributor",\n      "target": "MetroPass",\n      "type": "Association",\n      "description": "TicketDistributor reads MetroPass information."\n    },\n    {\n      "source": "TicketDistributor",\n      "target": "TimeCard",\n      "type": "Association",\n      "description": "TicketDistributor reads TimeCard information."\n    },\n    {\n      "source": "TicketDistributor",\n      "target": "Transaction",\n      "type": "Composition",\n      "description": "TicketDistributor creates and manages Transactions, which cannot exist independently."\n    },\n    {\n      "source": "TicketDistributor",\n      "target": "DisplayInterface",\n      "type": "Composition",\n      "description": "Tic