DOG CONDITION ANALYZER
1. Project Description

The Dog Condition Analyzer is a machine learning-based application designed to predict potential health conditions in dogs based on textual clinical notes or observed symptoms. This tool can assist veterinarians, pet owners, or animal care professionals in quickly identifying likely conditions, enabling timely intervention and care.

2. Problem Statement

Veterinary diagnosis often relies on the observation of symptoms and laboratory results. However:

Manual diagnosis can be time-consuming and error-prone.

Pet owners may struggle to interpret symptoms or decide when to consult a vet.

Problem: Automate the process of predicting possible dog health conditions from textual symptom descriptions using machine learning.

3. Objective

Build a text-based classifier that can predict dog health conditions from symptom descriptions.

Provide confidence scores for predictions.

Enable fast and reliable preliminary diagnosis support.

4. Tools & Libraries Used
Tool / Library	Purpose
Python	Programming language for implementation
pandas	Data loading and manipulation
numpy	Numerical operations
re	Text cleaning using regular expressions
scikit-learn	Machine learning library for model training, evaluation, and TF-IDF vectorization
RandomForestClassifier	Ensemble classifier used for prediction
TF-IDF Vectorizer	Converts text into numerical features for the model
accuracy_score, classification_report	Evaluate model performance
5. Solution Approach

Data Collection:
Load a dataset containing dog symptoms (text) and corresponding health conditions (condition).

Data Cleaning & Preprocessing:

Convert text to lowercase.

Remove special characters, numbers, and extra spaces.

Handle missing values.

Feature Extraction:

Use TF-IDF Vectorization to transform text into numerical vectors suitable for machine learning.

Include unigrams and bigrams for better context understanding.

Model Training:

Use Random Forest Classifier to learn patterns from symptom descriptions.

Train on 80% of data and validate on 20% of data.

Model Evaluation:

Check accuracy and other metrics using classification_report.

Prediction Function:

Clean input text.

Transform it into TF-IDF vectors.

Predict the condition and provide confidence score.

Usage:

Input: Text describing dog symptoms.

Output: Predicted condition with probability.

6. Advantages

Fast preliminary diagnosis support for dog owners and veterinarians.

Can handle multiple symptoms and long text descriptions.

Uses a robust ensemble model (Random Forest) to improve prediction accuracy.


In [11]:
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report


#  Load Dataset
df = pd.read_csv("/content/pet-health-symptoms-dataset.csv")
df.dropna(subset=["condition"], inplace=True)


# Text Cleaning
def clean_text(text):
    if pd.isnull(text):
        return ""
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text

df["clean_text"] = df["text"].apply(clean_text)


# Train-Test Split

X = df["clean_text"]
y = df["condition"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


# TF-IDF Vectorization

vectorizer = TfidfVectorizer(
    stop_words="english",
    ngram_range=(1, 2),
    max_features=7000,
    min_df=2
)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)


# Model Training

rf = RandomForestClassifier(n_estimators=200, max_depth=20, random_state=42, class_weight="balanced")
rf.fit(X_train_tfidf, y_train)


# Model Evaluation

y_pred = rf.predict(X_test_tfidf)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy*100:.2f}%")
print("\n Classification Report:\n", classification_report(y_test, y_pred))


# Prediction Function

def analyze_dog_condition(note):
    """
    Predicts likely dog condition based on text input (clinical notes).
    """
    cleaned = clean_text(note)
    vectorized = vectorizer.transform([cleaned])
    prediction = rf.predict(vectorized)[0]
    probabilities = rf.predict_proba(vectorized)[0]
    confidence = np.max(probabilities) * 100

    return {
        "Predicted Condition": prediction,
        "Confidence (%)": round(confidence, 2)
    }


Model Accuracy: 70.50%

 Classification Report:
                    precision    recall  f1-score   support

 Digestive Issues       0.72      0.69      0.71        80
   Ear Infections       0.82      0.80      0.81        80
Mobility Problems       0.69      0.80      0.74        80
        Parasites       0.73      0.51      0.60        80
 Skin Irritations       0.60      0.72      0.66        80

         accuracy                           0.70       400
        macro avg       0.71      0.70      0.70       400
     weighted avg       0.71      0.70      0.70       400



In [12]:

# Example Predictions
test_notes = [
    "Dog has yellow eyes, loss of appetite, vomiting frequently, and dark urine.",
    "Persistent cough, difficulty breathing, and lethargy observed.",
    "Severe itching, hair loss, and red skin patches."
]

for note in test_notes:
    result = analyze_dog_condition(note)
    print("\n🐾 Note:", note)
    print("Prediction:", result)



🐾 Note: Dog has yellow eyes, loss of appetite, vomiting frequently, and dark urine.
Prediction: {'Predicted Condition': 'Digestive Issues', 'Confidence (%)': np.float64(39.28)}

🐾 Note: Persistent cough, difficulty breathing, and lethargy observed.
Prediction: {'Predicted Condition': 'Skin Irritations', 'Confidence (%)': np.float64(21.11)}

🐾 Note: Severe itching, hair loss, and red skin patches.
Prediction: {'Predicted Condition': 'Skin Irritations', 'Confidence (%)': np.float64(52.29)}
