<a href="https://colab.research.google.com/github/vanadhisivakumar-source/Machine-learning-projects/blob/main/Medical_diagnosis_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Implement the Candidate Elimination Algorithm to find the Most Specific Hypothesis (S) and Most General Hypothesis (G) for the given dataset of Fever, Cough, and Fatigue attributes and the target Disease. Display the final version space.

## Define Data and Attributes

### Subtask:
Define the attributes (Fever, Cough, Fatigue) with their possible values and the target (Disease). Represent the given dataset as a list of examples.


**Reasoning**:
I need to define the attributes with their possible values and then represent the given dataset as a list of examples in Python.



In [None]:
attributes = {
    'Fever': ['Yes', 'No'],
    'Cough': ['Yes', 'No'],
    'Fatigue': ['Yes', 'No']
}

dataset = [
    {'Fever': 'Yes', 'Cough': 'No', 'Fatigue': 'Yes', 'Disease': 'Positive'},
    {'Fever': 'Yes', 'Cough': 'Yes', 'Fatigue': 'No', 'Disease': 'Negative'},
    {'Fever': 'No', 'Cough': 'Yes', 'Fatigue': 'Yes', 'Disease': 'Positive'},
    {'Fever': 'No', 'Cough': 'No', 'Fatigue': 'No', 'Disease': 'Negative'},
    {'Fever': 'Yes', 'Cough': 'Yes', 'Fatigue': 'Yes', 'Disease': 'Positive'}
]

print("Attributes and their possible values:")
for attr, values in attributes.items():
    print(f"  {attr}: {values}")

print("\nDataset (first 3 examples):")
for i, example in enumerate(dataset[:3]):
    print(f"  Example {i+1}: {example}")

## Initialize Hypotheses

### Subtask:
Initialize the Most Specific Hypothesis (S) with the first positive example and the Most General Hypothesis (G) with the most general possible hypothesis (all '?'s).


**Reasoning**:
I need to initialize the Most Specific Hypothesis (S) by finding the first positive example from the dataset and extracting its attribute values. Then, I will initialize the Most General Hypothesis (G) with '?' for each attribute. Finally, I will print both S and G.



In [None]:
initial_S = []
for example in dataset:
    if example['Disease'] == 'Positive':
        initial_S = [example[attr] for attr in attributes.keys()]
        break

initial_G = ['?' for _ in attributes.keys()]

print(f"Initial Most Specific Hypothesis (S): {initial_S}")
print(f"Initial Most General Hypothesis (G): {initial_G}")

## Implement Candidate Elimination Algorithm

### Subtask:
Implement the candidate elimination algorithm. Iterate through the dataset examples. For each positive example, generalize S. For each negative example, specialize G, removing inconsistent hypotheses and adding new specialized ones.


**Reasoning**:
First, I will initialize the Most Specific Hypothesis (S) and Most General Hypothesis (G) lists, then define the `consistent` helper function as it's fundamental to the algorithm and will be used by other functions.



In [None]:
S = [initial_S]
G = [initial_G]
attributes_keys = list(attributes.keys())
attributes_values = attributes # Store for easy access to all possible values

def consistent(hypothesis, example):
    """Checks if a hypothesis is consistent with an example."""
    for i, attr in enumerate(attributes_keys):
        if hypothesis[i] == '?':
            continue
        if hypothesis[i] != example[attr]:
            return False
    return True

print(f"Initial S: {S}")
print(f"Initial G: {G}")
print("Defined consistent function.")

**Reasoning**:
I need to define the `is_more_general` helper function, which is crucial for comparing hypotheses in the Candidate Elimination Algorithm. This function will be used later when updating the S and G boundaries to ensure proper generalization and specialization.



In [None]:
def is_more_general(h1, h2):
    """Checks if hypothesis h1 is more general than or equal to h2."""
    for i in range(len(h1)):
        if h1[i] == h2[i]:
            continue
        elif h1[i] == '?':
            continue
        else:
            return False
    return True

print("Defined is_more_general function.")

**Reasoning**:
Now I need to define the `generalize_S` helper function, which will be responsible for updating the Most Specific Hypothesis (S) boundary. This function will iterate through each hypothesis in S, and if it's not consistent with a positive example, it will generalize it minimally until it becomes consistent, then remove any redundant hypotheses.



In [None]:
def generalize_S(S, example, attributes_keys):
    updated_S = []
    for s_hypothesis in S:
        if not consistent(s_hypothesis, example):
            # Generalize s_hypothesis to be consistent with the positive example
            new_s = list(s_hypothesis) # Create a mutable copy
            for i, attr in enumerate(attributes_keys):
                if new_s[i] == '?':
                    continue
                if new_s[i] != example[attr]:
                    new_s[i] = '?' # Generalize by replacing with '?'
            updated_S.append(new_s)
        else:
            updated_S.append(s_hypothesis)

    # Remove more general hypotheses in S (i.e., keep only the most specific ones)
    # This part needs careful implementation to ensure maximality within S
    # For simplicity, we'll keep unique hypotheses for now and rely on later steps for stricter pruning
    # A more robust implementation would involve checking for redundancy with is_more_general

    # Remove duplicates
    final_S = []
    for h in updated_S:
        if h not in final_S:
            final_S.append(h)

    # Additional filtering: Ensure no hypothesis in final_S is more general than another in final_S
    # This is often handled implicitly by the construction or checked during final pruning.
    # For basic CE, the above generalization handles the main logic for S.

    return final_S

print("Defined generalize_S function.")

**Reasoning**:
I need to define the `specialize_G` helper function. This function will update the G-boundary by taking an inconsistent hypothesis with a negative example and generating all possible minimal specializations for it. It will also ensure that no hypothesis in G is more specific than another.



In [None]:
def specialize_G(G, example, attributes_keys, attributes_values):
    updated_G = []
    for g_hypothesis in G:
        if consistent(g_hypothesis, example):
            # If g_hypothesis is consistent with a negative example, it must be specialized
            for i, attr_key in enumerate(attributes_keys):
                if g_hypothesis[i] == '?':
                    # Specialize '?' to all possible specific values for this attribute
                    for val in attributes_values[attr_key]:
                        new_g = list(g_hypothesis)
                        new_g[i] = val
                        if not consistent(new_g, example) and is_more_general(new_g, s_hyp) for s_hyp in S:
                            updated_G.append(new_g)
                elif g_hypothesis[i] != example[attr_key]:
                    # This part of the hypothesis is already inconsistent or specific and different, no specialization needed here
                    pass # It means this specific value in g_hypothesis already makes it inconsistent with example[attr_key]
        else:
            updated_G.append(g_hypothesis)

    # Filter out hypotheses that are more specific than others in updated_G (keep only maximally general)
    final_G = []
    for h1 in updated_G:
        # Check if h1 is not more specific than any other h2 in updated_G
        is_maximally_general = True
        for h2 in updated_G:
            if h1 != h2 and is_more_general(h2, h1):
                is_maximally_general = False
                break
        if is_maximally_general:
            final_G.append(h1)

    # Remove duplicates
    unique_G = []
    for h in final_G:
        if h not in unique_G:
            unique_G.append(h)

    return unique_G

print("Defined specialize_G function.")

**Reasoning**:
The previous code failed due to a `SyntaxError` in the `if` condition within the `specialize_G` function. The generator expression `is_more_general(new_g, s_hyp) for s_hyp in S` was not properly wrapped in an `any()` or `all()` function. I need to wrap it in `any()` to correctly check if `new_g` is more general than at least one hypothesis in `S`.



In [None]:
def specialize_G(G, example, attributes_keys, attributes_values):
    updated_G = []
    for g_hypothesis in G:
        if consistent(g_hypothesis, example):
            # If g_hypothesis is consistent with a negative example, it must be specialized
            for i, attr_key in enumerate(attributes_keys):
                if g_hypothesis[i] == '?':
                    # Specialize '?' to all possible specific values for this attribute
                    for val in attributes_values[attr_key]:
                        new_g = list(g_hypothesis)
                        new_g[i] = val
                        # The new specialization new_g must be:
                        # 1. Inconsistent with the current negative example
                        # 2. More general than at least one hypothesis in S
                        if not consistent(new_g, example) and any(is_more_general(new_g, s_hyp) for s_hyp in S):
                            updated_G.append(new_g)
                # No elif for `g_hypothesis[i] != example[attr_key]` here because if it's not '?'
                # and it's already inconsistent, it doesn't need to be specialized further at this position.
        else:
            # If g_hypothesis is already inconsistent with the negative example, keep it.
            updated_G.append(g_hypothesis)

    # Filter out hypotheses that are more specific than others in updated_G (keep only maximally general)
    final_G = []
    for h1 in updated_G:
        is_maximally_general = True
        for h2 in updated_G:
            # h2 is more general than h1, so h1 is redundant
            if h1 != h2 and is_more_general(h2, h1):
                is_maximally_general = False
                break
        if is_maximally_general:
            final_G.append(h1)

    # Remove duplicates
    unique_G = []
    for h in final_G:
        if h not in unique_G:
            unique_G.append(h)

    return unique_G

print("Defined specialize_G function.")

**Reasoning**:
I have defined all the necessary helper functions (`consistent`, `is_more_general`, `generalize_S`, `specialize_G`). Now I need to implement the main loop of the Candidate Elimination Algorithm by iterating through each example in the dataset and updating S and G boundaries based on whether the example is positive or negative. After the loop, I will print the final S and G hypotheses.



In [None]:
for k, example_dict in enumerate(dataset):
    example = [example_dict[attr] for attr in attributes_keys]
    target = example_dict['Disease']

    print(f"\nProcessing Example {k+1}: {example_dict}")

    if target == 'Positive':
        # Remove any hypotheses from G that are not consistent with the positive example
        G = [g for g in G if consistent(g, example_dict)]

        # Generalize S hypotheses
        updated_S = []
        for s_hyp in S:
            if not consistent(s_hyp, example_dict):
                # Generalize s_hyp minimally to cover the positive example
                new_s = list(s_hyp)
                for i in range(len(new_s)):
                    if new_s[i] != '?' and new_s[i] != example[i]:
                        new_s[i] = '?'
                updated_S.append(new_s)
            else:
                updated_S.append(s_hyp)

        # Filter S: remove redundant (more general) hypotheses and duplicates
        final_S = []
        for h1 in updated_S:
            is_redundant = False
            for h2 in updated_S:
                if h1 != h2 and is_more_general(h2, h1): # h2 is more general than h1, so h1 is redundant
                    is_redundant = True
                    break
            if not is_redundant and h1 not in final_S:
                final_S.append(h1)
        S = final_S

        # Prune G if any G hypothesis is not more general than an S hypothesis
        G = [g for g in G if any(is_more_general(g, s) for s in S)]

    elif target == 'Negative':
        # Remove any hypotheses from S that are consistent with the negative example
        S = [s for s in S if not consistent(s, example_dict)]

        # Specialize G hypotheses
        updated_G = []
        for g_hyp in G:
            if consistent(g_hyp, example_dict):
                # If g_hyp is consistent with a negative example, it must be specialized
                for i, attr_key in enumerate(attributes_keys):
                    if g_hyp[i] == '?':
                        for val in attributes_values[attr_key]:
                            if val != example[i]: # Only specialize to values that differ from the negative example
                                new_g = list(g_hyp)
                                new_g[i] = val
                                # Ensure new_g is still more general than at least one S hypothesis
                                if any(is_more_general(new_g, s_hyp) for s_hyp in S):
                                    updated_G.append(new_g)
            else:
                updated_G.append(g_hyp)

        # Filter G: remove redundant (more specific) hypotheses and duplicates
        final_G = []
        for h1 in updated_G:
            is_redundant = False
            for h2 in updated_G:
                if h1 != h2 and is_more_general(h1, h2): # h1 is more general than h2, so h2 is redundant relative to h1
                    pass # This logic needs to keep maximally general hypotheses. It should be: if h2 is MORE general than h1, then h1 is redundant.

            is_maximally_general = True
            for h2 in updated_G:
                if h1 != h2 and is_more_general(h2, h1): # h2 is more general than h1, so h1 is not maximally general
                    is_maximally_general = False
                    break
            if is_maximally_general and h1 not in final_G:
                final_G.append(h1)
        G = final_G

        # Prune S if any S hypothesis is not more specific than any G hypothesis (should not happen with correctly maintained G)
        S = [s for s in S if any(is_more_general(g, s) for g in G)]

    print(f"Current S: {S}")
    print(f"Current G: {G}")

# Final pruning and consistency checks (if needed, typically handled in loops)
final_S = []
for s_hyp in S:
    if any(is_more_general(g_hyp, s_hyp) for g_hyp in G):
        final_S.append(s_hyp)
S = final_S

final_G = []
for g_hyp in G:
    if any(is_more_general(g_hyp, s_hyp) for s_hyp in S):
        final_G.append(g_hyp)
G = final_G

# Remove any hypotheses from S that are more general than another in S
S = [h1 for h1 in S if not any(is_more_general(h2, h1) and h1 != h2 for h2 in S)]
# Remove any hypotheses from G that are more specific than another in G
G = [h1 for h1 in G if not any(is_more_general(h1, h2) and h1 != h2 for h2 in G)]

print("\nFinal Version Space:")
print(f"Most Specific Hypothesis (S): {S}")
print(f"Most General Hypothesis (G): {G}")

## Final Task

### Subtask:
Provide a summary of the implemented algorithm and the resulting final version space (S and G hypotheses).


## Summary:

### Q&A
The final version space determined by the Candidate Elimination Algorithm consists of:
*   Most Specific Hypothesis (S): `[['?', '?', 'Yes']]`
*   Most General Hypothesis (G): `[['?', '?', 'Yes']]`

### Data Analysis Key Findings
*   The dataset consists of 5 examples, each described by three attributes: 'Fever', 'Cough', and 'Fatigue', with a binary target 'Disease' ('Positive' or 'Negative').
*   The Candidate Elimination Algorithm was initialized with the Most Specific Hypothesis (S) as `['Yes', 'No', 'Yes']` (derived from the first positive example) and the Most General Hypothesis (G) as `['?', '?', '?']`.
*   The algorithm successfully processed each example, iteratively refining the S and G boundaries. For positive examples, S hypotheses were generalized, and G hypotheses inconsistent with the example were removed. For negative examples, S hypotheses consistent with the example were removed, and G hypotheses were specialized.
*   After processing all examples, the algorithm converged, and both the Most Specific Hypothesis (S) and the Most General Hypothesis (G) were identical: `[['?', '?', 'Yes']]`.

### Insights or Next Steps
*   The convergence of S and G to `[['?', '?', 'Yes']]` indicates that the learned concept is uniquely identified by the dataset. This hypothesis suggests that 'Fatigue: Yes' is the sole condition for a 'Positive' disease diagnosis, while 'Fever' and 'Cough' attributes are irrelevant.
*   Further validation could involve testing this learned hypothesis on new, unseen data to confirm its predictive accuracy and generalizability beyond the provided training set.
