Naming variables in a way that indicates their data type and purpose can significantly improve the readability and maintainability of your code. Here are some suggestions for naming conventions that include both the type of data and its purpose:

1. **For Lists:** Use a plural form and include the type of elements the list contains. For example:
   - `topicNodeList` or `topicNodes` instead of `topic_excel_node_list` (indicating it's a list of `TopicNode` objects).
   - `topicHelperDataList` or `topicHelpers` instead of `topic_excel_helper_list` (indicating it's a list of helper data dictionaries).

2. **For Dictionaries:** Include `Dict` in the name, along with an indication of what the dictionary holds. For example:
   - `topicNodeDataDict` instead of `topic_node_data` (indicating it's a dictionary of data for a `TopicNode`).
   - `topicHelperDataDict` instead of `topic_excel_helper_data` (indicating it's a dictionary of helper data).

3. **For Processed Data:** Use a name that indicates both the original type and that it's been processed. For example:
   - `processedTopicNodeDataDict` instead of `topic_excel_node_data_processed` (indicating it's a processed dictionary of `TopicNode` data).

4. **For Combined Data:** Use a name that indicates it's a combined structure, along with what it contains. For example:
   - `topicNodeAndHelperDict` instead of `combined_data` (indicating it's a dictionary containing both a `TopicNode` and its helper data).

5. **For Default Values:** Specify what the defaults are for. For example:
   - `defaultTopicValuesDict` instead of `default_values` (indicating it's a dictionary of default values for topics).

Here's how your code might look with these naming conventions:

```python
topicNodeList = []
topicHelperDataList = []
defaultTopicValuesDict = {
    'topic_assessment_type': 'Null'
}

for index, row in topic_df.iterrows():
    topicNodeDataDict = {
        # your existing code to populate topicNodeDataDict
    }
    logging.pedantic(f"Processing row {index}: {topicNodeDataDict}")

    topicHelperDataDict = {
        # your existing code to populate topicHelperDataDict
    }

    processedTopicNodeDataDict = replace_nan_with_default(topicNodeDataDict, defaultTopicValuesDict)

    try:
        topicNode = TopicNode(**processedTopicNodeDataDict)
        topicNodeAndHelperDict = {
            'node': topicNode,
            'helper': topicHelperDataDict
        }
        topicNodeList.append(topicNodeAndHelperDict)
        logging.pedantic(f"Appended topicNodeAndHelperDict for row {index}: {topicNodeList}")
    except ValidationError as e:
        logging.error(f"Validation error for row {index}: {e}")
```

These names make it clearer what each variable is and what type of data it contains, which can be especially helpful in larger projects or when collaborating with other developers.



#Example usage
#topic_schema_instance = schemas.TopicSchema(**topic_data)
#topic_node_instance = schema_tools.convert_pydantic_to_neontology(topic_schema_instance, schemas.TopicNode)

#topic_lesson_schema_instance = schemas.TopicLessonSchema(**topic_lesson_data)
#topic_lesson_node_instance = schema_tools.convert_pydantic_to_neontology(topic_lesson_schema_instance, schemas.TopicLessonNode)

In [None]:


from schemas import TopicNode, TopicLessonNode, LearningStatementNode
from schemas import SchoolOffersSubject, SubjectForKeyStage, KeyStageIncludesTopic, SubjectIncludesTopic, TopicIncludesTopicLesson, TopicIncludesLearningStatement, LearningStatementPartOfTopicLesson
import schema_tools as schema_tools
import planner as planner
import neontology_tools as neon
import neo4j_driver_tools as neo4j

import pandas as pd
from pydantic import ValidationError, BaseModel
from neontology import BaseNode, BaseRelationship

In [None]:
neo4j_driver = neo4j.get_neo4j_driver()
neon.init_neo4j_connection()


In [None]:
planner = planner.get_excel_sheets('planner.xlsx ')


In [None]:
topic_df = planner['topiclookup_df']
lesson_df = planner['lessonlookup_df']
statement_df = planner['statementlookup_df']


In [None]:
def replace_nan_with_default(data, default_values):
    for key in default_values:
        if pd.isna(data.get(key, None)):
            logging.warning(f"Replacing NaN in {key} with default value '{default_values[key]}'")
            data[key] = default_values[key]
    return data


In [None]:
topic_excel_node_list = []
topic_excel_helper_list = []
default_values = {
    'topic_assessment_type': 'Null'
    }

for index, row in topic_df.iterrows():
    # Filter and map the row to TopicNode fields
    topic_node_data = {
        'topic_id': row.get('TopicID'),
        'topic_title': row.get('TopicTitle'),
        'total_number_of_lessons_for_topic': row.get('TotalNumberOfLessonsForTopic'),
        'topic_type': row.get('TopicType'),
        'topic_assessment_type': row.get('TopicAssessmentType')
    }
    logging.pedantic(f"Processing row {index}: {topic_node_data}")
    topic_excel_helper_data = {
        'topic_source': row.get('TopicSource'),
        'topic_department': row.get('TopicDepartment'),
        'topic_key_stage': row.get('TopicKeyStage'),
        'topic_year': row.get('TopicYear'),
        'topic_subject': row.get('TopicSubject'),
        'topic_sequence': row.get('TopicSequence')
    }
    # Replace NaN values with defaults
    topic_excel_node_data_processed = replace_nan_with_default(topic_node_data, default_values)
    logging.pedantic(f"Processed row {index}: {topic_excel_node_data_processed}")

    # Create a TopicNode instance for each row
    try:
        topic_node = TopicNode(**topic_excel_node_data_processed)
        logging.pedantic(f"TopicNode instance created for row {index}: {topic_node}")
        combined_data = {
            'node': topic_node,
            'helper': topic_excel_helper_data
        }
        logging.pedantic(f"Combined data for row {index}: {combined_data}")
        topic_excel_node_list.append(combined_data)
        logging.pedantic(f"Appended new combined data to topic_excel_node_list, current length: {len(topic_excel_node_list)}")
    except ValidationError as e:
        logging.error(f"Validation error for row {index}: {e}")



In [None]:
# Then, use the create_or_merge_neontology_node function to add these nodes to your Neo4j database
for node_data in topic_excel_node_list:
    try:
        logging.pedantic(f"Processing node: {node_data}")
        neon.create_or_merge_neontology_node(node_data['node'], operation='merge')
        logging.pedantic(f"Node processed: {node_data}")
    except Exception as e:
        logging.error(f"Error in processing node: {e}")

In [None]:
topic_lesson_excel_node_list = []
topic_lesson_excel_helper_list = []
default_values = {
    'topic_lesson_title': 'Null',  # Corrected default value key
    'topic_lesson_type': 'Null',  # Corrected default value key
    'topic_lesson_length': 1,             # Corrected default value key
    'topic_lesson_suggested_activities': 'Null',  # Corrected default value key
    'topic_lesson_skills_learned': 'Null',  # Corrected default value key
    'topic_lesson_weblinks': 'Null',   # Corrected default value key
}

for index, row in lesson_df.iterrows():
    # Filter and map the row to TopicLessonNode fields
    topic_lesson_node_data = {
        'topic_lesson_id': row.get('LessonID'),
        'topic_lesson_title': row.get('LessonTitle', default_values['topic_lesson_title']),
        'topic_lesson_type': row.get('TopicLessonType', default_values['topic_lesson_type']),
        'topic_lesson_length': row.get('SuggestedNumberOfPeriodsForLesson', default_values['topic_lesson_length']),
        'topic_lesson_suggested_activities': row.get('SuggestedActivities', default_values['topic_lesson_suggested_activities']),
        'topic_lesson_skills_learned': row.get('SkillsLearned', default_values['topic_lesson_skills_learned']),
        'topic_lesson_weblinks': row.get('TopicLessonWeblinks', default_values['topic_lesson_weblinks'])
        }
    logging.pedantic(f"topic_lesson_node_data: {topic_lesson_node_data}")
    topic_lesson_excel_helper_data = {
        'topic_lesson_source': row.get('TopicSource'),
        'topic_lesson_department': row.get('TopicDepartment'),
        'topic_lesson_key_stage': row.get('TopicKeyStage'),
        'topic_lesson_topic_id': row.get('TopicID'),
        'topic_lesson_topic_year': row.get('TopicYear'),
        'topic_lesson_topic_subject': row.get('TopicSubject'),
        'topic_lesson_topic_sequence': row.get('TopicSequence'),
        'topic_lesson_lesson_sequence': row.get('LessonSequence'),
        'topic_lesson_learning_objective': row.get('LessonLearningObjective'),
        }
    logging.pedantic(f"topic_lesson_excel_helper_list: {topic_lesson_excel_helper_list}")

    # Replace NaN values with defaults
    topic_lesson_excel_node_data_processed = replace_nan_with_default(topic_lesson_node_data, default_values)
    logging.pedantic(f"topic_lesson_excel_node_list_processed_data: {topic_lesson_excel_node_data_processed}")

    # Create a TopicLessonNode instance for each row
    try:
        topic_lesson_node = TopicLessonNode(**topic_lesson_excel_node_data_processed)
        logging.pedantic(f"topic_lesson_node: {topic_lesson_node}")
        combined_data = {
            'node': topic_lesson_node,
            'helper': topic_lesson_excel_helper_data
        }
        logging.pedantic(f"combined_data: {combined_data}")
        topic_lesson_excel_node_list.append(combined_data)
        logging.pedantic(f"Appended new combined data to topic_lesson_excel_node_list, current length: {len(topic_lesson_excel_node_list)}")
    except ValidationError as e:
        logging.error(f"Validation error for row {index}: {e}")
        


In [None]:
# Then, use the create_or_merge_neontology_node function to add these nodes to your Neo4j database
for node_data in topic_lesson_excel_node_list:
    try:
        neon.create_or_merge_neontology_node(node_data['node'], operation='merge')
        logging.pedantic(f"TopicLessonNode instance created for row {index}: {topic_lesson_node}")
    except Exception as e:
        logging.error(f"Error in processing node: {e}")

In [None]:
learning_statement_excel_node_list = []
learning_statement_excel_helper_list = []
default_values = {
    # Add default values for fields that might contain NaN
    'lesson_learning_statement': 'Null',
    'lesson_learning_statement_type': 'Student learning outcome'
}

for index, row in statement_df.iterrows():
    # Filter and map the row to LearningStatementNode fields
    learning_statement_node_data = {
        'lesson_learning_statement_id': row.get('LearningOutcomeID'),
        'lesson_learning_statement': row.get('LearningOutcomeStatement', default_values['lesson_learning_statement']),
        'lesson_learning_statement_type': row.get('LearningStatementType', default_values['lesson_learning_statement_type']),
    }
    logging.pedantic(f"learning_statement_node_data: {learning_statement_node_data}")
    learning_statement_excel_helper_data = {
        'lesson_learning_statement_source': row.get('TopicSource'),
        'lesson_learning_statement_department': row.get('TopicDepartment'),
        'lesson_learning_statement_key_stage': row.get('TopicKeyStage'),
        'lesson_learning_statement_topic_id': row.get('TopicID'),
        'lesson_learning_statement_lesson_id': row.get('LessonID'),
        'lesson_learning_statement_topic_year': row.get('TopicYear'),
        'lesson_learning_statement_topic_subject': row.get('TopicSubject'),
        'lesson_learning_statement_topic_sequence': row.get('TopicSequence'),
        'lesson_learning_statement_lesson_sequence': row.get('LessonSequence'),
        'lesson_learning_statement_sequence': row.get('LearningOutcomeSequence')            
    }
    logging.pedantic(f"learning_statement_excel_helper_list: {learning_statement_excel_helper_list}")

    # Replace NaN values with defaults
    learning_statement_excel_node_data_processed = replace_nan_with_default(learning_statement_node_data, default_values)
    logging.pedantic(f"processed_data: {learning_statement_excel_node_data_processed}")

    # Create a LearningStatementNode instance for each row
    try:
        learning_statement_node = LearningStatementNode(**learning_statement_excel_node_data_processed)
        logging.pedantic(f"learning_statement_node: {learning_statement_node}")
        combined_data = {
            'node': learning_statement_node,
            'helper': learning_statement_excel_helper_data
        }
        logging.pedantic(f"combined_data: {combined_data}")
        learning_statement_excel_node_list.append(combined_data)
        logging.pedantic(f"Appended new combined data to learning_statement_excel_node_list, current length: {len(learning_statement_excel_node_list)}")
    except ValidationError as e:
        logging.error(f"Validation error for row {index}: {e}")


In [None]:
# Then, use the create_or_merge_neontology_node function to add these nodes to your Neo4j database
for node_data in learning_statement_excel_node_list:
    try:
        logging.pedantic(f"Creating or merging node: {node_data}")
        neon.create_or_merge_neontology_node(node_data['node'], operation='merge')
    except Exception as e:
        logging.error(f"Error in processing node: {e}")

In [None]:
relationship_list = []
for topic_lesson_node in topic_lesson_excel_node_list:
    topic_lesson_node_id = topic_lesson_node['node'].topic_lesson_id
    lesson_nodes = neo4j.find_nodes_by_label_and_properties(neo4j_driver, 'TopicLessonNode', {'TopicLessonID': topic_lesson_node_id})
    if not lesson_nodes:  # Check if the list is empty
        logging.error(f"No lesson node found for ID {topic_lesson_node_id}")
        continue

    topic_node_id = topic_lesson_node['helper']['topic_lesson_topic_id']
    topic_nodes = neo4j.find_nodes_by_label_and_properties(neo4j_driver, 'TopicNode', {'topic_id': topic_node_id})
    if not topic_nodes:  # Check if the list is empty
        logging.error(f"No topic node found for ID {topic_node_id}")
        continue
    
    # Assuming only one node is expected for each query
    lesson_node = lesson_nodes[0]
    topic_node = topic_nodes[0]

    # might not be needed
    relationship_data = {
        'from_node_id': topic_lesson_node_id,
        'to_node_id': topic_node_id,
        'relationship_type': 'HAS_TOPIC'
    }
    # needed
    topic_node = neo4j.find_nodes_by_label_and_properties(neo4j_driver, 'TopicNode', {'topic_id': topic_node_id})
    logging.pedantic(f"topic_node: {topic_node}")
    topic_has_lesson_relationship = TopicIncludesTopicLesson(source=topic_node, target=lesson_node)
    logging.pedantic(f"topic_has_lesson_relationship: {topic_has_lesson_relationship}")
    relationship_list.append(relationship_data) # maybe don't bother with this list


In [None]:
for relationship in relationship_list:
    try:
        logging.pedantic(f"Processing relationship: {relationship}")
        SourceNode = TopicNode
        TargetNode = TopicLessonNode
        relationship_node = TopicIncludesTopicLesson(source=SourceNode(topic_id=relationship['from_node_id']), target=TargetNode(topic_lesson_id=relationship['to_node_id']))
        logging.pedantic(f"relationship_node: {relationship_node}")
        neon.create_or_merge_neontology_relationship(**relationship_node)
        logging.pedantic(f"Relationship processed: {relationship}")
    except Exception as e:
        logging.error(f"Error in processing relationship: {e}")

To proceed with the next stage of your project, you need to define schemas for the relationships between your Neo4j nodes. Based on the nodes you have provided, it seems you're working with an educational or curricular data model. I'll suggest some potential relationships based on common educational structures, and then show you how to implement them in code.

### Suggested Relationship Schemas

1. **SchoolContainsSubject**: A relationship from `SchoolNode` to `SubjectNode`, indicating that a subject is taught in a school.
2. **SubjectIncludesKeyStage**: A relationship from `SubjectNode` to `KeyStageNode`, suggesting that a subject includes a particular key stage.
3. **KeyStageCoversTopic**: A relationship from `KeyStageNode` to `TopicNode`, indicating that a topic is covered in a certain key stage.
4. **TopicIncludesLesson**: A relationship from `TopicNode` to `TopicLessonNode`, showing that a lesson is part of a topic.
5. **LessonHasLearningStatement**: A relationship from `TopicLessonNode` to `LearningStatementNode`, indicating that a lesson includes a particular learning statement.
6. **LessonUtilizesScienceLab**: A relationship from `TopicLessonNode` to `ScienceLabNode`, suggesting that a lesson utilizes a certain science lab.

### Implementing Relationship Schemas

First, define the relationship classes in your `schemas.py` or a similar file:

```python
from neontology import BaseNode, BaseRelationship

class SchoolContainsSubject(BaseRelationship):
    source: SchoolNode
    target: SubjectNode

class SubjectIncludesKeyStage(BaseRelationship):
    source: SubjectNode
    target: KeyStageNode

class KeyStageCoversTopic(BaseRelationship):
    source: KeyStageNode
    target: TopicNode

class TopicIncludesLesson(BaseRelationship):
    source: TopicNode
    target: TopicLessonNode

class LessonHasLearningStatement(BaseRelationship):
    source: TopicLessonNode
    target: LearningStatementNode

class LessonUtilizesScienceLab(BaseRelationship):
    source: TopicLessonNode
    target: ScienceLabNode
```

### Writing Code to Run Them

You already have the `create_or_merge_neontology_relationship` function, which you can use to create or merge these relationships. Here's an example of how you might use it in your main notebook:

```python
# Example of creating a relationship between a school and a subject
school_node = SchoolNode(school_id="SCH123", school_name="Example School", ...)
subject_node = SubjectNode(subject_id="SUB456", subject="Mathematics", ...)

school_subject_rel = SchoolContainsSubject(source=school_node, target=subject_node)
create_or_merge_neontology_relationship(school_subject_rel)

# Similarly, create other relationships
# ...

# Don't forget to handle exceptions and log appropriately
```

You will need to populate the nodes (`school_node`, `subject_node`, etc.) with actual data from your database or data source. The relationships should mirror the structure and connections of your educational model.

This setup allows for a flexible and expandable system that can be further customized or extended as your project grows. Remember to always test your code with a small subset of data before scaling up to your entire database to ensure everything works as expected.

In [None]:
neo4j_tools.create_nodes(neo4j_driver, topic_node_list)
neo4j_tools.create_nodes(neo4j_driver, topic_lesson_node_list)
neo4j_tools.create_nodes(neo4j_driver, learning_statement_node_list)

In [None]:
# Now we create relationships between the nodes
relationship_list = []
topic_nodes = neo4j_tools.find_nodes_by_label(neo4j_driver, 'Topic')
topic_lesson_nodes = neo4j_tools.find_nodes_by_label(neo4j_driver, 'TopicLesson')
learning_statement_nodes = neo4j_tools.find_nodes_by_label(neo4j_driver, 'LearningStatement')

logging.info(f"topic_nodes: {topic_nodes}")
logging.info(f"topic_lesson_nodes: {topic_lesson_nodes}")
logging.info(f"learning_statement_nodes: {learning_statement_nodes}")