## üìÅ Project Setup and Structure

### Step 1: Project Template
- Start by executing the `template.py` file to create all initial required files and folder.

### Step 2: Package Management
- Write the setup for importing local packages in `setup.py` and `pyproject.toml` files.
- **Tip**: Learn more about these files from `crashcourse.txt`.
- install required moduled listed in requirements.txt 
    - "pip install -r requirements.txt"


## üìä MongoDB Setup and Data Management

### Step 4: MongoDB Atlas Configuration
1. Sign up for [MongoDB Atlas](https://www.mongodb.com/cloud/atlas) and create a new project.
2. Set up a free M0 cluster, configure the username and password, and go to "network access" and ip address(`0.0.0.0/0`) so that you can access it from anywhere and any network.
3. Retrieve the MongoDB connection string for Python and save it (replace `<password>` with your password).

### Step 5: Pushing Data to MongoDB
1. Create a folder named `notebook`, add the dataset, and create a notebook file `mongoDB_demo.ipynb`.
2. Use the notebook to push data to the MongoDB database.
3. Verify the data in MongoDB Atlas under Database > Browse Collections.


## **Note**
### **-e .**
- e in requirements.txt  :  SO basically any directory with file name   __init__.py is a package and since it stored in our local machine so we call it local packages.
- so if we wanted to import src anywhere in local machine without errors( when importing src in other directories) then we need to install these local packages in your local machine.

- So how to install it in venv : just add -e . in the end of requirements.txt file.
- The setup.py and pyproject.toml file works together to install local packages in the venv
- so now you can access src package outside the src or anywhere within this venv


## **setup.py and pyproject.toml**

1. What is a pyproject.toml file?

TOML (Tom‚Äôs Obvious, Minimal Language): It‚Äôs a simple configuration file format (like JSON or YAML) but is easier to read and write. 
TOML is becoming the standard for Python packaging metadata.

2. Why pyproject.toml is important:

> It was introduced with PEP 518 to modernize Python package building. Previously, everything was done using setup.py 
  but now pyproject.toml allows for more flexibility, better dependency management, and cleaner project configuration.
> It centralizes metadata about the project: project name, version, dependencies, authors, etc.
> It supports various build systems (like setuptools, poetry, etc.).

3. Explaining sections of pyproject.toml:

[project]: Defines the basic project information (name, version, description, authors).
[tool.setuptools]: Specifies that setuptools is being used to build the project.
[tool.setuptools.dynamic]: Links the external files (like requirements.txt) to dynamically pull dependencies.

4. setup.py with the advent of pyproject.toml: Some tasks previously handled by setup.py (like metadata) are now managed 
   by pyproject.toml. However, setup.py can still be used, especially if you have complex build steps.

5. How do setup.py, pyproject.toml, and requirements.txt work together?

> pyproject.toml: It‚Äôs now the central place for project metadata. Instead of defining your dependencies and project 
  information in setup.py, you can define them in pyproject.toml.
  As we did in your project, the line [tool.setuptools.dynamic] dependencies = {file = "requirements.txt"} links your requirements.txt 
  file to the TOML file, so when the project is built, the dependencies are fetched from requirements.txt.

> setup.py: While it‚Äôs still used for custom builds and configurations, most of the basic functionality (like metadata and dependencies) 
  is being transferred to pyproject.toml. You might still keep a minimal setup.py if you have custom build steps, but for many projects, 
  it‚Äôs not necessary anymore with pyproject.toml.

> requirements.txt: It lists all project dependencies and their versions.

When you run pip install -r requirements.txt, it ensures that all dependencies are installed. The pyproject.toml file can reference 
it (as we did) so that package dependencies are automatically pulled from there.


# **Data Ingestion üì•**
- src.constant
    - Add code in constant dir : add all constant variables here.
- src.configuration 
    - Add code to mongo_db_connection.py and define the func for mongo_db connection to stablish connection with mongo db server.
    - How to set MongoDb URL in venv variable  for (os.getenv(MONGODB_URL_KEY) in src.configure.mongo_db_connection.py)---> set: export MONGODB_URL="mongodb+srv://santosh4thmarch_db_user:santosh@#9605@cluster0.5cjjjrf.mongodb.net/?appName=Cluster0"
    check: echo $MONGODB_URL


    - in constants package MONGODB_URL_KEY = "MONGODB_URL" must be as same beacause os.getenv(MONGODB_URL_KEY) will look for the str "MONGODB_URL" which is saved in venv byt this command        #export MONGODB_URL="mongodb+srv://santosh4thmarch_db_user:santosh@#9605@cluster0.5cjjjrf.mongodb.net/?appName=Cluster0" 

    - > The real issue is that the MongoDB URL contains @#9605 which needs to be URL-encoded as %40%239605. Update your environment variable with the properly encoded credentials:
    - > export MONGODB_URL="mongodb+srv://santosh4thmarch_db_user:santosh%40%239605@cluster0.5cjjjrf.mongodb.net/?appName=Cluster0"


- src.data_access
    - Add code in proj1_data.py , it will use configuration.mongo_db_connection.py and fetch data from mongoDB and convert it into data frame
- src.entity
    - Add code to config_entity.py , till DataIngestionConfig class --> pulling variables data from constant local package module ---> and  storing data ingestion variables , we are gonna use dataclass(this Auto creates the essential methods like __init__ so we just need to store varaibles as instance) module to store varibales 
    - Add code to Artifact_entity.py, till DataIngestionArtifact class : it will return file paths of output data file like training file path

- src.components
    - Add code in data_ingestion.py , in this file we are gonna use all 4 points that we dicussed in upper and it will save Train & Test data sets in the end.

- src.pipeline
    - Add code in training_pipeline.py : it will use all files and modules to fetch and save mongo db data into our local for instant.
    







# **Data Validation, Data Transformation & Model Trainer**

## Data Validation
1. complete the work on src.utils.main_utils.py (it have some reusable function which will be used in most of MLops projects), and config.schema.yaml (to store dataset table schema details for data validation) files
2. follow below workflow and add code in these
    - constants : All varibles and their values for data validation has been added already
    - config_entity : create @dataclass class to store data validation related only paths or orther varibles values
    - artifact_entity : store what type of values and what Data validation will return in the end.
    - components : code src.components.data_validation.py file.
    - pipeline : add code for data validation in src.pipeline.training pipeline
    - app.py/ demo.py

## Data Trasformation
### Data tranformation workflow: 
    - constants : All varibles and their values for data Transformation has been added already
    - config_entity : create @dataclass class to store data Transformation related only paths or orther varibles values
    - artifact_entity : store what type of values and what Data Transformation will return in the end.
    - components : code src.components.Transformation.py file.
    - pipeline : add code for data Transformation in src.pipeline.training pipeline
    - app.py/ demo.py


##
