Assignment repository for building custom Python ETL data connectors (Kyureeus EdTech, SSN CSE). Students: Submit your ETL scripts here. Make sure your commit message includes your name and roll number.
Welcome to the official repository for submitting your Software Architecture assignment on building custom data connectors (ETL pipelines) in Python. This assignment is part of the Kyureeus EdTech program for SSN CSE students.
Guideline: Building and Managing Custom Data Connectors (ETL Pipeline) in Python
-
Setting Up the Connector Environment a. Choose Your API Provider: Identify a data provider and understand its Base URL, Endpoints, and Authentication. b. Understand the API Documentation: Focus on headers, query params, pagination, rate limits, and response structure.
-
Secure API Authentication Using Environment Variables a. Create a
.envFile Locally: Store API keys and secrets as KEY=VALUE pairs. b. Load Environment Variables in Code: Use libraries likedotenvto securely load environment variables. -
Design the ETL Pipeline Extract: Connect to the API, pass tokens/headers, and collect JSON data. Transform: Clean or reformat the data for MongoDB compatibility. Load: Store the transformed data into a MongoDB collection.
-
MongoDB Collection Strategy Use one collection per connector, e.g.,
connector_name_raw. Store ingestion timestamps to support audits or updates. -
Iterative Testing & Validation Test for invalid responses, empty payloads, rate limits, and connectivity errors. Ensure consistent insertion into MongoDB.
-
Git and Project Structure Guidelines a. Use a Central Git Repository: Clone the shared repo and create a new branch for your connector. b. Ignore Secrets: Add
.envto.gitignorebefore the first commit. c. Push and Document: Write README.md with endpoint details, API usage, and example output.
Final Checklist for Students
Understand API documentation
Secure credentials in .env
Build complete ETL script
Validate MongoDB inserts
Push code to your branch
Include descriptive README
Submit Pull Request
Goal:
Develop a Python script to connect with an API provider, extract data, transform it for compatibility, and load it into a MongoDB collection. Follow secure coding and project structure practices as outlined below.
- Choose a data provider (API) and understand its documentation
- Secure all API credentials using a
.envfile - Build a complete ETL pipeline: Extract β Transform β Load (into MongoDB)
- Test and validate your pipeline (handle errors, invalid data, rate limits, etc.)
- Follow the provided Git project structure
- Write a clear and descriptive
README.mdin your folder with API details and usage instructions - Include your name and roll number in your commit messages
- Push your code to your branch and submit a Pull Request
/your-branch-name/ βββ etl_connector.py βββ .env βββ requirements.txt βββ README.md βββ (any additional scripts or configs)
.env: Store sensitive credentials; do not commit this file.etl_connector.py: Your main ETL script.requirements.txt: List all Python dependencies.README.md: Instructions for your connector.
- Store all API keys/secrets in a local
.envfile. - Load credentials using the
dotenvPython library. - Add
.envto.gitignorebefore committing.
- Use one MongoDB collection per connector (e.g.,
connectorname_raw). - Store ingestion timestamps for audit and update purposes.
- Check for invalid responses, empty payloads, rate limits, and connectivity issues.
- Ensure data is correctly inserted into MongoDB.
- Clone the repository and create your own branch.
- Add your code and documentation in your folder/branch.
- Do not commit your
.envor secrets. - Write clear commit messages (include your name and roll number).
- Submit a Pull Request when done.
- Post your queries in the KYUREEUS/SSN College - WhatsApp group .
- Discuss issues, share progress, and help each other.
Happy coding! π