---
title:  Project Card
subtitle: Project Name
version: v0.2
card version: v0.1
author: Soma S Dhavala
date: 05-Aug-2025
objective: >
    The purpose of Project Cards is two folds. During development, it helps the developer think about the problem in a structured way w.r. t framing the problem, assessing the business value, viability, and many other aspects. 

    It can also serve as a document giving a high level overview of the system developed and deployed. With proper versioning, one can also see the evolution of the problem. It is meant to be a high level document and as details emerge, documents such Model Cards and Data Cards can be linked.
tag: >
    This notebook uses tags to render the output. Each call has a tag. There are three tags: objective, instruction, response. 
    The cell with objective tag explains the purpose of this project card. Cells with instruction tag, are the key sections of the document that must be filled.  A cell following immediately will have a tag response. You only fill the cell with response tag. DO NOT MODIFY the cells with tag instruction. Of course, feel free to modify to your needs. Once the format is agreed upon, stick to it.
format:
    html:
        code-fold: true
---

The purpose of Project Cards is two folds. During development, it helps the developer think about the problem in a structured way w.r. t framing the problem, assessing the business value, viability, and many other aspects. 

It can also serve as a document giving a high level overview of the system developed and deployed. With proper versioning, one can also see the evolution of the problem. It is meant to be a high level document and as details emerge, documents such Model Cards and Data Cards can be linked. 

The following are the different sections of the Project Cards.

This notebook uses tags to render the output. Each call has a tag. There are three tags: objective, instruction, response. 
    The cell with objective tag explains the purpose of this project card. Cells with instruction tag, are the key sections of the document that must be filled.  A cell following immediately will have a tag response. You only fill the cell with response tag. DO NOT MODIFY the cells with tag instruction. Of course, feel free to modify to your needs. Once the format is agreed upon, stick to it.

# Business View

## Background
_Provide succinct **background** to the problem so that the reader can empathize with the problem._

The sinking of the RMS Titanic on its maiden voyage in 1912 is one of the most infamous shipwrecks in history. Out of the 2,224 passengers and crew on board, only 712 survived the catastrophic event. While there was an element of luck involved, it seems that certain groups of people were more likely to survive than others.

## Problem
_**What** is the problem being solved?_

The key problem is to build a predictive model that can accurately determine which passengers were more likely to survive the Titanic disaster. By analyzing the available passenger data, such as name, age, gender, socio-economic class, etc., the goal is to identify the factors that influenced the likelihood of survival.

## Customer
_**Who** it is for? Is that a _user_ or a _beneficiary_?
What is the problem being solved? Who it is for?_

The primary customers for this project are data science students statisticians and possibly enthusiasts that are willing to dive deeper into this significant event in our history. The insights gained from this analysis could also be useful for maritime safety organizations and historians studying the Titanic disaster.

## Value Proposition
_Why it needs to be solved?_

Solving this problem will provide valuable insights into the factors that affected passenger survival during the Titanic sinking. The predictive model developed can be used to educate people about the dynamics of the disaster and potentially inform future maritime safety protocols. Additionally, this project serves as an excellent learning opportunity for those new to data science and machine learning such as myself.

## Product
_How does the solution look like? It is more of the experience, rather how it will be developed._

The solution should provide a predictive model that can accurately identify which passengers are likely to have survived the Titanic disaster based on the available passenger data. This model can be used to make binary predictions (survived or not survived) for the test set of passengers.

## Objectives
_Breakdown the product into key (business) objectives that need to be delivered?_
[SMART Goals](https://med.stanford.edu/content/dam/sm/s-spire/documents/How-to-write-SMART-Goals-v2.pdf) is useful to frame

1. Accuracy: Develop a machine learning model that can predict passenger survival with an 2.accuracy of at least 80%.
2. Interpretability: Ensure the model's predictions are interpretable, providing insights into the key factors that influenced survival.
3. Generalization: Ensure the model can generalize well to new, unseen passenger data, not just the training set.
4. Scalability: Design the solution in a way that it can be easily extended to handle larger passenger datasets or similar maritime disaster scenarios.

## Risks & Challenges
_What are the challenges one can face and ways to overcome?_

1. Data Quality: The passenger data may contain missing values, inconsistencies, or biases that could impact the model's performance.
2. Feature Engineering: Determining the most relevant features from the available passenger information and engineering them appropriately will be a key challenge.
3. Model Complexity: Balancing model complexity to achieve high accuracy while maintaining interpretability may require careful experimentation.
4. Overfitting: Ensuring the model does not overfit to the training data and can generalize to new, unseen passengers will be crucial.

# ML View

## Task
_What type of prediction problem is this? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_

This is a binary classification problem, where the goal is to predict whether each passenger survived the Titanic disaster or not. A Model Card will be created to document the details of the model as the project progresses.

## Metrics
_How will the solution be evaluated - What are the ML metrics? What are the business metrics? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_

1. Classification Accuracy: The percentage of passengers correctly predicted as survived or not survived.
2. F1-Score: The harmonic mean of precision and recall, to balance the model's ability to correctly identify both survived and not survived passengers.
3. Quantifying how much each factor played a role in the Survivability of passengers.

## Evaluation
_How will the solution be evaluated (process)? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_

The predictive model will be evaluated using a held-out test set, representing a sample of the passenger data. The model's performance on this test set will be used to estimate its real-world effectiveness in predicting passenger survival.

## Data
_What type of data is needed? How will it be collected - for training and for continuous improvement? Link  [Data Cards](https://arxiv.org/abs/2204.01075) when sufficient details become available (start small but early)_

The primary data source will be the Titanic passenger manifest, which includes information such as passenger ticket class, gender, age, number of siblings/spouses, number of parents/children aboard, ticket number, fare, cabin number, port of embarkation and survival status. This data will be used to train and test the predictive model.

## Plan/ Roadmap
_Provide problem break-up, tentative timelines and deliverables? Use [PACT](https://nesslabs.com/smart-goals-pact) format if SMART is not suitable._

1. Data Exploration and Preprocessing

    - Analyze the quality and completeness of the passenger data
    - Engineer relevant features from the available information
    - Handle missing values and address any data inconsistencies


2. Model Development

    - Evaluate various classification algorithms (e.g., logistic regression, decision trees, random forests)
    - Tune model hyperparameters and select the best-performing approach
    - Implement the model to generate passenger survival predictions


3. Model Evaluation and Refinement

    - Assess the model's performance on the held-out test set
    - Analyze the model's feature importance and interpret the key factors affecting survival
    - Experiment with the model architecture and feature engineering as needed to improve accuracy


4. Deployment and Dissemination

    - Document the project findings and insights in a clear and accessible format
    - Share the project and its results with the broader data science community and among relevant enthusiasts
    - Explore opportunities to use the model for maritime safety initiatives

## Continuous Improvement
_How will the system/model will improve? Provide a plan and means._

To ensure the model remains accurate and effective over time, the following processes will be implemented:

1. Ongoing Data Collection: Monitor for any new or updated Titanic passenger data that could be used to further refine the model.
2. Periodic Retraining: The model will be retrained at regular intervals to update with the latest data and maintain its predictive accuracy.
3. Performance Monitoring: Monitor the model's performance, tracking key metrics such as classification accuracy and F1-score. Any significant changes or degradation in performance may mean a model update, reqork is necessary.

## Resources
_What resources are needed? Estimate the cost!_

TBD. Not sure what to put here yet.


### Human Resources
_what type of team and strength needed?_

The project will require the following team members:

- 1 Machine Learning Engineer: To lead the model development and deployment efforts
- 1 Data Engineer: To manage the data pipeline and prepare the passenger data for modeling
- 1 Domain Expert: To provide subject matter expertise on the Titanic disaster and its historical context
- 1 Technical Writer: To document the project findings and insights in a clear and accessible format

### Compute Resources
_What type of compute resources needed to train and serve?_

TBD: This is a first time for doing a project like this. While the data is not very large, it would benefit from large computation in case a complex model is required for best predictions. Resources may include Cloud Based GPU/CPU instances, sufficient amount of memory and serverless infrastructure.