**Question:**

In a research experiment, you are tasked with analyzing the final marks of the students who took the "Introduction to Statistical Computing" course last year. 

Based on the given sample data, calculate the 95% confidence interval for the true mean final mark of all students in the course.

**Sample data: [8.4, 7.2, 9.0, 7.8, 9.5, 6.8, 8.9, 8.1, 9.2, 7.5, 8.8, 7.9, 9.6, 7.0, 8.5]**

Please write the code to calculate the confidence interval and include any necessary calculations and assumptions. Write the steps you have followed to solve the problem.

----

To calculate the 95% confidence interval for the true mean final mark of all students in the "Introduction to Statistical Computing" course:
- We can use the t-distribution since the sample size is small
- The population standard deviation is unknown.

Steps to calculate the confidence inteval: 
- Calculate the sample mean of the final marks. 
- Calculate the sample standard deviation of the final marks. 
- Determine the sample size (n)
- Find the t-value corresponding to a 95% confidnece level and n-1 degrees of freedom
- Calculate the margin of error using the formula: ME = $t * \sqrt(s/\sqrt(n))$ where t is the t-value
- Calculate the lower bound of the confidence interval: $x - ME$
- Calculat the upper bound of the confidence interval: $x + ME$

In [1]:
import numpy as np
from scipy import stats

# Sample data
marks = [8.4, 7.2, 9.0, 7.8, 9.5, 6.8, 8.9, 8.1, 9.2, 7.5, 8.8, 7.9, 9.6, 7.0, 8.5]

# Step 1: Calculate the sample mean
sample_mean = np.mean(marks)

# Step 2: Calculate the sample standard deviation
sample_std = np.std(marks, ddof=1)  # ddof=1 for sample standard deviation

# Step 3: Determine the sample size
n = len(marks)

# Step 4: Find the t-value for a 95% confidence level and n-1 degrees of freedom
t_value = stats.t.ppf(0.975, df=n - 1)  # 0.975 for two-tailed test

# Step 5: Calculate the margin of error
margin_of_error = t_value * (sample_std / np.sqrt(n))

# Step 6: Calculate the lower bound of the confidence interval
lower_bound = sample_mean - margin_of_error

# Step 7: Calculate the upper bound of the confidence interval
upper_bound = sample_mean + margin_of_error

# Print the confidence interval
print("95% Confidence Interval for the True Mean Final Mark:")
print("Lower Bound:", lower_bound)
print("Upper Bound:", upper_bound)

95% Confidence Interval for the True Mean Final Mark:
Lower Bound: 7.78194834403112
Upper Bound: 8.778051655968879


----

# Question 2

You are a new junior data statistics operational blockchain engineer in a marketing research firm, and your supervisor has handed you the results of an experiment comparing the effectiveness of two different advertising campaigns. 

* The sample data shows that **Campaign A** had an **average conversion rate of 12%** and a **standard deviation of 2%**, while **Campaign B** had an **average conversion rate of 14% and a standard deviation of 3%**. 

* Your supervisor wants you to decide which campaign is more effective. How might you proceed?

Provide more than one approach to answer this experiment and discuss the advantages and limitations of each approach.

### Approach 1: Hypothesis Testing
**Procedure**: Perform a two-sample t-test to compare the mean conversion rates of Campaign A and Campaign B.
- **Null Hypothesis (H0)**: There is no difference in conversion rates between the two campaigns (\(\mu_A = \mu_B\)).
- **Alternative Hypothesis (H1)**: There is a difference in conversion rates (\(\mu_A \neq \mu_B\)).

**Advantages**:
- Provides a clear statistical basis to accept or reject the hypothesis.
- Can quantify the likelihood of observing the difference in conversion rates if there truly was no difference.

**Limitations**:
- Assumes that the data follows a normal distribution.
- Sensitive to outliers and the size of the sample.

### Approach 2: Confidence Interval Comparison
**Procedure**: Construct confidence intervals (CIs) for the mean conversion rates of both campaigns and compare them.
- If the CI for Campaign A does not overlap with the CI for Campaign B, it suggests a statistically significant difference in conversion rates.

**Advantages**:
- Intuitive and visually easy to understand.
- Provides a range of plausible values for the mean conversion rates.

**Limitations**:
- Confidence level choice (commonly 95%) can affect the width of the interval, thus impacting the interpretation.
- Does not provide a specific p-value or a measure of the strength of evidence.

### Approach 3: Bayesian Analysis
**Procedure**: Use a Bayesian framework to estimate the posterior distributions of the conversion rates for both campaigns.
- Compare the posterior distributions to see which campaign is more likely to have a higher conversion rate.

**Advantages**:
- Incorporates prior knowledge into the analysis, if available.
- Provides a probability of one campaign being better than the other, rather than just testing for difference.

**Limitations**:
- Requires selection of prior distributions, which can influence results.
- Computationally more intensive and requires specialized software.

----

## **Question 3**

Congratulations! You are now the Head Data Scientist of Invented Company S.L. You are now in charge of improving the performance of internal data analysis tools. CEO have stated that it is extremely important to focus on that task, although your subordinates think that there are more urgent tasks to do. Which steps do you take and why? You can make any reasonable assumption to support your ideas.

As the newly appointed Head Data Scientist at InventedCompany S.L., balancing the directive from the CEO to improve the performance of internal data analysis tools with the perceived priorities of the team requires a strategic approach. Here are the steps I would take:

![image.png](attachment:image.png)

### Step 1: Assess Current Capabilities and Identify Gaps
**Action**: Conduct a thorough review of the existing data analysis tools and workflows. This includes understanding the technology stack, the efficiency of current processes, and any bottlenecks or pain points experienced by the team.

**Why**: This step ensures that any improvements are data-driven and targeted, addressing real issues rather than perceived deficiencies.

### Step 2: Engage the Team in Open Dialogue
**Action**: Hold discussions with the team to understand their concerns and the tasks they consider urgent. This could be structured as a series of one-on-one meetings or group brainstorming sessions.

**Why**: By involving the team in decision-making, you can gain valuable insights into unaddressed issues and foster a sense of collective ownership over the project's direction.

### Step 3: Prioritize Tasks with a Balanced Approach
**Action**: Combine insights from the initial assessment and team feedback to prioritize tasks. Create a roadmap that balances quick wins (to demonstrate progress) with long-term strategic improvements.

**Why**: Addressing both immediate and long-term needs can help maintain momentum and ensure sustained impact.

### Step 4: Implement a Pilot Project
**Action**: Select a critical part of the data analysis process identified during the assessment and start with a pilot project to improve it. This could involve integrating new tools, optimizing existing algorithms, or streamlining data workflows.

**Why**: A pilot project serves as a test case to demonstrate the potential benefits of broader changes and helps refine approaches based on practical outcomes.

### Step 5: Measure Outcomes and Gather Feedback
**Action**: After implementing changes, measure the outcomes against predefined metrics (e.g., processing time, user satisfaction, error rates). Gather feedback from the team on the impact of these changes.

**Why**: This step is crucial for validating the effectiveness of the improvements and ensuring they genuinely enhance performance.

### Step 6: Scale and Integrate
**Action**: Based on the success of the pilot, begin broader integration of successful strategies into other areas of the data analysis process.

**Why**: Scaling successful initiatives ensures that improvements have a company-wide impact, optimizing overall performance.

### Step 7: Continuous Improvement and Training
**Action**: Establish a cycle of continuous improvement that includes regular updates to tools, training for team members, and periodic reviews of tool effectiveness.

**Why**: Data science is a rapidly evolving field, and ongoing training and updates ensure that the company remains at the cutting edge and team skills stay relevant.

By taking these steps, I would aim to align the team's operational priorities with strategic business goals, thereby enhancing the performance of data analysis tools while also addressing the team's concerns about urgent tasks. This balanced approach should lead to sustained improvements and better alignment within the team.

---

**Q1**: How can I quantify the impact of improvements in data analysis tools on overall business performance?

**Q2**: What specific metrics should I look at to evaluate the success of the pilot project in improving data analysis tools?

**Q3**: How can I maintain team motivation and engagement throughout the process of changing and improving these tools?