                                                 Basic Stats - 2

In [2]:
# Background
# In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.

# Scenario
# A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.

# Data
# A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
# 1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29

# Assignment Tasks

# a. Build 99% Confidence Interval Using Sample Standard Deviation
# Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.

# b. Build 99% Confidence Interval Using Known Population Standard Deviation
# If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.


In [3]:
import pandas as pd

In [4]:
BS2=pd.DataFrame([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29],columns=(['Data']))
BS2.head()

Unnamed: 0,Data
0,1.13
1,1.55
2,1.43
3,0.92
4,1.25


In [5]:
# a. Build 99% Confidence Interval Using Sample Standard Deviation

In [6]:
import numpy as np
import scipy.stats as stats
# Calculate sample mean and standard deviation
mean = BS2['Data'].mean()
print('mean: ',mean)
std_dev = BS2['Data'].std(ddof=1)  # Sample standard deviation (ddof=1)
print('stdev:',std_dev)
n = len(BS2)  # Sample size

# Step 2: Determine the t-critical value for a 99% confidence interval
confidence_level = 0.99
alpha = 1 - confidence_level
degrees_of_freedom = n - 1

confidence_interval = stats.t.interval(
    confidence_level,  # Confidence level (99%)
    degrees_of_freedom,  # Degrees of freedom (n-1)
    loc=mean,  # Sample mean
    scale=std_dev / np.sqrt(n)  # Standard error of the mean
)
confidence_interval

#Method-2
# t_critical = stats.t.ppf(1 - alpha/2, degrees_of_freedom)
# # Step 3: Calculate the margin of error
# margin_of_error = t_critical * (std_dev / np.sqrt(n))
# # Step 4: Calculate the confidence interval
# lower_bound = mean - margin_of_error
# upper_bound = mean + margin_of_error
# print('confidence interval: \n','lower bound: ',lower_bound ,'upper bound',upper_bound)

mean:  1.2386666666666666
stdev: 0.19316412956959936


(1.0901973384384906, 1.3871359948948425)

In [7]:
# Explanation:
# The 99% confidence interval gives a range of values within which we are 99% confident that population mean lies.
# We used the t-distribution because the population standard deviation is unknown and the sample size is small.This gives a more
# reliable interval estimate of the population mean when working with sample data.
# This output means that we are 99% confident that the true population mean lies between (1.0901973384384906, 1.3871359948948425)

In [8]:
# b. Build 99% Confidence Interval Using Known Population Standard Deviation

In [9]:
# Calculate the sample mean
mean = BS2['Data'].mean()

# Known population standard deviation
population_std_dev = 0.2  # Given

# Calculate the sample size
n = len(BS2)

# Calculate the 99% confidence interval using the inbuilt function
confidence_level = 0.99
confidence_interval = stats.norm.interval(confidence_level, loc=mean, scale= population_std_dev / np.sqrt(n))#scale=standard-error

confidence_interval

# Method-2
# # Determine the z-critical value for a 99% confidence interval
# confidence_level = 0.99
# z_critical = stats.norm.ppf(1 - (1 - confidence_level) / 2)  # Z-critical value for 99% confidence
# # Calculate the margin of error
# margin_of_error = z_critical * (population_std_dev / np.sqrt(n))
# # Calculate the confidence interval
# lower_bound = mean - margin_of_error
# upper_bound = mean + margin_of_error
# (lower_bound, upper_bound)

(1.1056514133957607, 1.3716819199375725)

In [10]:
# Explanation:
# We are 99% confident that true population mean number of characters printed before failure lies between (1.1056514133957607, 1.3716819199375725) million characters.