#Estimation And Confidence Intervals

##Scenario
A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.

##Data
A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:  
1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29


In [1]:
# Import necessary libraries
import numpy as np  # For numerical operations
import scipy.stats as st  # For statistical functions

##(A) Build 99% Confidence Interval Using Sample Standard Deviation
Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.


In [2]:
# Given sample data (durability of 15 print-heads in millions of characters)
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

In [3]:
# Step 1: Determine the sample size (n)
n = len(data)  # Number of observations in the sample
print(n)

15


Sample Size (n = 15)

This is a small sample (less than 30), which is why we use the t-distribution instead of the z-distribution.

In [4]:
# Step 2: Compute the sample mean (x̄) - the average durability of print-heads
sample_mean = np.mean(data)
print(sample_mean)

1.2386666666666666


Sample Mean (1.2387 million characters)

The best estimate of the true population mean durability of print-heads.

In [5]:
# Step 3: Compute the sample standard deviation (s)
# We set ddof=1 to ensure an unbiased estimate (Bessel's correction for sample std dev)
sample_std = np.std(data, ddof=1)
print(sample_std)

0.19316412956959936


Sample Standard Deviation (0.1932 million characters)

Measures how much individual durability values deviate from the sample mean.
Since this is an estimate (not the true population standard deviation), we must use the t-distribution.

In [6]:
# Step 4: Set the confidence level (99%)
confidence_Interval = 0.99

# Compute alpha (significance level)
alpha = 1 - confidence_Interval
print(alpha)

0.010000000000000009


In [7]:
# Step 5: Determine the t-critical value (tα/2)
# This is obtained from the t-distribution table using the given confidence level and degrees of freedom (df = n-1)
t_Critical = st.t.ppf(1 - alpha / 2, df=n-1)
print(t_Critical)

2.976842734370834


T Critical Value (2.9768)

From the t-table, the critical value for 99% confidence with
𝑑𝑓
=
14
 (n-1) is 2.9768.
A higher confidence level results in a wider confidence interval (more conservative estimate).

In [8]:
# Step 6: Compute the Margin of Error (ME)
# ME = t_Critical * (sample_std / sqrt(n))
margin_error_t = t_Critical * (sample_std / np.sqrt(n))
print(margin_error_t)

0.14846932822817596


Margin of Error (0.1485 million characters)

The range of uncertainty in our sample mean estimate.
This means our true population mean could be ±0.1485 million characters from the sample mean.

In [9]:
# Step 7: Compute the confidence interval
t_Confidence_Interval = (sample_mean - margin_error_t, sample_mean + margin_error_t)
print(t_Confidence_Interval)

(1.0901973384384906, 1.3871359948948425)


Confidence Interval (1.0902, 1.3871) million characters

We are 99% confident that the true mean durability of print-heads falls between 1.0902 and 1.3871 million characters.
The wider interval accounts for more uncertainty due to the small sample size.

In [10]:
# Step 8: Print results in a readable format
print(f"The Sample mean is: {round(sample_mean, 4)}")
print(f"The Sample Standard Deviation is: {round(sample_std, 4)}")
print(f"The T Critical Value is: {round(t_Critical, 4)}")
print(f"The Margin of Error is: {round(margin_error_t, 4)}")
print(f"The Confidence Interval when population's Standard Deviation is unknown (Using t-distribution): ({t_Confidence_Interval[0]:.4f}, {t_Confidence_Interval[1]:.4f})")

The Sample mean is: 1.2387
The Sample Standard Deviation is: 0.1932
The T Critical Value is: 2.9768
The Margin of Error is: 0.1485
The Confidence Interval when population's Standard Deviation is unknown (Using t-distribution): (1.0902, 1.3871)


###Explanation of the Steps Taken:
  * Collected sample data and calculated the sample mean and sample standard deviation.
  * Used t-distribution to find the t-critical value for a 99% confidence interval.
  * Computed the margin of error and constructed the confidence interval.

###Rationale for Using t-Distribution:
  * The sample size ( 𝑛 = 15 ) is small.
  * The population standard deviation (σ) is unknown, so we must estimate it from the sample.
  * The t-distribution accounts for more variability in small samples, providing a more reliable estimate than the normal (z) distribution.

Thus, the 99% confidence interval (1.0902, 1.3871) means we are 99% confident that the true mean durability of all print-heads falls within this range.  
Because of the small sample size and unknown population standard deviation, we must use the t-distribution instead of the z-distribution to ensure our estimate is accurate and reliable.

_________________________________________________________________________________________________________________________________________________________________________

In [11]:
import numpy as np  # Importing NumPy for numerical operations
import scipy.stats as st  # Importing SciPy stats for statistical functions

##(B) Build 99% Confidence Interval Using Known Population Standard Deviation
If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.


In [12]:
# Step 1: Define the sample data (number of characters printed before failure)
Z_data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

In [13]:
# Step 2: Calculate the sample size (number of observations)
n = len(Z_data)  # Count of data points
n
# OP: 15 (There are 15 data points in the dataset)

15

In [14]:
# Step 3: Compute the sample mean (average number of characters printed before failure)
sample_mean_Z = np.mean(Z_data)  # Mean of the dataset
sample_mean_Z
# OP: 1.2387 (This is the estimated mean from the sample)

1.2386666666666666

Sample Mean (1.2387): This is the average number of characters printed before failure based on the sample.

In [15]:
# Step 4: Given Population Standard Deviation
Z_std_dev = 0.2  # This is known from the problem statement

Population Standard Deviation (0.2): This is given in the problem and represents the true variability in the population.

In [16]:
# Step 5: Define the Confidence Level
confidence_Interval_Z = 0.99  # 99% confidence level
alpha_Z = 1 - confidence_Interval_Z  # Alpha represents the probability of error
alpha_Z
# OP: 0.01 (This is the level of significance, meaning 1% probability of error)

0.010000000000000009

In [17]:
# Step 6: Compute the Z Critical Value
z_critical = st.norm.ppf(1 - alpha_Z / 2)  # Critical value from the standard normal table
z_critical
# OP: 2.576 (This is the z-score corresponding to a 99% confidence level)

2.5758293035489004

Z Critical Value (2.576): This is the z-score corresponding to a 99% confidence level.

In [18]:
# Step 7: Compute Margin of Error
margin_error_z = z_critical * (Z_std_dev / np.sqrt(n))
margin_error_z
# OP: 0.1331 (This is the margin of error, meaning how much our estimate may vary)

0.13301525327090588

Margin of Error (0.1331): This indicates how much the estimate may vary around the mean.


In [19]:
# Step 8: Compute the Confidence Interval
z_Confidence_Interval = (sample_mean_Z - margin_error_z, sample_mean_Z + margin_error_z)  # CI Formula
z_Confidence_Interval
# OP: (1.1056, 1.3718) (This is the interval within which the true mean is expected to lie)

(1.1056514133957607, 1.3716819199375725)

Confidence Interval (1.1056, 1.3718): This means we are 99% confident that the true population mean lies between 1.1056 million and 1.3718 million characters before failure.

In [20]:
# Step 9: Print results in a readable format
print(f"The Sample mean is: {round(sample_mean_Z, 4)}")
print(f"The Population Standard Deviation is: {round(Z_std_dev, 4)}")
print(f"The T Critical Value is: {round(z_critical, 4)}")
print(f"The Margin of Error is: {round(margin_error_z, 4)}")
print(f"The Confidence Interval when population's Standard Deviation is known (Using z-distribution): ({z_Confidence_Interval[0]:.4f}, {z_Confidence_Interval[1]:.4f})")

The Sample mean is: 1.2387
The Population Standard Deviation is: 0.2
The T Critical Value is: 2.5758
The Margin of Error is: 0.133
The Confidence Interval when population's Standard Deviation is known (Using z-distribution): (1.1057, 1.3717)


##We use the z-distribution instead of the t-distribution because:

* Population Standard Deviation is Known:  
  * The problem states that the true standard deviation (σ) is 0.2. When the population standard deviation is known, we use the z-distribution.

* Sample Size Consideration:
  * When n ≥ 30, the Central Limit Theorem (CLT) ensures the sample mean is normally distributed. Here, n = 15, which is slightly small, but if σ is known, the z-distribution is still preferred.

* Z-Distribution for More Precise Estimates:
  * The z-distribution assumes a fixed standard deviation, making it more stable for large datasets. The t-distribution is used when σ is unknown and estimates the population standard deviation using the sample, which introduces more variability.

Thus, We use the z-distribution here because the population standard deviation is given, and in such cases, using the normal (z) distribution provides an accurate and reliable confidence interval.

__________________________________________________________________________________________________________