# Statistical Analysis:T-Test
Hypothesis / Assumption: Most companies replace departures with new hires, so small differences during any time period are part of a normal turnover and thus not meaningful, but when differences are large, it can be an indicator of a company's ability to attract and retain talent. We identifying the companies for which either departures or hires are real using tests of statistical significance.

H0(null hypothesis): There is no evidence that there is a real (statistically significant) difference between departures and new hires.(Departures = Hires)

H1(alternative hypothesis): There is evidence that there is a real difference between departures and new hires (Departures =! Hires)


In [None]:
import pandas as pd
from scipy.stats import ttest_ind 

# Load the Excel file
excel_file = 'C:/Users/zkarimib@volvocars.com/OneDrive - Volvo Cars/Zohreh/Consultant Supplier Quality/Counsultant Supplier/Excel File/Company Excel LTI reports/All Consultant Company/out/Combine.xlsx'

# Assume the sheet name is 'Company Movements'
sheet_name = 'Company Movements'

# Read the data into a DataFrame
df = pd.read_excel(excel_file, sheet_name=sheet_name)

# Get unique company names
unique_companies = df['Company_Name'].unique()

# Create a new column
df['Good Consultant2'] = ''

# Iterate over unique companies
for company in unique_companies:
    # Extract data for the current company
    company_data = df[df['Company_Name'] == company]
    departures = company_data['Departures']
    hires = company_data['Hires']
    
        # Not normal, dependent, but equal variances, use Mann-Whitney U Test
    stat, pvalue = ttest_ind(departures, hires)
    result = 'Departures =! Hires' if pvalue < 0.025 else 'Departures = Hires'
        

    # Update the 'Good Consultant' column
    df.loc[df['Company_Name'] == company, 'Good Consultant2'] = result

# Display the DataFrame with the new column
print(df[['Company_Name', 'Good Consultant2']])


 The script applies a T-test to each company's talent flow data to determine if there's a significant difference between the number of departures and hires.

#### Analysis Workflow:

1. **Data Preparation:**

-The 'Company Movements' sheet is read into a DataFrame.
-Unique company names are extracted for analysis.

2. **Column Initialization:**

-A new column, 'Good Consultant2', is added to the DataFrame to store the results of the statistical test.

3. **Iterative T-Test:**

-For each company, the script extracts the data on departures and hires.
-A T-test (ttest_ind) is performed to compare these two datasets.
-The p-value is used to determine if there's a significant difference between departures and hires (with a significance level of 2.5%).

4. **Result Assignment:**

-Based on the p-value, the 'Good Consultant2' column is updated with either 'Departures =! Hires' (indicating a significant difference) or 'Departures = Hires' (no significant difference).

5. **Output Display: **

-The DataFrame, with the new column showing the test results for each company, is printed.

#### Purpose of the Analysis:
This method provides insights into whether consulting firms have a balanced talent flow (similar numbers of departures and hires) or if there are significant disparities.
The statistical approach ensures that observed differences are not due to random chance, adding rigor to the analysis.
Such insights can inform strategic decisions in talent management, helping companies understand if they are losing more talent than they are gaining or vice versa.

# Exporting DataFrame with T-Test Results to Excel

In [4]:
# Save the DataFrame to Excel
df.to_excel('C:/Users/zkarimib@volvocars.com/OneDrive - Volvo Cars/Zohreh/Consultant Supplier Quality/Counsultant Supplier/Excel File/Company Excel LTI reports/All Consultant Company/out//statistical Methods.xlsx', index=False)