# Summary for Initial Setup and Imports


In [1]:
import pandas as pd
import os
import re
import numpy as np
import warnings

warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")

Description:

Library Imports: This section imports essential Python libraries needed for the project.

**pandas**: Used for data manipulation and analysis.

**os**: Provides a way of using operating system dependent functionality.

**re**: Allows for regular expression operations.

**numpy**: Adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.


warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl"): Configures the script to ignore specific warnings, particularly those related to openpyxl, which is a library used for reading Excel files.

# Summary for File Path Configuration


In [2]:
path = "C:/Users/zkarimib@volvocars.com/OneDrive - Volvo Cars/Zohreh/Consultant Supplier Quality/Counsultant Supplier/Excel File/Company Excel LTI reports/All Consultant Company/231113/"
input_path = path + "in" + "/"
output_path = path + "out" + "/"
filter_path = path + "filter" + "/"
print(input_path)

C:/Users/zkarimib@volvocars.com/OneDrive - Volvo Cars/Zohreh/Consultant Supplier Quality/Counsultant Supplier/Excel File/Company Excel LTI reports/All Consultant Company/231113/in/



#### Base Path Setup
- **`path` Variable**: Defined to store the base directory where all project-related files are located. 

#### Sub-directory Paths
- **`input_path`**: Designated to hold the path for input files. 
- **`output_path`**: Set up for storing output files, which could include processed datasets, results from data analysis, or any exported data files.
- **`filter_path`**: Used for storing files related to filtering criteria or other specific processing needs.


# Loading and Preparing Filter Data from Excel Files


**File Listing**: The script begins by listing all files in the input_path directory, helping to verify the available data files for processing.


In [3]:
file_list = os.listdir(input_path)
print(file_list)

['accenture.xlsx', 'afry.xlsx', 'akkodis.xlsx', 'Alten.xlsx', 'amaris.xlsx', 'AVL.xlsx', 'broccoli.xlsx', 'caevalue.xlsx', 'cap-consult.xlsx', 'cap-eng.xlsx', 'combitech.xlsx', 'condesign.xlsx', 'consid.xlsx', 'devport.xlsx', 'diadrom.xlsx', 'edag.xlsx', 'epam.xlsx', 'etteplan.xlsx', 'evidente.xlsx', 'expleo.xlsx', 'FEV.xlsx', 'globallogic.xlsx', 'hcltech.xlsx', 'hiq.xlsx', 'knowit.xlsx', 'l-and-t.xlsx', 'luxoft.xlsx', 'mca.xlsx', 'Netgroup.xlsx', 'nexer.xlsx', 'qrtech.xlsx', 'Quokka.xlsx', 'segula.xlsx', 'semcon.xlsx', 'sigma.xlsx', 'tata-consult.xlsx', 'tata-tech.xlsx', 'techmahindra.xlsx', 'tieto.xlsx', 'togethertech.xlsx', 'tribuit.xlsx', 'Vinngroup.xlsx']


**Consultant Companies Data Processing**:
Loads data from the "Consulting Firms.xlsx" file.
Extracts unique consulting companies, applies regex escaping for special characters, and converts them to lowercase for consistent comparison.

In [4]:
consultant_filter_file = pd.read_excel(filter_path + "Consulting Firms.xlsx")
consultant_filter_list = consultant_filter_file['Consulting Company'].unique().tolist()
consultant_filter_list = [re.escape(item) for item in consultant_filter_list]
consultant_filter_list = sorted([element.lower() for element in consultant_filter_list])
#print(consultant_filter_list)

**Automotive Companies Data Processing:**
Similar processing is applied to "Automotive Companies.xlsx", preparing a list of automotive companies.

In [5]:
# Load the Excel file for automative_company
automative_company_filter_file = pd.read_excel(filter_path + "Automotive Companies.xlsx")
automative_company_filter_list = automative_company_filter_file['Company'].tolist()
automative_company_filter_list = [re.escape(item) for item in automative_company_filter_list]
automative_company_filter_list = sorted([element.lower() for element in automative_company_filter_list])
#print(automative_company_filter_list)

# Exploratory Data Analysis (EDA)

### 1.Sheet Name Checking Script

##### Initial Setup
- The script starts by reading the first Excel file in the `input_path` directory and storing its sheet names in a set.

##### Iteration and Comparison
- It then iterates through the rest of the Excel files in the directory.
- For each file, it reads the sheet names and compares them with the sheet names of the first file.

##### Output
- **Matching Sheet Names**: If the sheet names match those in the first file, it prints a confirmation message.
- **Discrepancies in Sheet Names**: If there is a discrepancy in sheet names, it prints a message indicating which file has different sheets.


In [6]:
#Checking Sheet Names in Excel Files

first_file_path = os.path.join(input_path, file_list[0])
first_xls = pd.ExcelFile(first_file_path)
first_sheet_names = set(first_xls.sheet_names)

for file in file_list[1:]:
    file_path = os.path.join(input_path, file)
    xls = pd.ExcelFile(file_path)
    sheet_names = set(xls.sheet_names)
    
    if sheet_names == first_sheet_names:
        print(f"Sheets are the same as in the first file.")
    else:
        print(f"Sheets in {file} are different from the first file.")

Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the first file.
Sheets are the same as in the firs

### Results of Sheet Name Comparison

- **Consistency in Sheet Names**: It was found that all the Excel files have the same sheet names as the first file. This consistency was confirmed for each file processed by the script.



### 2. Listing Sheet Names in an Excel File

The script segment below is designed to list the names of the sheets present in the first Excel file located in the `input_path`.


In [7]:
sheets_list = pd.ExcelFile(input_path + file_list[0])
print(sheets_list.sheet_names)
    

['Overview', 'Locations', 'Company Movements', 'Location Movements', 'Industry Movements', 'Titles', 'Skills', 'Attrition by Functions', 'Attrition by Locations', 'Schools', 'Degrees', 'Fields of Study']


### 3.Displaying Column Names for Each Sheet in an Excel File

This script section lists the names of all columns for each sheet in the first Excel file within the `input_path`.


In [8]:
for sheet_name in sheets_list.sheet_names:
     df = pd.read_excel(os.path.join(input_path, file_list[0]), sheet_name)
     #print(f"Columns for sheet '{sheet_name}': {df.columns.tolist()}")

### Functionality of the Column Name Listing Script

#### Looping Through Sheets
- The script iterates over each sheet name in `sheets_list.sheet_names`.

#### Reading Each Sheet
- For each sheet, it reads the data into a DataFrame `df` using `pd.read_excel`, specifying the sheet name.

#### Printing Column Names
- The column names of each DataFrame are then printed. This is done by accessing the `columns` attribute of `df`, converting it to a list, and printing it along with the sheet name.

### Purpose of the Script

- This process is crucial for gaining an overview of the data structure within each sheet of the Excel file.
- It helps in identifying which columns are available for analysis or need processing in subsequent steps.


### 4.Displaying Column Names for Each Sheet in Every Excel File

This script section lists the names of all columns for each sheet in every Excel file located in the `input_path`.


In [9]:
#The name of columns for each sheets and each file


for file_name in file_list:
    excel_file = pd.ExcelFile(os.path.join(input_path, file_name))
    sheets_list = excel_file.sheet_names

    for sheet_name in sheets_list:
        sheet_temp = excel_file.parse(sheet_name)
        #print(f"Columns for sheet '{sheet_name}' in file '{file_name}': \n{sheet_temp.columns}\n")
    

### Purpose:

This process provides a comprehensive overview of the data structure in every sheet of each Excel file in the directory.
It's instrumental in understanding the variety and consistency of data columns across multiple files and sheets.
This level of detail is particularly useful in projects that involve complex data sets spread across numerous files, ensuring thoroughness in data analysis and preprocessing.

### 5.Displaying First Five Rows for Each Sheet in All Excel Files

The following script segment is designed to display the first five rows of data for each sheet in all Excel files within the `input_path`.


In [10]:
for file_name in file_list:
    excel_file = pd.ExcelFile(os.path.join(input_path, file_name))
    sheets_list = excel_file.sheet_names

    for sheet_name in sheets_list:
        sheet_temp = excel_file.parse(sheet_name)
       
        print(f"5 head row for '{sheet_name}' in file '{file_name}': \n{sheet_temp.head()}\n")

5 head row for 'Overview' in file 'accenture.xlsx': 
                              se-cr-accenture-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2                              Created by Megan Reif        NaN        NaN
3                                                NaN        NaN        NaN
4  For internal distribution only. All use subjec...        NaN        NaN

5 head row for 'Locations' in file 'accenture.xlsx': 
                               Location  Employees  1y growth  1y hires  \
0   Greater Stockholm Metropolitan Area        228  -0.017241        38   
1  Greater Gothenburg Metropolitan Area         34  -0.081081         6   
2       Greater Malmö Metropolitan Area         20   0.176471         5   
3        Täby, Stockholm County, Sweden          8   0.000000         1   
4     Greater Uppsala Metropolitan Area          5  -0.285714      

5 head row for 'Overview' in file 'akkodis.xlsx': 
                                se-cr-akkodis-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2                              Created by Megan Reif        NaN        NaN
3                                                NaN        NaN        NaN
4  For internal distribution only. All use subjec...        NaN        NaN

5 head row for 'Locations' in file 'akkodis.xlsx': 
                                  Location  Employees  1y growth  1y hires  \
0     Greater Gothenburg Metropolitan Area          8   0.333333         3   
1       Greater Västerås Metropolitan Area          6   0.500000         4   
2  Eskilstuna, Sodermanland County, Sweden          2   0.000000         1   
3          Greater Malmö Metropolitan Area          1        NaN         1   
4          Varberg, Halland County, Sweden          1   

5 head row for 'Overview' in file 'amaris.xlsx': 
                                 se-cr-amaris-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2                              Created by Megan Reif        NaN        NaN
3                                                NaN        NaN        NaN
4  For internal distribution only. All use subjec...        NaN        NaN

5 head row for 'Locations' in file 'amaris.xlsx': 
                               Location  Employees  1y growth  1y hires  \
0  Greater Gothenburg Metropolitan Area         29   -0.09375         8   
1   Greater Stockholm Metropolitan Area          3    0.50000         2   
2       Greater Malmö Metropolitan Area          2    0.00000         0   
3      Skellefteå, Västerbotten, Sweden          1        NaN         1   
4    Olofström, Blekinge County, Sweden          1        NaN         1  

5 head row for 'Fields of Study' in file 'broccoli.xlsx': 
                                     Fields of study  Employees  1y growth  \
0             Electrical and Electronics Engineering          8  -0.111111   
1  Mechatronics, Robotics, and Automation Enginee...          2  -0.333333   
2                                   Computer Science          2   0.000000   
3                              Computational Science          2   0.000000   
4                               Computer Engineering          1   0.000000   

   1y hires  % of employees    Your %  
0         0        0.421053  0.100890  
1         0        0.105263  0.064758  
2         0        0.105263  0.084133  
3         0        0.105263  0.071740  
4         0        0.052632  0.035783  

5 head row for 'Overview' in file 'caevalue.xlsx': 
                               se-cr-caevalue-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                            

5 head row for 'Schools' in file 'cap-consult.xlsx': 
                                         Schools  Employees  1y growth  \
0              KTH Royal Institute of Technology         54  -0.250000   
1                                Lund University         46  -0.115385   
2              Chalmers University of Technology         43  -0.140000   
3                           Stockholm University         40  -0.148936   
4  The Faculty of Engineering at Lund University         40  -0.148936   

   1y hires  % of employees    Your %  
0         7        0.054435  0.035608  
1         6        0.046371  0.010124  
2         3        0.043347  0.307209  
3         5        0.040323  0.006284  
4         3        0.040323  0.028801  

5 head row for 'Degrees' in file 'cap-consult.xlsx': 
                          Unnamed: 0  Capgemini  Industry  Your company
0  Master of Business Administration   0.112342  0.056010      0.060962
1                    Master's Degree   0.424051  0.563084     

5 head row for 'Skills' in file 'combitech.xlsx': 
                          Skills  Employees  1y growth  1y hires  Job posts  \
0           Software Development        334  -0.005952        40          2   
1  Python (Programming Language)        250   0.173709        69          1   
2                    Engineering        246   0.060345        56         13   
3                            C++        241   0.090498        53          2   
4                          Scrum        239  -0.020492        18          0   

   % of employees    Your %  
0        0.401442  0.333566  
1        0.300481  0.264444  
2        0.295673  0.549660  
3        0.289663  0.211381  
4        0.287260  0.214523  

5 head row for 'Attrition by Functions' in file 'combitech.xlsx': 
                         Function  Attrition  Your attrition  % of employees
0                     Engineering   0.142857        0.095893        0.442308
1          Information Technology   0.165138        0.137625        0.19

5 head row for 'Schools' in file 'consid.xlsx': 
                             Schools  Employees  1y growth  1y hires  \
0               Linköping University         93   0.000000        21   
1                 Uppsala University         47   0.068182        11   
2   Blekinge Institute of Technology         39  -0.025000         5   
3  KTH Royal Institute of Technology         38   0.225806        10   
4  Chalmers University of Technology         37   0.156250         8   

   % of employees    Your %  
0        0.133047  0.034910  
1        0.067239  0.015360  
2        0.055794  0.021295  
3        0.054363  0.035608  
4        0.052933  0.307209  

5 head row for 'Degrees' in file 'consid.xlsx': 
                          Unnamed: 0    Consid  Industry  Your company
0  Master of Business Administration  0.030201  0.056010      0.060962
1                    Master's Degree  0.489933  0.563084      0.614817
2                 Associate's Degree  0.003356  0.006071      0.003827
3   

5 head row for 'Overview' in file 'edag.xlsx': 
                             se-cr-edag-engsw-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2                              Created by Megan Reif        NaN        NaN
3                                                NaN        NaN        NaN
4  For internal distribution only. All use subjec...        NaN        NaN

5 head row for 'Locations' in file 'edag.xlsx': 
                                      Location  Employees  1y growth  \
0         Greater Gothenburg Metropolitan Area         48   0.043478   
1          Greater Stockholm Metropolitan Area          3   2.000000   
2              Varberg, Halland County, Sweden          1   0.000000   
3             Karlskoga, Orebro County, Sweden          1   0.000000   
4  Trollhättan, Vastra Gotaland County, Sweden          1  -0.500000   

   1y hires  Job po

5 head row for 'Fields of Study' in file 'etteplan.xlsx': 
                             Fields of study  Employees  1y growth  1y hires  \
0                     Mechanical Engineering         56   0.018182        12   
1  Design and Visual Communications, General         28   0.272727         7   
2     Electrical and Electronics Engineering         22   0.047619         9   
3                           Computer Science         14   0.076923         4   
4                      Computational Science         12   0.090909         3   

   % of employees    Your %  
0        0.152589  0.144702  
1        0.076294  0.041194  
2        0.059946  0.100890  
3        0.038147  0.084133  
4        0.032698  0.071740  

5 head row for 'Overview' in file 'evidente.xlsx': 
                               se-cr-evidente-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2 

5 head row for 'Location Movements' in file 'FEV.xlsx': 
                               Location  Departures  Hires  Net change  Ratio
0  Greater Gothenburg Metropolitan Area           1      4           3   0.25

5 head row for 'Industry Movements' in file 'FEV.xlsx': 
                               Industry  Departures  Hires  Net change  Ratio
0               Architecture & Planning           0      1           1      1
1                            Automotive           0      2           2      2
2                      Higher Education           0      1           1      1
3  Mechanical Or Industrial Engineering           0      1           1      1
4              Renewables & Environment           0      1           1      1

5 head row for 'Titles' in file 'FEV.xlsx': 
                       Titles  Employees  1y hires  Job posts  \
0                CAE Engineer          2         1          0   
1   Manager Department System          1         0          0   
2           Managing

5 head row for 'Titles' in file 'hcltech.xlsx': 
                     Titles  Employees  1y growth  1y hires  Job posts  \
0            Technical Lead         26  -0.161290         1          0   
1     Senior Technical Lead         21  -0.045455         1          0   
2       Solutions Architect         11   0.000000         0          0   
3  Senior Software Engineer          8  -0.333333         0          0   
4      Technical Specialist          8  -0.200000         0          0   

   % of employees    Your %  
0        0.090278  0.004189  
1        0.072917  0.000349  
2        0.038194  0.005760  
3        0.027778  0.029674  
4        0.027778  0.002269  

5 head row for 'Skills' in file 'hcltech.xlsx': 
                  Skills  Employees  1y growth  1y hires  Job posts  \
0   Software Development        189  -0.073529        12          5   
1    Agile Methodologies        114  -0.116279         9          0   
2                    SQL        112  -0.089431         6       

5 head row for 'Overview' in file 'knowit.xlsx': 
                                 se-cr-knowit-231113 Unnamed: 1 Unnamed: 2
0                                     Company Report        NaN        NaN
1                             Created on: 11/13/2023        NaN        NaN
2                              Created by Megan Reif        NaN        NaN
3                                                NaN        NaN        NaN
4  For internal distribution only. All use subjec...        NaN        NaN

5 head row for 'Locations' in file 'knowit.xlsx': 
                               Location  Employees  1y growth  1y hires  \
0   Greater Stockholm Metropolitan Area        277   0.007273        58   
1       Greater Malmö Metropolitan Area        134  -0.056338        16   
2  Greater Gothenburg Metropolitan Area         87  -0.112245        18   
3   Greater Linköping Metropolitan Area         58   0.094340        14   
4   Greater Jönköping Metropolitan Area         31  -0.031250         4  

5 head row for 'Skills' in file 'luxoft.xlsx': 
                          Skills  Employees  1y growth  1y hires  Job posts  \
0           Software Development         27  -0.035714         5          4   
1                          Linux         18   0.000000         2          2   
2                     Automotive         18   0.125000         4          0   
3                            C++         16  -0.058824         3          4   
4  Python (Programming Language)         15   0.000000         3          4   

   % of employees    Your %  
0        0.818182  0.333566  
1        0.545455  0.132309  
2        0.545455  0.434282  
3        0.484848  0.211381  
4        0.454545  0.264444  

5 head row for 'Attrition by Functions' in file 'luxoft.xlsx': 
      Function  Attrition  Your attrition  % of employees
0  Engineering   0.238095        0.095893        0.606061

5 head row for 'Attrition by Locations' in file 'luxoft.xlsx': 
                               Location  Attrition 

5 head row for 'Industry Movements' in file 'Netgroup.xlsx': 
               Industry  Departures  Hires  Net change     Ratio
0            Automotive          30     35           5  0.857143
1  Aviation & Aerospace           0      2           2  2.000000
2               Banking           3      1          -2 -3.000000
3         Biotechnology           1      2           1  0.500000
4     Civil Engineering           1      0          -1 -1.000000

5 head row for 'Titles' in file 'Netgroup.xlsx': 
                     Titles  Employees  1y hires  Job posts  \
0  Accessibility Consultant          1         0          0   
1               Agile Coach          2         1          0   
2         Analysis Engineer          1         0          0   
3     Application Developer          1         1          0   
4      Application Engineer          1         0          0   

   N Last Year Employee    Your %  1 Year Growth  % of employees  
0                   0.0  0.000000            NaN   

5 head row for 'Titles' in file 'nexer.xlsx': 
                              Titles  Employees  1y growth  1y hires  \
0                         Consultant        110   0.047619        31   
1                  Software Engineer         76  -0.037975        18   
2  Information Technology Consultant         51  -0.177419        13   
3                   System Developer         36  -0.076923         5   
4    Software Engineering Consultant         21  -0.125000         6   

   Job posts  % of employees    Your %  
0          1        0.171607  0.004189  
1          2        0.118565  0.078373  
2          0        0.079563  0.000524  
3          3        0.056162  0.003316  
4          0        0.032761  0.001222  

5 head row for 'Skills' in file 'nexer.xlsx': 
                 Skills  Employees  1y growth  1y hires  Job posts  \
0  Software Development        424  -0.049327        94         21   
1                   SQL        272  -0.074830        60          7   
2               

5 head row for 'Fields of Study' in file 'Quokka.xlsx': 
                                     Fields of study  Employees  1y hires  \
0  Agricultural Mechanics and Equipment/Machine T...          1         0   
1  Animation, Interactive Technology, Video Graph...          1         1   
2                                       Architecture          1         0   
3                            Artificial Intelligence          2         2   
4       Automotive Engineering Technology/Technician          1         1   

   N Last Year Employee    Your %  1 Year Growth  % of employees  
0                   1.0  0.000000            0.0        0.007634  
1                   0.0  0.000000            NaN        0.007634  
2                   1.0  0.003316            0.0        0.007634  
3                   0.0  0.003666            NaN        0.015267  
4                   0.0  0.032641            NaN        0.007634  

5 head row for 'Overview' in file 'segula.xlsx': 
                           

5 head row for 'Industry Movements' in file 'semcon.xlsx': 
                                Industry  Departures  Hires      Ratio  \
0                               Internet          25      0 -25.000000   
1   Mechanical Or Industrial Engineering          11     12   1.090909   
2      Information Technology & Services          14      5  -2.800000   
3                             Automotive           6      6   1.000000   
4  Electrical & Electronic Manufacturing           5      4  -1.250000   

   Net change  
0         -25  
1           1  
2          -9  
3           0  
4          -1  

5 head row for 'Titles' in file 'semcon.xlsx': 
                       Titles  Employees  1y growth  1y hires  Job posts  \
0             Design Engineer         35   0.093750         8          0   
1           Software Engineer         30  -0.062500         7          0   
2                  Consultant         26  -0.037037         6          0   
3             Project Manager         23  -0.2

5 head row for 'Location Movements' in file 'tata-consult.xlsx': 
                               Location  Departures  Hires      Ratio  \
0   Greater Stockholm Metropolitan Area          87      7 -12.428572   
1  Greater Gothenburg Metropolitan Area          26      0 -26.000000   
2       Greater Malmö Metropolitan Area           7      0  -7.000000   
3      Skellefteå, Västerbotten, Sweden           2      0  -2.000000   
4     Älmhult, Kronoberg County, Sweden           2      0  -2.000000   

   Net change  
0         -80  
1         -26  
2          -7  
3          -2  
4          -2  

5 head row for 'Industry Movements' in file 'tata-consult.xlsx': 
                            Industry  Departures  Hires  Ratio  Net change
0  Information Technology & Services          25      1    -25         -24
1                         Automotive          11      0    -11         -11
2                 Telecommunications          10      0    -10         -10
3                            Ban

5 head row for 'Schools' in file 'techmahindra.xlsx': 
                                 Schools  Employees  1y growth  1y hires  \
0  Visvesvaraya Technological University          4   0.000000         0   
1                      Örebro University          2   0.000000         0   
2               Bharathidasan University          2   0.000000         0   
3    Manipal Academy of Higher Education          2   0.000000         0   
4      Chalmers University of Technology          2  -0.333333         0   

   % of employees    Your %  
0            0.08  0.009949  
1            0.04  0.002967  
2            0.04  0.000524  
3            0.04  0.000698  
4            0.04  0.307209  

5 head row for 'Degrees' in file 'techmahindra.xlsx': 
                          Unnamed: 0  Tech Mahindra  Industry  Your company
0  Master of Business Administration       0.266667  0.056010      0.060962
1                    Master's Degree       0.400000  0.563084      0.614817
2                  Bache

5 head row for 'Attrition by Locations' in file 'togethertech.xlsx': 
                               Location  Attrition  Your attrition  \
0  Greater Gothenburg Metropolitan Area   0.131737        0.115924   

   % of employees  
0        0.688525  

5 head row for 'Schools' in file 'togethertech.xlsx': 
                             Schools  Employees  1y growth  1y hires  \
0  Chalmers University of Technology         29  -0.216216         2   
1               Linköping University         13  -0.071429         3   
2                    University West          6   0.200000         1   
3                University of Borås          4   0.000000         0   
4    Vellore Institute of Technology          3   0.500000         1   

   % of employees    Your %  
0        0.237705  0.307209  
1        0.106557  0.034910  
2        0.049180  0.015710  
3        0.032787  0.017804  
4        0.024590  0.002967  

5 head row for 'Degrees' in file 'togethertech.xlsx': 
                        

5 head row for 'Skills' in file 'Vinngroup.xlsx': 
                              Skills  Employees  1y hires  Job posts  \
0                          .NET Core          3         0          0   
1                     .NET Framework         12         1          0   
2  3D Computer Aided Design (3D CAD)          1         1          0   
3                        3D Printing          4         1          0   
4                   3D Visualization          3         1          0   

   N Last Year Employee    Your %  1 Year Growth  % of employees  
0                   3.0  0.007327       0.000000        0.000955  
1                  11.0  0.042742       0.090909        0.003818  
2                   0.0  0.002094            NaN        0.000318  
3                   3.0  0.010468       0.333333        0.001273  
4                   2.0  0.004710       0.500000        0.000955  

5 head row for 'Attrition by Functions' in file 'Vinngroup.xlsx': 
Empty DataFrame
Columns: [Function, Attrition,

#### Purpose:

This step is vital for a preliminary examination of the data. It allows for a quick check of data formatting, column names, and the type of data contained in each sheet.
Displaying the first few rows of each sheet helps in identifying any immediate data inconsistencies, missing values, or peculiarities that may require attention in the data cleaning or preprocessing stages.
This overview is particularly beneficial when dealing with large datasets spread across multiple files and sheets, as it aids in quickly assessing the data's nature and structure without needing to manually open and inspect each file.

# Data Validation:

### Checking for Missing Values in DataFrames

This script section is designed to identify and report missing values in each sheet of every Excel file located in the `input_path`.


In [11]:
# Check for missing values in the DataFrame

for file_name in file_list:
    excel_file = pd.ExcelFile(os.path.join(input_path, file_name))
    sheets_list = excel_file.sheet_names

    for sheet_name in sheets_list:
        sheet_temp = excel_file.parse(sheet_name)
        
        missing_values = sheet_temp.isnull().sum()
        print(f"Missing values for sheet '{sheet_name}' in file '{file_name}': \n{missing_values}\n")

Missing values for sheet 'Overview' in file 'accenture.xlsx': 
se-cr-accenture-231113     8
Unnamed: 1                20
Unnamed: 2                24
dtype: int64

Missing values for sheet 'Locations' in file 'accenture.xlsx': 
Location          0
Employees         0
1y growth         0
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'accenture.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'accenture.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'accenture.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'accenture.xlsx': 
Titles             0
Employees          0
1y growth      

Missing values for sheet 'Degrees' in file 'AVL.xlsx': 
Degree          0
Company         0
Industry        0
Your company    0
dtype: int64

Missing values for sheet 'Fields of Study' in file 'AVL.xlsx': 
Fields of study          0
Employees                0
1y hires                 0
N Last Year Employee     0
Your %                   0
1 Year Growth           12
% of employees           0
dtype: int64

Missing values for sheet 'Overview' in file 'broccoli.xlsx': 
se-cr-broccoli-231113     8
Unnamed: 1               20
Unnamed: 2               24
dtype: int64

Missing values for sheet 'Locations' in file 'broccoli.xlsx': 
Location          0
Employees         0
1y growth         0
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'broccoli.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in 

Missing values for sheet 'Locations' in file 'combitech.xlsx': 
Location          0
Employees         0
1y growth         7
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'combitech.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'combitech.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'combitech.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'combitech.xlsx': 
Titles            0
Employees         0
1y growth         2
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Skills' in file 'combitech.xlsx': 
Skills

Missing values for sheet 'Fields of Study' in file 'devport.xlsx': 
Fields of study     0
Employees           0
1y growth          15
1y hires            0
% of employees      0
Your %              0
dtype: int64

Missing values for sheet 'Overview' in file 'diadrom.xlsx': 
se-cr-diadrom-231113     8
Unnamed: 1              20
Unnamed: 2              24
dtype: int64

Missing values for sheet 'Locations' in file 'diadrom.xlsx': 
Location          0
Employees         0
1y growth         0
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'diadrom.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'diadrom.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'diadrom.xlsx': 
Industry      

Missing values for sheet 'Overview' in file 'expleo.xlsx': 
se-cr-expleo-231113     8
Unnamed: 1             20
Unnamed: 2             24
dtype: int64

Missing values for sheet 'Locations' in file 'expleo.xlsx': 
Location          0
Employees         0
1y growth         0
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'expleo.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'expleo.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'expleo.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'expleo.xlsx': 
Titles            0
Employees         0
1y growth         7
1y hires          0
Job 

Missing values for sheet 'Overview' in file 'knowit.xlsx': 
se-cr-knowit-231113     8
Unnamed: 1             20
Unnamed: 2             24
dtype: int64

Missing values for sheet 'Locations' in file 'knowit.xlsx': 
Location          0
Employees         0
1y growth         3
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'knowit.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'knowit.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'knowit.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'knowit.xlsx': 
Titles            0
Employees         0
1y growth         5
1y hires          0
Job 

Missing values for sheet 'Industry Movements' in file 'Netgroup.xlsx': 
Industry      0
Departures    0
Hires         0
Net change    0
Ratio         0
dtype: int64

Missing values for sheet 'Titles' in file 'Netgroup.xlsx': 
Titles                   0
Employees                0
1y hires                 0
Job posts                0
N Last Year Employee     0
Your %                   0
1 Year Growth           32
% of employees           0
dtype: int64

Missing values for sheet 'Skills' in file 'Netgroup.xlsx': 
Skills                   0
Employees                0
1y hires                 0
Job posts                0
N Last Year Employee     0
Your %                   0
1 Year Growth           40
% of employees           0
dtype: int64

Missing values for sheet 'Attrition by Functions' in file 'Netgroup.xlsx': 
Function                 0
Attrition                0
Your attrition           0
% of employees           0
Departures               0
Average Employee         0
Attrition by Fun

Missing values for sheet 'Overview' in file 'segula.xlsx': 
se-cr-segula-231113     8
Unnamed: 1             20
Unnamed: 2             24
dtype: int64

Missing values for sheet 'Locations' in file 'segula.xlsx': 
Location          0
Employees         0
1y growth         1
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'segula.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'segula.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'segula.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'segula.xlsx': 
Titles            0
Employees         0
1y growth         9
1y hires          0
Job 

Missing values for sheet 'Overview' in file 'tata-tech.xlsx': 
se-cr-tata-tech-231113     8
Unnamed: 1                20
Unnamed: 2                24
dtype: int64

Missing values for sheet 'Locations' in file 'tata-tech.xlsx': 
Location          0
Employees         0
1y growth         0
1y hires          0
Job posts         0
% of employees    0
Your %            0
dtype: int64

Missing values for sheet 'Company Movements' in file 'tata-tech.xlsx': 
Company       0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Location Movements' in file 'tata-tech.xlsx': 
Location      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Industry Movements' in file 'tata-tech.xlsx': 
Industry      0
Departures    0
Hires         0
Ratio         0
Net change    0
dtype: int64

Missing values for sheet 'Titles' in file 'tata-tech.xlsx': 
Titles            0
Employees         0
1y growth        

Missing values for sheet 'Overview' in file 'Vinngroup.xlsx': 
se-cr-vinngr-actrify-231114      398
Unnamed: 1                       320
Unnamed: 2                       384
se-cr-vinngr-brimondo-231114     398
se-cr-vinngr-eclipse-231114      398
se-cr-vinngr-eyevinn-231114      398
se-cr-vinngr-goovinn-231114      398
se-cr-vinngr-hubbau-231114       398
se-cr-vinngr-humblebee-231114    398
se-cr-vinngr-pollen-231114       398
se-cr-vinngr-provinn-231114      398
se-cr-vinngr-qinq-231114         398
se-cr-vinngr-redigo-231114       398
se-cr-vinngr-vinnter-231114      398
se-cr-vinngr-xvii-231114         398
se-cr-vinngr-yalta-231114        398
se-cr-vinngr-yovinn-231114       398
se-cr-vinngrAB-231114            398
dtype: int64

Missing values for sheet 'Locations' in file 'Vinngroup.xlsx': 
Location                0
Employees               0
1y hires                0
Job posts               0
N Last Year Employee    0
Your %                  0
1 Year Growth           2
% of employ

# Aggregating and Processing Data 


### - Consolidating and Enhancing Data from Multiple Excel Sheets

In [12]:
# Get the sheet names from the first file in the list
initial_sheets_list = pd.ExcelFile(input_path + file_list[0]).sheet_names

final_result = dict()

for sheet_slected in initial_sheets_list:
    final_sheet = pd.DataFrame()
    
    for file_name in file_list:
        temp_sheet = pd.read_excel(input_path + file_name, sheet_slected, index_col=False)
        
        if sheet_slected == 'Degrees':
            temp_sheet = temp_sheet.rename(columns={temp_sheet.columns[0]: 'Degree', temp_sheet.columns[1]: 'Company'})
            
        

        if (sheet_slected == 'Company Movements'):
            temp_sheet['Consultant Company'] = temp_sheet['Company'].str.lower().str.contains("|".join(consultant_filter_list)).apply(lambda x: 1 if x else 0)
            temp_sheet['Automotive Company'] = temp_sheet['Company'].str.lower().str.contains("|".join(automative_company_filter_list)).apply(lambda x: 1 if x else 0)
            
            temp_sheet['company_lower'] = temp_sheet['Company'].str.lower()
            
                        
            
        temp_sheet = temp_sheet.assign(Company_Name=file_name.split(".")[0])
        final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
         
        
    final_result[sheet_slected] = final_sheet


  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_index=True)
  final_sheet = pd.concat([final_sheet, temp_sheet], ignore_inde

### Overview of Operations

#### Initial Sheet Name Extraction
- Retrieves the list of sheet names from the first file in the list to set a standard for processing.

#### Dataframe Initialization
- Initializes a dictionary to store the final aggregated data from all sheets.

#### Sheet-Wise Data Processing
- Iterates through each selected sheet, reading and processing data from all files.
- Applies specific transformations depending on the sheet type (e.g., renaming columns in 'Degrees', categorizing companies in 'Company Movements').

#### Company Categorization and Ranking
- Includes logic to categorize companies as 'Consultant' or 'Automotive', and to rank them based on predefined criteria.

#### Data Aggregation
- Combines data from individual files into a comprehensive DataFrame for each sheet.

### Purpose

- This script segment is tailored for handling diverse datasets spread across multiple Excel files, ensuring data integrity and consistency.
- The transformations and categorizations applied are specific to the project's focus on evaluating consulting suppliers in Sweden, taking into account various factors like degrees, company movements, and rankings.
- The final result is an aggregated, enriched dataset ready for further analysis or visualization, facilitating in-depth insights into supplier quality.


### - Modification of Location Data in 'Locations' and 'Location Movements' Sheets

In [13]:
#modify Location column 

final_result['Locations']['Location'] = final_result['Locations']['Location'].replace('Area', '', regex=True).replace('Greater', '', regex=True).replace('Metropolitan', '', regex=True).replace('City', '', regex=True).replace('County', '', regex=True)
final_result['Location Movements']['Location'] = final_result['Location Movements']['Location'].replace('Area', '', regex=True).replace('Greater', '', regex=True).replace('Metropolitan', '', regex=True).replace('City', '', regex=True).replace('County', '', regex=True)
final_result['Attrition by Locations']['Location'] = final_result['Attrition by Locations']['Location'].replace('Area', '', regex=True).replace('Greater', '', regex=True).replace('Metropolitan', '', regex=True).replace('City', '', regex=True).replace('County', '', regex=True)



#### Key Actions in Location Data Modification

**Regex-Based Replacement**
- The script employs regular expressions to remove specific words from the 'Location' column in both sheets.

**Target Words for Removal**
- Words like 'Area', 'Greater', 'Metropolitan', 'City', and 'County' are targeted for removal to streamline the location data.

**Application to Multiple Sheets**
- Similar transformations are applied to both the 'Locations' and 'Location Movements' sheets.


In [14]:
final_result['Company Movements']['Company']=final_result['Company Movements']['Company'].str.lower()

## -Splitting and Reformatting 'Location' Data in DataFrames

In [15]:
# Split 'Location' column and handle different cases
split_locations = final_result['Locations']['Location'].str.split(",", n=2, expand=True)

# Assign the result to new columns
final_result['Locations'][['City', 'County', 'Country']] = split_locations

# Fill the entire 'Country' column with 'Sweden'
final_result['Locations']['Country'] = 'Sweden'

In [16]:
# Split 'Location' column and handle different cases
split_locations = final_result['Location Movements']['Location'].str.split(",", n=2, expand=True)

# Assign the result to new columns
final_result['Location Movements'][['City', 'County', 'Country']] = split_locations

# Fill the entire 'Country' column with 'Sweden'
final_result['Location Movements']['Country'] = 'Sweden'


In [17]:
# Split 'Location' column and handle different cases
split_locations = final_result['Attrition by Locations']['Location'].str.split(",", n=2, expand=True)

# Assign the result to new columns
final_result['Attrition by Locations'][['City', 'County', 'Country']] = split_locations

# Fill the entire 'Country' column with 'Sweden'
final_result['Attrition by Locations']['Country'] = 'Sweden'

#### Detailed Breakdown of Location Data Processing

**Splitting Location Data**
- The 'Location' column in both 'Locations' and 'Location Movements' sheets is split into separate components based on commas, using the `str.split` method with `expand=True`.
- The split is restricted to a maximum of two commas (three parts) to ensure consistency in the data format.

**Creating New Columns**
- The results of the split are then assigned to new columns: 'City', 'County', and 'Country'.
- This reformatting allows for a more structured and detailed representation of location data.

**Standardizing Country Data**
- In the 'Locations' sheet, the 'Country' column is uniformly filled with 'Sweden', standardizing this aspect of the dataset.


In [18]:
# Replace non-standard commas or semicolons (or any other unexpected delimiter) with a standard comma
final_result['Attrition by Locations']['Location'] = final_result['Attrition by Locations']['Location'].str.replace(r'[;]', ',', regex=True)

# Then try splitting again
split_locations = final_result['Attrition by Locations']['Location'].str.split(r'\s*,\s*', expand=True)


In [19]:
sheets_to_process = ['Attrition by Locations']


for sheet_name in sheets_to_process:
    if sheet_name in final_result:
        
        # Fill missing values in the 'Attrition by Location' column with values from 'Attrition' column
        final_result[sheet_name]['Attrition by Locations'] = final_result[sheet_name]['Attrition by Locations'].fillna(final_result[sheet_name]['Attrition'])
        
        # Fill missing values in the 'Average Employee' column with values from % of employees'*100' 
        final_result[sheet_name]['Average Employee'] = final_result[sheet_name]['Average Employee'].fillna(final_result[sheet_name]['% of employees']*100)
        
        # Fill missing values in the ''Departures' column with values from 'Attrition'*'Average Employee' 
        final_result[sheet_name]['Departures'] = final_result[sheet_name]['Departures'].fillna(final_result[sheet_name]['Attrition']*final_result[sheet_name]['Average Employee'])
    else:
        print(f"{sheet_name} not found in final_result dictionary")


print(final_result['Attrition by Locations'])
    

                    Location  Attrition  Your attrition  % of employees  \
0                Stockholm     0.173913        0.150495        0.745098   
1               Gothenburg     0.281690        0.115924        0.111111   
2                    Malmö     0.162162        0.132964        0.065359   
3                Stockholm     0.200131        0.150495        0.257606   
4               Gothenburg     0.179104        0.115924        0.225490   
..                       ...        ...             ...             ...   
147                Uppsala     0.098039             NaN        0.042345   
148  Kalmar, Kalmar , Sweden   0.098765             NaN        0.034202   
149             Gothenburg     0.131737        0.115924        0.688525   
150             Gothenburg     0.111280        0.116072        1.875000   
151              Stockholm     0.050000        0.150495        0.846154   

     Company_Name  Departures  Average Employee  Attrition by Locations  \
0       accenture   12.9

In [20]:
sheets_to_process = ['Attrition by Functions']


for sheet_name in sheets_to_process:
    if sheet_name in final_result:
        
        # Fill missing values in the '1 Year Growth' column with values from '1y growth' column
        final_result[sheet_name]['Attrition by Function'] = final_result[sheet_name]['Attrition by Function'].fillna(final_result[sheet_name]['Attrition'])
        
        # Fill missing values in the 'Average Employee' column with values from % of employees'*100' 
        final_result[sheet_name]['Average Employee'] = final_result[sheet_name]['Average Employee'].fillna(final_result[sheet_name]['% of employees']*100)
        
        # Fill missing values in the ''Departures' column with values from 'Attrition'*'Average Employee' 
        final_result[sheet_name]['Departures'] = final_result[sheet_name]['Departures'].fillna(final_result[sheet_name]['Attrition']*final_result[sheet_name]['Average Employee'])
    else:
        print(f"{sheet_name} not found in final_result dictionary")


print(final_result['Attrition by Functions'])

                         Function  Attrition  Your attrition  % of employees  \
0                     Engineering   0.210526        0.095893        0.209150   
1          Information Technology   0.179104        0.137625        0.209150   
2            Business Development   0.052174        0.145985        0.199346   
3                      Consulting   0.285714        0.148148        0.068627   
4                      Operations   0.114286        0.165121        0.058824   
..                            ...        ...             ...             ...   
132  Customer Success and Support   0.100000        0.125000        0.016287   
133                      Research   0.125000        0.218905        0.011401   
134        Information Technology   0.265306        0.137625        0.401639   
135                   Engineering   0.194444        0.095893        0.286885   
136                         Sales   0.117647        0.198020        0.147541   

     Company_Name  Departures  Average 

In [21]:
sheets_to_process = ['Locations', 'Titles', 'Skills', 'Schools', 'Fields of Study']


for sheet_name in sheets_to_process:
    if sheet_name in final_result:
        
        # Fill missing values in the '1 Year Growth' column with values from '1y growth' column
        final_result[sheet_name]['1 Year Growth'] = final_result[sheet_name]['1 Year Growth'].fillna(final_result[sheet_name]['1y growth'])
    else:
        print(f"{sheet_name} not found in final_result dictionary")

# Print the updated final_result dictionary
print(final_result)

{'Overview':                                  se-cr-accenture-231113      Unnamed: 1  \
0                                        Company Report             NaN   
1                                Created on: 11/13/2023             NaN   
2                                 Created by Megan Reif             NaN   
3                                                   NaN             NaN   
4     For internal distribution only. All use subjec...             NaN   
...                                                 ...             ...   
2023                                                NaN             NaN   
2024                                                NaN             NaN   
2025                                                NaN             NaN   
2026                                                NaN            Type   
2027                                                NaN  Months of data   

     Unnamed: 2 Company_Name se-cr-afry-231113 se-cr-akkodis-231113  \
0           NaN

### Process Description for Data Preprocessing

#### Defining Sheets to Process
- The script defines a list of sheet names (`sheets_to_process`) that are to be processed. This includes 'Locations', 'Titles', 'Skills', 'Schools', and 'Fields of Study'.

#### Iterative Sheet Processing
- The script iterates through each sheet name in the list.
- It checks if the sheet exists in the `final_result` dictionary.

#### Filling Missing Values
- For sheets present in `final_result`, the script fills missing values in the '1 Year Growth' column with corresponding values from the '1y growth' column.
- This step is crucial for maintaining data integrity and ensuring completeness in key growth-related metrics.

#### Handling Absent Sheets
- If any of the specified sheets are not found in `final_result`, the script prints a message indicating the absence of that sheet.

#### Updating and Printing Final Result
- The script then prints the updated `final_result` dictionary, showcasing the changes made.


# Exporting Processed Data to a Single Excel File

In [23]:
with pd.ExcelWriter(output_path+"Combine231113.xlsx") as writer:
    for sheet_slected in initial_sheets_list:
        final_result[sheet_slected].to_excel(writer, sheet_name=sheet_slected, index=False)

### Key Steps in Exporting Processed Data to Excel

#### Initialization of Excel Writer
- The `pd.ExcelWriter` is used to create an Excel file at the specified `output_path`. This allows for writing multiple DataFrames to a single Excel file across different sheets.

#### Iterating and Writing Data to Sheets
- The script iterates through each sheet name stored in `initial_sheets_list`.
- For each sheet, the corresponding DataFrame from `final_result` is written to the Excel file using the `to_excel` method.
- The parameter `index=False` ensures that DataFrame indices are not included in the output file, leading to cleaner data presentation.
