# 

# Automated Optical Inspection at Variass
Help Variass in predicting the defect rates of assembled products based on their specifications

predict the quality (rate of defects per defect type) based on a given Valor code (or list of Valor codes cq the bill of materials of a completely new product)

## Step 1: Introduction

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Step 1 : Introduction
Read excel AOI, order list and VPL (taking around 31s) and converting it to csv for better performance

In [2]:
file_path_order = "data/initial/Order picklists.xlsx"
file_path_vpl = "data/initial/VPL list.xlsx"
pd.read_excel(file_path_order).to_csv("data/orders.csv", index=False)
pd.read_excel(file_path_vpl).to_csv("data/vpl_codes.csv", index=False)

Or if the csv files are already there read the new csv files as three new dataframes 

In [2]:
df_aoi_defects = pd.read_csv("data/initial/AOI defects last year.csv", index_col="id")
df_orders = pd.read_csv("data/orders.csv")
df_vpl = pd.read_csv("data/vpl_codes.csv")

Additional info on defect df 
- *defecttypestring* = Defecttypestring: type of defect
- *reviewd* = 2 after manual inspection the 3D defect is approved, 1 after manual inspection the 3D defect is disapproved (false call)
- *refid* = reference identification with is the position at the PCBA
- *Partnumber* = link to the orderpick list and VPL list

Additional info on order list 
- *Order* = each order corresponds to an part needed for that order and its total amount
- *part number* = each part number corresponds an specific part needed for that specific assembly order
- *total amount* = total amount of parts needed to fulfil the order  

In [3]:
df_merged_order_vpl = pd.merge(df_orders, df_vpl[["partnumber", "VPLpackage"]], left_on="part number", right_on="partnumber")
df_merged_vpl_and_defects = pd.merge(df_vpl[["partnumber", "VPLpackage", "Material type", "Position type", "Package type", "Lead type", "Pitch", "Subtype"]], df_aoi_defects[["reviewed", "defecttypestring", "refid", "partnumber"]], left_on="partnumber", right_on="partnumber")
print("merged orders and vpl")
print(df_merged_order_vpl.head(3))
print("merged vpl and defects")
print(df_merged_vpl_and_defects.head(3))

merged orders and vpl
     Order part number  total amount partnumber              VPLpackage
0  1036933   CAP-00693         504.0  CAP-00693  PZXC-L12/HH-L198W52T53
1  1038499   CAP-00693         200.0  CAP-00693  PZXC-L12/HH-L198W52T53
2  1038502   CAP-00693         200.0  CAP-00693  PZXC-L12/HH-L198W52T53
merged vpl and defects
  partnumber            VPLpackage Material type Position type Package type  \
0  VVH-00021  PDSO-J3/XX-L48W46T37             P             D           SO   
1  VVH-00021  PDSO-J3/XX-L48W46T37             P             D           SO   
2  VVH-00021  PDSO-J3/XX-L48W46T37             P             D           SO   

  Lead type Pitch Subtype  reviewed defecttypestring refid  
0        J3     X       X         1          Missing    R6  
1        J3     X       X         1          Missing    R6  
2        J3     X       X         1          Missing    R6  


# Step 3 : Defect rates
In  this  step,  you  will  make  descriptive  estimations  on  the  defect  rates  without  considering 
any product specifications.
- (a)  Find the defect rate returned by the automated inspection system on the overall and per defect type.  
- (b)  Find the true- and false-positive rates overall and per defect type, based on the information on both automated and manual inspections

The defects found by the 3D machine are based on the column 'reviewed'. Starting with this column lets use value_counts to check how many times 2 (approved after inspection) and 1 (disapproved after inspection) occure in the dataset. 

In [4]:
df_merged_vpl_and_defects.value_counts(subset=["reviewed"])

reviewed
2           686190
1           228998
0             6134
dtype: int64

We see something intressting in the result of value counts on the review column, as can be observerd in the result the review column also includes 0 values, it is assumeed that this 0 value means that the manual inspection didn´t reviewed these parts yet. Focussing on the calculation of the defect rate overall, the length merged dataframe of vpl codes and defects is divided by the sum of total amount column in df_orders and multiplied by 100. 

In [4]:
total_amount_parts_ordered = df_orders["total amount"].sum()
print("Total amount of orders", total_amount_parts_ordered)
# defect rate by the automated inspection system on the overall, but 0 reviews should be included 
defect_rate_overall = (len(df_merged_vpl_and_defects) / total_amount_parts_ordered) * 100
# get all defects 
print("Defect rate returned by the automated inspection system overall is", round(defect_rate_overall, 3), "%")

Total amount of orders 76993396.73099999
Defect rate returned by the automated inspection system overall is 1.197 %


We observer based on number of defects and the total order amount that the automated inspection system finds out of 76993396.73 parts ordered that 1.2% are found of being defect. Next, the value_counts method is used on the column defect types in order to count the number of occurances per defect type. The results of the value_counts function is then placed in an dataframe called df_defects. Based on the new dataframe df_defects the defect rate per defect type is calculated by dividing the type of defect counts by total amount of parts ordered and multiplied by 100. 

In [6]:
defect_types = df_merged_vpl_and_defects.value_counts(subset=["defecttypestring"])
# source: https://stackoverflow.com/questions/47136436/python-pandas-convert-value-counts-output-to-dataframe
df_defects = pd.DataFrame(defect_types)
df_defects = df_defects.reset_index()
df_defects.columns = ["defect_type", "amount"]
# calculate the defect rate per defect type
df_defects["defect_rate"] = (df_defects["amount"] / total_amount_parts_ordered) * 100
df_defects

Unnamed: 0,defect_type,amount,defect_rate
0,InsufficientSolder,393691,0.511331
1,Missing,200502,0.260415
2,WrongPart,104381,0.135571
3,LiftedLead,59554,0.077349
4,Bridge,32489,0.042197
5,LiftedPackage,26018,0.033793
6,Polarity,22641,0.029406
7,Shift,21437,0.027843
8,Tomstone,15710,0.020404
9,NoSolder,11843,0.015382


Based on the AOI defect type excel file there are 37 different defect types of which 25 are observed here, why 25? Because on index numbers 10 and 13 of the value counts result above we observe two Dutch words: Tekort and tekort meaning too small, but based on the AOI defect type excel file there is no defect type with that name (or the english translation). Besides this, based on the AOI defect types excel file the 'most used (in green)' defects are Missing, Shift, InsufficientSolder and LiftedPackage, but in the analysis of the dataframe we also observe that wrongpart placement is also quite often found by the AOI machine. Next, we look at the true- and false-positive rates.   

As stated in the powerpoint presentation of Variass slide 14 the AOI machine reviews every part and when the an defect is observed the manual inspection reviews if the defect is correctly observed by the machine, if the machine is correct the manual inspection approves (reviewed = 2 = true-positive) the defect, if the machine is incorrect the manual inspection disapproves (reviewed = 1 = false-positive) the defect. Let's start by finding the true- and false positive rates overall. As found above the reviewed column has some zeros it is assumed that the 0 means that the manual inspection didnt review these parts yet. Therefore, we will leave these out in the following block of code.  

Let's start by finding the true- and false-positive rates overall. 

In [7]:
# get the amount of reviewed 1 and 2 exluding the 0
total_true_false_positives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] != 0])
false_postives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 1])
true_positives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 2])
assert false_postives + true_positives == total_true_false_positives
false_positive_rate = (false_postives / total_true_false_positives) * 100
true_positive_rate = (true_positives / total_true_false_positives) * 100
print("False positive rate overall (reviewed = 1 = inspection disapproves):", round(false_positive_rate, 3), "%")
print("True positive rate overall (reviewed = 2 = inspection approves):", round(true_positive_rate, 3), "%")
print("Correct defects found by the machine", (total_amount_parts_ordered / 100) * true_positive_rate, "parts")
print("Incorrect defects found by the machine", (total_amount_parts_ordered / 100) * false_positive_rate, "parts")

False positive rate overall (reviewed = 1 = inspection disapproves): 25.022 %
True positive rate overall (reviewed = 2 = inspection approves): 74.978 %
Correct defects found by the machine 57728137.71907508 parts
Incorrect defects found by the machine 19265259.01192491 parts


The true- and false-positive rates of the AOI of 74% and 25% respectivly are calculated by picking the amount of true- and false-postives divided by the sum of true- and false-postives times 100. At first hand the AOI machine with 74% based on 25 defect types is quite on point with its analysis. But based on the historical order data that would mean that 25% of the ordered parts are incorrectly assessed by the machine which is quite alot false inspections over the years. So in the end this quite a dubble story with negatives and postives. Let's look further at the true- and false postives per defect type.  

In [8]:
false_positives_counts = df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 1].value_counts(subset=["defecttypestring"])
df_false_positives = pd.DataFrame(false_positives_counts)
df_false_positives = df_false_positives.reset_index()
df_false_positives.columns = ["defect_type", "false_positives"]
df_false_positives["false_positive_rate"] = (df_false_positives["false_positives"] / false_postives) * 100 
print("Number of defect types having false positive reviews are", len(df_false_positives))
df_false_positives

Number of defect types having false positive reviews are 26


Unnamed: 0,defect_type,false_positives,false_positive_rate
0,Missing,64652,28.232561
1,InsufficientSolder,33265,14.526328
2,WrongPart,24872,10.86123
3,LiftedPackage,24297,10.610136
4,Shift,19698,8.601822
5,NoSolder,10152,4.433226
6,LiftedLead,9873,4.311391
7,Tekort,7915,3.456362
8,Tomstone,7886,3.443698
9,Bridge,7724,3.372955


Starting with the false positive rate per defect type it can be observed that out of 37 defect types in the AOI defect types excel file, 26 have been manually false positively reviewed by the inspection. If we look at the df_defect number of defect types with is 27 the number of defect types having a false positive review are 26. This means that 1 defect type in df_defects doesn't have an false posistive review. Let's outer merge the df_defects with df_false_positives on defect_type in order to find out which defect type it is. We use outer merge because when the missing defect_type will not have a match between the two dataframes the cell values will be filled with nan values.    

In [9]:
df_defects = pd.merge(df_defects, df_false_positives, on="defect_type", how="outer")
df_defects

Unnamed: 0,defect_type,amount,defect_rate,false_positives,false_positive_rate
0,InsufficientSolder,393691,0.511331,33265.0,14.526328
1,Missing,200502,0.260415,64652.0,28.232561
2,WrongPart,104381,0.135571,24872.0,10.86123
3,LiftedLead,59554,0.077349,9873.0,4.311391
4,Bridge,32489,0.042197,7724.0,3.372955
5,LiftedPackage,26018,0.033793,24297.0,10.610136
6,Polarity,22641,0.029406,5553.0,2.424912
7,Shift,21437,0.027843,19698.0,8.601822
8,Tomstone,15710,0.020404,7886.0,3.443698
9,NoSolder,11843,0.015382,10152.0,4.433226


After the merge we can indeed see that defect_type LiftedSolder doesnt have any false_positive reviews and is filled with nan values. Next, we need to do the same for the true positives. 

In [10]:
true_positives_counts = df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 2].value_counts(subset=["defecttypestring"])
df_true_positives = pd.DataFrame(true_positives_counts)
df_true_positives = df_true_positives.reset_index()
df_true_positives.columns = ["defect_type", "true_positives"]
df_true_positives["true_positive_rate"] = (df_true_positives["true_positives"] / true_positives) * 100 
print("Number of defect types having false positive reviews are", len(df_true_positives))
df_true_positives

Number of defect types having false positive reviews are 26


Unnamed: 0,defect_type,true_positives,true_positive_rate
0,InsufficientSolder,357939,52.163249
1,Missing,133689,19.482796
2,WrongPart,78715,11.471313
3,LiftedLead,49562,7.222781
4,Bridge,24699,3.59944
5,Polarity,16825,2.451945
6,Tomstone,7730,1.12651
7,Tilt,2932,0.427287
8,Shift,1726,0.251534
9,LiftedPackage,1715,0.249931


Same as the false positives, the true positives are found in 26 out of 27 defect types. The most true positive reviews found are defect_types insufficientsolder, missing part and wrong part. What is intressing is to see that insufficientsolder has an very high true positive rate (>52%) which means that from all true positive reviews upmost 50% consist of insufficient solder. So insufficient solder is quite a big problem for Variass in there production / assembly line, but on the other hand it also a good thing because in >52% cases the machine detects correct defects of having insufficient solder. Next, let's outer merge the true positives with the df_defects in order finalize the descriptive estimations on the defect rates.

In [11]:
df_defects = pd.merge(df_defects, df_true_positives, on="defect_type", how="outer")
df_defects.fillna(0)

Unnamed: 0,defect_type,amount,defect_rate,false_positives,false_positive_rate,true_positives,true_positive_rate
0,InsufficientSolder,393691,0.511331,33265.0,14.526328,357939.0,52.163249
1,Missing,200502,0.260415,64652.0,28.232561,133689.0,19.482796
2,WrongPart,104381,0.135571,24872.0,10.86123,78715.0,11.471313
3,LiftedLead,59554,0.077349,9873.0,4.311391,49562.0,7.222781
4,Bridge,32489,0.042197,7724.0,3.372955,24699.0,3.59944
5,LiftedPackage,26018,0.033793,24297.0,10.610136,1715.0,0.249931
6,Polarity,22641,0.029406,5553.0,2.424912,16825.0,2.451945
7,Shift,21437,0.027843,19698.0,8.601822,1726.0,0.251534
8,Tomstone,15710,0.020404,7886.0,3.443698,7730.0,1.12651
9,NoSolder,11843,0.015382,10152.0,4.433226,1678.0,0.244539


# Step 4: Prediction taks and Performance reporting
Next, we will scrutinize the available to make predictions on defect rates considering product specifications by proposing, implementing and showcasing a prediction model. 

Steps for this part include:
<ol type="a">
 <li>Develop  a  single  model  that  predicts  the  defect  rate  per  defect  type,  based  on  the 
inspection data and product specifications captured in the VPL codes. </li>
 <li>Run  your  model  using  all  inspection  data  for  training  and  learning.  Present  your  results 
and . Provide a critical analysis of your findings.</li>
 <li>Run  your  model  using  80%  of  the  inspection  data  for  training  and  testing  and  20%  for 
validation. Present your results.  Provide a critical analysis of your findings.</li>
</ol>

In [5]:
df_merged_order_vpl.head(1)

Unnamed: 0,Order,part number,total amount,partnumber,VPLpackage
0,1036933,CAP-00693,504.0,CAP-00693,PZXC-L12/HH-L198W52T53


In [6]:
print(len(df_merged_vpl_and_defects))
df_merged_vpl_and_defects.head(1)

921322


Unnamed: 0,partnumber,VPLpackage,Material type,Position type,Package type,Lead type,Pitch,Subtype,reviewed,defecttypestring,refid
0,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6


In [34]:
# source: https://www.pauldesalvo.com/how-to-extract-all-numbers-from-a-string-column-in-python-pandas/
l_w_t = df_merged_vpl_and_defects["VPLpackage"].str.split("-").str[-1].str.extractall("(\d+)").unstack()

In [35]:
df_merged_vpl_and_defects["L"] = l_w_t[0][0].astype(int)
df_merged_vpl_and_defects["W"] = l_w_t[0][1].astype(int)
df_merged_vpl_and_defects["T"] = l_w_t[0][2].astype(int)
# df_merged_vpl_and_defects
df_merged_vpl_and_defects

Unnamed: 0,partnumber,VPLpackage,Material type,Position type,Package type,Lead type,Pitch,Subtype,reviewed,defecttypestring,refid,L,W,T
0,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6,48,46,37
1,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6,48,46,37
2,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6,48,46,37
3,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6,48,46,37
4,VVH-00021,PDSO-J3/XX-L48W46T37,P,D,SO,J3,X,X,1,Missing,R6,48,46,37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
921317,KWO-00025,PDCY-L2/XE-L66W66T58,P,D,CY,L2,X,E,2,InsufficientSolder,C215,66,66,58
921318,KWO-00025,PDCY-L2/XE-L66W66T58,P,D,CY,L2,X,E,2,InsufficientSolder,C204,66,66,58
921319,KWO-00025,PDCY-L2/XE-L66W66T58,P,D,CY,L2,X,E,2,InsufficientSolder,C204,66,66,58
921320,KWO-00025,PDCY-L2/XE-L66W66T58,P,D,CY,L2,X,E,2,InsufficientSolder,C214,66,66,58


### Split the Leadtype into Leadtype letter and Leadtype number
Else we would have to many combinations

In [36]:
df_merged_vpl_and_defects["Lead type"].value_counts()

R2      252538
G8       71529
G3       66272
C2       41536
G5       29479
         ...  
B320         1
B449         1
N47          1
L63          1
X12          1
Name: Lead type, Length: 258, dtype: int64

In [None]:
df_merged_vpl_and_defects["Lead type number"] = df_merged_vpl_and_defects["Lead type"].str[1].astype(int)
df_merged_vpl_and_defects["Lead type letter"] = df_merged_vpl_and_defects["Lead type"].str[0]

In [85]:
df = df_merged_vpl_and_defects[["defecttypestring", "Material type", "Position type", "Package type", "Lead type", "Pitch", "Subtype", "L", "W", "T", "VPLpackage"]].value_counts().reset_index()
df = df.rename(columns={0:"vpl_counts"})

df_merged_order_vpl_grouped = df_merged_order_vpl.groupby(by="VPLpackage").sum()
df_merged_order_vpl_grouped
df_merged_order_vpl_grouped = df_merged_order_vpl_grouped.reset_index()

df = pd.merge(df, df_merged_order_vpl_grouped[["VPLpackage", "total amount"]], on="VPLpackage")

df["defect_rate"] = df["vpl_counts"] / df["total amount"] * 100

  df_merged_order_vpl_grouped = df_merged_order_vpl.groupby(by="VPLpackage").sum()


Unnamed: 0,defecttypestring,Material type,Position type,Package type,Lead type,Pitch,Subtype,L,W,T,VPLpackage,vpl_counts,total amount,defect_rate
0,Missing,C,D,XD,R2,X,R,16,9,5,CDXD-R2/XR-L16W9T5,26817,3250034.0,0.825130
1,InsufficientSolder,C,D,XD,R2,X,R,16,9,5,CDXD-R2/XR-L16W9T5,13645,3250034.0,0.419842
2,Tomstone,C,D,XD,R2,X,R,16,9,5,CDXD-R2/XR-L16W9T5,1759,3250034.0,0.054123
3,WrongPart,C,D,XD,R2,X,R,16,9,5,CDXD-R2/XR-L16W9T5,725,3250034.0,0.022307
4,LiftedPackage,C,D,XD,R2,X,R,16,9,5,CDXD-R2/XR-L16W9T5,538,3250034.0,0.016554
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6749,Tilt,P,B,GA,B320,P,X,190,190,20,PBGA-B320/PX-L190W190T20,1,246.0,0.406504
6750,InsufficientSolder,M,X,XH,N5,X,X,300,102,29,MXXH-N5/XX-L300W102T29,1,4.0,25.000000
6751,InsufficientSolder,P,D,SO,G4,F,X,28,44,21,PDSO-G4/FX-L28W44T21,1,18.0,5.555556
6752,InsufficientSolder,M,X,XC,N1,X,T,38,22,23,MXXC-N1/XT-L38W22T23,1,480.0,0.208333


In [93]:
df_dummies = pd.get_dummies(df[["defecttypestring", "Material type", "Position type", "Package type", "Lead type", "Pitch", "Subtype", "L", "W", "T"]])
# df_dummies["defect_rate"] = df["defect_rate"]
# df_dummies["defect_type"] = df["defecttypestring"]
# df_dummies.to_csv("data/janjan&tobi.csv", index=False)
df_dummies

Unnamed: 0,L,W,T,defecttypestring_Bridge,defecttypestring_ChipFlying,defecttypestring_ColdSolder,defecttypestring_Damage,defecttypestring_DoubleChip,defecttypestring_ExcessSolder,defecttypestring_ForeignMaterial,...,Subtype_M,Subtype_N,Subtype_O,Subtype_P,Subtype_R,Subtype_S,Subtype_T,Subtype_U,Subtype_X,Subtype_Y
0,16,9,5,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
1,16,9,5,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,16,9,5,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,16,9,5,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,16,9,5,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6749,190,190,20,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
6750,300,102,29,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
6751,28,44,21,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
6752,38,22,23,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [14]:
df_product_specs = pd.merge(df_merged_vpl_and_defects[["defecttypestring", "Material type", "Position type", "Package type", "Lead type", "Pitch", "Subtype"]], df_defects[["defect_type", "defect_rate"]], left_on="defecttypestring", right_on="defect_type")
df_product_specs.drop(columns=["defect_type"])
df_product_specs.rename(columns={"defect_rate": "defect_rate_per_defect_type"}, inplace=True)
df_product_specs

Unnamed: 0,defecttypestring,Material type,Position type,Package type,Lead type,Pitch,Subtype,defect_type,defect_rate_per_defect_type
0,Missing,P,D,SO,J3,X,X,Missing,0.260415
1,Missing,P,D,SO,J3,X,X,Missing,0.260415
2,Missing,P,D,SO,J3,X,X,Missing,0.260415
3,Missing,P,D,SO,J3,X,X,Missing,0.260415
4,Missing,P,D,SO,J3,X,X,Missing,0.260415
...,...,...,...,...,...,...,...,...,...
915942,DoubleChip,C,D,XD,R2,X,J,DoubleChip,0.000069
915943,DoubleChip,C,D,XD,R2,X,R,DoubleChip,0.000069
915944,DoubleChip,C,D,XD,R2,X,R,DoubleChip,0.000069
915945,DoubleChip,C,D,XD,R2,X,R,DoubleChip,0.000069


In [38]:
df_product_specifications = df_merged_vpl_and_defects[["defecttypestring", "Material type", "Position type", "Package type", "Lead type", "Pitch", "Subtype"]].value_counts().reset_index()
df_product_specifications = df_product_specifications.rename(columns={"defecttypestring": "defect_type", 0: "product_spec_count"})
df_product_specifications


Unnamed: 0,defect_type,Material type,Position type,Package type,Lead type,Pitch,Subtype,product_spec_count
0,InsufficientSolder,C,D,XD,R2,X,R,46807
1,Missing,C,D,XD,R2,X,R,46707
2,InsufficientSolder,C,D,XD,R2,X,C,33052
3,InsufficientSolder,P,D,SO,G3,X,T,28556
4,Missing,C,D,XD,R2,X,C,25783
...,...,...,...,...,...,...,...,...
4225,Damage,P,D,SO,G4,F,X,1
4226,Tomstone,P,D,SO,F6,M,T,1
4227,InsufficientSolder,P,D,CY,F2,X,X,1
4228,LiftedPackage,P,Q,FP,N21,M,X,1


In [16]:
df_merged_vpl_and_defects["Material type"].value_counts()

P    572469
C    240828
X     89731
M     13206
L      5088
Name: Material type, dtype: int64

In [39]:
df_material_type = df_merged_vpl_and_defects[["defecttypestring", "Material type"]].value_counts().reset_index()
df_material_type = df_material_type.rename(columns={"defecttypestring": "defect_type", 0: "material_type_defects"})
df_material_type = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_material_type, on="defect_type", how="outer")
df_material_type = df_material_type.rename(columns={"amount": "total_defects"})

df_material_type["probablity_material_defect"] = df_material_type["material_type_defects"] / df_material_type["total_defects"] * 100 
df_material_type["defect_rate_material_type"] = df_material_type["probablity_material_defect"] * df_material_type["defect_rate"]
df_material_type
# df_product_specifications = df_material_type[["defect_type", "Material type", "defect_rate_material_type"]]
# df_product_specifications

Unnamed: 0,defect_type,defect_rate,total_defects,Material type,material_type_defects,probablity_material_defect,defect_rate_material_type
0,InsufficientSolder,0.511331,393691,P,252573,64.155137,32.804502
1,InsufficientSolder,0.511331,393691,C,93495,23.748320,12.143249
2,InsufficientSolder,0.511331,393691,X,42122,10.699254,5.470859
3,InsufficientSolder,0.511331,393691,M,5229,1.328199,0.679149
4,InsufficientSolder,0.511331,393691,L,272,0.069090,0.035328
...,...,...,...,...,...,...,...
104,ChipFlying,0.000203,156,P,44,28.205128,0.005715
105,ChipFlying,0.000203,156,X,9,5.769231,0.001169
106,PinHole,0.000170,131,P,131,100.000000,0.017014
107,DoubleChip,0.000069,53,C,53,100.000000,0.006884


In [40]:
df_position_type = df_merged_vpl_and_defects[["defecttypestring", "Position type"]].value_counts().reset_index()
df_position_type = df_position_type.rename(columns={"defecttypestring": "defect_type", 0: "position_type_defects"})
df_position_type = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_position_type, on="defect_type", how="outer")
df_position_type = df_position_type.rename(columns={"amount": "total_defects"})

df_position_type["probablity_position_type_defect"] = df_position_type["position_type_defects"] / df_position_type["total_defects"] * 100
df_position_type["defect_rate_position_type"] = df_position_type["probablity_position_type_defect"] * df_position_type["defect_rate"] 
df_position_type
# df_product_specifications = pd.merge(df_product_specifications, df_position_type[["defect_type", "Position type", "defect_rate_position_type"]], on="defect_type")
# df_product_specifications

Unnamed: 0,defect_type,defect_rate,total_defects,Position type,position_type_defects,probablity_position_type_defect,defect_rate_position_type
0,InsufficientSolder,0.511331,393691,D,309551,78.627909,40.204876
1,InsufficientSolder,0.511331,393691,S,47916,12.170967,6.223391
2,InsufficientSolder,0.511331,393691,Q,19816,5.033389,2.573727
3,InsufficientSolder,0.511331,393691,X,7227,1.835704,0.938652
4,InsufficientSolder,0.511331,393691,E,4098,1.040918,0.532253
...,...,...,...,...,...,...,...
144,ChipFlying,0.000203,156,Q,1,0.641026,0.000130
145,PinHole,0.000170,131,D,80,61.068702,0.010391
146,PinHole,0.000170,131,Q,51,38.931298,0.006624
147,DoubleChip,0.000069,53,D,53,100.000000,0.006884


In [41]:
df_package_type = df_merged_vpl_and_defects[["defecttypestring", "Package type"]].value_counts().reset_index()
df_package_type = df_package_type.rename(columns={"defecttypestring": "defect_type", 0: "package_type_defects"})
df_package_type = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_package_type, on="defect_type", how="outer")
df_package_type = df_package_type.rename(columns={"amount": "total_defects"})
df_package_type["probablity_package_type_defect"] = df_package_type["package_type_defects"] / df_package_type["total_defects"] * 100
df_package_type["defect_rate_package_type"] = df_package_type["probablity_package_type_defect"] * df_package_type["defect_rate"] 
df_package_type
# df_product_specifications = pd.merge(df_product_specifications, df_package_type[["defect_type", "Package type", "defect_rate_package_type"]], on="defect_type")
# df_product_specifications

Unnamed: 0,defect_type,defect_rate,total_defects,Package type,package_type_defects,probablity_package_type_defect,defect_rate_package_type
0,InsufficientSolder,0.511331,393691,SO,159007,40.388782,20.652031
1,InsufficientSolder,0.511331,393691,XD,110815,28.147710,14.392793
2,InsufficientSolder,0.511331,393691,XC,62542,15.886063,8.123034
3,InsufficientSolder,0.511331,393691,XS,23900,6.070751,3.104162
4,InsufficientSolder,0.511331,393691,FP,14306,3.633814,1.858081
...,...,...,...,...,...,...,...
216,PinHole,0.000170,131,XD,76,58.015267,0.009871
217,PinHole,0.000170,131,FP,51,38.931298,0.006624
218,PinHole,0.000170,131,SO,4,3.053435,0.000520
219,DoubleChip,0.000069,53,XD,53,100.000000,0.006884


In [20]:
df_merged_vpl_and_defects["Lead type"].value_counts()

R2      252538
G8       71529
G3       66272
C2       41536
G5       29479
         ...  
B320         1
B449         1
N47          1
L63          1
X12          1
Name: Lead type, Length: 258, dtype: int64

In [21]:
df_lead_type = df_merged_vpl_and_defects[["defecttypestring", "Lead type"]].value_counts().reset_index()
df_lead_type = df_lead_type.rename(columns={"defecttypestring": "defect_type", 0: "lead_type_defects"})
df_lead_type = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_lead_type, on="defect_type", how="outer")
df_lead_type = df_lead_type.rename(columns={"amount": "total_defects"})
df_lead_type["probability_lead_type_defect"] = df_lead_type["lead_type_defects"] / df_lead_type["total_defects"] * 100
df_lead_type["defect_rate_lead_type"] = df_lead_type["probability_lead_type_defect"] * df_lead_type["defect_rate"]
df_lead_type
# df_product_specifications = pd.merge(df_product_specifications, df_lead_type[["defect_type", "Lead type", "defect_rate_lead_type"]])
# df_product_specifications

Unnamed: 0,defect_type,Material type,defect_rate_material_type,Position type,defect_rate_position_type,Package type,defect_rate_package_type,Lead type,defect_rate_lead_type
0,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244
1,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,G8,4.340632
2,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,G3,4.201919
3,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,F4,2.843880
4,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,G5,1.680404
...,...,...,...,...,...,...,...,...,...
734483,PinHole,P,0.017014,Q,0.006624,SO,0.000520,R2,0.004026
734484,PinHole,P,0.017014,Q,0.006624,SO,0.000520,N12,0.000779
734485,PinHole,P,0.017014,Q,0.006624,SO,0.000520,C2,0.000520
734486,DoubleChip,C,0.006884,D,0.006884,XD,0.006884,R2,0.006884


In [22]:
df_pitch = df_merged_vpl_and_defects[["defecttypestring", "Pitch"]].value_counts().reset_index()
df_pitch = df_pitch.rename(columns={"defecttypestring": "defect_type", 0: "pitch_defects"})
df_pitch = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_pitch, on="defect_type", how="outer")
df_pitch = df_pitch.rename(columns={"amount": "total_defects"})
df_pitch["probability_pitch_defect"] = df_pitch["pitch_defects"] / df_pitch["total_defects"] * 100
df_pitch["defect_rate_pitch"] = df_pitch["probability_pitch_defect"] * df_pitch["defect_rate"]
df_pitch
# df_product_specifications = pd.merge(df_product_specifications, df_pitch[["defect_type", "Pitch", "defect_rate_pitch"]])
# df_product_specifications

Unnamed: 0,defect_type,Material type,defect_rate_material_type,Position type,defect_rate_position_type,Package type,defect_rate_package_type,Lead type,defect_rate_lead_type,Pitch,defect_rate_pitch
0,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244,X,32.695661
1,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244,N,4.180618
2,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244,M,4.096975
3,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244,H,3.572384
4,InsufficientSolder,P,32.804502,D,40.204876,SO,20.652031,R2,12.389244,F,3.556409
...,...,...,...,...,...,...,...,...,...,...,...
8162968,PinHole,P,0.017014,Q,0.006624,SO,0.000520,C2,0.000520,X,0.010391
8162969,PinHole,P,0.017014,Q,0.006624,SO,0.000520,C2,0.000520,H,0.005845
8162970,PinHole,P,0.017014,Q,0.006624,SO,0.000520,C2,0.000520,M,0.000779
8162971,DoubleChip,C,0.006884,D,0.006884,XD,0.006884,R2,0.006884,X,0.006884


In [24]:
df_subtype = df_merged_vpl_and_defects[["defecttypestring", "Subtype"]].value_counts().reset_index()
df_subtype = df_subtype.rename(columns={"defecttypestring": "defect_type", 0: "subtype_defects"})
df_subtype = pd.merge(df_defects[["defect_type", "defect_rate", "amount"]], df_subtype, on="defect_type", how="outer")
df_subtype = df_subtype.rename(columns={"amount": "total_defects"})
df_subtype["probability_subtype_defect"] = df_subtype["subtype_defects"] / df_subtype["total_defects"] * 100
df_subtype["defect_rate_subtype"] = df_subtype["probability_subtype_defect"] * df_subtype["defect_rate"]
df_subtype

Unnamed: 0,defect_type,defect_rate,total_defects,Subtype,subtype_defects,probability_subtype_defect,defect_rate_subtype
0,InsufficientSolder,0.511331,393691,X,125239,31.811497,16.266200
1,InsufficientSolder,0.511331,393691,T,56143,14.260677,7.291924
2,InsufficientSolder,0.511331,393691,R,55882,14.194381,7.258025
3,InsufficientSolder,0.511331,393691,H,40298,10.235946,5.233955
4,InsufficientSolder,0.511331,393691,C,38429,9.761209,4.991207
...,...,...,...,...,...,...,...
388,PinHole,0.000170,131,B,76,58.015267,0.009871
389,PinHole,0.000170,131,X,55,41.984733,0.007143
390,DoubleChip,0.000069,53,R,52,98.113208,0.006754
391,DoubleChip,0.000069,53,J,1,1.886792,0.000130


In [25]:
from functools import reduce






# df_pitch = df_merged_vpl_and_defects[["defecttypestring", "Pitch"]].value_counts().reset_index()
# df_pitch = df_pitch.rename(columns={"defecttypestring": "defect_type", 0: "pitch_defects"})

# df_subtype = df_merged_vpl_and_defects[["defecttypestring", "Subtype"]].value_counts().reset_index()
# df_subtype = df_subtype.rename(columns={"defecttypestring": "defect_type", 0: "subtype_defects"})

# # dfs = [df_defects[["defect_type", "amount"]], df_material_type, df_position_type, df_package_type, df_lead_type, df_pitch, df_subtype]
# # # cumulatively merge df_defects with the 6 product specifications value_counts results using functools.reduce
# # # source: https://stackoverflow.com/questions/44327999/how-to-merge-multiple-dataframes 
# # df_product_specifications = reduce(lambda left,right: pd.merge(left,right,on="defect_type"), dfs)
# # df_product_specifications = df_product_specifications.rename(columns={"amount": "total_defects"})



# df_position_type = pd.merge(df_defects[["defect_type", "amount"]], df_position_type, on="defect_type", how="outer")
# df_position_type = df_position_type.rename(columns={"amount": "total_defects"})
# df_position_type["defect_rate_position_type"] = df_position_type["position_type_defects"] / df_position_type["total_defects"] * 100
# df_position_type[["Position type", "defect_type", "total_defects", "position_type_defects", "defect_rate_position_type"]]

# df_package_type = pd.merge(df_defects[["defect_type", "amount"]], df_package_type, on="defect_type", how="outer")
# df_package_type = df_package_type.rename(columns={"amount": "total_defects"})
# df_package_type["defect_rate_package_type"] = df_package_type["package_type_defects"] / df_package_type["total_defects"] * 100
# df_package_type[["Package type", "defect_type", "total_defects", "package_type_defects", "defect_rate_package_type"]]





# df_product_specifications = pd.merge(df_defects[["defect_type", "amount"]], df_material_type, on="defect_type") # merge material_type with total defects
# df_product_specifications = pd.merge(df_product_specifications, df_position_type, on="defect_type") # merge postion_type with total defects
# df_product_specifications = pd.merge(df_product_specifications, df_package_type, on="defect_type") # merge package_type with total defects
# df_product_specifications = pd.merge(df_product_specifications, df_lead_type, on="defect_type") # merge lead_type with total defects
# # df_product_specifications = pd.merge(df_product_specifications, df_pitch, on="defect_type", how="outer") # merge pitch with total defects
# # df_product_specifications = pd.merge(df_product_specifications, df_subtype, on="defect_type", how="outer") # merge subtype with total defects
# df_product_specifications.fillna(0)
# df_product_specifications = df_product_specifications.rename(columns={"amount": "total_defects"})

# # calculate the defect rate per defect type per product specification
# df_product_specifications




# df_material_type = pd.merge(df_defects[["defect_type", "amount"]], df_material_type, on="defect_type")
# df_material_type = df_material_type.rename(columns={"amount": "total_defects"})
# df_material_type["defect_rate_material_type"] = df_material_type["material_type_defects"] / df_material_type["total_defects"] * 100
# df_material_type[["Material type", "defect_type", "total_defects", "material_type_defects", "defect_rate_material_type"]]

In [26]:
# df_position_type = df_merged_vpl_and_defects[["defecttypestring", "Position type"]].value_counts().reset_index()
# df_position_type = df_position_type.rename(columns={"defecttypestring": "defect_type", 0: "position_type_defects"})
# df_position_type = pd.merge(df_defects[["defect_type", "amount"]], df_position_type, on="defect_type")
# df_position_type = df_position_type.rename(columns={"amount": "total_defects"})
# df_position_type["defect_rate_position_type"] = df_position_type["position_type_defects"] / df_position_type["total_defects"] * 100
# df_position_type[["Position type", "defect_type", "total_defects", "position_type_defects", "defect_rate_position_type"]]
# df_position_type