# 

# Automated Optical Inspection at Variass
Help Variass in predicting the defect rates of assembled products based on their specifications

predict the quality (rate of defects per defect type) based on a given Valor code (or list of Valor codes cq the bill of materials of a completely new product)

## Step 1: Introduction

In [1]:
import pandas as pd
import numpy as np

## Step 1 : Introduction
Read excel AOI, order list and VPL (taking around 31s) and converting it to csv for better performance

In [None]:
file_path_order = "data/initial/Order picklists.xlsx"
file_path_vpl = "data/initial/VPL list.xlsx"
pd.read_excel(file_path_order).to_csv("data/orders.csv", index=False)
pd.read_excel(file_path_vpl).to_csv("data/vpl_codes.csv", index=False)

Read the new csv files as three new dataframes 

In [2]:
df_aoi_defects = pd.read_csv("data/initial/AOI defects last year.csv", index_col="id")
df_orders = pd.read_csv("data/orders.csv")
df_vpl = pd.read_csv("data/vpl_codes.csv")

Additional info on defect df 
- *defecttypestring* = Defecttypestring: type of defect
- *reviewd* = 2 after manual inspection the 3D defect is approved, 1 after manual inspection the 3D defect is disapproved (false call)
- *refid* = reference identification with is the position at the PCBA
- *Partnumber* = link to the orderpick list and VPL list

Additional info on order list 
- *Order* = each order corresponds to an part needed for that order and its total amount
- *part number* = each part number corresponds an specific part needed for that specific assembly order
- *total amount* = total amount of parts needed to fulfil the order  

In [3]:
df_merged_order_vpl = pd.merge(df_orders, df_vpl[["partnumber", "VPLpackage"]], left_on="part number", right_on="partnumber")
df_merged_vpl_and_defects = pd.merge(df_vpl[["partnumber", "VPLpackage"]], df_aoi_defects[["reviewed", "defecttypestring", "refid", "partnumber"]], left_on="partnumber", right_on="partnumber")
print("merged orders and vpl")
print(df_merged_order_vpl.head(3))
print("merged vpl and defects")
print(df_merged_vpl_and_defects.head(3))

merged orders and vpl
     Order part number  total amount partnumber              VPLpackage
0  1036933   CAP-00693         504.0  CAP-00693  PZXC-L12/HH-L198W52T53
1  1038499   CAP-00693         200.0  CAP-00693  PZXC-L12/HH-L198W52T53
2  1038502   CAP-00693         200.0  CAP-00693  PZXC-L12/HH-L198W52T53
merged vpl and defects
  partnumber            VPLpackage  reviewed defecttypestring refid
0  VVH-00021  PDSO-J3/XX-L48W46T37         1          Missing    R6
1  VVH-00021  PDSO-J3/XX-L48W46T37         1          Missing    R6
2  VVH-00021  PDSO-J3/XX-L48W46T37         1          Missing    R6


Value counts on the merged df of vpl and defects shows that 

In [None]:
print(df_merged_vpl_and_defects.value_counts(subset=["reviewed"]))
df_merged_vpl_and_defects = df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] != 0]
df_merged_vpl_and_defects.value_counts(subset=["reviewed"])

reviewed
2           686190
1           228998
0             6134
dtype: int64


reviewed
2           686190
1           228998
dtype: int64

# Step 3 : Defect rates
In  this  step,  you  will  make  descriptive  estimations  on  the  defect  rates  without  considering 
any product specifications.
- (a)  Find the defect rate returned by the automated inspection system on the overall and per defect type.  
- (b)  Find the true- and false-positive rates overall and per defect type, based on the information on both automated and manual inspections

The defects found by the 3D machine are based on the column 'reviewed'. Starting with this column lets use value_counts to check how many times 2 (approved after inspection) and 1 (disapproved after inspection) occure in the dataset. 

In [4]:
df_merged_vpl_and_defects.value_counts(subset=["reviewed"])

reviewed
2           686190
1           228998
0             6134
dtype: int64

We see something intressting in the result of value counts on the review column, as can be observerd in the result the review column also includes 0 values, it is assumeed that this 0 value means AOI machine didn't reviewed these parts yet. Focussing on the calculation of the defect rate overall, the length merged dataframe of vpl codes and defects is divided by the sum of total amount column in df_orders and multiplied by 100.    

In [94]:
total_amount_parts_ordered = df_orders["total amount"].sum()
print("Total amount of orders", total_amount_parts_ordered)
# defect rate by the automated inspection system on the overall, but 0 reviews should be included 
defect_rate_overall = (len(df_merged_vpl_and_defects) / total_amount_parts_ordered) * 100
# get all defects 
print("Defect rate returned by the automated inspection system overall is", round(defect_rate_overall, 3), "%")

Total amount of orders 76993396.73099999
Defect rate returned by the automated inspection system overall is 1.197 %


We observer based on number of defects and the total order amount that the automated inspection system finds out of 76993396.73 parts ordered that 1.2% are found of being defect. Next, the value_counts method is used on the column defect types in order to count the number of occurances per defect type. The results of the value_counts function is then placed in an dataframe called df_defects. Based on the new dataframe df_defects the defect rate per defect type is calculated by dividing the type of defect counts by total amount of parts ordered and multiplied by 100. 

In [131]:
defect_types = df_merged_vpl_and_defects.value_counts(subset=["defecttypestring"])
# source: https://stackoverflow.com/questions/47136436/python-pandas-convert-value-counts-output-to-dataframe
df_defects = pd.DataFrame(defect_types)
df_defects = df_defects.reset_index()
df_defects.columns = ["defect_type", "amount"]
# calculate the defect rate per defect type
df_defects["defect_rate"] = (df_defects["amount"] / total_amount_parts_ordered) * 100
df_defects

Unnamed: 0,defect_type,amount,defect_rate
0,InsufficientSolder,393691,0.511331
1,Missing,200502,0.260415
2,WrongPart,104381,0.135571
3,LiftedLead,59554,0.077349
4,Bridge,32489,0.042197
5,LiftedPackage,26018,0.033793
6,Polarity,22641,0.029406
7,Shift,21437,0.027843
8,Tomstone,15710,0.020404
9,NoSolder,11843,0.015382


Based on the AOI defect type excel file there are 37 different defect types of which 25 are observed here, why 25? Because on index numbers 10 and 13 of the value counts result above we observe two Dutch words: Tekort and tekort meaning too small, but based on the AOI defect type excel file there is no defect type with that name (or the english translation). Besides this, based on the AOI defect types excel file the 'most used (in green)' defects are Missing, Shift, InsufficientSolder and LiftedPackage, but in the analysis of the dataframe we also observe that wrongpart placement is also quite often found by the AOI machine. Next, we look at the true- and false-positive rates.   

As stated in the powerpoint presentation of Variass slide 14 the AOI machine reviews every part and when the an defect is observed the manual inspection reviews if the defect is correctly observed by the machine, if the machine is correct the manual inspection approves (reviewed = 2 = true-positive) the defect, if the machine is incorrect the manual inspection disapproves (reviewed = 1 = false-positive) the defect. Let's start by finding the true- and false positive rates overall. 

Let's start by finding the true- and false-positive rates overall. 

In [132]:
# get the amount of reviewed 1 and 2 exluding the 0
total_true_false_positives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] != 0])
false_postives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 1])
true_positives = len(df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 2])
assert false_postives + true_positives == total_true_false_positives
false_positive_rate = (false_postives / total_true_false_positives) * 100
true_positive_rate = (true_positives / total_true_false_positives) * 100
print("False positive rate overall (reviewed = 1 = inspection disapproves):", round(false_positive_rate, 3), "%")
print("True positive rate overall (reviewed = 2 = inspection approves):", round(true_positive_rate, 3), "%")
print("Correct defects found by the machine", (total_amount_parts_ordered / 100) * true_positive_rate, "parts")
print("Incorrect defects found by the machine", (total_amount_parts_ordered / 100) * false_positive_rate, "parts")

False positive rate overall (reviewed = 1 = inspection disapproves): 25.022 %
True positive rate overall (reviewed = 2 = inspection approves): 74.978 %
Correct defects found by the machine 57728137.71907508 parts
Incorrect defects found by the machine 19265259.01192491 parts


The true- and false-positive rates of the AOI of 74% and 25% respectivly are calculated by picking the amount of true- and false-postives divided by the sum of true- and false-postives times 100. At first hand the AOI machine with 74% based on 25 defect types is quite on point with its analysis. But based on the historical order data that would mean that 25% of the ordered parts are incorrectly assessed by the machine which is quite alot false inspections over the years. So in the end this quite a dubble story with negatives and postives. Let's look further at the true- and false postives per defect type.  

In [133]:
false_positives_counts = df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 1].value_counts(subset=["defecttypestring"])
df_false_positives = pd.DataFrame(false_positives_counts)
df_false_positives = df_false_positives.reset_index()
df_false_positives.columns = ["defect_type", "false_positives"]
df_false_positives["false_positive_rate"] = (df_false_positives["false_positives"] / false_postives) * 100 
print("Number of defect types having false positive reviews are", len(df_false_positives))
df_false_positives

Number of defect types having false positive reviews are 26


Unnamed: 0,defect_type,false_positives,false_positive_rate
0,Missing,64652,28.232561
1,InsufficientSolder,33265,14.526328
2,WrongPart,24872,10.86123
3,LiftedPackage,24297,10.610136
4,Shift,19698,8.601822
5,NoSolder,10152,4.433226
6,LiftedLead,9873,4.311391
7,Tekort,7915,3.456362
8,Tomstone,7886,3.443698
9,Bridge,7724,3.372955


Starting with the false positive rate per defect type it can be observed that out of 37 defect types in the AOI defect types excel file, 26 have been manually false positively reviewed by the inspection. If we look at the df_defect number of defect types with is 27 the number of defect types having a false positive review are 26. This means that 1 defect type in df_defects doesn't have an false posistive review. Let's outer merge the df_defects with df_false_positives on defect_type in order to find out which defect type it is. We use outer merge because when the missing defect_type will not have a match between the two dataframes the cell values will be filled with nan values.    

In [134]:
df_defects = pd.merge(df_defects, df_false_positives, on="defect_type", how="outer")
df_defects

Unnamed: 0,defect_type,amount,defect_rate,false_positives,false_positive_rate
0,InsufficientSolder,393691,0.511331,33265.0,14.526328
1,Missing,200502,0.260415,64652.0,28.232561
2,WrongPart,104381,0.135571,24872.0,10.86123
3,LiftedLead,59554,0.077349,9873.0,4.311391
4,Bridge,32489,0.042197,7724.0,3.372955
5,LiftedPackage,26018,0.033793,24297.0,10.610136
6,Polarity,22641,0.029406,5553.0,2.424912
7,Shift,21437,0.027843,19698.0,8.601822
8,Tomstone,15710,0.020404,7886.0,3.443698
9,NoSolder,11843,0.015382,10152.0,4.433226


After the merge we can indeed see that defect_type LiftedSolder doesnt have any false_positive reviews and is filled with nan values. Next, we need to do the same for the true positives. 

In [135]:
true_positives_counts = df_merged_vpl_and_defects[df_merged_vpl_and_defects["reviewed"] == 2].value_counts(subset=["defecttypestring"])
df_true_positives = pd.DataFrame(true_positives_counts)
df_true_positives = df_true_positives.reset_index()
df_true_positives.columns = ["defect_type", "true_positives"]
df_true_positives["true_positive_rate"] = (df_true_positives["true_positives"] / true_positives) * 100 
print("Number of defect types having false positive reviews are", len(df_true_positives))
df_true_positives

Number of defect types having false positive reviews are 26


Unnamed: 0,defect_type,true_positives,true_positive_rate
0,InsufficientSolder,357939,52.163249
1,Missing,133689,19.482796
2,WrongPart,78715,11.471313
3,LiftedLead,49562,7.222781
4,Bridge,24699,3.59944
5,Polarity,16825,2.451945
6,Tomstone,7730,1.12651
7,Tilt,2932,0.427287
8,Shift,1726,0.251534
9,LiftedPackage,1715,0.249931


Same as the false positives, the true positives are found in 26 out of 27 defect types. The most true positive reviews found are defect_types insufficientsolder, missing part and wrong part. What is intressing is to see that insufficientsolder has an very high true positive rate (>52%) which means that from all true positive reviews upmost 50% consist of insufficient solder. So insufficient solder is quite a big problem for Variass in there production / assembly line, but on the other hand it also a good thing because in >52% cases the machine detects correct defects of having insufficient solder. Next, let's outer merge the true positives with the df_defects in order finalize the descriptive estimations on the defect rates.

In [136]:
df_defects = pd.merge(df_defects, df_true_positives, on="defect_type", how="outer")
df_defects

Unnamed: 0,defect_type,amount,defect_rate,false_positives,false_positive_rate,true_positives,true_positive_rate
0,InsufficientSolder,393691,0.511331,33265.0,14.526328,357939.0,52.163249
1,Missing,200502,0.260415,64652.0,28.232561,133689.0,19.482796
2,WrongPart,104381,0.135571,24872.0,10.86123,78715.0,11.471313
3,LiftedLead,59554,0.077349,9873.0,4.311391,49562.0,7.222781
4,Bridge,32489,0.042197,7724.0,3.372955,24699.0,3.59944
5,LiftedPackage,26018,0.033793,24297.0,10.610136,1715.0,0.249931
6,Polarity,22641,0.029406,5553.0,2.424912,16825.0,2.451945
7,Shift,21437,0.027843,19698.0,8.601822,1726.0,0.251534
8,Tomstone,15710,0.020404,7886.0,3.443698,7730.0,1.12651
9,NoSolder,11843,0.015382,10152.0,4.433226,1678.0,0.244539
