# 0. Metric

For a picture of a bookshelf, the first step in trying to determine the accuracy of 
the api and algorithm is to find the names of all the books by the human eye(***Human Eye Book Name List***), 
Then, applied API and Algorithm to get all names of books by Computing(***Detected Book Name List***).
Finally, compare the two list of book names to get a ***confusion matrix***.
### OCR Confusion Matrix
Expect OCR to return only all the text that appears in the image, not all the book names

1. Matched By Both: Text exists in both ***Human Eye Book Name List*** and ***Detected Book Name List***
2. Only by OCR: Text exists in only ***Detected Book Name List*** but not ***Human Eye Book Name List***
3. OCR Book Name Total Amount: Total number of Text in ***Detected Book Name List***
4. Only by Human: Text exists in only ***Human Eye Book Name List*** but not ***Detected Book Name List***
5. Human Eye Book Name Total Amount: Total number of Text in ***Human Eye Book Name List***
### Text Group Confusion Matrix
Expect algorithm to only group book names without sorting

1. Matched By Both: Book Names exists in both ***Human Eye Book Name List*** and ***Detected Book Name List***
2. Only by Algorithm: Book Names exists in only ***Detected Book Name List*** but not ***Human Eye Book Name List***
3. Algorithm Book Name Total Amount: Total number of Book Names in ***Detected Book Name List***
4. Only by Human: Book Names exists in only ***Human Eye Book Name List*** but not ***Detected Book Name List***
5. Human Eye Book Name Total Amount: Total number of Book Names in ***Human Eye Book Name List***

Recall: ***Matched By Both*** / number of ***Detected Book Name List***
Precision: ***Matched By Both*** / number of ***Human Eye Book Name List***

# 1. Test Different OCR api

## Prepare Human detected data and Image paths

In [2]:
from ActivePyTools.grab_data import *

file_path1 = './data/bookshelves_data.xlsx'
file_path2 = './data/bookshelves_data_round2.xlsx'

mess_sheets_df = pd.read_excel(file_path1, sheet_name=None)
clean_sheets_df = pd.read_excel(file_path2, sheet_name=None)
shelf_1 = mess_sheets_df['Bookshelves_1']
shelf_4 = mess_sheets_df['Bookshelves_4']
shelf_5 = mess_sheets_df['Bookshelves_5']
shelf_6 = clean_sheets_df['Bookshelves_6']
actual_names_1_split, actual_names_1 = collect_human_eye_book_names(shelf_1)
actual_names_4_split, actual_names_4 = collect_human_eye_book_names(shelf_4)
actual_names_5_split, actual_names_5 = collect_human_eye_book_names(shelf_5)
actual_names_6_split, actual_names_6 = collect_human_eye_book_names(shelf_6)

actual_names = [[actual_names_6_split, actual_names_6],
                [actual_names_1_split, actual_names_1],
                [actual_names_4_split, actual_names_4],
                [actual_names_5_split, actual_names_5]]
img_paths = ['./pics/IMG_7940.jpeg', 
             './pics/Bookshelves_1.jpg', 
             './pics/Bookshelves_4.jpg', 
             './pics/Bookshelves_5.jpg']

## Prepare Google and Amazon API Client

In [3]:
from ActivePyTools.bookshelf_detection_pipeline import *

amazon_client, google_client = init_clients()

## 1.0 Crop Image into small pieces

In [7]:
cropped_imgs = get_cropped_images(img_paths)

No need to be cropped


In [45]:
cropped_imgs[0][0][0][0].shape

(1448, 4031, 3)

## 1.1 Text Detection and Lines Detection

In [4]:
half_way_data = get_text_location_from_api(img_paths, actual_names, cropped_imgs, google_client, 'google', enable_print=True)
# half_way_data2 = get_text_location_from_api(img_paths, actual_names, cropped_imgs, amazon_client, 'amazon', enable_print=True)

GOOGLE OCR: 
./pics/IMG_7940.jpeg GOOGLE Text Confusion Matrix:
Matched by both                259                 
Only by OCR                    165                 
OCR Text Total Amount          424                 
Only by Human                  16                  
Human Eye Book Name Total Amount 275                 

OCR Text Recall                0.9418181818181818  
OCR Text Precision             0.6108490566037735  

./pics/IMG_7940.jpeg GOOGLE Book Names Confusion Matrix:
./pics/Bookshelves_1.jpg GOOGLE Text Confusion Matrix:
Matched by both                270                 
Only by OCR                    275                 
OCR Text Total Amount          545                 
Only by Human                  22                  
Human Eye Book Name Total Amount 292                 

OCR Text Recall                0.9246575342465754  
OCR Text Precision             0.4954128440366973  

./pics/Bookshelves_1.jpg GOOGLE Book Names Confusion Matrix:
./pics/Bookshelves_4.jpg GO

## 1.2 Object Detection
After testing the picture in Google Vision demo and Amazon Rekognition demo, neither of them provide useful result.
Google Vision API cannot detect any book object's position
Amazon Rekognition can detect only 5 book object out of 22 books.

## 1.3 Conclusion
The test shows that Google OCR Text Detection has a relatively high accuracy (93.8%) when the books are placed neatly, 
and an acceptable accuracy (79%) when books are in mess. 
Amazon OCR Text Detection sometimes ignore a part of the picture therefore result in a low accuracy, neat (24%), mess (71%).
Neither lines detection for Google nor Amazon can extract correct book names (0% for all pictures)
Object Detection also not work in most case (Of the 22 books, 0 were detected by Google and 5 by Amazon)

Therefore, we only use Google OCR Text Detection for the later analysis.

## 1.4 Save df for future use

In [5]:
# save_df_dict(img_paths, half_way_data)

## 1.5 Load Previous Data

In [4]:
df_dict = load_df_dict(img_paths)

# 2. Text Group Algorithm

Since the position of books cannot be retrieved easily, due to limited number of training data,
(Training Data require: 1. picture of bookshelf; 2. True Book Names or True Book Objects Position;) 
the best way to group them is to develop an algorithm to calculate text positions.

In [56]:
from ActivePyTools.utils import select_slope, ImportVariables
from ActivePyTools.text_group_algorithms import *

def test_hyperparameter(df_collection: dict, actual_data: list, img_path_lst: list, cropped_pics: list, iv: ImportVariables, enable_print = False):
    """
    
    :param df_collection: 
    :param actual_data: list -- list of list [list of word, list of sentences] (True Data)
    :param img_path_lst: list -- list of image paths
    :param cropped_pics: list -- list of tuple (list_of_crop, original image)
            list_of_crop with shape (image_rows, image_columns, img),
            note: img is type of numpy.ndarray with shape (height, width, 3)
    :param iv: ImportVariables -- Necessary variables includes xdt, ydt, img_shape and metric.
    :param enable_print: bool -- print or not
    :return: 
    """
    tp_num_list = []
    dict_lst = []

    if enable_print:
        print(f"Analyzing text by {iv.metric.description} ...\n")
    
    for key, text_df in df_collection.items():
        # detected_texts = text_df.txt.tolist()
        dict_of_dict = {}
        total_TP, total_FP,  = 0, 0
        total_FN = len(actual_data[key][1])

        max_row = max(text_df.crop_idx.apply(lambda coord: coord[0]))
        max_col = max(text_df.crop_idx.apply(lambda coord: coord[1]))
        
        for row in range(max_row + 1):
            dict_of_dict[row] = {}
            for col in range(max_col + 1):
                filtered_df = text_df[text_df['crop_idx'] == (max_row, max_col)]
                
                print(f"df shape: {filtered_df.shape}")
                print(f"key: {key}, row: {row}, col: {col}")
                if filtered_df.shape[0] == 0:
                    continue
                iv.set_img_shape(cropped_pics[key][0][row][col].shape)
                print(f"img shape: {iv.img_shape}")
                # Step1: Relate text by selected criteria
                related_text_dict = group_related_text(filtered_df, iv)
                # Step2: Remove subsets
                unique_text_dict = remove_duplicates(related_text_dict)
                # Step3: Group highly related text groups 
                grouped_text_dict = combine_elements(unique_text_dict, text_df, iv)
                # Step4: Extract Book Names to measure performance
                book_names = [[e[1] for e in v] for k, v in grouped_text_dict.items()]

                TP, FP, FN = update_text_group_confusion_matrix(actual_data[key][1], book_names)
                total_TP += TP
                total_FP += FP
                total_FN -= TP
                # change this line to see different dict
                dict_of_dict[row][col] = related_text_dict
        if enable_print:
            if total_TP + total_FP == 0:
                print(f"{key} -- The OCR detected book names is none")
                print(f"df shape: {text_df.shape}")
                continue
            else:
                print_confusion_matrix(total_TP, total_FP, total_FN, img_path_lst[key])

        dict_lst.append(dict_of_dict)
        tp_num_list.append(total_TP)
        
    return tp_num_list, dict_lst

In [57]:
iv_obj = ImportVariables(xdt=0.7, ydt=1.5, metric=AreaCriteria.EDWA)
tp_list, t_dict = test_hyperparameter(df_dict, actual_names, img_paths, cropped_imgs, iv_obj, True)

Analyzing text by ellipse different weight area ...

df shape: (424, 12)
key: 0, row: 0, col: 0
img shape: (1448, 4031, 3)
./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      0                   
Only by OCR          160                 
OCR Book Name Total Amount     160                 
Only by Human        37                  
Human Eye Book Name Total Amount 37                  

OCR Book Name Recall           0.0                 
OCR Book Name Precision        0.0                 
df shape: (142, 12)
key: 1, row: 0, col: 0
img shape: (1030, 1425, 3)
df shape: (142, 12)
key: 1, row: 0, col: 1
img shape: (1030, 1426, 3)
df shape: (142, 12)
key: 1, row: 1, col: 0
img shape: (774, 1425, 3)
df shape: (142, 12)
key: 1, row: 1, col: 1
img shape: (774, 1426, 3)
./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      0                   
Only by OCR          196                 
OCR Book Name Total Amount     196                 
Only by Human 

In [58]:
t_dict[0][0][0]

{0: [[0, 'ated']],
 1: [[1, '808.3'], (2, 'BUC', 0.6456068986841041)],
 2: [[2, 'BUC'], (1, '808.3', 0.3485509327044479)],
 3: [[3, 'A'], (4, "Writer's", 0.4989215266290625)],
 4: [[4, "Writer's"],
  (3, 'A', 1.0588235294117647),
  (5, 'Guide', 3.005777152726901),
  (6, 'To', 0.23784374275121847)],
 5: [[5, 'Guide'],
  (4, "Writer's", 2.4750961555069266),
  (6, 'To', 1.0),
  (7, 'ACTIVE', 1.1108546883405515)],
 6: [[6, 'To'],
  (5, 'Guide', 0.5118858848640175),
  (7, 'ACTIVE', 0.4258993267504168)],
 7: [[7, 'ACTIVE'],
  (5, 'Guide', 2.6690838504885654),
  (6, 'To', 1.0),
  (8, 'SETTING', 3.9798755311371643)],
 8: [[8, 'SETTING'],
  (1, '808.3', 2.8411211398522687),
  (2, 'BUC', 1.8224987264788541),
  (7, 'ACTIVE', 4.2960291382298506)],
 9: [[9, '808.3'], (10, 'CRA', 0.19381917691412537)],
 10: [[10, 'CRA'], (9, '808.3', 0.15198218789401505)],
 11: [[11, 'CRAFTING'], (12, 'DYNAMIC', 5.036869660060965)],
 12: [[12, 'DYNAMIC'],
  (11, 'CRAFTING', 4.384995607178266),
  (13, 'DIALOGUE', 4.3

In [32]:
df_dict[1].iloc[406]

txt                                                         Jason
confidence                                                    100
vertices             [(23, 392), (26, 432), (13, 433), (10, 393)]
boundBox        {'Width': 16, 'Height': 41, 'Left': 10, 'Top':...
slopes                           (-13.333, 0.077, -13.333, 0.077)
font                                                    13.038405
word_len                                                40.112342
direction                                                vertical
center_point                                        (18.0, 412.5)
crop_idx                                                   (1, 1)
Left                                                           10
Top                                                           392
Name: 406, dtype: object

In [36]:
t_dict[1][0][0]

{446: [[446, 'PARTY'], (450, 'BOX', 0.33460989191847923)],
 447: [[447, 'UGLYDOLL']],
 448: [[448, 'GOODIES'], (449, 'IN', 1.6845617461318005)],
 449: [[449, 'IN'], (448, 'GOODIES', 2.340754098281065)],
 450: [[450, 'BOX'], (446, 'PARTY', 0.10853580349030771)]}

In [17]:
import pickle
with open('data.pickle', 'wb') as pickle_file:
    pickle.dump(t_dict, pickle_file)

# Unpickling (reading) data
with open('data.pickle', 'rb') as pickle_file:
    data_loaded = pickle.load(pickle_file)
data_loaded

[{0: [[0, 'ated']],
  1: [[1, '808.3'], (2, 'BUC', 0.6456068986841041)],
  2: [[2, 'BUC'], (1, '808.3', 0.3485509327044479)],
  3: [[3, 'A'], (4, "Writer's", 0.4989215266290625)],
  4: [[4, "Writer's"],
   (3, 'A', 1.0588235294117647),
   (5, 'Guide', 3.005777152726901),
   (6, 'To', 0.23784374275121847)],
  5: [[5, 'Guide'],
   (4, "Writer's", 2.4750961555069266),
   (6, 'To', 1.0),
   (7, 'ACTIVE', 1.1108546883405515)],
  6: [[6, 'To'],
   (5, 'Guide', 0.5118858848640174),
   (7, 'ACTIVE', 0.4258993267504167)],
  7: [[7, 'ACTIVE'],
   (5, 'Guide', 2.6690838504885654),
   (6, 'To', 1.0),
   (8, 'SETTING', 3.9798755311371643)],
  8: [[8, 'SETTING'],
   (1, '808.3', 2.8411211398522687),
   (2, 'BUC', 1.8224987264788541),
   (7, 'ACTIVE', 4.2960291382298506)],
  9: [[9, '808.3'], (10, 'CRA', 0.19381917691412537)],
  10: [[10, 'CRA'], (9, '808.3', 0.15198218789401505)],
  11: [[11, 'CRAFTING'], (12, 'DYNAMIC', 5.036869660060965)],
  12: [[12, 'DYNAMIC'],
   (11, 'CRAFTING', 4.384995607178

In [12]:
unique_text_dict = remove_duplicates(t_dict[0])
grouped_text_dict = combine_elements(unique_text_dict, df_dict[0], iv_obj)

In [13]:
grouped_text_dict

{0: [[0, 'ated']],
 4: [[4, "Writer's"],
  (3, 'A', 1.0588235294117647),
  (5, 'Guide', 3.005777152726901),
  (6, 'To', 0.23784374275121847)],
 5: [[5, 'Guide'],
  (4, "Writer's", 2.4750961555069266),
  (6, 'To', 1.0),
  (7, 'ACTIVE', 1.1108546883405515)],
 7: [[7, 'ACTIVE'],
  (5, 'Guide', 2.6690838504885654),
  (6, 'To', 1.0),
  (8, 'SETTING', 3.9798755311371643)],
 8: [[8, 'SETTING'],
  (1, '808.3', 2.8411211398522687),
  (2, 'BUC', 1.8224987264788541),
  (7, 'ACTIVE', 4.2960291382298506)],
 12: [[12, 'DYNAMIC'],
  (11, 'CRAFTING', 4.384995607178266),
  (13, 'DIALOGUE', 4.3985263626241276)],
 13: [[13, 'DIALOGUE'],
  (9, '808.3', 1.7928597485890587),
  (10, 'CRA', 0.8139479254000898),
  (12, 'DYNAMIC', 5.095165987772275)],
 14: [[14, 'STORYVILLE'],
  (20, 'TO', 1.09375),
  (21, 'WRITING', 3.65625),
  (22, 'FICTION', 3.258488930988866),
  (23, 'JOHN', 0.1675822680996222),
  (24, '!', 0.2949286065977352),
  (25, 'AN', 0.8696996510210204),
  (26, 'ILLUSTRATED', 1.8473818792658554),
  (

## 2.1 Vertical Line Group

In [17]:
tp_list1, text_dict1 = test_hyperparameter(1, 5, 'vertical line', True)

./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      0                   
Only by OCR          103                 
Only by Human        37                  
Recall               0.0                 
Precision            0.0                 


  ## 2.2 Same Weight Ellipse Group

In [18]:
tp_list2, text_dict2 = test_hyperparameter(1, 5, 'ellipse same weight', True)

./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      8                   
Only by OCR          159                 
Only by Human        29                  
Recall               0.21621621621621623 
Precision            0.04790419161676647 
./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      9                   
Only by OCR          237                 
Only by Human        49                  
Recall               0.15517241379310345 
Precision            0.036585365853658534
./pics/Bookshelves_4.jpg Book Name Group Larger than 80%:
Matched by both      4                   
Only by OCR          156                 
Only by Human        91                  
Recall               0.042105263157894736
Precision            0.025               
./pics/Bookshelves_5.jpg Book Name Group Larger than 80%:
Matched by both      5                   
Only by OCR          105                 
Only by Human        150                 
Recall          

## 2.3 Different Weight Ellipse Group


In [19]:
tp_list3, text_dict3 = test_hyperparameter(1, 5, 'ellipse different weight', True)

./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      7                   
Only by OCR          43                  
Only by Human        30                  
Recall               0.1891891891891892  
Precision            0.14                
./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      8                   
Only by OCR          64                  
Only by Human        50                  
Recall               0.13793103448275862 
Precision            0.1111111111111111  
./pics/Bookshelves_4.jpg Book Name Group Larger than 80%:
Matched by both      13                  
Only by OCR          62                  
Only by Human        82                  
Recall               0.1368421052631579  
Precision            0.17333333333333334 
./pics/Bookshelves_5.jpg Book Name Group Larger than 80%:
Matched by both      6                   
Only by OCR          60                  
Only by Human        149                 
Recall          

In [20]:
def show_distance_check(img2, ref_point, xdt, ydt, metric, x_offset:int = 0, y_offset:int = 0):
    mask = np.zeros_like(img2)
    temp_ref_point = ref_point.copy()
    mid_point = (ref_point.mid_point[0] - x_offset, ref_point.mid_point[1] - y_offset)
    temp_ref_point.mid_point = mid_point

    for y in range(img2.shape[0]):
        for x in range(img2.shape[1]):
            target_slope = select_slope(ref_point.slopes, ref_point.direction)
            temp_select_point = {
                "mid_point": (x, y)
            }
            distance = select_distance_metric(ref_point, temp_select_point, target_slope, xdt, ydt, metric)
            if metric == 'collapse area':
                if distance > 0:
                    mask[y, x] = [255, 0, 0]
            else:
                if distance < 1: # or inter_points >= 2:
                    mask[y, x] = [255, 0, 0]

    result = cv2.addWeighted(img2, 0.7, mask, 0.3, 0)

    plt.figure(figsize=(10, 10))
    plt.imshow(result)
    plt.axis('off')
    plt.show()

In [21]:
half_way_data1[0][0][0]

Unnamed: 0,txt,confidence,vertices,boundBox,slopes,width,height,direction,mid_point,crop_idx,Left,Top
0,ated,100,"[(0, 189), (72, 195), (69, 228), (0, 222)]","{'Width': 72, 'Height': 39, 'Left': 0, 'Top': ...","(0.03, -3.951)",33.136083,72.249567,horizontal,"(35.25, 208.5)","(0, 0)",0,189
1,FRAISE,100,"[(3493, 300), (3536, 296), (3538, 317), (3495,...","{'Width': 45, 'Height': 25, 'Left': 3493, 'Top...","(-0.033, 3.772)",21.095023,43.185646,horizontal,"(3515.5, 308.5)","(0, 0)",3493,296
2,37131,100,"[(3543, 320), (3577, 315), (3579, 334), (3546,...","{'Width': 36, 'Height': 24, 'Left': 3543, 'Top...","(-0.053, 3.413)",19.104973,34.365681,horizontal,"(3561.25, 327.0)","(0, 0)",3543,315
3,119,100,"[(3580, 315), (3602, 312), (3604, 331), (3583,...","{'Width': 24, 'Height': 22, 'Left': 3580, 'Top...","(-0.049, 3.413)",19.104973,22.203603,horizontal,"(3592.25, 323.0)","(0, 0)",3580,312
4,148,100,"[(3602, 312), (3627, 309), (3629, 327), (3605,...","{'Width': 27, 'Height': 22, 'Left': 3602, 'Top...","(-0.043, 3.233)",18.11077,25.179357,horizontal,"(3615.75, 319.75)","(0, 0)",3602,309
...,...,...,...,...,...,...,...,...,...,...,...,...
419,POE,100,"[(3334, 1248), (3370, 1252), (3368, 1266), (33...","{'Width': 37, 'Height': 18, 'Left': 3333, 'Top...","(0.04, -2.515)",14.142136,36.221541,horizontal,"(3351.25, 1257.0)","(0, 0)",3333,1248
420,POE,100,"[(3419, 1251), (3458, 1254), (3457, 1268), (34...","{'Width': 40, 'Height': 17, 'Left': 3418, 'Top...","(0.028, -5.029)",14.035669,39.115214,horizontal,"(3438.0, 1259.5)","(0, 0)",3418,1251
421,808.8N,100,"[(3730, 1237), (3802, 1239), (3802, 1257), (37...","{'Width': 72, 'Height': 20, 'Left': 3730, 'Top...","(0.01, -300.0)",18.0,72.027772,horizontal,"(3766.0, 1247.0)","(0, 0)",3730,1237
422,BOO,100,"[(3727, 1262), (3762, 1265), (3761, 1277), (37...","{'Width': 36, 'Height': 15, 'Left': 3726, 'Top...","(0.031, -4.311)",12.041595,35.128336,horizontal,"(3744.0, 1269.5)","(0, 0)",3726,1262


In [None]:
img = cv2.imread('./docs/crop_pics/google/IMG_7940.jpeg_0_0.jpg')
img2 = img[0:img.shape[0], 0:1000]
point = half_way_data1[0][0][0].iloc[356]
show_distance_check(img2, point, 1, 5, "ellipse same weight")

In [None]:
show_distance_check(img2, point, 1, 5, "ellipse different weight")

## 2.4 Clustering Group

In [84]:
!set OMP_NUM_THREADS=1

In [86]:
test_hyperparameter(0.7, 1.5, "clusters", True)



./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      0                   
Only by OCR          40                  
Only by Human        37                  
Recall               0.0                 
Precision            0.0                 




./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      3                   
Only by OCR          77                  
Only by Human        55                  
Recall               0.0375              
Precision            0.05172413793103448 




./pics/Bookshelves_4.jpg Book Name Group Larger than 80%:
Matched by both      5                   
Only by OCR          155                 
Only by Human        90                  
Recall               0.03125             
Precision            0.05263157894736842 




./pics/Bookshelves_5.jpg Book Name Group Larger than 80%:
Matched by both      0                   
Only by OCR          120                 
Only by Human        155                 
Recall               0.0                 
Precision            0.0                 


([0, 3, 5, 0],
 {0: Empty DataFrame
  Columns: []
  Index: [],
  1: Empty DataFrame
  Columns: []
  Index: [],
  2: Empty DataFrame
  Columns: []
  Index: [],
  3: Empty DataFrame
  Columns: []
  Index: []})

# 3. Hyperparameter Tuning

In [None]:
from tqdm import tqdm

lst = []
threshold_lst = []
txt_dicts = []
for xt in tqdm(range(1, 21), desc='Processing xt'):
    xt = xt / 10
    for yt in tqdm(range(1, 11), desc='Processing yt'):
        yt = yt / 2
        tp_lst, txt_dict = test_hyperparameter(xt, yt, "ellipse different weight")
        lst.append(tp_lst)
        txt_dicts.append(txt_dict)
        threshold_lst.append((xt, yt))
        print(f'xt = {xt}, yt = {yt}') 

In [27]:
import json
data_to_save = {
    "tp_nums": lst,
    "threshold_lst": threshold_lst
}
filename = './diff_weight_lists.json'
with open(filename, 'w') as file:
    json.dump(data_to_save, file, indent=4)

In [29]:
filename = './diff_weight_dataframes.h5'

# Use HDFStore to save the DataFrames
with pd.HDFStore(filename, 'w') as store:
    for i, df in enumerate(txt_dicts, start=1):
        store.put(f'df{i}', df)

TypeError: value must be None, Series, or DataFrame

In [40]:
np_arr = np.array(lst)
formatted_labels = [f'{tup[0]}/{tup[1]}' for tup in threshold_lst]

def plot_tp(np_arr, img_name, formatted_labels, segment_idx, write, dpi=300):
    plt.figure(figsize=(20, 12))
    plt.plot(np_arr, marker='o', linestyle='-', color='b')
    imgname = img_name.split('/')[-1]
    save_picname = f'Line_Plot_for_{imgname}_Part_{segment_idx+1}'
    plt.title(save_picname)
    plt.xlabel('Hyperparameter')
    plt.ylabel('Matched By Both')

    plt.xticks(ticks=range(len(formatted_labels)), labels=formatted_labels, rotation=90, fontsize=5)
    plt.grid(True)
    if write:
        save_path = './docs/tuning/diff_weight/' + save_picname + '.jpg'
        plt.savefig(save_path, dpi=dpi)
    else:
        plt.show()
    plt.close()

max_tp_lst = []
max_tp_idx_lst = []
for i in range(len(img_paths)):
    for segment_idx in range(3):
        segment_length = np_arr.shape[0] // 3
        start_idx = segment_idx * segment_length
        end_idx = (segment_idx + 1) * segment_length if segment_idx < 2 else np_arr.shape[0]
        
        element_np = np_arr[start_idx:end_idx, i]
        labels = formatted_labels[start_idx:end_idx]
        
        plot_tp(element_np, img_paths[i], labels, segment_idx, write = True)
        
    max_tp = np.amax(np_arr[:, i])
    max_tp_lst.append(max_tp)
    max_tp_idx_lst.append(np.where(np_arr[:, i] == max_tp)[0].tolist())
max_tp_lst, max_tp_idx_lst

([17, 15, 18, 14], [[62, 72], [37, 62, 66, 67], [26, 34], [33]])

In [43]:
max_tp_idx_lst[0]

[62, 72]

In [52]:
for i in range(len(max_tp_idx_lst)):
    for j in max_tp_idx_lst[i]:
        print(f'{i} {j} : {np_arr[j][i]}  {threshold_lst[j]}')

0 62 : 17  (0.7, 1.5)
0 72 : 17  (0.8, 1.5)
1 37 : 15  (0.4, 4.0)
1 62 : 15  (0.7, 1.5)
1 66 : 15  (0.7, 3.5)
1 67 : 15  (0.7, 4.0)
2 26 : 18  (0.3, 3.5)
2 34 : 18  (0.4, 2.5)
3 33 : 14  (0.4, 2.0)


In [58]:
for i in range(len(max_tp_idx_lst)):
    candidate_idx = [62, 72, 37, 66, 67, 26, 34, 33]
    for idx in candidate_idx:
        print(f'{i} {idx} : {np_arr[idx][i]}  {threshold_lst[idx]}')
    print()

0 62 : 17  (0.7, 1.5)
0 72 : 17  (0.8, 1.5)
0 37 : 12  (0.4, 4.0)
0 66 : 9  (0.7, 3.5)
0 67 : 9  (0.7, 4.0)
0 26 : 11  (0.3, 3.5)
0 34 : 12  (0.4, 2.5)
0 33 : 15  (0.4, 2.0)

1 62 : 15  (0.7, 1.5)
1 72 : 14  (0.8, 1.5)
1 37 : 15  (0.4, 4.0)
1 66 : 15  (0.7, 3.5)
1 67 : 15  (0.7, 4.0)
1 26 : 10  (0.3, 3.5)
1 34 : 12  (0.4, 2.5)
1 33 : 10  (0.4, 2.0)

2 62 : 15  (0.7, 1.5)
2 72 : 13  (0.8, 1.5)
2 37 : 15  (0.4, 4.0)
2 66 : 15  (0.7, 3.5)
2 67 : 16  (0.7, 4.0)
2 26 : 18  (0.3, 3.5)
2 34 : 18  (0.4, 2.5)
2 33 : 15  (0.4, 2.0)

3 62 : 10  (0.7, 1.5)
3 72 : 10  (0.8, 1.5)
3 37 : 9  (0.4, 4.0)
3 66 : 8  (0.7, 3.5)
3 67 : 6  (0.7, 4.0)
3 26 : 10  (0.3, 3.5)
3 34 : 13  (0.4, 2.5)
3 33 : 14  (0.4, 2.0)


In [62]:
test_hyperparameter(0.7, 1.5, "ellipse different weight", True)

./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      17                  
Only by OCR          120                 
Only by Human        20                  
Recall               0.12408759124087591 
Precision            0.4594594594594595  
./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      15                  
Only by OCR          157                 
Only by Human        43                  
Recall               0.0872093023255814  
Precision            0.25862068965517243 
./pics/Bookshelves_4.jpg Book Name Group Larger than 80%:
Matched by both      15                  
Only by OCR          116                 
Only by Human        80                  
Recall               0.11450381679389313 
Precision            0.15789473684210525 
./pics/Bookshelves_5.jpg Book Name Group Larger than 80%:
Matched by both      10                  
Only by OCR          89                  
Only by Human        145                 
Recall          

([17, 15, 15, 10],
 {0:           txt  confidence                                           vertices  \
  0        ated         100         [(0, 189), (72, 195), (69, 228), (0, 222)]   
  1       808.3         100  [(184, 1168), (241, 1170), (241, 1185), (184, ...   
  2         BUC         100  [(184, 1193), (218, 1194), (217, 1208), (184, ...   
  3           A         100   [(220, 347), (220, 383), (186, 383), (186, 347)]   
  4    Writer's         100   [(220, 397), (222, 557), (188, 557), (186, 397)]   
  ..        ...         ...                                                ...   
  419    POETRY         100  [(3870, 560), (3858, 639), (3839, 636), (3851,...   
  420        OF         100  [(3877, 517), (3873, 544), (3853, 541), (3857,...   
  421      BOOK         100  [(3887, 451), (3879, 504), (3860, 502), (3868,...   
  422       THE         100  [(3896, 395), (3890, 436), (3870, 433), (3876,...   
  423       Son         100  [(3962, 228), (4030, 228), (4030, 294), (3962,.

In [63]:
lst2 = []
threshold_lst2 = []
txt_dicts2 = []
for xt in tqdm(range(1, 51), desc='Processing xt'):
    xt = xt / 10
    tp_lst, txt_dict = test_hyperparameter(xt, xt, "ellipse same weight")
    lst2.append(tp_lst)
    txt_dicts2.append(txt_dict)
    threshold_lst2.append(xt)

Processing xt: 100%|██████████| 50/50 [37:27<00:00, 44.95s/it]


In [87]:
import json
data_to_save = {
    "tp_nums": lst2,
    "threshold_lst": threshold_lst2
}
filename = './same_weight_lists.json'
with open(filename, 'w') as file:
    json.dump(data_to_save, file, indent=4)

In [66]:
np_arr = np.array(lst2)

def plot_tp(np_arr, img_name, formatted_labels, segment_idx, write, dpi=300):
    plt.figure(figsize=(20, 12))
    plt.plot(np_arr, marker='o', linestyle='-', color='b')
    imgname = img_name.split('/')[-1]
    save_picname = f'Line_Plot_for_{imgname}_Part_{segment_idx+1}'
    plt.title(save_picname)
    plt.xlabel('Hyperparameter')
    plt.ylabel('Matched By Both')

    plt.xticks(ticks=range(len(formatted_labels)), labels=formatted_labels, rotation=90, fontsize=5)
    plt.grid(True)
    if write:
        save_path = './docs/tuning/same_weight/' + save_picname + '.jpg'
        plt.savefig(save_path, dpi=dpi)
    else:
        plt.show()
    plt.close()

max_tp_lst = []
max_tp_idx_lst = []
for i in range(len(img_paths)):
    for segment_idx in range(3):
        segment_length = np_arr.shape[0] // 3
        start_idx = segment_idx * segment_length
        end_idx = (segment_idx + 1) * segment_length if segment_idx < 2 else np_arr.shape[0]

        element_np = np_arr[start_idx:end_idx, i]
        labels = threshold_lst2[start_idx:end_idx]

        # Plot each segment
        plot_tp(element_np, img_paths[i], labels, segment_idx, True)

    max_tp = np.amax(np_arr[:, i])
    max_tp_lst.append(max_tp)
    max_tp_idx_lst.append(np.where(np_arr[:, i] == max_tp)[0].tolist())
max_tp_lst, max_tp_idx_lst

([14, 10, 12, 9], [[11], [10, 11], [11], [12]])

In [67]:
for i in range(len(max_tp_idx_lst)):
    candidate_idx = [10, 11, 12]
    for idx in candidate_idx:
        print(f'{i} {idx} : {np_arr[idx][i]}  {threshold_lst[idx]}')
    print()

0 10 : 11  (0.2, 0.5)
0 11 : 14  (0.2, 1.0)
0 12 : 11  (0.2, 1.5)

1 10 : 10  (0.2, 0.5)
1 11 : 10  (0.2, 1.0)
1 12 : 8  (0.2, 1.5)

2 10 : 10  (0.2, 0.5)
2 11 : 12  (0.2, 1.0)
2 12 : 10  (0.2, 1.5)

3 10 : 7  (0.2, 0.5)
3 11 : 8  (0.2, 1.0)
3 12 : 9  (0.2, 1.5)


In [68]:
test_hyperparameter(1.1, 1.1, "ellipse same weight", True)

./pics/IMG_7940.jpeg Book Name Group Larger than 80%:
Matched by both      11                  
Only by OCR          128                 
Only by Human        26                  
Recall               0.07913669064748201 
Precision            0.2972972972972973  
./pics/Bookshelves_1.jpg Book Name Group Larger than 80%:
Matched by both      10                  
Only by OCR          173                 
Only by Human        48                  
Recall               0.0546448087431694  
Precision            0.1724137931034483  
./pics/Bookshelves_4.jpg Book Name Group Larger than 80%:
Matched by both      10                  
Only by OCR          133                 
Only by Human        85                  
Recall               0.06993006993006994 
Precision            0.10526315789473684 
./pics/Bookshelves_5.jpg Book Name Group Larger than 80%:
Matched by both      7                   
Only by OCR          90                  
Only by Human        148                 
Recall          

([11, 10, 10, 7],
 {0:           txt  confidence                                           vertices  \
  0        ated         100         [(0, 189), (72, 195), (69, 228), (0, 222)]   
  1       808.3         100  [(184, 1168), (241, 1170), (241, 1185), (184, ...   
  2         BUC         100  [(184, 1193), (218, 1194), (217, 1208), (184, ...   
  3           A         100   [(220, 347), (220, 383), (186, 383), (186, 347)]   
  4    Writer's         100   [(220, 397), (222, 557), (188, 557), (186, 397)]   
  ..        ...         ...                                                ...   
  419    POETRY         100  [(3870, 560), (3858, 639), (3839, 636), (3851,...   
  420        OF         100  [(3877, 517), (3873, 544), (3853, 541), (3857,...   
  421      BOOK         100  [(3887, 451), (3879, 504), (3860, 502), (3868,...   
  422       THE         100  [(3896, 395), (3890, 436), (3870, 433), (3876,...   
  423       Son         100  [(3962, 228), (4030, 228), (4030, 294), (3962,..