<a href="https://colab.research.google.com/github/viniciusrpb/cic0193_machinelearning/blob/main/cap2_3_feature_extraction_sift.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extração de Características de Imagens utilizando Bag-of-Features (Bag-of-visual-words) com Scale Invariant Feature Transform (SIFT)

Vamos utilizar a coleção de imagens Swedish Leaf Dataset disponível aqui:

https://www.cvl.isy.liu.se/en/research/datasets/swedish-leaf/

Particularmente, será utilizada a classe "1. Ulmus carpinifolia" e todas as 75 imagens foram convertidas de ".tif" para ".png" antes da execução desse notebook.


In [15]:
from google.colab import drive
drive.mount("/content/drive",force_remount=True)

In [16]:
!cp -r "/content/drive/My Drive/leafs" "leafs"

Como a implementação do SIFT não está disponível na versão mais atual do opencv devido a uma patente, é preciso instalar uma versão mais antiga do opencv que tinha permissão para ter essa implementação

In [17]:
!pip3 install opencv-python==3.4.2.17
!pip3 install opencv-contrib-python==3.4.2.17

Importando as bibliotecas necessárias


In [18]:
import numpy as np
import cv2
import os
from os import listdir
from sklearn.cluster import KMeans
import pandas as pd

Pega o nome dos arquivos associados às imagens das folhas

In [19]:
path_imgs = "leafs/leaf1/"

l_imgs = listdir(path_imgs)

## Implementação do Bag-of-Features

Primeiramente, deve-se inicializar uma lista relacionada a um dicionário, que vai armazenar os descritores de todos os "patches" (localidades) das imagens da entrada.

Em seguida, executa-se o SIFT, que retorna os keypoints (pontos locais) e o histograma das orientações para cada imagem da entrada. É importante salientar que cada imagem pode gerar um número diferentes de pontos-chave e o descritor sempre possui 128 posições.

Vale ressaltar que cada patch está associado com a região de um keypoint. Assim, o descritor de cada keypoint vai representar um vetor de características.

In [20]:
dictionary = []

for img_path in l_imgs:

  img = cv2.imread(path_imgs+img_path)

  sift = cv2.xfeatures2d.SIFT_create()

  keypoints,descriptor = sift.detectAndCompute(img,None)

  for d in descriptor:
    dictionary.append(d)

Demonstrar o SIFT: aula do dia 13/08/2021

In [21]:
# to do next lecture!

A próxima etapa consiste na geração das "visual words" que serão os bins (posições) do histograma, isto é, representações dos grupos de K "patches".

No caso específico abaixo, utiliza-se o K-Means para gerar grupos nos vetores de características dos patches. Poderíamos utilizar qualquer outro algoritmo de agrupamento para esse propósito.

No caso abaixo, vamos gerar um histograma para a coleção de imagens com 50 "visual words". O resultado do K-Means é um modelo representado pelos centróides de cada agrupamento, isto é, as "visual words".

In [22]:
n_visual_words = 50

kmeans_model = KMeans(n_clusters=n_visual_words, 
                      verbose=False, 
                      init='random',
                      n_init=3)

clusters = kmeans_model.fit(dictionary)

Agora temos que gerar o histograma (hist_img) de cada imagem da entrada considerando apenas as "visual words" obtidas do K-Means.

Para isso, geramos novamente os patches para cada imagem da entrada e contabilizamos a frequência de cada patch no histograma da coleção de imagens (histogram). Observe que cada patch é representado pelo descritor SIFT de 128 posições. O histograma será normalizado em relação ao número de keypoints (pontos locais).

Para sabermos em qual bin do histograma que o patch será contabilizado, utilizamos um classificador baseado distância mínima para pegar a label do bin que o patch é mais similar.

In [26]:
histogram = []

for img_path in l_imgs:

  img = cv2.imread(path_imgs+img_path)

  keypoints, descriptor = sift.detectAndCompute(img,None)

  hist_img = np.zeros(n_visual_words)
  number_keypoints = np.size(keypoints)

  for d in descriptor:
    label = clusters.predict([d])
    hist_img[label] += 1/number_keypoints
  
  histogram.append(histo_img)

Por fim, geramos o DataFrame contendo os vetores de características da coleção de imagens da entrada

In [27]:
X = np.array(histograma)

attributes = []

for x in range(1,n_visual_words+1):
  attributes.append("attrib_"+str(x))

df = pd.DataFrame(X,columns=attributes)

Resultado final:

In [28]:
df

Unnamed: 0,attrib_1,attrib_2,attrib_3,attrib_4,attrib_5,attrib_6,attrib_7,attrib_8,attrib_9,attrib_10,attrib_11,attrib_12,attrib_13,attrib_14,attrib_15,attrib_16,attrib_17,attrib_18,attrib_19,attrib_20,attrib_21,attrib_22,attrib_23,attrib_24,attrib_25,attrib_26,attrib_27,attrib_28,attrib_29,attrib_30,attrib_31,attrib_32,attrib_33,attrib_34,attrib_35,attrib_36,attrib_37,attrib_38,attrib_39,attrib_40,attrib_41,attrib_42,attrib_43,attrib_44,attrib_45,attrib_46,attrib_47,attrib_48,attrib_49,attrib_50
0,0.010568,0.003303,0.003303,0.029062,0.024439,0.007266,0.021797,0.003963,0.052180,0.003963,0.011889,0.023778,0.016513,0.018494,0.019155,0.023118,0.013871,0.036328,0.021136,0.002642,0.028402,0.035007,0.015852,0.013871,0.015192,0.066050,0.025760,0.011889,0.021136,0.001982,0.0,0.005945,0.016513,0.016513,0.005284,0.007926,0.002642,0.027081,0.015852,0.009247,0.056143,0.011889,0.051519,0.003963,0.015852,0.001321,0.007926,0.005945,0.005945,0.018494
1,0.014719,0.004600,0.004600,0.040478,0.034039,0.010120,0.030359,0.005520,0.072677,0.005520,0.016559,0.033119,0.022999,0.025759,0.026679,0.032199,0.019319,0.050598,0.029439,0.003680,0.039558,0.048758,0.022079,0.019319,0.021159,0.091996,0.035879,0.016559,0.029439,0.002760,0.0,0.008280,0.022999,0.022999,0.007360,0.011040,0.003680,0.037718,0.022079,0.012879,0.078197,0.016559,0.071757,0.005520,0.022079,0.001840,0.011040,0.008280,0.008280,0.025759
2,0.007207,0.002252,0.002252,0.019820,0.016667,0.004955,0.014865,0.002703,0.035586,0.002703,0.008108,0.016216,0.011261,0.012613,0.013063,0.015766,0.009459,0.024775,0.014414,0.001802,0.019369,0.023874,0.010811,0.009459,0.010360,0.045045,0.017568,0.008108,0.014414,0.001351,0.0,0.004054,0.011261,0.011261,0.003604,0.005405,0.001802,0.018468,0.010811,0.006306,0.038288,0.008108,0.035135,0.002703,0.010811,0.000901,0.005405,0.004054,0.004054,0.012613
3,0.009297,0.002905,0.002905,0.025567,0.021499,0.006392,0.019175,0.003486,0.045904,0.003486,0.010459,0.020918,0.014526,0.016270,0.016851,0.020337,0.012202,0.031958,0.018594,0.002324,0.024985,0.030796,0.013945,0.012202,0.013364,0.058106,0.022661,0.010459,0.018594,0.001743,0.0,0.005230,0.014526,0.014526,0.004648,0.006973,0.002324,0.023823,0.013945,0.008135,0.049390,0.010459,0.045322,0.003486,0.013945,0.001162,0.006973,0.005230,0.005230,0.016270
4,0.005581,0.001744,0.001744,0.015347,0.012905,0.003837,0.011510,0.002093,0.027555,0.002093,0.006278,0.012557,0.008720,0.009766,0.010115,0.012208,0.007325,0.019184,0.011161,0.001395,0.014998,0.018486,0.008371,0.007325,0.008022,0.034880,0.013603,0.006278,0.011161,0.001046,0.0,0.003139,0.008720,0.008720,0.002790,0.004186,0.001395,0.014301,0.008371,0.004883,0.029648,0.006278,0.027206,0.002093,0.008371,0.000698,0.004186,0.003139,0.003139,0.009766
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,0.009190,0.002872,0.002872,0.025273,0.021252,0.006318,0.018955,0.003446,0.045376,0.003446,0.010339,0.020678,0.014360,0.016083,0.016657,0.020103,0.012062,0.031591,0.018380,0.002298,0.024698,0.030442,0.013785,0.012062,0.013211,0.057438,0.022401,0.010339,0.018380,0.001723,0.0,0.005169,0.014360,0.014360,0.004595,0.006893,0.002298,0.023550,0.013785,0.008041,0.048823,0.010339,0.044802,0.003446,0.013785,0.001149,0.006893,0.005169,0.005169,0.016083
71,0.008299,0.002593,0.002593,0.022822,0.019191,0.005705,0.017116,0.003112,0.040975,0.003112,0.009336,0.018672,0.012967,0.014523,0.015041,0.018154,0.010892,0.028527,0.016598,0.002075,0.022303,0.027490,0.012448,0.010892,0.011929,0.051867,0.020228,0.009336,0.016598,0.001556,0.0,0.004668,0.012967,0.012967,0.004149,0.006224,0.002075,0.021266,0.012448,0.007261,0.044087,0.009336,0.040456,0.003112,0.012448,0.001037,0.006224,0.004668,0.004668,0.014523
72,0.005185,0.001620,0.001620,0.014258,0.011990,0.003564,0.010693,0.001944,0.025599,0.001944,0.005833,0.011666,0.008101,0.009073,0.009397,0.011342,0.006805,0.017822,0.010369,0.001296,0.013934,0.017174,0.007777,0.006805,0.007453,0.032404,0.012638,0.005833,0.010369,0.000972,0.0,0.002916,0.008101,0.008101,0.002592,0.003889,0.001296,0.013286,0.007777,0.004537,0.027544,0.005833,0.025275,0.001944,0.007777,0.000648,0.003889,0.002916,0.002916,0.009073
73,0.009270,0.002897,0.002897,0.025492,0.021437,0.006373,0.019119,0.003476,0.045771,0.003476,0.010429,0.020857,0.014484,0.016222,0.016802,0.020278,0.012167,0.031866,0.018540,0.002317,0.024913,0.030707,0.013905,0.012167,0.013326,0.057937,0.022596,0.010429,0.018540,0.001738,0.0,0.005214,0.014484,0.014484,0.004635,0.006952,0.002317,0.023754,0.013905,0.008111,0.049247,0.010429,0.045191,0.003476,0.013905,0.001159,0.006952,0.005214,0.005214,0.016222
