## DETECT FAKES

Here I will go through the steps what I had done and reason their usage. I had done the whole code in Visual Studio 2017 which helped me with great debugging tools.Following things are meant for demo where I loaded all the .pkl files and MODELS which had been pre built from visual studio.



#### IMPORT FILES

In [1]:
import numpy as np
import pandas as pd
import scipy as sp
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import re

#### HELPER FUNCTIONS

* remove_punc : This function removes all symbols and punctuations since they are of less importance as a feature
* alpha_order : This arranges the column 'product_family' in alphabetical order
* avg_word    : Helps in getting average count of words
* get_jaccard_sim : Helps in retrieving jaccard similarity between two vectors . like intersection between the same.
* ascii_rep       : Representation of 'size' column as unique asciicode representation. I dint use this in code since word2vec handles it nicely
* word2vec_feat    : They help in representation of every word in vectors which can then be fed to train the model. Each word here is represented in a vector of size  150

* normalize_vect : It normalizes the given vector between a range of -1 to 1.Th formulation is simple.
* common_feat    : Using the tf-idf(Term Frequency - Inverse Domain Frequency) a transform is set on which the vector is fitted and a cosine similarity of both vectors are determined.

In [2]:
#-- REMOVES THE PUNCTUATIONS AND SYMBOLS --#
def remove_punc(inp):
    #s = re.sub(r'[^\w\s]','',s)
    inp=inp.replace(';',' ')
    return re.sub(r'[^\w\s]','',str(inp))

#-- ARRANGES THE PRODUCT FAMILY IN ALPHABETICAL ORDER --#
def alpha_order(inp):
    return ' '.join(sorted(inp.split(',')))

#-- AVERAGE _--#
def avg_word(sentence):
    words = sentence.split()
    return (sum(len(word) for word in words)/len(words))

#-- THIS FUNCTION RETRIEVES THE COSINE SIMILARITY BETWEEN TWO VECTORS --#
def get_jaccard_sim(str1, str2): 
    a = set(str1.split()) 
    b = set(str2.split())
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

#-- ASCII REPRESENTATION OF SIZE --#
def ascii_rep(x):
    lst=np.array([ord(ch) for ch in x])
    return ((lst/ord('z')).sum())/len(lst)

#-- GET WOD TO VEC FEATURE VECTOR --#
def word2vec_feat(lst_words,model):
    fin_vec=[]
    for wrd in lst_words:
        try:
            fin_vec.append(model[wrd])
        except:
            fin_vec.append(np.zeros(150).tolist())
    fin_vec=np.array(fin_vec)
    wd_vec_feat=fin_vec.sum(axis=0)
    return wd_vec_feat
 

#-- NORMALIZES THE VECTOR --#
def normalize_vect(lst):
    return lst/np.sqrt((lst**2).sum())  #v/np.sqrt((v**2).sum())

#-- COSINE SIMILARITY BETWEEN TWO SENTENCES --#
def common_feat(sentences):
    vectorizer=TfidfVectorizer()
    tfidf=vectorizer.fit_transform(sentences)
    cosine_similarities = linear_kernel(tfidf[0:1], tfidf).flatten()
    return cosine_similarities

#### LETS DIVE IN

* data_file_pd.pkl is the actual excel data given which is conveted into .pkl 
* I'm considering only the subcategoy Tops(Kids) containing 311051 rows
* I'm not considering images here. Trying to fit a model with words and similarities as features.
* Will proceed with some sample outputs
* I made use of pandas which helped me in hitting the output in less time without necessities of 'for'

In [3]:
#fields=['productId','title','description','mrp','sellingPrice','specialPrice','categories','productBrand','productFamily','size','color','keySpecsStr','sellerName']
#df = pd.read_excel('machine_learn_data_xls.xlsx',converters={'productId':str,'title':str,'description':str,'mrp':str,'sellingPrice':str,'specialPrice':str,'categories':str,'productBrand':str,'productFamily':str,'size':str,'color':str,'keySpecsStr':str,'sellerName':str})
df =pd.read_pickle("data_file_pd.pkl")
df['productFamily']=df['productFamily'].apply(alpha_order)
df['train_set'] = df[['title','description','productBrand','productFamily','size','color','keySpecsStr','sellerName']].apply(lambda x: ' '.join(map(str, x)), axis=1)
df['Removed_punc']=df['train_set'].apply(remove_punc)
df['Removed_punc'][0]

'Citrine Casual Short Sleeve Printed Womens Pink White Top This beautiful printed modal top from Citrine is soft against the skin It features mock pockets and a pleat at the back Pair with jeans for a cool and casual look Citrine TOPE9ABBBTJYDSQE TOPE9ABBHJ8HGGGK TOPE9ABBPDAN7VCH S Pink Off White Round Neck Short Sleeve Fabric Modal Pattern Printed Pack of 1 Shweta Mathur'

In [4]:
## SAMPLE 
df.head(3)

Unnamed: 0,productId,title,description,imageUrlStr,mrp,sellingPrice,specialPrice,productUrl,categories,productBrand,...,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,train_set,Removed_punc
0,TOPE9ABBZU3HZRHN,Citrine Casual Short Sleeve Printed Women's Pi...,This beautiful printed modal top from Citrine ...,http://img.fkcdn.com/image/top/r/h/n/1-1-wwtpw...,1099,329,329,http://dl.flipkart.com/dl/citrine-casual-short...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Citrine,...,,,,,,,,,Citrine Casual Short Sleeve Printed Women's Pi...,Citrine Casual Short Sleeve Printed Womens Pin...
1,TOPE9ABBBTJYDSQE,Citrine Casual Short Sleeve Printed Women's Pi...,This beautiful printed modal top from Citrine ...,http://img.fkcdn.com/image/top/r/h/n/1-1-wwtpw...,1099,329,329,http://dl.flipkart.com/dl/citrine-casual-short...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Citrine,...,,,,,,,,,Citrine Casual Short Sleeve Printed Women's Pi...,Citrine Casual Short Sleeve Printed Womens Pin...
2,TOPE6XZPUVT9C7RU,Butterfly Wears Casual Short Sleeve Solid Wome...,,http://img.fkcdn.com/image/top/y/h/c/5245-butt...,799,799,799,http://dl.flipkart.com/dl/butterfly-wears-casu...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Butterfly Wears,...,,,,,,,,,Butterfly Wears Casual Short Sleeve Solid Wome...,Butterfly Wears Casual Short Sleeve Solid Wome...


In [5]:
#length
df['len_row']= df['Removed_punc'].apply(lambda x: len(str(x)))
'LENGTH:',df['len_row'][0]

('LENGTH:', 373)

In [6]:
## UNIQUE REPRENTATION OF CHARACTERS
df['len_char']=df['Removed_punc'].apply(lambda x : len(''.join(set(str(x).replace(' ','')))))
'LENGTH OF CHAR:',df['len_char'][0]

('LENGTH OF CHAR:', 48)

In [7]:
df['len_word']=df['Removed_punc'].apply(lambda x :len(str(x).split()))
'WORD COUNT',df['len_word'][0]

('WORD COUNT', 61)

In [8]:
#COSINE SMILARITY OF FIRST SENTENCE WITH RESPECT TO SECOND. AVERAGE OF THESE WILL BE USED
'COSINE SMILARITY :',common_feat([df['Removed_punc'][0],df['Removed_punc'][45]])

('COSINE SMILARITY :', array([1.        , 0.18434916]))

In [9]:
'JACCARD SIMILARITY :',get_jaccard_sim(df['Removed_punc'][0],df['Removed_punc'][45])

('JACCARD SIMILARITY :', 0.1746031746031746)

#### BASIC FEATURE EXTRACTION

They are some exapmles of basic and common feature extractions. Now lets build positive and negative datasets for training.

new_df.pkl file:
    This file stores all the extracted basic features in a single vector. They are non-normalized as of now.They contain only the features of individual vectors. The common features are yet to be built and added. 
    
 ISSUE : 'gensim' library is not working perfectly in jupyter notebook.Hardly I tried figuring it out. So I have trained the word2vec model and extracted features using VS . I will make use of those files here. The function I used to extract and combine features is:
 
 

In [10]:



def feature_extract(df):
   # print('---------------------------------------------------------------------')
    print('---------------BASE FEATURES EXTRACTION------------------------------')
    df['productFamily']=df['productFamily'].apply(alpha_order)
    df['train_set'] = df[['title','description','productBrand','productFamily','size','color','keySpecsStr','sellerName']].apply(lambda x: ' '.join(map(str, x)), axis=1)
    df['Removed_punc']=df['train_set'].apply(remove_punc)
    df['len_row']= df['Removed_punc'].apply(lambda x: len(str(x)))
    print('Length of rows DONE')
    df['len_char']=df['Removed_punc'].apply(lambda x : len(''.join(set(str(x).replace(' ','')))))
    print('Length of characters DONE')
    df['len_word']=df['Removed_punc'].apply(lambda x :len(str(x).split()))
    print('Length of words DONE')
    df['avg_word']=df['Removed_punc'].apply(lambda x: avg_word(x))
    print('Average of words DONE')
    df['numerics'] = df['Removed_punc'].apply(lambda x: len([x for x in x.split() if x.isdigit()]))
    print('Check Numerics DONE')
    df['upper_count'] = df['Removed_punc'].apply(lambda x: len([x for x in x.split() if x.isupper()]))
    print('Upper case counts DONE')
    df['lower_case']=df['Removed_punc'].apply(lambda x: " ".join(x.lower() for x in x.split()))
    print('Check Lowercases DONE')
    #freqH= pd.Series(' '.join(df['lower_case']).split()).value_counts()[:10]
    #df['rmv_most_occuring_words'] = df['lower_case'].apply(lambda x: " ".join(x for x in x.split() if x not in freqH))  
    stop = stopwords.words('english')
    df['rmv_stop_word'] = df['lower_case'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
    print('Remove stop words DONE')
    df['pre_process']=df['rmv_stop_word'].apply(lambda x : gensim.utils.simple_preprocess(x))
    print('Preprocessing DONE')
    return df

#REFERENCE :##TRAINING OF WORD2VEC MODEL - ACTUALLY DONE EXTERNALLY (not in jupyter notebook)
def train_word2vec(new_df):
    model = gensim.models.Word2Vec(
                    new_df['pre_process'],
                    size=150,
                    window=10,
                    min_count=2,
                    workers=10)
    model.train(new_df['pre_process'], total_examples=len(new_df['pre_process']), epochs=10)
    model.save('model.bin')

In [11]:
#The new_df file is the output of above function
df_basic =pd.read_pickle('new_df.pkl')

In [12]:
df_basic.head(3)

Unnamed: 0,productId,title,description,imageUrlStr,mrp,sellingPrice,specialPrice,productUrl,categories,productBrand,...,Removed_punc,len_row,len_char,len_word,avg_word,numerics,upper_count,lower_case,rmv_stop_word,pre_process
0,TOPE9ABBZU3HZRHN,Citrine Casual Short Sleeve Printed Women's Pi...,This beautiful printed modal top from Citrine ...,http://img.fkcdn.com/image/top/r/h/n/1-1-wwtpw...,1099,329,329,http://dl.flipkart.com/dl/citrine-casual-short...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Citrine,...,Citrine Casual Short Sleeve Printed Womens Pin...,373,48,61,5.131148,1,4,citrine casual short sleeve printed womens pin...,citrine casual short sleeve printed womens pin...,"[citrine, casual, short, sleeve, printed, wome..."
1,TOPE9ABBBTJYDSQE,Citrine Casual Short Sleeve Printed Women's Pi...,This beautiful printed modal top from Citrine ...,http://img.fkcdn.com/image/top/r/h/n/1-1-wwtpw...,1099,329,329,http://dl.flipkart.com/dl/citrine-casual-short...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Citrine,...,Citrine Casual Short Sleeve Printed Womens Pin...,373,49,61,5.131148,1,4,citrine casual short sleeve printed womens pin...,citrine casual short sleeve printed womens pin...,"[citrine, casual, short, sleeve, printed, wome..."
2,TOPE6XZPUVT9C7RU,Butterfly Wears Casual Short Sleeve Solid Wome...,,http://img.fkcdn.com/image/top/y/h/c/5245-butt...,799,799,799,http://dl.flipkart.com/dl/butterfly-wears-casu...,"Apparels>Women>Western Wear>Shirts, Tops & Tun...",Butterfly Wears,...,Butterfly Wears Casual Short Sleeve Solid Wome...,216,45,29,6.482759,1,4,butterfly wears casual short sleeve solid wome...,butterfly wears casual short sleeve solid wome...,"[butterfly, wears, casual, short, sleeve, soli..."


#### ASSEMBLING MULTIPLE FEATURES

Once every thing is done, we have to combine all these features as single vector.The output is stored as fin_df.pkl .
This file contains one feature vector combining all features and a sentence whose stop words have been removed.
The function that does this is

In [13]:
#INPUTS are basic feature extracted df and word_to_vec model.
def assemble_features(new_df,model):
   
    print('-----------------ASSEMBLING FEATURES STARTED-------------------------')
    fin_df = pd.DataFrame(columns=['lst_vect','rmv_word'])
    percent=10
    for i in range(new_df.shape[0]):
        lst_feature=[]
        lst_feature.append(new_df['len_row'][i])
        lst_feature.append(new_df['len_char'][i])
        lst_feature.append(new_df['len_word'][i])
        lst_feature.append(new_df['avg_word'][i])
        lst_feature.append(new_df['numerics'][i])
        lst_feature.append(new_df['upper_count'][i])
        lst_feature+=word2vec_feat(new_df['pre_process'][i],model).tolist()
        fin_df.loc[i] = [lst_feature,new_df['rmv_stop_word'][i]]
        if int(i%(new_df.shape[0]*10/100))==0:
            print(percent,'%','completed..')
            percent+=10
    print('100%','completed..')
    print('---------------------------------------------------------------------')
    return fin_df

In [14]:
df =pd.read_pickle('fin_df.pkl')
df=pd.concat([df_basic['productId'],df],axis=1)
df.head(5)

Unnamed: 0,productId,lst_vect,rmv_word
0,TOPE9ABBZU3HZRHN,"[373, 48, 61, 5.131147540983607, 1, 4, -14.944...",citrine casual short sleeve printed womens pin...
1,TOPE9ABBBTJYDSQE,"[373, 49, 61, 5.131147540983607, 1, 4, -15.067...",citrine casual short sleeve printed womens pin...
2,TOPE6XZPUVT9C7RU,"[216, 45, 29, 6.482758620689655, 1, 4, -15.808...",butterfly wears casual short sleeve solid wome...
3,TOPE6Y7HSDDXPHZN,"[238, 49, 32, 6.46875, 1, 5, -8.65605640411377...",butterfly wears casual short sleeve solid wome...
4,TOPE6XZPXBP5APH9,"[220, 46, 30, 6.366666666666666, 1, 6, -4.3993...",butterfly wears casual short sleeve solid wome...


#### POSITIVE DATA COLLECTION

As you can see we made a single vector of features. Still the common features has to be added and the vector has to be normalized. Using that we can build training data for our model.

I shuffled the data appropriately and stacked two comparable rows side by side. Here we built the positive samples. ie they are non duplicates

In [15]:
except_top_two_df = df.iloc[2:]
top_two=df.head(2)
two_stepped_down_df=except_top_two_df.append(top_two,ignore_index=True)
df3=pd.DataFrame()
df3['lst_vect2']=two_stepped_down_df['lst_vect']
df3['rmv_word2']=two_stepped_down_df['rmv_word']
no_dup_df=pd.concat([df,df3],axis=1)

In [16]:
no_dup_df.head(5)

Unnamed: 0,productId,lst_vect,rmv_word,lst_vect2,rmv_word2
0,TOPE9ABBZU3HZRHN,"[373, 48, 61, 5.131147540983607, 1, 4, -14.944...",citrine casual short sleeve printed womens pin...,"[216, 45, 29, 6.482758620689655, 1, 4, -15.808...",butterfly wears casual short sleeve solid wome...
1,TOPE9ABBBTJYDSQE,"[373, 49, 61, 5.131147540983607, 1, 4, -15.067...",citrine casual short sleeve printed womens pin...,"[238, 49, 32, 6.46875, 1, 5, -8.65605640411377...",butterfly wears casual short sleeve solid wome...
2,TOPE6XZPUVT9C7RU,"[216, 45, 29, 6.482758620689655, 1, 4, -15.808...",butterfly wears casual short sleeve solid wome...,"[220, 46, 30, 6.366666666666666, 1, 6, -4.3993...",butterfly wears casual short sleeve solid wome...
3,TOPE6Y7HSDDXPHZN,"[238, 49, 32, 6.46875, 1, 5, -8.65605640411377...",butterfly wears casual short sleeve solid wome...,"[220, 46, 30, 6.366666666666666, 1, 6, -4.6318...",butterfly wears casual short sleeve solid wome...
4,TOPE6XZPXBP5APH9,"[220, 46, 30, 6.366666666666666, 1, 6, -4.3993...",butterfly wears casual short sleeve solid wome...,"[236, 47, 32, 6.40625, 1, 5, -18.8782730102539...",butterfly wears casual full sleeve solid women...


#### NEGATIVE DATA COLLECTION

Here Im stacking vectors side by side without shuffling which makes them identical and we end up with a duplicate datasets. You can see their identical nature.

In [17]:
df_duplicate=pd.DataFrame()
df_duplicate['lst_vect2']=df['lst_vect']
df_duplicate['rmv_word2']=df['rmv_word']
df_dup_fin=pd.concat([df,df_duplicate],axis=1)
df_dup_fin.head(5)

Unnamed: 0,productId,lst_vect,rmv_word,lst_vect2,rmv_word2
0,TOPE9ABBZU3HZRHN,"[373, 48, 61, 5.131147540983607, 1, 4, -14.944...",citrine casual short sleeve printed womens pin...,"[373, 48, 61, 5.131147540983607, 1, 4, -14.944...",citrine casual short sleeve printed womens pin...
1,TOPE9ABBBTJYDSQE,"[373, 49, 61, 5.131147540983607, 1, 4, -15.067...",citrine casual short sleeve printed womens pin...,"[373, 49, 61, 5.131147540983607, 1, 4, -15.067...",citrine casual short sleeve printed womens pin...
2,TOPE6XZPUVT9C7RU,"[216, 45, 29, 6.482758620689655, 1, 4, -15.808...",butterfly wears casual short sleeve solid wome...,"[216, 45, 29, 6.482758620689655, 1, 4, -15.808...",butterfly wears casual short sleeve solid wome...
3,TOPE6Y7HSDDXPHZN,"[238, 49, 32, 6.46875, 1, 5, -8.65605640411377...",butterfly wears casual short sleeve solid wome...,"[238, 49, 32, 6.46875, 1, 5, -8.65605640411377...",butterfly wears casual short sleeve solid wome...
4,TOPE6XZPXBP5APH9,"[220, 46, 30, 6.366666666666666, 1, 6, -4.3993...",butterfly wears casual short sleeve solid wome...,"[220, 46, 30, 6.366666666666666, 1, 6, -4.3993...",butterfly wears casual short sleeve solid wome...


#### COMMON FEATURES AND NORMALIZATION

Using both of these, building of complete feature vector with normalization and addition of common features are done using the function below. This is a time consuming process. Hence I stored the resulting numpy file which I had used for future training

In [18]:
data_df = pd.DataFrame(columns=['features','is_duplicate'])
def complete_feature(inp_df):
    data_df = pd.DataFrame(columns=['features'])
    for indx in range(inp_df.shape[0]):
        lst_vect1=normalize_vect(np.array(inp_df['lst_vect'][indx])).tolist()
        #print(lst_vect1)
        lst_vect2=normalize_vect(np.array(inp_df['lst_vect2'][indx])).tolist()
        #print(lst_vect2)
        #print(inp_df['rmv_word'][indx])
        #print(inp_df['rmv_word2'][indx])
        cosine_sim=common_feat([inp_df['rmv_word'][indx],inp_df['rmv_word2'][indx]]).tolist()
        #print(cosine_sim)
        sim_vect3=sum(cosine_sim)/len(cosine_sim)
        #print(sim_vect3)
        jac_vect4=get_jaccard_sim(inp_df['rmv_word'][indx],inp_df['rmv_word2'][indx])
        #print(jac_vect4)
        feat_vect=lst_vect1+lst_vect2
        feat_vect.append(sim_vect3)
        feat_vect.append(jac_vect4)
        #print(feat_vect)
        data_df.loc[indx] = [feat_vect]
        if indx%(len(inp_df)*10/100)==0:
            print(int(indx+1), 'rows completed')
    print('------------------------------------------------')
    feature_set=np.array([i for i in data_df['features']])
    return feature_set

In [19]:
a=np.load('feat_no_dup.npy')
b=np.load('feat_dup.npy')
X_train=np.concatenate((a,b),axis=0)

In [20]:
X_train[0:20]

array([[ 0.50825622,  0.06540563,  0.08311965, ..., -0.10151488,
         0.60715074,  0.25      ],
       [ 0.50812909,  0.06675154,  0.08309886, ..., -0.08546899,
         0.62910789,  0.27272727],
       [ 0.36518946,  0.07608114,  0.04903007, ..., -0.09684292,
         0.80363031,  0.51724138],
       ...,
       [ 0.49552698,  0.06684528,  0.07992371, ..., -0.08505097,
         0.55479937,  0.14285714],
       [ 0.56873839,  0.06410241,  0.08865227, ..., -0.11193999,
         0.57072952,  0.14814815],
       [ 0.52277434,  0.10735545,  0.06301298, ..., -0.08371695,
         0.55938964,  0.21212121]])

We get a complete feature vector where the addition of labels is also simple

In [35]:
y = np.hstack((np.zeros(len(a)), np.ones(len(b))))
#Here 1 represnts they are duplicates and 0 represents they are not duplicates


#### TRAINING SVM

I chose SVM for classification because considering such type of data and binary classifications SVM works greatly yileding high accuracy.

I used simple LinearSVC() as it fits the data well. Since our feature extraction is also great we can make use of SVM to pull out the predictions. I felt using CNNs here is not that necessary.

In [34]:
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
import time
svc=LinearSVC()
rand_state = np.random.randint(0, 100)
X_t, X_test, y_t, y_test = train_test_split(X_train, y, test_size=0.3, random_state=rand_state,shuffle=True)
t=time.time()
svc.fit(X_t, y_t)
t2 = time.time()
fittingTime = round(t2 - t, 2)
accuracy = round(svc.score(X_test, y_test),4)

print(fittingTime)
print(accuracy)

27.52
0.984


#### I got around 98.4% accuracy

In [36]:
import pickle
with open('svc_model.pkl', 'wb') as f:
    pickle.dump(svc, f)

# # and later you can load it
# with open('filename.pkl', 'rb') as f:
#     clf = pickle.load(f)

* Had tested the model with various datasets and its working relatively good. Even I tried visulaizing the decision tree classifier which showed the map and vector indexes decided to classify perfectly. This shows the model is not over fit.

* The features are also extracted greatly in which it predicts the class perfectly even the colour and size of the data are altered manually.

* This accuracy might become low if we are considering more columns and also if the dataset becomes huge. There we can combat the loss with the use of images which helps the most

### MAKING AN USER CENTRIC APPLICATION:

#### * The entire python script has been included with this work
#### * To trigger the program, just specify the 
   * 'excel_file_name.xlsx'on which you need to train. 
    * mode : test or train
    
#### The program will automatically read in the file,extract features and train model in case if 'train' is specified else
#### if test is specified along with the file name
#### The program kick starts by picking in the file, start comparing each rows and results the dictionary of corrresponding duplicates

## So just feed an excel file and watch it plays!.

## THANKS

#### LEARNING AND CONCLUSIONS:

This learning is a great opportunity where I came across lot of things to learn right from WordToVec and deep inside SVMs.

I thought of adding an extra feature into the vector which is described below:

Finding duplicates is mostly a relation so we have to make keen analysis on creating a vector relative to each other. SO assuming that we are having 9 different columns.  Try combining both the columns of each rows making a unique vector form in a shape of 9X9 matrix. Running 1D CNN(Convolutional neural network) over it can give a boost to the accuracy. Since the relative paramters are considered greatly while vector formation and CNN helps maintaining the same.

### Looking forward to work with you!