# Project Overview

## About Dataset

### Context
This dataset has been sourced from a study on violence recognition titled "Violence Recognition from Videos using Deep Learning Techniques" by M. Soliman, M. Kamal, M. Nashed, Y. Mostafa, B. Chawky, D. Khattab. The study was presented at the 9th International Conference on Intelligent Computing and Information Systems (ICICIS'19) in Cairo, with the following citation:
> M. Soliman, M. Kamal, M. Nashed, Y. Mostafa, B. Chawky, D. Khattab, “Violence Recognition from Videos using Deep Learning Techniques”, Proc. 9th International Conference on Intelligent Computing and Information Systems (ICICIS'19), Cairo, pp. 79-84, 2019. 
Please ensure to use this citation when utilizing the dataset for research or engineering purposes.

### Dataset Origin
When embarking on our Graduation Project on Violence Recognition from Videos, we identified a scarcity of available datasets focusing on violence between individuals. To address this gap, we decided to curate a new comprehensive dataset with a diverse range of scenes.

## Content

Our dataset comprises 1000 violence and 1000 non-violence videos, all sourced from YouTube videos. The violence videos within our dataset encompass real street fight situations recorded in various environments and conditions. In contrast, the non-violence videos are sourced from a broad spectrum of human actions, including sports, eating, walking, and more.


#### Import Necessary Dependencies

In [2]:
import cv2
import os
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.models import Model
from tensorflow.keras import Input
from tqdm import tqdm

2024-03-06 17:26:36.987819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-06 17:26:36.987957: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-06 17:26:37.123765: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


# Feature Extraction using InceptionV3

In this section, we perform feature extraction using the InceptionV3 model. Feature extraction involves using a pre-trained deep learning model to capture meaningful representations (features) from input data, which can be beneficial for various tasks such as image classification.


In [3]:
pretrained_model = InceptionV3()
# Create a new model for feature extraction
# Extract features from the second-to-last layer of the InceptionV3 model
pretrained_model = Model(inputs=pretrained_model.input,outputs=pretrained_model.layers[-2].output)
pretrained_model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
[1m96112376/96112376[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


### List out the data from kaggle dataset

In [9]:
violence = os.listdir('/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/Violence')
nonviolence = os.listdir('/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/NonViolence')

In [10]:
violence[0]

'V_465.mp4'

In [11]:
violence_path = [os.path.join('/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/Violence',name) for name in violence]
nonviolence_path = [os.path.join('/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/NonViolence',name) for name in nonviolence]

In [12]:
violence_path[0]

'/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/Violence/V_465.mp4'

In [13]:
nonviolence_path[0]

'/kaggle/input/real-life-violence-situations-dataset/Real Life Violence Dataset/NonViolence/NV_759.mp4'

# Frame Feature Extraction Function

In this section, we define a function for extracting features from an individual frame using the previously configured feature extraction model.


In [14]:
def feature_extractor(frame):
    # Expand the dimensions of the frame for model compatibility
    img = np.expand_dims(frame, axis=0)
    
    # Use the pre-trained feature extraction model to obtain the feature vector
    feature_vector = pretrained_model.predict(img, verbose=0)
    
    # Return the extracted feature vector
    return feature_vector

# Video Frames Extraction Function

In this section, we define a function for extracting features from video frames. The function takes a path to a video file, the desired sequence length, image width, image height, and the total number of videos as input parameters.

In [16]:
def frames_extraction(video_path, SEQUENCE_LENGTH=16, IMAGE_WIDTH=299, IMAGE_HEIGHT=299, total_video=0):
    # List to store features for all videos
    all_video_features = []
    
    # Loop through each video
    for pos in tqdm(range(total_video)):
        frames_list = []
        
        # Open the video file for reading
        video_reader = cv2.VideoCapture(video_path[pos])
        
        # Get the total number of frames in the video
        video_frames_count = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
        
        # Calculate the number of frames to skip in order to achieve the desired sequence length
        skip_frames_window = max(int(video_frames_count / SEQUENCE_LENGTH), 1)

        # Loop through each frame in the sequence
        for frame_counter in range(SEQUENCE_LENGTH):
            # Set the position of the video reader to the current frame
            video_reader.set(cv2.CAP_PROP_POS_FRAMES, frame_counter * skip_frames_window)
            
            # Read the frame
            success, frame = video_reader.read()

            # Break if unable to read the frame
            if not success:
                break
            
            # Convert the frame to RGB and resize it
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            resized_frame = cv2.resize(frame_rgb, (IMAGE_HEIGHT, IMAGE_WIDTH))
            
            # Normalize the frame
            normalized_frame = resized_frame / 255
            
            # Extract features using the previously defined feature extraction function
            features = feature_extractor(normalized_frame)
            
            # Append the features to the list
            frames_list.append(features)
        
        # Append the list of features for the current video to the overall list
        all_video_features.append(frames_list)

        # Release the video reader
        video_reader.release()
    
    # Convert the list of features to a numpy array
    return np.array(all_video_features)

#### now im only using 500 videos on both violence and non violence classes

In [17]:
violence_features = frames_extraction(violence_path[:500],total_video=len(violence_path[:500]))
non_violence_features = frames_extraction(nonviolence_path[:500],total_video=len(nonviolence_path[:500]))

I0000 00:00:1709746285.519296     106 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
100%|██████████| 500/500 [17:58<00:00,  2.16s/it]
 99%|█████████▉| 497/500 [13:39<00:07,  2.36s/it][h264 @ 0x558aae234b00] mb_type 104 in P slice too large at 98 31
[h264 @ 0x558aae234b00] error while decoding MB 98 31
[h264 @ 0x558aae234b00] mb_type 104 in P slice too large at 98 31
[h264 @ 0x558aae234b00] error while decoding MB 98 31
[h264 @ 0x558aae234b00] mb_type 104 in P slice too large at 98 31
[h264 @ 0x558aae234b00] error while decoding MB 98 31
[h264 @ 0x558aae234b00] mb_type 104 in P slice too large at 98 31
[h264 @ 0x558aae234b00] error while decoding MB 98 31
100%|██████████| 500/500 [13:47<00:00,  1.65s/it]


In [18]:
np.save('/kaggle/working/violence_features.npy',violence_features)# save the feature in our directory and make it reusable

In [19]:
np.save('/kaggle/working/non_violence_features.npy',non_violence_features)# save the feature in our directory and make it reusable

# Loading Non-Violence and Violence Feature Data

In this section, we load the precomputed feature data for non-violence and violence videos. The features are stored in NumPy arrays.

In [43]:
non_violence_data = np.load('/kaggle/working/non_violence_features.npy')
violence_data = np.load('/kaggle/working/violence_features.npy')

In [44]:
violence_data[0].shape

(16, 1, 2048)

# Creating LSTM Model and Preparing Data

In this section, we define an Bidirectional LSTM (Long Short-Term Memory) model for video classification and prepare the data for training.

In [45]:
from keras.models import Sequential
from keras.layers import LSTM, Dense,Bidirectional,BatchNormalization,Dropout
from sklearn.model_selection import train_test_split
import numpy as np

# Create labels
violence_labels = np.zeros(len(violence_data))
nonviolence_labels = np.ones(len(non_violence_data))

# Combine features and labels
X = np.concatenate([violence_data, non_violence_data], axis=0)
y = np.concatenate([violence_labels, nonviolence_labels], axis=0)

In [46]:
len(X)# total samples

1000

In [47]:
X[0].shape# shape of each samples

(16, 1, 2048)

In [48]:
y[0:20]# first 20 labels

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

In [49]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=32)

X_train_reshaped = X_train.reshape((X_train.shape[0], 16, 2048))# reshape to (16,2048)
X_test_reshaped = X_test.reshape((X_test.shape[0], 16, 2048))# reshape to (16,2048)

# LSTM Model Definition using Keras Functional API

In this section, we define an Bidirectional LSTM (Long Short-Term Memory) model for video classification using the Keras Functional API.

In [70]:
# Define the input layer
inputs = Input(shape=(16, 2048))

# Build the LSTM model using Functional API
x = Bidirectional(LSTM(200, return_sequences=True))(inputs)
x = BatchNormalization()(x)
x = Dropout(0.3)(x)
x = Bidirectional(LSTM(100))(x)
x = BatchNormalization()(x)
x = Dropout(0.3)(x)
x = Dense(200, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)

# Create the model
model = Model(inputs=inputs, outputs=outputs)

In [71]:
model.summary()

### Compiling The Model 

In [72]:
# Compile your model with an appropriate loss and optimizer
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train_reshaped,y_train,validation_data=(X_test_reshaped,y_test),epochs=5,batch_size=32)

Epoch 1/5
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 40ms/step - accuracy: 0.7781 - loss: 0.4628 - val_accuracy: 0.9250 - val_loss: 0.5014
Epoch 2/5
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9535 - loss: 0.1409 - val_accuracy: 0.9700 - val_loss: 0.3292
Epoch 3/5
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9704 - loss: 0.0770 - val_accuracy: 0.8600 - val_loss: 0.3208
Epoch 4/5
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9867 - loss: 0.0478 - val_accuracy: 0.9550 - val_loss: 0.1896
Epoch 5/5
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9990 - loss: 0.0128 - val_accuracy: 0.9600 - val_loss: 0.1318


<keras.src.callbacks.history.History at 0x79ca9a3ff220>

### Model evaluation

In [73]:
# Evaluate the model on the test set
accuracy = model.evaluate(X_test_reshaped, y_test)
print("Test Accuracy:", accuracy[1])

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9562 - loss: 0.1453 
Test Accuracy: 0.9599999785423279


### Lets Test With Unseen Videos

In [74]:
violence_features_test = frames_extraction(violence_path[500:510],total_video=len(violence_path[500:510]))
non_violence_features_test = frames_extraction(nonviolence_path[500:510],total_video=len(nonviolence_path[500:510]))

100%|██████████| 10/10 [00:23<00:00,  2.38s/it]
100%|██████████| 10/10 [00:17<00:00,  1.77s/it]


In [77]:
test_violence = violence_features_test.reshape((violence_features_test.shape[0], 16, 2048))
test_non_violence = non_violence_features_test.reshape((non_violence_features_test.shape[0], 16, 2048))

In [79]:
test_violence[0].shape

(16, 2048)

In [81]:
np.expand_dims(test_violence[0],axis=0).shape# if we do prediiction single video then we need to perform expand dim

(1, 16, 2048)

In [82]:
class_names = ['violence','non_violence']# class names

### Model Testing

In [92]:
predicted_non_violence = [class_names[1] if i > 0.5 else class_names[0] for i in model.predict(test_non_violence)]# tested with non violence video
predicted_violence = [class_names[1] if i > 0.5 else class_names[0] for i in model.predict(test_violence)]# tested with violence video

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step


In [93]:
predicted_non_violence

['violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence',
 'non_violence']

In [94]:
predicted_violence

['violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence',
 'violence']

### Classification Report For The Model Prediction

In [95]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test_reshaped)
y_preds = [1 if i > 0.5 else 0 for i in y_pred]
# Generate a classification report
report = classification_report(y_test, y_preds)

# Print the classification report
print("Classification Report:\n", report)

[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step  
Classification Report:
               precision    recall  f1-score   support

         0.0       0.92      1.00      0.96        98
         1.0       1.00      0.92      0.96       102

    accuracy                           0.96       200
   macro avg       0.96      0.96      0.96       200
weighted avg       0.96      0.96      0.96       200

