# File loading and plotting
In this notebook, we will demonstrate how to load and plot a the types of data that we are principally interested in for this course. 

In [None]:
# Imports
import matplotlib.pyplot as plt
import numpy as np
import cv2
# For fancy HTML movie embedding
from IPython.display import HTML

# VEDB-specific code imports
import file_io
import plot_utils

# Simple class for loading
%run ../../code/sesssion_standalone.py

# Data location 
We will be working with data in a shared folder called `/data/` on phi. (TEMP NOTE for the VEDB team: this data can alos be found on `<vedbcloud0>/data/summer_workshop/data/`) Let's have a look at what's there:

In [None]:
ls /home/data/

First we will load data collected for the vedb project, in the /data/vedb/ folder. Each session in this dataset is stored according to the date it was collected; within each folder there are a bunch of video and other files

In [None]:
ls /home/data/vedb/

In [None]:
ls /home/data/vedb/2021_01_11_16_33_39/

Let's load a video first!

# Simple video loading 
First we will show you what it would look like to load a video with off-the-shelf tools. Here, we will use OpenCV, but a few other libraries have similar mechanisms to load video. 

In [None]:
# Name the file you want to load
video_file = '/data/vedb/2021_01_11_16_33_39/world.mp4'
# Create a video capture object
vid = cv2.VideoCapture(video_file)

This thing we have got is now a python object - a representation of that video. It does NOT actually contain the pixels of the video - so we can't get at them as an array to do computations on yet. We have to use this object to retrieve frames of the video, one by one.

In [None]:
# First, set the first frame you want to read. 
vid.set(0, 0)
# Then, capture a frame. This reads whatever is next in line, after the frame you set above.
success, frame = vid.read()

In [None]:
# the `success` variable here is True if the frame loaded correctly:
print(success)

In [None]:
# The `frame` variable contains an array for the image, which has a resolution of 1536 x 2048 pixels
print(frame.shape)

In [None]:
# Show the image!
plt.imshow(frame)

... wait, what's wrong with the colors? Remember that an image is represnted by (at least) three layers of data, representing the red, green, and blue components of each pixel. Many image formats - and libraries - store and load these as R, G, B (Red, Green, Blue in that order). OpenCV, for whatever reason, loads them as B, G, R. Thus, we have to switch around the order of the color channels to make the image look sensible. 

This is a really common issue in real data analysis: conventions often differ - even in the same field! - between data sets and code libraries. This will come up in our odometry data, too, and is a common source of confusion in graphics, geospatial analysis, and many other fields. 

In [None]:
# switch the image color channels:
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Show it again!
plt.imshow(frame_rgb)

Much better!

# Loading with a more complex wrapper function
This process of opening a movie with opencv works just fine - it's flexible, it's functional, but it's at least three steps, which then have to be run in a loop to load more than one frame. It also doesn't provide any additional functionality, like potentially resizing the frames of the movie at load time (so you don't overrun the RAM of your poor computer with a bajillion pixels). 

Next we will demonstrate a faster way to load an image with a library written by the instructors. 

In [None]:
# Again, specify the video file
video_file = '/data/vedb/2021_01_11_16_33_39/world.mp4'
# Specify also how many frames you want to load - and that's it! 
frames = file_io.load_mp4(video_file, frames=(0, 100))

In [None]:
# Note that this loads a whole spatiotemporal chunk of the movie, with frames as the first axis. 
frames.shape

In [None]:
# we can index into this to show a frame:
plt.imshow(frames[0, :, :, : ])

Same thing! 

In [None]:
# Side note: technically, those extra [:] indices are not necessary in python. 
# So you can show the next frame like this:
plt.imshow(frames[1])

Very slightly different from the first frame! This syntax is nice and clean; it will be used in other notebooks, too. 

`file_io.load_mp4` provides a few more options too. You can see them by calling help on this function: 

In [None]:
file_io.load_mp4?

In [None]:
# Load grayscale, downsampled version of images:
gray_frames = file_io.load_mp4(video_file, frames=(0, 10), size=(300,400), color='gray')

In [None]:
# Notice that the color dimension is gone!
gray_frames.shape

In [None]:
# Show the image
plt.imshow(gray_frames[0])

Wait what happened now?? Did we go back to BGR? No - this is just a color map applied to an image with no color channels. To see reglar grayscale colors, you just have to specify a grayscale colormap:

In [None]:
plt.imshow(gray_frames[0], cmap='gray')

In [None]:
# You can specify other color maps too!
plt.imshow(gray_frames[0], cmap='inferno')

2D images can be mapped to any color scheme - to see all the colormaps available in matplotlib, check out [this link](https://matplotlib.org/stable/tutorials/colors/colormaps.html)

color mapping will be important when we talk about 2D histograms and other quantities computed from images or other data that manifest as arrays! 


# VEDB specific loading
Finally, we have so many different quantities to load in our VEDB data that it's useful to have a single loader for all the data types. This loader can exploit the fact that within each directory, there is a very regular organization of files. It also can return time-synced data from each stream, as we demonstrate below.

In [None]:
# Create a "Session" object from a data folder. Note that you pass the whole folder; the code will figure
# out what is inside. This is a bit like the cv2.VideoCapture object above - it isn't the data, it's just
# an object that provides a way to load it. 
ses = Session(folder='/data/vedb/2021_01_11_16_33_39/')

Here, `ses` is an *instance* of the Session class. It's an object, with properties attached to it and methods that can be called.

In [None]:
# One property is `paths`:
ses.paths

In [None]:
# Like vid.read() above, this object has a method to load different streams of data:
ses.load?

In [None]:
# Load world camera data from seconds 8 to 9 in the video:
world_time, world_frames = ses.load('world_camera', time_idx=(8,9))

In [None]:
# The first data point returned is the timestamp for each frame. For some analyses, this will be important!
world_time

In [None]:
# Note that our intended frame rate for our camera is 30 frames per second, but we don't always hit that 
# for a variety of reasons. If we did, there would be 30 values for world_time, and 30 frames for world_frames:
print(world_time.shape)
print(world_frames.shape)

So: not 30 frames per second, but there is one timestamp per frame, so we can compute fps later if we need to. 

Under the hood, this is using file_io, which under the hood is using opencv! Each of these is one more layer of abstraction, providing a little more convenience and a little less flexibility. They are more and more tailored to our uses here. 

Since this uses file_io.load_mp4 for movies, it can take extra arguments just like `file_io.load_mp4` does:

In [None]:
# Load 10 seconds of downsampled frames
world_time, gray_frames = ses.load('world_camera', time_idx=(30,40), color='gray', size=(300, 400))

In [None]:
print(world_time.shape)
print(gray_frames.shape)

In [None]:
plt.imshow(gray_frames[0], cmap='gray')

The nice thing about this is that we can load the eye data for the same time range! 

In [None]:
eye_left_time, eye_left_frames = ses.load('eye_left', time_idx=(30,40), color='gray')

In [None]:
# Note that there are many more frames for the same time interval! 
# The eye camera has a much faster frame rate than the world camera. 
print(eye_left_time.shape)
print(eye_left_frames.shape)

In [None]:
# And here's the eye:
plt.imshow(eye_left_frames[0], cmap='gray')

For funz, let's show a movie of the eye data while we're here. This relies on another library written by the instructors, which wraps some matplotlib functions:

In [None]:
# Make a short 3-second plot, but since the monitor you're on probably can't actually display 200 fps, 
# show it at a slower rate. Here, 3s worth of eye data will be shown in ~10s
fps = 200
display_fps = 60
seconds = 3
anim = plot_utils.make_image_animation(eye_left_frames[:fps * seconds], fps=display_fps, cmap='gray')
HTML(anim.to_html5_video())

# Loading odometry data
The same object and the same method can be used to load odometry data. Odometry data, under the hood, is a very different beast than video data. Each frame of odometry data is stored as a dictionary of values in a special file format called a message pack (msgpack). The method here saves you the trouble of googling around to find an appropriate library to load this data. Note that we won't by default get an array out: 

In [None]:
odometry_time, odometry_all = ses.load('odometry', time_idx=(30,40))

In [None]:
# Note that this isn't an array, so we can't get its shape - it's a list:
print(type(odometry_all))
# Note also that odometry is sampled fast, too - nearly 200 fps
print(len(odometry_all))

In [None]:
odometry_all[0]

Each frame has all of these values. It might be more useful for a given analysis to get an array of only one of these out - and the loading function provides such syntax: 

In [None]:
odometry_time, odometry_linear_velocity = ses.load('odometry:linear_velocity', time_idx=(30,40))

In [None]:
# Now we have an array, which we can plot:
print(odometry_linear_velocity.shape)

In [None]:
plt.plot(odometry_linear_velocity)

The same can be done for any of the other odometry parameters, e.g. angular_velocity, angular_acceleration, and position.

Pogen: not quite sure what to do with this file, let's consult in the morning.

In [None]:
import pandas

In [None]:
hrm = pandas.read_csv('/data/odometry/total_acc_x_train.txt', sep='  ')

In [None]:
hrm.shape

In [None]:
!head /data/odometry/total_acc_x_train.txt