# Introduction to Python for Image Analysis

Darvin Yi (darvinyi[at]Stanford.EDU)

0. Installing python
1. python data structures
2. reading in images
3. plotting
4. introduction to dicom
5. loops
6. functions

## For this course, just use anaconda.  It's very easy.

 [Anaconda](https://www.continuum.io/downloads) is a great platform built on top of python.

## Python data structures

The main ones to know:
1. list
2. tuple <- like an immutable list, use as function 
3. dictionary
4. sets

In [None]:
test_list = [] #creating empty list
print 'Empty list: ' + str(test_list)
test_list.append('a') #adding a to the list
print 'List with a: ' + str(test_list)
print 'Popping a: ' + str(test_list.pop()) + ', and now the list is ' + str(test_list)

In [None]:
test_list = [[1,2,3], [4,5,6], [7,8,9]] #list of lists
print 'List of lists: ' + str(test_list)
print 'First list in list of lists: ' + str(test_list[0])
print 'first element of first list in list of lists: ' + str(test_list[0][0])

Tuples are like immutable lists.  You can still use it to store data, but you can't change how long it is.

Dictionaries are like hash tables.  It holds (key,value) pairs.  When you give it a key, you get back a value.

In [None]:
test_dict = {}
print 'Empty Dictionary: ' + str(test_dict)
test_dict[3] = 'dog'
test_dict[1] = 'cat'
test_dict[0] = 'rat'
test_dict[2] = 'pigeon'
print 'Discount Zoo: ' + str(test_dict)

In [None]:
# Let's order the zoo.
for key in sorted(test_dict):
    print (key, test_dict[key])

Sets allow you to do some really cool stuff too.  You shouldn't need it for this class though.  Do look it up on your own though.

## Reading in Images

For BASIC BASIC stuff, you need `numpy`, `scipy`, and `scikit-image`.

You can install them with:

```
sudo pip install numpy
sudo pip install scipy
sudo pip install scikit-image
```

In [None]:
import numpy as np
import scipy.misc

path_img = '20170407_figures/baboon.png'
img      = scipy.misc.imread(path_img)
print img

# Plotting

We'll mainly be using `matplotlib.pyplot`.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.imshow(img)
plt.axis('off')

You can also change how big the plot is or how many things you want to plot.

In [None]:
plt.rcParams['figure.figsize'] = (12, 12)
plt.imshow(img)
plt.axis('off')

In [None]:
fig,ax = plt.subplots(5,5)
for i in range(5):
    for j in range(5):
        ax[i,j].imshow(img)
        ax[i,j].axis('off')

You're data is stored as a numpy array.  If you've used `R` or `MATLAB`, accessing the array should feel very familiar.  If not, don't worry.  It's pretty intuitive.

For a color image, you'll make a RxCx3 size image.  The third dimension being size 3 is because of the color channels.

You can access values in two ways:
1. giving matrix coordinates
2. giving binary matrix with same size

In [None]:
# Coordinates
img_crop = img[10:100, 5:500, :]
plt.imshow(img_crop)

In [None]:
# Binary matrix
#Where is my image most red?
img = img.astype(np.float32)
img_red = (img[:,:,0] > 2*img[:,:,1]) & (img[:,:,0] > 2*img[:,:,2]) & (img[:,:,0] > 128)
plt.cla()
plt.imshow(img_red)

In [None]:
pixels_red = img[img_red]
_ = plt.hist(pixels_red,color=('red','green','blue'))

## Introduction to DICOM

DICOM properties:
- .dcm files
- has many fields (like an object)
- holds a lot of patient data (e.g. name, DoB, doctor name, etc...)
- holds a lot of scan information (e.g. model of scanner, angle of scan, etc...)
- holds pixel data

We can read in DICOM in Python with pydicom.  Install using

```
sudo pip install pydicom
```

Import using

```
import dicom
```

Let's take a look at the Kaggle data:

![](20170407_figures/sample_data.png)

And inside one of these folders is just a whole ton of ".dcm"'s.

![](20170407_figures/sample_data_2.png)


In [None]:
import dicom
import numpy as np
import matplotlib.pyplot as plt
from os import listdir
from os.path import join

%matplotlib inline

# Define Filepaths <- CHANGE THIS FOR YOUR COMPUTER
path_dcms = 'C:\\Users\\yidar\\Desktop\\kaggle_sample_data\\00cba091fa4ad62cc3200a657aeb957e'

# Print out all the .dcm's
list_dcms = listdir(path_dcms)
print str(list_dcms)

In [None]:
# let's just read in the first one.
name_dcm = list_dcms[0]
path_dcm = join(path_dcms, name_dcm)
dcm = dicom.read_file(path_dcm)

# Let's now display the image.
img = dcm.pixel_array
plt.rcParams['figure.figsize'] = (12, 12)
plt.imshow(img)

In [None]:
# Let's also see what's stored in the full dicom file.
print str(dcm)

## Using loops

In [None]:
#Let's read in all the images and save the pixel data with InstanceNumber

dict_dcms = {}                          #initialize dictionary for dicoms
for name_dcm in list_dcms:              #iterate over all dicoms in folder
    path_dcm = join(path_dcms,name_dcm) #define path of dicom image
    dcm = dicom.read_file(path_dcm)     #read in dicom image
    key = dcm.InstanceNumber            #save instance number as key
    val = dcm.pixel_array               #save image as value
    dict_dcms[key] = val                #input (key,val) into dictionary

In [None]:
plt.imshow(dict_dcms[3])

## Using Functions

A better way to organize our data would be to have a single 3D array.

In [None]:
# Let's create a function to do this.

def dict2array(dict_dcms, path_save=None):
    """
    Converts a dictionary with (InstanceNumber,img) key-value pairs
    into a 3D numpy.ndarray that has that has lower instance numbers
    at the top and higher instance numbers at the bottom.
    INPUTS:
    - dict_dcms: (dictioanry) holds (InstanceNumber,img) for chestCT
    - path_save: (string) directory to save the images.
    OUTPUTS:
    - volCT: (3d.array) width x height x z-axis for chestCT
    """
    # Initialize the array.
    # Get sorted indices.
    # Loop over sorted indices to put slice into 3d.array.
    # Return array.
    
    
path_save = 'C:\\Users\\yidar\\Desktop\\temp_save'