# Purpose:

2015-02-12 (Thursday)

use pandas to get Gisella something more like what she said she wanted in terms of summary tables for the _G. pallidipes_ summary table information:

- data types:
    - location
    - symbols when present (_I assume this mean location symbol?_)
    - number of individuals
    - date range
    - is tissue?
    - is extraction?
    - analysis status

__Notes:__

- group by location
- filter on Kenya locations (not sure which locations are kenya if not explicit)




# Implementation:

## Imports:

In [1]:
# imports
import pandas as pd
import numpy as np

import qgrid as qg

In [2]:
qg.nbinstall()

In [3]:
qshow = qg.show_grid

## File paths:

In [4]:
# define paths to files

xl_path = '/home/gus/Documents/YalePostDoc/project_stuff/G_pallidipes_kenya/Collection_data_updated_Feb_9_2015.xlsx'

## Load data

In [5]:
xl = pd.ExcelFile(xl_path)

In [6]:
xl.sheet_names

[u'cold room',
 u'idaho boxes',
 u'Idaho strip tube boxes',
 u'gisella freezer',
 u'-80 freezer',
 u'dissection_sheet_template']

In [7]:
sample_cat = xl.parse(sheetname=u'dissection_sheet_template', 
         header=0, 
         skiprows=None, skip_footer=0, 
         index_col=None, parse_cols=None, 
         parse_dates=False, date_parser=None, 
         na_values=['NA'], 
         thousands=None, chunksize=None, 
         convert_float=False, 
         has_index_names=False, converters=None)

In [8]:
qshow(sample_cat,remote_js=True)

## Manipulate table to facilitate pivot operations

In [9]:
# add a number column to represent that each row is ONE sample
sample_cat['Count'] = 1

### Convert Gpd to Gp

In [10]:
try:
    gpd_mask.head()
except NameError:
    gpd_mask = sample_cat.Species == 'Gpd'


In [11]:
sample_cat.loc[gpd_mask,'Species'] = 'Gp'

In [12]:
sample_cat[gpd_mask].head(1)

Unnamed: 0,Lab_Source,Village,Trap_No,Date,Species,Sex,Teneral,Dead,Fly_Number,Hunger_stage,...,midgut,sal_gland,Kept_in,Comment,Tube_or_box,Tissue,Derivative,Method_of_prep,Box_number_id,Count
2135,EPH Idaho Samples (Tube Boxes),"Galana (?) D1, Kenya",0,Unk.,Gp,M,,,18,,...,,,Unk.,"CTAB, color coded aliquots, poorly labelled an...",box,,gDNA,,50,1


### Convert Date: Unk, Unk., etc to single value (Unk)

In [13]:
def date_unk_func(x):
    try:
        if x.upper().startswith('UNK'):
            return 'UNK'
        else:
            return x
    except AttributeError:
        return x
    
def standardize_date_unk(df):
    new = df.Date.apply(date_unk_func)
    df.Date = new

In [14]:
standardize_date_unk(sample_cat)

### Add tissue_or_derivative column

In [15]:
sample_cat['tissue_or_extraction'] = 'none'

In [16]:
# generate masks of which rows match tissue or derivative
try:
    is_tissue_mask.head()
except NameError:
    is_tissue_mask =  sample_cat.Tissue.apply(lambda x: isinstance(x, (unicode, str)))
    
try:
    is_derivative_mask.head()
except NameError:
    is_derivative_mask =  sample_cat.Derivative.apply(lambda x: isinstance(x, (unicode, str)))

In [17]:
sample_cat.loc[is_tissue_mask, 'tissue_or_extraction'] = 'tissue'
sample_cat.loc[is_derivative_mask, 'tissue_or_extraction'] = 'extraction'

## Begin Pivots

In [18]:
ptab = pd.pivot_table(sample_cat,index=['Village', 'Lab_Source','Date', 'Species'],
                      values=['Count'],
                      columns=['tissue_or_extraction'],
                      fill_value=0,
                      aggfunc=[np.sum])

In [19]:
ptab.query('Species == ["Gp"]')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum,sum
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Count,Count
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,tissue_or_extraction,extraction,tissue
Village,Lab_Source,Date,Species,Unnamed: 4_level_3,Unnamed: 5_level_3
"""Africa"" P.M.S.",EPH Idaho Samples (Tube Boxes),2002-11/12,Gp,46,54
"Arba Minch, Ethiopia",EPH -80,1997-01-11 00:00:00,Gp,18,32
"Arba Minch, Ethiopia",EPH Idaho Samples (Strip Tube Boxes),1997-01,Gp,24,0
"Arba Minch, Ethiopia",EPH Idaho Samples (Strip Tube Boxes),UNK,Gp,48,0
"Arba Minch, Ethiopia",EPH Idaho Samples (Tube Boxes),UNK,Gp,32,0
"Arba Minch, Ethiopia, P3",EPH Idaho Samples (Tube Boxes),UNK,Gp,24,0
"Arba Minch, Ethiopia, P6",EPH Idaho Samples (Tube Boxes),UNK,Gp,24,0
"Block A (Ruma?), Kenya",EPH Idaho Samples (Strip Tube Boxes),2003-11-02,Gp,400,0
"Busia, Kenya",EPH Idaho Samples (Strip Tube Boxes),UNK,Gp,48,0
Campsite,EPH Idaho Samples (Strip Tube Boxes),2003-12-12,Gp,34,0
