# Sample definitions
This notebook takes the BASS DR 1 catalog as the parent sample and whittles it down to the sample we want to use for our analysis. The cleaned sample will be saved in the ../data directory as a CSV file. 

In [1]:
# Standard module imports
import numpy as np
import pandas as pd

# Useful Directory paths
bass_dir = '/Users/ttshimiz/Dropbox/Research/BASS/'

We start with the entire BASS DR 1 catalog. The measurements we want for our analysis are the intrinsic X-ray luminosity, the broad H$\alpha$ luminosity, and the X-ray absorbing column. We further will need more general information such as the Seyfert type, distance, and redshift.

In [5]:
# Upload the catalog and measurements
bass_general = pd.read_csv(bass_dir+'bass_general_dr1.csv', index_col=0, skiprows=[1])       # File with general info for every source
bass_xray = pd.read_csv(bass_dir+'ricci_xray_fits.csv', index_col=0)           # File with the X-ray spectral fits from Ricci+16
bass_sy_class = pd.read_csv(bass_dir+'bass_seyfert_class.csv', index_col=0)     # File with the Winkler 1992 seyfert class types
bass_halpha = pd.read_csv(bass_dir+'bass_halpha_dr1.csv', index_col=1)         # File with the H-alpha measurements

For now our sample will be defined as those sources that are Sy 1, 1.2, 1.5, 1.8, 1.9, or 2 that have intrinsic X-ray luminosity and broad H-alpha measurements. They further need to have a distance measurement.

In [6]:
# Get all sources with a valid 'Best distance'
bass_dist = bass_general['Best distance']
bass_dist = bass_dist[bass_dist > 0]

print 'Number of sources with distance measurement =', len(bass_dist)

Number of sources with distance measurement = 789


Grab the Seyfert types from bass_sy_class. These Types were determined using the Winkler 1992 classification scheme. Throw out all Blazars, and other types.

In [11]:
bass_type = bass_sy_class['new Winkler mk']
keep = ((bass_type == '1') | (bass_type == '1.2') | (bass_type == '1.5') |
        (bass_type == '1.8') | (bass_type == '1.9') | (bass_type == '2'))
bass_type = bass_type[keep]

print 'Number of sources with Sy Type 1 - 2 =', len(bass_type)

Number of sources with Sy Type 1 - 2 = 594


Let's get all of the sources with a proper intrinsic X-ray flux measurement. This is the intrinsic 14-150 keV flux in units of $10^{-12}$ erg/s/cm$^{2}$

In [13]:
bass_fx = bass_xray[' Intrinsic F14-150']
bass_fx = bass_fx[bass_fx > 0]

print 'Number of sources with intrinsic X-ray meaurement =', len(bass_fx)

Number of sources with intrinsic X-ray meaurement = 832


Let's get the NH values from Claudio's spectral fitting.

In [25]:
bass_nh = bass_xray['log NH']

Finally let's get the broad H$\alpha$ flux

In [14]:
bass_broad_halpha = bass_halpha['F(Ha broad) [e-15 erg/s/cm2]']
bass_broad_halpha = bass_broad_halpha[bass_broad_halpha > 0]

print 'Number of sources with broad H-alpha meaurement =', len(bass_broad_halpha)

Number of sources with broad H-alpha meaurement = 324


Based on the number of sources with each measurement available, it looks like the broad H-alpha measurement will be the limiting factor which is expected since about 1/2 the sources should be Sy 2's which don't have a broad lines.

Now let's combine all of the measurements together into one dataframe and calculate luminosities. But first we need to remove duplicates in the X-ray data because for some reason there are two measurements of the same source. 

#### ***The duplicate sources are sources 249 and 923. I need to ask Mike and Claudio about these.***

In [28]:
bass_fx = bass_fx.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
bass_fx = bass_fx[' Intrinsic F14-150']
bass_nh = bass_nh.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
bass_nh = bass_nh['log NH']

df = pd.DataFrame({'Intrinsic 14-150 Flux':bass_fx, 'Broad Halpha Flux':bass_broad_halpha,
                   'Distance':bass_dist, 'NH':bass_nh, 'Type':bass_type})

# Remove all sources with missing information
df = df.dropna()

print 'Number of sources in final sample =', len(df)

Number of sources in final sample = 311


In [29]:
# Convert to X-ray and Halpha luminosities
df['Intrinsic X-ray Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Intrinsic 14-150 Flux']*10**(-12)
df['Broad Halpha Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Broad Halpha Flux']*10**(-15)

In [30]:
# Save to a CSV file
df.to_csv('../data/cleaned_sample.csv')