# Sample definitions
This notebook takes the BASS DR 1 catalog as the parent sample and whittles it down to the sample we want to use for our analysis. The cleaned sample will be saved in the ../data directory as a CSV file. 

In [1]:
# Standard module imports
import numpy as np
import pandas as pd

# Useful Directory paths
bass_dir = '/Users/ttshimiz/Dropbox/Research/BASS/'

We start with the entire BASS DR 1 catalog. The measurements we want for our analysis are the intrinsic X-ray luminosity, the broad H$\alpha$ luminosity, and the X-ray absorbing column. We further will need more general information such as the Seyfert type, distance, and redshift.

In [2]:
# Upload the catalog and measurements
bass_general = pd.read_csv(bass_dir+'bass_general_dr1.csv', index_col=0, skiprows=[1])       # File with general info for every source
bass_xray = pd.read_csv(bass_dir+'ricci_xray_fits.csv', index_col=0)           # File with the X-ray spectral fits from Ricci+16
bass_sy_class = pd.read_csv(bass_dir+'bass_seyfert_class_v2.csv', index_col=0)     # File with the Winkler 1992 seyfert class types
bass_halpha = pd.read_csv(bass_dir+'bass_halpha_dr1.csv', index_col=1)         # File with the H-alpha measurements

For now our sample will be defined as those sources that are Sy 1, 1.2, 1.5, 1.8, 1.9, or 2 that have intrinsic X-ray luminosity and broad H-alpha measurements. They further need to have a distance measurement.

In [3]:
# Get all sources with a valid 'Best distance'
bass_dist = bass_general['Best distance']
bass_dist = bass_dist[bass_dist > 0]

print 'Number of sources with distance measurement =', len(bass_dist)

Number of sources with distance measurement = 789


Grab the Seyfert types from bass_sy_class. These Types were determined using the Winkler 1992 classification scheme. Throw out all Blazars, and other types.

In [5]:
bass_type = bass_sy_class['Winkler']
keep = ((bass_type == '1') | (bass_type == '1.2') | (bass_type == '1.5') |
        (bass_type == '1.8') | (bass_type == '1.9') | (bass_type == '2'))
bass_type = bass_type[keep]

print 'Number of sources with Sy Type 1 - 2 =', len(bass_type)

Number of sources with Sy Type 1 - 2 = 594


Let's get all of the sources with a proper intrinsic X-ray flux measurement. This is the intrinsic 14-150 keV flux in units of $10^{-12}$ erg/s/cm$^{2}$

In [6]:
bass_fx = bass_xray[' Intrinsic F14-150']
bass_fx = bass_fx[bass_fx > 0]

print 'Number of sources with intrinsic X-ray meaurement =', len(bass_fx)

Number of sources with intrinsic X-ray meaurement = 832


Let's get the NH values from Claudio's spectral fitting.

In [7]:
bass_nh = bass_xray['log NH']

Finally let's get the broad H$\alpha$ flux

In [8]:
bass_broad_halpha = bass_halpha[['F(Ha broad) [e-15 erg/s/cm2]', 'err.7']]
bass_broad_halpha = bass_broad_halpha[(bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]'] > 0) & (bass_broad_halpha['err.7']) > 0]
bass_broad_halpha['Broad Halpha S/N'] = bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]']/bass_broad_halpha['err.7']
print 'Number of sources with broad H-alpha measurement =', len(bass_broad_halpha)

Number of sources with broad H-alpha measurement = 297


In [9]:
bass_broad_halpha[bass_broad_halpha['Broad Halpha S/N'] < 5.0]

Unnamed: 0_level_0,F(Ha broad) [e-15 erg/s/cm2],err.7,Broad Halpha S/N
BAT Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
397,157.194961,40.479372,3.883335
988,37.660594,18.783547,2.004978
1205,34.902484,12.00696,2.906854


Based on the number of sources with each measurement available, it looks like the broad H-alpha measurement will be the limiting factor which is expected since about 1/2 the sources should be Sy 2's which don't have a broad lines.

Now let's combine all of the measurements together into one dataframe and calculate luminosities. But first we need to remove duplicates in the X-ray data because for some reason there are two measurements of the same source. (Not necessary anymore).

#### ***The duplicate sources are sources 249 and 923. I need to ask Mike and Claudio about these.***  (Got rid of the 249 row with NH = 23.11 and the 923 row with NH = 22.05. Claudio said these are old using a different baseline model. See Slack chat).

In [17]:
#bass_fx = bass_fx.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
#bass_fx = bass_fx[' Intrinsic F14-150']
#bass_nh = bass_nh.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
#bass_nh = bass_nh['log NH']

df = pd.DataFrame({'Intrinsic 14-150 Flux':bass_fx, 'Broad Halpha Flux':bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]'],
                   'Distance':bass_dist, 'NH':bass_nh, 'Type':bass_type})

# Remove all sources with missing information
df = df.dropna()

print 'Number of sources in final sample =', len(df)

Number of sources in final sample = 288


In [18]:
# Convert to X-ray and Halpha luminosities
df['Intrinsic X-ray Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Intrinsic 14-150 Flux']*10**(-12)
df['Broad Halpha Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Broad Halpha Flux']*10**(-15)

In [19]:
# Save to a CSV file
df.to_csv('../data/cleaned_sample.csv')

I also want to create a sample with all Seyferts regardless of whether they have a broad H-alpha measurement so we can look at population fractions.

In [23]:
df_all = pd.DataFrame(index=bass_type.index, data={'Intrinsic 14-150 Flux':bass_fx,'Distance':bass_dist, 'NH':bass_nh, 'Type':bass_type,
                       'Broad Halpha Flux':bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]']})
df_all = df_all[df_all['Distance'] > 0]
df_all['Intrinsic X-ray Luminosity'] = 4*np.pi*(df_all['Distance']*10**6*3.086e18)**2*df_all['Intrinsic 14-150 Flux']*10**(-12)

In [25]:
# Save to a CSV file
df_all.to_csv('../data/cleaned_sample_all_seyferts.csv')

In [31]:
# Look at distribution of Types within the whole sample
print '# Type 1 =', sum(df_all['Type'] == '1')
print '# Type 1.2 =', sum(df_all['Type'] == '1.2')
print '# Type 1.5 =', sum(df_all['Type'] == '1.5')
print '# Type 1.8 =', sum(df_all['Type'] == '1.8')
print '# Type 1.9 =', sum(df_all['Type'] == '1.9')
print '# Type 2 =', sum(df_all['Type'] == '2')

# Type 1 = 53
# Type 1.2 = 107
# Type 1.5 = 102
# Type 1.8 = 0
# Type 1.9 = 100
# Type 2 = 227


In [32]:
# Look at distribution of Types with a broad Halpha and X-ray luminosity measurement.
print '# Type 1 =', sum(df['Type'] == '1')
print '# Type 1.2 =', sum(df['Type'] == '1.2')
print '# Type 1.5 =', sum(df['Type'] == '1.5')
print '# Type 1.8 =', sum(df['Type'] == '1.8')
print '# Type 1.9 =', sum(df['Type'] == '1.9')
print '# Type 2 =', sum(df['Type'] == '2')

# Type 1 = 33
# Type 1.2 = 88
# Type 1.5 = 82
# Type 1.8 = 0
# Type 1.9 = 79
# Type 2 = 6


In [36]:
bass_general.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.5')].index]

Unnamed: 0_level_0,BAT Name,CTPT Name,Other Name,CTPT RA [deg],CTPT Dec [deg],TYPE,REDSHIFT,BAT Measurements,Unnamed: 9,Unnamed: 10,...,L_BAT (from best redshift ),log L_bol,low log L_bol err,high log L_bol err,Hmag,K mag,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
36.0,SWIFT J0051.9+1724,Mrk 1148,,12.9783,17.4329,Sy1,0.064,15.73,29.7,25.89,...,44.42,45.32,0.06,0.06,,,,0.057,,1.5
90.0,SWIFT J0142.0+3922,B2 0138+39B,,25.4906,39.3914,Sy1,0.08,6.57,13.61,9.58,...,44.34,45.24,0.13,0.15,12.489,11.851,,0.0601,9.0,1.5
162.0,SWIFT J0300.0-1048,MCG -02-08-038,,45.018,-10.8246,Sy1,0.032589,8.23,17.63,13.64,...,43.65,44.55,0.1,0.11,11.61,11.276,,0.0688,,1.5
226.0,SWIFT J0433.0+0521,3C 120,,68.2962,5.3543,Sy1,0.03301,36.14,94.36,89.44,...,44.38,45.29,0.02,0.02,11.087,10.241,,0.297,24.0,1.5
318.0,SWIFT J0606.0-2755,2MASX J06054896-2754398,,91.454,-27.9112,Sy1.5,0.089228,7.31,11.76,8.6,...,44.38,45.29,0.12,0.13,13.087,12.185,,0.0303,,1.5
389.0,SWIFT J0747.5+6057,Mrk 10,,116.8714,60.9335,Sy1.2,0.029255,7.92,14.8,11.3,...,43.48,44.38,0.1,0.11,10.852,10.512,,0.0466,,1.5
394.0,SWIFT J0752.2+1937,2MASX J07521780+1935423,,118.0743,19.5951,"QSO, Sy1",0.117217,6.77,15.71,11.38,...,44.76,45.66,0.12,0.13,13.689,12.715,,0.0448,,1.5
418.0,SWIFT J0830.1+4154,2MASX J08294266+4154366,,127.4277,41.9102,Sy1,0.126327,6.11,13.03,9.11,...,44.75,45.65,0.13,0.15,14.764,13.439,,0.0395,,1.5
507.0,SWIFT J1038.8-4942,2MASX J10384520-4946531,,159.6883,-49.7816,Sy1,0.06,10.4,26.21,21.74,...,44.37,45.27,0.07,0.08,12.639,11.908,,0.4968,,1.5
572.0,SWIFT J1148.3+0901,2MASX J11475508+0902284,,176.9795,9.0413,Sy1.5,0.068831,6.85,11.85,8.3,...,44.15,45.05,0.13,0.15,13.356,12.622,,0.0268,,1.5
