# Sample definitions
This notebook takes the BASS DR 1 catalog as the parent sample and whittles it down to the sample we want to use for our analysis. The cleaned sample will be saved in the ../data directory as a CSV file. 

In [28]:
# Standard module imports
import numpy as np
import pandas as pd

# Useful Directory paths
bass_dir = '/Users/ttshimiz/Dropbox/Research/BASS/'

We start with the entire BASS DR 1 catalog. The measurements we want for our analysis are the intrinsic X-ray luminosity, the broad H$\alpha$ luminosity, and the X-ray absorbing column. We further will need more general information such as the Seyfert type, distance, and redshift.

In [30]:
# Upload the catalog and measurements
bass_general = pd.read_csv(bass_dir+'bass_general_dr1.csv', index_col=0, skiprows=[1])       # File with general info for every source
bass_xray = pd.read_csv(bass_dir+'ricci_xray_fits.csv', index_col=0)           # File with the X-ray spectral fits from Ricci+16
bass_sy_class = pd.read_csv(bass_dir+'bass_seyfert_class_v2.csv', index_col=0)     # File with the Winkler 1992 seyfert class types
bass_halpha = pd.read_csv(bass_dir+'bass_halpha_dr1.csv', index_col=1)         # File with the H-alpha measurements
bass_mbh = pd.read_csv(bass_dir+'bass_mbh_best_dr1.csv', comment='#', index_col=0)   # File with the best MBH estimates and Eddington Ratios

In [3]:
bass_halpha.columns

Index([u'Unnamed: 0', u'BAT Name', u'CTPT Name', u'spec. source',
       u'FWHM narrow [km/s]', u'err', u'F(OI 6300) [e-15 erg/s/cm2]', u'err.1',
       u'F(NII red) [e-15 erg/s/cm2]', u'err.2',
       u'F(NII total) [e-15 erg/s/cm2]', u'err.3',
       u'F(Ha narrow) [e-15 erg/s/cm2]', u'err.4',
       u'F(SII blue) [e-15 erg/s/cm2]', u'err.5',
       u'F(SII red) [e-15 erg/s/cm2]', u'err.6',
       u'F(Ha broad) [e-15 erg/s/cm2]', u'err.7', u'FWHM(Ha broad) [km/s]',
       u'err.8', u'Unnamed: 23', u'quality flag',
       u'6DF flag: flux calibration problem', u'EQW(Ha broad) [A', u'high z',
       u'comments', u'Two broad components', u'table',
       u'Corrected F(OI 6300) [e-15 erg/s/cm2]',
       u'Corrected F(NII red) [e-15 erg/s/cm2]'],
      dtype='object')

For now our sample will be defined as those sources that are Sy 1, 1.2, 1.5, 1.8, 1.9, or 2 that have intrinsic X-ray luminosity and broad H-alpha measurements. They further need to have a distance measurement.

In [31]:
# Get all sources with a valid 'Best distance'
bass_dist = bass_general['Best distance']
bass_dist = bass_dist[bass_dist > 0]

print 'Number of sources with distance measurement =', len(bass_dist)

Number of sources with distance measurement = 789


Grab the Seyfert types from bass_sy_class. These Types were determined using the Winkler 1992 classification scheme. Throw out all Blazars, and other types.

In [32]:
bass_type = bass_sy_class['Winkler']
keep = ((bass_type == '1') | (bass_type == '1.2') | (bass_type == '1.5') |
        (bass_type == '1.8') | (bass_type == '1.9') | (bass_type == '2'))
bass_type = bass_type[keep]

print 'Number of sources with Sy Type 1 - 2 =', len(bass_type)

Number of sources with Sy Type 1 - 2 = 594


Let's get all of the sources with a proper intrinsic X-ray flux measurement. This is the intrinsic 14-150 keV flux in units of $10^{-12}$ erg/s/cm$^{2}$

In [33]:
bass_fx = bass_xray['Intrinsic F14-150']
bass_fx = bass_fx[bass_fx > 0]

print 'Number of sources with intrinsic X-ray meaurement =', len(bass_fx)

Number of sources with intrinsic X-ray meaurement = 830


Let's get the NH values from Claudio's spectral fitting.

In [34]:
bass_nh = bass_xray['log NH']

Finally let's get the broad H$\alpha$ flux

In [35]:
bass_broad_halpha = bass_halpha[['F(Ha narrow) [e-15 erg/s/cm2]', 'err.4', 'F(Ha broad) [e-15 erg/s/cm2]', 'err.7', 'quality flag', '6DF flag: flux calibration problem']]
bass_broad_halpha = bass_broad_halpha[(bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]'] > 0) & (bass_broad_halpha['err.7'] > 0) &
                                      ((bass_broad_halpha['quality flag'] == 1) | (bass_broad_halpha['quality flag'] == 2)) &
                                      (bass_broad_halpha['6DF flag: flux calibration problem'] != 6)]
bass_broad_halpha['Broad Halpha S/N'] = bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]']/bass_broad_halpha['err.7']
print 'Number of sources with broad H-alpha measurement =', len(bass_broad_halpha)

Number of sources with broad H-alpha measurement = 226


Based on the number of sources with each measurement available, it looks like the broad H-alpha measurement will be the limiting factor which is expected since about 1/2 the sources should be Sy 2's which don't have a broad lines.

Now let's combine all of the measurements together into one dataframe and calculate luminosities. But first we need to remove duplicates in the X-ray data because for some reason there are two measurements of the same source. (Not necessary anymore).

#### ***The duplicate sources are sources 249 and 923. I need to ask Mike and Claudio about these.***  (Got rid of the 249 row with NH = 23.11 and the 923 row with NH = 22.05. Claudio said these are old using a different baseline model. See Slack chat).

In [36]:
#bass_fx = bass_fx.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
#bass_fx = bass_fx[' Intrinsic F14-150']
#bass_nh = bass_nh.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')
#bass_nh = bass_nh['log NH']

df = pd.DataFrame({'Intrinsic 14-150 Flux':bass_fx, 'Broad Halpha Flux':bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]'],
                   'Narrow Halpha Flux':bass_broad_halpha['F(Ha narrow) [e-15 erg/s/cm2]'],'Distance':bass_dist, 'NH':bass_nh, 'Type':bass_type})

# Remove all sources with missing information
df = df.dropna()

# Add in MBH and Lbol/LEdd info
df['logMBH'] = bass_mbh['log M']
df['logEddRatio'] = bass_mbh['log (Lbol/Ledd)']
print 'Number of sources in final sample =', len(df)

Number of sources in final sample = 221


In [37]:
# Convert to X-ray and Halpha luminosities
df['Intrinsic X-ray Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Intrinsic 14-150 Flux']*10**(-12)
df['Broad Halpha Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Broad Halpha Flux']*10**(-15)
df['Narrow Halpha Luminosity'] = 4*np.pi*(df['Distance']*10**6*3.086e18)**2*df['Narrow Halpha Flux']*10**(-15)

In [38]:
# Save to a CSV file
df.to_csv('../data/cleaned_sample.csv')

I also want to create a sample with all Seyferts regardless of whether they have a broad H-alpha measurement so we can look at population fractions.

In [12]:
df_all = pd.DataFrame(index=bass_type.index, data={'Intrinsic 14-150 Flux':bass_fx,'Distance':bass_dist, 'NH':bass_nh, 'Type':bass_type,
                       'Broad Halpha Flux':bass_broad_halpha['F(Ha broad) [e-15 erg/s/cm2]']})
df_all = df_all[df_all['Distance'] > 0]
df_all['Intrinsic X-ray Luminosity'] = 4*np.pi*(df_all['Distance']*10**6*3.086e18)**2*df_all['Intrinsic 14-150 Flux']*10**(-12)

In [13]:
# Save to a CSV file
df_all.to_csv('../data/cleaned_sample_all_seyferts.csv')

In [14]:
# Look at distribution of Types within the whole sample
print '# Type 1 =', sum(df_all['Type'] == '1')
print '# Type 1.2 =', sum(df_all['Type'] == '1.2')
print '# Type 1.5 =', sum(df_all['Type'] == '1.5')
print '# Type 1.8 =', sum(df_all['Type'] == '1.8')
print '# Type 1.9 =', sum(df_all['Type'] == '1.9')
print '# Type 2 =', sum(df_all['Type'] == '2')

# Type 1 = 53
# Type 1.2 = 107
# Type 1.5 = 102
# Type 1.8 = 0
# Type 1.9 = 100
# Type 2 = 227


In [15]:
# Look at distribution of Types with a broad Halpha and X-ray luminosity measurement.
print '# Type 1 =', sum(df['Type'] == '1')
print '# Type 1.2 =', sum(df['Type'] == '1.2')
print '# Type 1.5 =', sum(df['Type'] == '1.5')
print '# Type 1.8 =', sum(df['Type'] == '1.8')
print '# Type 1.9 =', sum(df['Type'] == '1.9')
print '# Type 2 =', sum(df['Type'] == '2')

# Type 1 = 22
# Type 1.2 = 67
# Type 1.5 = 68
# Type 1.8 = 0
# Type 1.9 = 60
# Type 2 = 4


I need to look at the Type 1 sources which don't seem to have a broad Halpha component and figure out why that is. First I'll start with the Sy 1 sources.

In [16]:
t1 = df_all[df_all['Type'] == '1.9']
select_halpha = bass_halpha.loc[t1.index]
select_halpha.to_csv('../data/all_type1_9s.csv')

So it looks like for 3 of the sources its simply a S/N issue since there is a flux measurement but its an upper limit I'm guessing since the error component is 0. But then there are some sources with a 0 for the flux measurement and something in the error column so what are these objects?? I'll wait for Mike to respond to get an answer.

The other problem could be that I've selected some blazars. Above, I only got rid of objects if they had a classification of 'blazar' in the 'Winkler' column but there is also a 'Blazar' column as well. 

In [17]:
bass_sy_class.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1')].index]

Unnamed: 0_level_0,Name,CounterName,Winkler,Blazars,Unnamed: 5,Unnamed: 6
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
113.0,SWIFTJ0209,SWIFT J0209.7+5226,1,0.0,,
164.0,g0307353-725003,ESO 031- G 008,1,0.0,,
189.0,g0342037-211440,ESO 548-G081,1,0.0,,
194.0,g0351417-402759,Fairall 1116,1,0.0,,
208.0,g0405340-130814,2MASX J04053399-1308135,1,1.0,,
268.0,g0516212-103342,MCG -02-14-009,1,0.0,,
270.0,PICTORA_single,PICTOR A,1,0.0,,
301.0,specsumswij0543.7,MCG-05-14-012,1,0.0,,
314.0,PKS0558-504_single,PKS 0558-504,1,0.0,,
338.0,PBCJ0635.0-7441,2MASSJ06340353-7446377,1,0.0,,


For 6 of the objects that also seems to be the case. That leaves 10 Sy 1s we need to explain why they don't have a measured broad Halpha component.

Let's move to the Sy 1.2s.

In [20]:
bass_halpha.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.2')].index, ['F(Ha broad) [e-15 erg/s/cm2]', 'err.7']]

Unnamed: 0_level_0,F(Ha broad) [e-15 erg/s/cm2],err.7
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1
16.0,0.0,0.0
39.0,0.0,0.0
45.0,92.538227,0.579317
85.0,108.886168,1.62992
95.0,239.797911,3.869275
111.0,406.32555,1.151953
124.0,0.0,0.0
126.0,132.122533,3.73825
138.0,383.457807,1.636808
147.0,0.0,0.0


Here only 2 sources have an upper limit for the broad Halpha component. 

In [65]:
bass_sy_class.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.2')].index]

Unnamed: 0_level_0,Name,CounterName,Winkler,Blazars,Unnamed: 5,Unnamed: 6
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16.0,J00292+1316,[HB89] 0026+129,1.2,0.0,,
39.0,J00548+2525,[HB89] 0052+251,1.2,0.0,,
124.0,g0225028-231248,PKS 0222-23,1.2,0.0,,
147.0,J02449+6228,[HB89] 0241+622,1.2,0.0,,
214.0,3C111_n08_dered,3C 111.0,1.2,0.0,,
223.0,PBCJ0429.7-6703,2MASXJ04294735-6703205,1.2,0.0,,
285.0,pbc0532.0003,2MASXJ05325752+1345092,1.2,0.0,,
363.0,J07142+4541,Mrk 376,1.2,0.0,,
576.0,J11520-1122,PG 1149-110,1.2,0.0,,
656.0,J13055-1033,PKS 1302-102,1.2,1.0,,


And 3 are also classified as a blazar, leaving 14 objects we need to explain.

Now the Sy 1.5s.

In [67]:
bass_halpha.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.5')].index, ['F(Ha broad) [e-15 erg/s/cm2]', 'err.7']]

Unnamed: 0_level_0,F(Ha broad) [e-15 erg/s/cm2],err.7
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1
36.0,0.0,0.0
90.0,0.0,0.0
162.0,0.0,0.0
226.0,0.0,0.0
318.0,294.894692,0.0
389.0,341.382027,0.0
394.0,0.0,0.0
418.0,165.495195,0.0
507.0,0.0,0.0
572.0,392.17048,0.0


7 Sy 1.5s have what I think are upper limits on their broad Halpha measurements.

In [68]:
bass_sy_class.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.5')].index]

Unnamed: 0_level_0,Name,CounterName,Winkler,Blazars,Unnamed: 5,Unnamed: 6
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
36.0,J00519+1725,Mrk 1148,1.5,0.0,,
90.0,J01419+3923,B2 0138+39B,1.5,1.0,,
162.0,MCG-02-08-038,MCG -02-08-038,1.5,0.0,,
226.0,J04331+0521,3C 120,1.5,1.0,,
318.0,igr06058,2MASXJ06054896-2754398,1.5,0.0,,
389.0,Mrk10_single,Mrk 10,1.5,0.0,,
394.0,spec-1582-52939-0612,2MASX J07521780+1935423,1.5,0.0,,
418.0,spec-0761-54524-0071,2MASX J08294266+4154366,1.5,0.0,,
507.0,swi1038.0003,2MASXJ10384520-4946531,1.5,0.0,,
572.0,spec-1225-52760-0009,2MASX J11475508+0902284,1.5,0.0,,


And 3 are also classified as blazars. That leaves 9 objects that need to be explained.

Finally, the Sy 1.9s.

In [18]:
bass_halpha.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.9')].index, ['F(Ha broad) [e-15 erg/s/cm2]', 'err.7']]

Unnamed: 0_level_0,F(Ha broad) [e-15 erg/s/cm2],err.7
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1
2.0,173.174449,0.858336
5.0,72.438746,0.194586
10.0,0.0,0.0
14.0,234.088352,3.803568
135.0,1756.857173,2.830005
136.0,0.0,0.0
199.0,25.187739,1.982539
205.0,63.391663,0.0
207.0,194.84988,3.669043
246.0,0.0,0.0


In [22]:
bass_sy_class.loc[df_all[pd.isnull(df_all['Broad Halpha Flux']) & (df_all['Type'] == '1.9')].index]

Unnamed: 0_level_0,Name,CounterName,Winkler,Blazars,Unnamed: 5,Unnamed: 6
BAT ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2.0,g0001461-765714,Fairall 1203,1.9,0.0,,
5.0,igr00040,2MASXJ00040192+7019185,1.9,0.0,,
10.0,g0021075-191006,2MASX J00210753-1910056,1.9,0.0,,
14.0,g0026407-530948,2MASX J00264073-5309479,1.9,0.0,,
135.0,g0235135-293617,2MASX J02351345-2936166,1.9,0.0,,
136.0,ESO198-024_single,ESO 198-024,1.9,0.0,,
199.0,g0356200-625139,2MASX J03561995-6251391,1.9,0.0,,
205.0,g0402257-180251,ESO 549- G 049,1.9,0.0,,
207.0,g0405017-371115,ESO 359- G 019,1.9,0.0,,
246.0,g0454429-431423,2MASX J04544295-4314231,1.9,0.0,,


In this case 9 objects have upper limits on their their broad Halpha components and none are blazars. So 10 objects need to be explained.