# A simple ascii manipulation...

It's time to read my data set in Python. This is the output of my long Supermongo (SM) macro. Inclusions of **type**, **class**, **sigobs** and **photobs** with readable labels were done using Power BI. Columns (features) will be precisely identified elsewhere. This is the source data set to reproduce the main results in Bordalo & Telles 2011, ApJ, 735, 52 (http://adsabs.harvard.edu/abs/2011ApJ...735...52B).

In [1]:
from pandas import read_csv

data = read_csv('../data/lsigma.csv')
data.head(10)

Unnamed: 0,name,lum,sig,oh,ewhb,ion,te,ne,chb,z,typ,typn,ref,type,class,sigobs,photobs
0,UM238,40.024,1.27,7.891,1.554,0.52,4.186,2.938,0.233,0.01427,2,0,1,Gaussian Profile,G,FEROS,B&C
1,mrk557,40.668,1.761,8.697,0.996,-0.715,4.146,2.573,0.383,0.01328,1,1,1,Irregular Profile,I,COUDÉ,B&C
2,UM304,41.546,1.893,0.0,0.0,0.0,4.146,2.309,0.0,0.0157,0,0,14,Profile with Components,C,COUDÉ,Others
3,cts1001,40.81,1.683,7.961,1.775,0.059,4.173,2.927,0.189,0.02263,1,1,1,Irregular Profile,I,FEROS,B&C
4,UM306,40.245,1.282,8.184,1.375,0.344,4.065,1.423,0.082,0.01649,2,0,1,Gaussian Profile,G,FEROS,B&C
5,UM307,41.196,1.693,8.432,1.353,-0.174,4.146,2.993,0.253,0.02249,2,2,1,Gaussian Profile,G,COUDÉ,B&C
6,UM323,39.781,1.254,7.918,1.32,-0.046,4.245,1.423,0.852,0.00648,2,2,1,Gaussian Profile,G',FEROS,B&C
7,Tol0127-397,40.721,1.548,0.0,0.0,0.0,4.146,2.309,0.0,0.01735,2,2,14,Gaussian Profile,G',FEROS,Others
8,Tol0140-420,40.348,1.422,8.057,1.748,0.297,4.106,1.423,0.0,0.02205,0,0,1,Profile with Components,C,FEROS,B&C
9,UM137,39.127,1.162,8.25,0.635,-0.257,4.146,1.423,0.371,0.00591,4,4,1,Gaussian Profile,G,FEROS,B&C


In [2]:
# Removing old columns typ e typn
data = data.drop(columns=['typ','typn'], axis=1)
data.head(10)

Unnamed: 0,name,lum,sig,oh,ewhb,ion,te,ne,chb,z,ref,type,class,sigobs,photobs
0,UM238,40.024,1.27,7.891,1.554,0.52,4.186,2.938,0.233,0.01427,1,Gaussian Profile,G,FEROS,B&C
1,mrk557,40.668,1.761,8.697,0.996,-0.715,4.146,2.573,0.383,0.01328,1,Irregular Profile,I,COUDÉ,B&C
2,UM304,41.546,1.893,0.0,0.0,0.0,4.146,2.309,0.0,0.0157,14,Profile with Components,C,COUDÉ,Others
3,cts1001,40.81,1.683,7.961,1.775,0.059,4.173,2.927,0.189,0.02263,1,Irregular Profile,I,FEROS,B&C
4,UM306,40.245,1.282,8.184,1.375,0.344,4.065,1.423,0.082,0.01649,1,Gaussian Profile,G,FEROS,B&C
5,UM307,41.196,1.693,8.432,1.353,-0.174,4.146,2.993,0.253,0.02249,1,Gaussian Profile,G,COUDÉ,B&C
6,UM323,39.781,1.254,7.918,1.32,-0.046,4.245,1.423,0.852,0.00648,1,Gaussian Profile,G',FEROS,B&C
7,Tol0127-397,40.721,1.548,0.0,0.0,0.0,4.146,2.309,0.0,0.01735,14,Gaussian Profile,G',FEROS,Others
8,Tol0140-420,40.348,1.422,8.057,1.748,0.297,4.106,1.423,0.0,0.02205,1,Profile with Components,C,FEROS,B&C
9,UM137,39.127,1.162,8.25,0.635,-0.257,4.146,1.423,0.371,0.00591,1,Gaussian Profile,G,FEROS,B&C


In [3]:
# Creating a new column to tag the outliers identified in Bordalo & Telles (2011)
# Assigning 1 to the outiliers and 0 to the inliers
# There must be an easiest way to do this...

data['out'] = 0
data.loc[data['name'] == 'UM417', 'out'] = 1
data.loc[data['name'] == 'Tol0505-387', 'out'] = 1
data.loc[data['name'] == 'MRK1201', 'out'] = 1
data.loc[data['name'] == 'MRK1318', 'out'] = 1
data.loc[data['name'] == 'Tol1008-286', 'out'] = 1
data.loc[data['name'] == 'UM463', 'out'] = 1
data.loc[data['name'] == 'UM559', 'out'] = 1
data.loc[data['name'] == 'TOL2138-397', 'out'] = 1

Lastly, I use **astropy** instead of **to_csv** from **pandas** to write the csv and fix column formats: strings and integers are identified automatically and can be omitted in the dictionary. Float formats must be fixed to garantee the original data set precisions. **pandas.DataFrame.to_csv** seems to accept only one format for all columns, that's why I decided to use **astropy.io**.

In [4]:
from astropy.table import Table
from astropy.io import ascii
tab = Table.from_pandas(data)
ascii.write(tab, '../data/lsigma_new.csv', delimiter=',',
            overwrite=True, formats={
                'lum':'%.3f','sig':'%.3f',
                'oh':'%.3f','ewhb':'%.3f','ion':'%.3f',
                'te':'%.3f','ne':'%.3f','chb':'%.3f','z':'%.5f'})

In [5]:
# Checking the new file
data = read_csv('../data/lsigma_new.csv')
data.head(10)

Unnamed: 0,name,lum,sig,oh,ewhb,ion,te,ne,chb,z,ref,type,class,sigobs,photobs,out
0,UM238,40.024,1.27,7.891,1.554,0.52,4.186,2.938,0.233,0.01427,1,Gaussian Profile,G,FEROS,B&C,0
1,mrk557,40.668,1.761,8.697,0.996,-0.715,4.146,2.573,0.383,0.01328,1,Irregular Profile,I,COUDÉ,B&C,0
2,UM304,41.546,1.893,0.0,0.0,0.0,4.146,2.309,0.0,0.0157,14,Profile with Components,C,COUDÉ,Others,0
3,cts1001,40.81,1.683,7.961,1.775,0.059,4.173,2.927,0.189,0.02263,1,Irregular Profile,I,FEROS,B&C,0
4,UM306,40.245,1.282,8.184,1.375,0.344,4.065,1.423,0.082,0.01649,1,Gaussian Profile,G,FEROS,B&C,0
5,UM307,41.196,1.693,8.432,1.353,-0.174,4.146,2.993,0.253,0.02249,1,Gaussian Profile,G,COUDÉ,B&C,0
6,UM323,39.781,1.254,7.918,1.32,-0.046,4.245,1.423,0.852,0.00648,1,Gaussian Profile,G',FEROS,B&C,0
7,Tol0127-397,40.721,1.548,0.0,0.0,0.0,4.146,2.309,0.0,0.01735,14,Gaussian Profile,G',FEROS,Others,0
8,Tol0140-420,40.348,1.422,8.057,1.748,0.297,4.106,1.423,0.0,0.02205,1,Profile with Components,C,FEROS,B&C,0
9,UM137,39.127,1.162,8.25,0.635,-0.257,4.146,1.423,0.371,0.00591,1,Gaussian Profile,G,FEROS,B&C,0


Ready to go...