# NGC Dataset filtering

I would like to enhance my DSO target list from a 'top 20' by Google. To something more *personable*.

So using the pyongc python package, it is time for some *data shaping*.

In [68]:
import pandas as pd
from pyongc import ongc
from pyongc import data

## My Data Convertion functions

Not sure why the datacolumns are of type *object*, when int would be more usefule. 
This is how I will convert these objects

In [70]:
def myint(a)->int:
    try:
        b=int(a)
    except:
        b=0
    return b

def mystring(a)->str:
    try:
        b=str(a)
    except:
        b=None
    return b
    

In [71]:
# Pull all the NGC Data into a pandas dataframe
ngc=data.all()
#messier,ngc numeric commonnames a str
ngc['messier'] = ngc['messier'].apply(lambda x: myint(x))
ngc['ngc'] = ngc['ngc'].apply(lambda x: myint(x))
ngc['commonnames'] = ngc['commonnames'].apply(lambda x: mystring(x))

## Data type occurance 

Whilst NGC has a lot of Galaxies, it has plenty of other objects.
In case you want to see how many occurances of each type - this is how to get this.

You could then use these values to filter for your own specific data targets.


These are the occurances of object 'types' in the NGC Catalog
Full description at ![https://github.com/mattiaverga/OpenNGC/blob/master/NGC_guide.txt]()


In [72]:
ngc['type'].value_counts(dropna=False)

type
G         10521
OCl         663
Dup         652
*           546
Other       419
**          244
GPair       231
GCl         208
PN          130
Neb          94
HII          83
Cl+N         67
*Ass         64
RfN          38
GTrpl        26
GGroup       13
SNR          11
NonEx        10
EmN           8
Nova          3
DrkN          2
Name: count, dtype: int64

## Messier Data

We will start with the much loved *messier* collection, and then use that as a base.

In [44]:
messier_df=ngc[ngc.messier>0]
# Yes that's it !!

In [74]:
# Average size of a Messier object 
# Which are considered "big" and to an extent *bright*
# You could adjust this as you see fit
#
messier_mean_size=messier_df.majax.mean()
messier_mean_bright=messier_df.bmag.mean()
print(f"Mean Size   is {messier_mean_size} arcmin")
print(f"Mean Bright is {messier_mean_bright} magnitude ")


Mean Size   is 18.167169811320758 arcmin
Mean Bright is 8.197222222222223 magnitude 


## Create New Datasets 

If we take the mean **size** and **brightness** of the Messier Catalog; 

What should be there - which was missed ?? 

We created 3 datasets 

  - large_bright
  - large
  - bright

However no object which already has a Messier id will be in this dataset.

Finally we will get all objects which have a *commonname*, remove the messier items - and call this **common_named_objects**

Slight warning as brightness is a + value in this dataset we need to get magnutude less that the mean.

In [96]:
large_bright_df=ngc[(ngc.majax>messier_mean_size)&(ngc.messier==0)&(ngc.bmag<=messier_mean_bright)]
large_df=ngc[(ngc.majax>messier_mean_size)&(ngc.messier==0)]
bright_df=ngc[(ngc.bmag<=messier_mean_bright)&(ngc.messier==0)]

# Create a Mask of matching records 
mask = (ngc['commonnames'].str.len()>0)
#Apply the Must Have commonname filter
common_named_objects=ngc.loc[mask]
#Only select records which have no messier number
common_named_objects=common_named_objects[common_named_objects.messier==0]

## Data Grouping 

At this point we have 4 dataframes of targets

  - messier_df
  - large_bright_df (no messier objects in it)
  - large_df (no messier objects in it)
  - bright_df (no messier objects in it)
  - common_named_objects (no messier, but could be in the other 2 'large' datasets

We want all record in 1 Dataframe, without duplicates.

In [97]:
mytargets= pd.concat([messier_df, large_bright_df, large_df,bright_df,common_named_objects],ignore_index=True).drop_duplicates()

In [98]:
print(f"large_bright_df has {len(large_bright_df)} records")
print(f"large_df has {len(large_df)} records")
print(f"bright_df has {len(bright_df)} records")
print(f"common_named_objects has {len(common_named_objects)} records")
print("\n\n")


print(f"We have {len(mytargets)} potential objects now")
print(f"Instead of {len(messier_df)} messier objects")
print(f"There are in total {len(ngc)} ngc objects in total")


large_bright_df has 29 records
large_df has 85 records
bright_df has 176 records
common_named_objects has 122 records



We have 412 potential objects now
Instead of 110 messier objects
There are in total 14033 ngc objects in total


In [99]:
filename='extended_messier.feather'
mytargets.to_feather(filename)
print(f"Dataset saved as {filename}")

Dataset saved as extended_messier.feather
