### DataSet structure
The main thing to know about the PinkRigs organisation is that we store two types of data about the experiments: 
- metadata (CSVs) summarising each animal and experiemntal session
- experimental session data
Typically when you run analysis on the PinkRigs data you will need to do the following: 
1) query the metadata to make sure you included all the data that fits your requrements
2) load in the details (events,spikes,cameras) of the datasets that you have selected


### Querying experiments
You can query the experiements using the `query.queryCSV` module, e.g.:

In [1]:
from pinkrigs_tools.dataset.query import queryCSV

exp = queryCSV(
  subject='AV043',
  expDate='2024-03-14:2024-03-24', 
  expDef = 'multiSpaceWorld_checker_training',
  )



### Loading the data
You can also direcrly query and then load the ONE folder content in one line using `load_data`. To specify the ONE folder content to load, you need to give a nested dictionary to the `data_name_dict` argument of the `load_data`. The nesting follows the ONE data structure `{collection:{'object':'attribute'}}`. For example: 
#### Events data

In [1]:
from pinkrigs_tools.dataset.query import load_data

# define parameters of your query
exp_kwargs = {
    'subject': ['AV043'],
    'expDate': '2024-03-14:2024-03-15',
    }

# define the ONE data to load
data_name_dict = { 'events': {'_av_trials': 'table'}}
recordings = load_data(data_name_dict=data_name_dict,**exp_kwargs)

#### Spikes data
(this operation is the slowest! So, in order to avoid loading in unwanted data, you should probably query the data first and then only load in spike data for datasets that you ensured you want to use in your analysis.)

In [3]:
ephys_dict = {'spikes':'all','clusters':'all'}
# both probes 
data_name_dict = {'probe0':ephys_dict,'probe1':ephys_dict} 
recordings = load_data(data_name_dict=data_name_dict,**exp_kwargs)

#### Camera data

In [4]:
cameras = ['frontCam','sideCam','eyeCam']
data_name_dict = {cam:{'camera':['times','ROIMotionEnergy']} for cam in cameras}
recordings = load_data(data_name_dict=data_name_dict,**exp_kwargs)

Alternatively you can also first query the data using `queryCSV`, subset your DataFrame as you wish, and load the ONE object only on your subset using 'load_data'.

In [5]:
recordings = load_data(recordings=exp.iloc[0:1], data_name_dict = {'events':{'_av_trials':'all'}})

Or just load every data together by inputting `all-default` as the `data_name_dict`! This will load `events`,`probe0`,`probe1`,`frontCam`,`eyeCam` and `sideCam` 

In [8]:
# define which data you need
recordings = load_data(
    subject = 'AV043',
    expDate  = '2024-03-14',
    data_name_dict='all-default')

#### Utility functions
There are numerous utility functions to process both event data `utils.ev_utils`, and spike data `utils.spike_utils`. Please brose those functions freely but be aware that at the moment they might change at times. 
For example you can `format_events` for the audiovisual events task ev structure (returning a `pd.df`) for more ideal processing in python.
For spikes data you can use `format_cluster_data`, which will parse the anatomical location and the bombcell quality metrics of your units. 
(sorry it has some warning messages atm!)



In [13]:
from pinkrigs_tools.utils import ev_utils
from pinkrigs_tools.utils import spk_utils

example_active_session  = recordings[recordings.expDef=='multiSpaceWorld_checker_training'].iloc[0]


ev = example_active_session.events._av_trials
spikes = example_active_session.probe0.spikes
clusters = example_active_session.probe0.clusters


formatted_events = ev_utils.format_events(ev)

formatted_cluster_data = spk_utils.format_cluster_data(clusters) 


  level=df_regions.depth.to_numpy().astype(np.uint16),
  order=df_regions.graph_order.to_numpy().astype(np.uint16))
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clusInfo.brainLocationAcronyms_ccf_2017[


#### Neurometric criteria of data selection
(takes a long time!)
Sometimes, you just want to call experimental sessions with neural data from a specific brain region, or ensure that each session you are calling is from a separate brain region. `load_data` can also handle that for you with some of its arguments that specifically relate to neural data. More broadly we deal with: 
- several probes per recordings: 
    - you can use the `unwrap_probes` argument to flatten the recordings DataFrame such that each probe is a separate row. In this case the neural data is merged under the `probe` column and the `probeID` column will contain info about which probe each row corresponds to (`probe0` or `probe1` on the ONE folder)
    - you can also use the `merge_probes` to instead not create a sepatate row but just re-ID the clusters (adding 1000 to probe1 clusterIDs)
- chronic recordings: 
    - `filter_unique_shank_positions` where we only allow each botrow position to be sampled once
- region selection
    to load experiments only when minimum 10 neurons etc. are in a particular brain region defined by Allen Acronyms. 

For Example the below code loads in all the data with minimum 20 neurons in MRN in `AV030`:

In [5]:
exp_kwargs = {
    'subject': ['AV030'],
    'expDate': 'postImplant',
    'expDef': 'multiSpaceWorld'
    }
recordings = load_data(data_name_dict = 'all-default',
                             unwrap_probes= False,
                             merge_probes=True,
                             filter_unique_shank_positions = False,
                             region_selection={'region_name':'MRN',
                                                'framework':'Beryl',
                                                'min_fraction':20,
                                                'goodOnly':True,
                                                'min_spike_num':300},
                            **exp_kwargs
                             )

  level=df_regions.depth.to_numpy().astype(np.uint16),
  order=df_regions.graph_order.to_numpy().astype(np.uint16))


#### Call and save out pre-curated datasets 
Oftentimes I use neurometric criteria, but because it takes a long time, you want to compte the experiments that you want to analyse once, and then you can load just those experiments. For this, I also wrote a function (`dataset.pre_cured.call_`) to call just predtermined fdatasets where I aleady set up the selection criteria. With this, you save your selection in your `analysis_folder` and load summary data with the latest timestamp. You can recompute your selection using the `recompute_data_selection` argument. For example, with the below code will call all the data where mice were recorded in the forebrain while doing the audiovisual task.

(not ready) You can also use `extract. ...` to save out the trial data with spiking and movement. 

(not ready) You can also use `extract. ...` to save out the binned time series, which contains binned neural,camera and event data and event triggered toeplitz matrices. 


In [None]:
from pathlib import Path
from pinkrigs_tools.dataset.pre_cured import call_


analysis_folder = Path(r'path_to_analysis_folder')

recordings = call_(subject_set='forebrain',
                             dataset_type='active',
                             spikeToInclde=True,
                             camToInclude=False,
                             recompute_data_selection = False,
                             unwrap_probes= True,
                             merge_probes=False,
                             region_selection=None,
                             filter_unique_shank_positions = True,
                             analysis_folder = analysis_folder)