### Project checklist

- [ ] Title
- [ ] Abstract (max 300 words)
- [ ] env.yml (include both full and cross-platform)
  - _If time..._
    - [ ] Set up container environment to run Notebook (Binder?)
- [ ] Package motivations
- [ ] Include rich text (equations/tables/links/images/vids)
- **I/O**
  - [ ] Use `pandas` to read large data _or_ `numpy` to load from files
  - [ ] Save processed/generated data to disk with `pandas`
- **DATA MANIPULATION**
  - [ ] Needs to include numerical operations (`numpy`, `scipy`, `pandas`) or data transformation (`pandas`)
- **VISUALIZATION**
  - [ ] Min. one composite plot (multi-panel or inset)
  - "[Publication ready figures](https://pubs.acs.org/doi/10.1021/jz500997e)"
    - [ ] The figs are 89 mm wide (single column) or 183 mm wide (double column)
    - [ ] The axes are labeled
    - [ ] The font sizes are sufficiently large
    - [ ] The figures are saved as ~~rasterized images (300 dpi) or~~ **vector art**
- [ ] [Repo Zenodo DOI](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content)

# Project title

## Abstract

Abstract text...

## Index

- [Notebook instructions](#notebook-instructions)
- [Packages](#packages)
  - [Package](#package)

## Notebook instructions

_Information on how to use/run the notebook_.

## Packages

### Package

_Reason for inclusion_.

In [74]:
import sqlite3 as sql
from pathlib import Path

import pandas as pd

In [66]:
base_dir = Path('data').resolve()

sample_dirs = {v.name: [*v.resolve().iterdir()] for v in base_dir.iterdir()}

In [90]:
con = sql.connect(sample_dirs['m1-b_rep'][0] / 'ms2_results.sql')

df = pd.read_sql('SELECT * FROM MS2data', con)

d = sample_dirs['m1-b_rep'][0] / 'top_xls.txt'

top_xls = !cat {d}

df.query(f'XL in {top_xls}')

Unnamed: 0,XL,mgf_file,spectrum_id,spectrum_num,delta,pre_charge,H_L,fragSc,coverage,covered_Frags,covered_Mz,covered_int,main_Mz,main_int,count
0,-.ATALEKELEEK(6)--QSNNKYMASSYLTLTAR(5).-,/srv/data1/home/jo0348st/projects/2023-heusel_...,controllerType=0 controllerNumber=1 scan=44913,242,0.01,4,Heavy,30,3.529412,"AT+,ATA+,ATAL+,ATALE+,ELEEK+,LEEK+,EEK+,Q+,QS+...","173.09206871989,244.1291825046,357.21324648173...","267146.25,171657.25,78572.367,55976.23,287251....","100.07611,101.07134,101.10809,102.05514,105.06...","128613.781,242081.547,26213.936,38480.988,3794...",15


In [93]:
xl_dict = {}

for k, v in sample_dirs.items():

    for path in v:

        print(k, path)

m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_ctrl-tail03_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_top04_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_ctrl-tail02_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_ctrl-tail01_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_top05_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_ctrl-igs01_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_top01_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_ctrl-pls01_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_top02_r0
m1-c_rep /home/jstrobaek/Projects/2023-compute_jupyter_course/data/m1-c_rep/LUIGSeq_m11_top03_r0
m1