# Fruit Tree kraken-biom

## Install the Jupyter 'h5py' custom kernel in an Interactive Terminal Session

- Python Modules https://hpcdocs.static.arizona.edu/software/popular_software/python
- Open On Demand-Jupyter https://uarizona.atlassian.net/wiki/x/bIaHB
- Using Python & Python Packages https://uarizona.atlassian.net/wiki/x/N4eHB
- HPC Containers https://uarizona.atlassian.net/wiki/x/QoaHB

In [None]:
The following Python command `python3 -m venv --system-site-packages /path/to/virtual/env` is used to create a new virtual environment for managing Python packages. 

- `python3`: Specifies which version of Python to use. 
- `-m venv`: Tells Python to use the `venv` module to create a virtual environment. The `venv` module is a built-in module that comes with Python 3 and provides support for creating lightweight "virtual environments" with their own site directories, isolated from system site directories if desired.
- `--system-site-packages`: Allows the virtual environment to access packages installed system-wide and in the virtual environment. By default, a virtual environment is isolated and cannot access packages.
- `/path/to/virtual/env`: Directory where the virtual environment is created.

### In the interactive terminal:

In [None]:
module load python

### Create 'h5py' virtual environment

In [None]:
python3 -m venv --system-site-packages ~/virtual/h5py_env/ 

### Activate the 'h5py_env' environment

In [None]:
source ~/virtual/h5py_env/bin/activate

### Update package installer for Python

In [None]:
pip install --upgrade pip

### Install the `h5py` package in this virtual environment:

In [None]:
pip install h5py

### Install the `jupyter` package in the virtual environment

In [None]:
pip install jupyter --force-reinstall

### Install the ipython kernel in the 'h5py_env' virtual environment

In [None]:
ipython kernel install --name h5py_env --user

### Create a new kernel linked to the virtual environment

In [None]:
python -m ipykernel install --user --name=h5py_kernel --display-name="Python (h5py)"

### Install `h5py` or other packages needed in this virtual environment

In [None]:
pip install h5py

### Open the hidden kernel configuration file `kernel.json` which is what sets up the virtual environment at runtime

In [None]:
$HOME/.local/share/jupyter/kernels/h5py_env/kernel.json

In [None]:
# - Make a note of the (1) path `</path/to/your/environment>/bin/python` and (2) <kernel_name> to use in the edited file.
# - Replace `<your_modules_here>` with the modules you would like to load.
# - Replace the 'kernel.json' code after the '"argv": [' line with the following:
{
 "argv": [
 "bash",
 "-c",
 "module load <your_modules_here> ; </path/to/your/environment>/bin/python -m ipykernel_launcher -f {connection_file}"
 ],
 "display_name": "<kernel_name>",
 "language": "python",
 "metadata": {
 "debugger": true
 }
}

In [None]:
# Example code
 "bash",
 "-c",
 "module load python ; ~/virtual/h5py_env/bin/python3 -m ipykernel_launcher -f {connection_file}"
 ],
 "display_name": "h5py_env",
 "language": "python",
 "metadata": {
 "debugger": true
 }
}

### Save the `kernel.json` file and restart the Jupyter notebook session.

'Kernel' menu > 'Change kernel' > 'h5py_env'

## Run kraken-biom

- kraken-biom github https://github.com/smdabdoub/kraken-biom
- kraken-biom biocontainers https://biocontainers.pro/tools/kraken-biom
- Pulling Docker Containers https://hpcdocs.static.arizona.edu/software/containers/pulling_containers/#pulling-docker-containers

Options used:

- `--kraken_reports_fp`: Input file path. ENSURE THAT THIS DIRECTORY ONLY CONTAINS 'reports.txt' FILES.*
- `--max` and `--min`: Defines the maximum and minimum taxonomic ranks for which counts are recorded. By default, these are set to 'Order' (`O`) for `--max` and 'Species' (`S`) for `--min`. For our project, the default setting (`--max O --min S`) is useful for comprehensive microbial community analyses that require detail at the species level but also include higher taxonomic levels. This range ensures a detailed view of microbial diversity while preventing excessive granular data that may not be useful or could complicate analysis without adding informative value.
- `--output_fp`: Specifies the output file path for the BIOM-format table.
- `--fmt hdf5`: Phyloseq can import BIOM files using the `biomformat` package in R, which supports both BIOM 1.0 and 2.x formats. This format is advantageous due to its compact size and efficiency of handling large datasets.

#### Navigate to working directory

In [18]:
cd /xdisk/kcooper/caparicio/tree-fruit

/xdisk/kcooper/caparicio/tree-fruit


### *MOVE 'output.txt' files

Create '04a_reads_kraken2_output' directory

In [19]:
import os # Library to interact with operating system
os.makedirs('04a_reads_kraken2_output', exist_ok=True)

Ensure executable permissions to move files 

In [22]:
import stat # Module which provides constants and functions for interpreting the results of 'os'

dir_path = "/xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2"

# Ensure the directory path is absolute or adjust as necessary
for filename in os.listdir(dir_path):
    file_path = os.path.join(dir_path, filename)
    if os.path.isfile(file_path):  # Check if it's a file
        # Add execute permissions for owner, group, and others
        os.chmod(file_path, os.stat(file_path).st_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)

Move 'output.txt' files from '/04a_reads_kraken2' to '/04a_reads_kraken2_output'

In [25]:
import shutil  # Shell utilities module

source_dir = "/xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2"
target_dir = "/xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output"

# Iterate through files in the source directory
for filename in os.listdir(source_dir):
    if filename.endswith('output.txt'):  # Check if the file ends with 'output.txt'
        source_path = os.path.join(source_dir, filename)
        target_path = os.path.join(target_dir, filename)
        
        # Move the file
        shutil.move(source_path, target_path)
        print(f"Moved: {filename} to {target_dir}")

Moved: k2_oranges415_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges368_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges412_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_apples340_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges363_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_peaches289_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_peaches202_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges378_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges407_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_apples335_output.txt to /xdisk/kcooper/caparicio/tree-fruit/04a_reads_kraken2_output
Moved: k2_oranges422_output.txt to /xdisk/

### Pull the latest kraken-biom container image

In [4]:
%%bash
apptainer pull --name kraken-biom.sif docker://quay.io/biocontainers/kraken-biom:1.2.0--pyh5e36f6f_0

INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:581f44a8dfce1bf79f4da15b1397143379bf83612ad2a68f417a2e8e9f9bbcdd
Copying blob sha256:c1a16a04cedd950c541fa85e64b62b17eb3b73a7f7e29ea3db23dc9b83dfcade
Copying blob sha256:4ca545ee6d5db5c1170386eeb39b2ffe3bd46e5d4a73a9acbebc805f19607eb3
Copying config sha256:f2b8ab953b50f8d8932f214ab7c5cd9208cc8e2a1365bfd0556de6a2cb7a51f8
Writing manifest to image destination
2024/05/08 11:28:52  info unpack layer: sha256:c1a16a04cedd950c541fa85e64b62b17eb3b73a7f7e29ea3db23dc9b83dfcade
2024/05/08 11:28:52  info unpack layer: sha256:4ca545ee6d5db5c1170386eeb39b2ffe3bd46e5d4a73a9acbebc805f19607eb3
2024/05/08 11:28:52  info unpack layer: sha256:581f44a8dfce1bf79f4da15b1397143379bf83612ad2a68f417a2e8e9f9bbcdd
INFO:    Creating SIF file...


### Run kraken-biom using the pulled container

In [27]:
%%bash
apptainer exec kraken-biom.sif kraken-biom \
  --kraken_reports_fp 04a_reads_kraken2/ \
  --output_fp 05_kraken-biom/tree-fruit_reads.biom \
  --max O --min S --fmt hdf5

# Check if the file was created and echo a confirmation
if [ -f "05_kraken-biom/tree-fruit_reads.biom" ]; then
    echo "BIOM file was successfully produced: 05_kraken-biom/tree-fruit_reads.biom"
else
    echo "Failed to produce BIOM file."
fi  

BIOM file was successfully produced: 05_kraken-biom/tree-fruit_reads.biom
