### Step 1: Retrieve homologs of Gloeobacter rhodopsin (GR)

To identify homologs of **Gloeobacter rhodopsin (GR)**, I ran a BLASTp search against the UniRef90 database using the  
[`scripts/blast_uniref.py`](../scripts/blast_uniref.py) helper script.

- Query: `data/GR.fasta` (accession BAC88139.1)
- Database: UniRef90
- Program: BLASTp
- Date run: 2025-08-22
- Top hits retained: 150
- Output: `data/BAC88139.1_top150_uniref90.fasta`

This was executed once on 2025-08-22 to generate results for downstream analysis.  
To avoid hitting EBI servers repeatedly, the script call is documented below but not re-executed.

In [15]:
import sys, os
from Bio import SeqIO

# Add scripts/ to path so we can import utils
sys.path.append(os.path.abspath(os.path.join("..", "scripts")))
from utils import get_path


In [16]:
# Command used to generate results on 2025-08-22:
# (not executed here to avoid re-running BLAST queries against EBI servers)
# !python ../scripts/blast_uniref.py ../data/GR.fasta oakley@ucsb.edu --max_hits 150 --out ../data/BAC88139.1_top150_uniref90.fasta --log ../logs/blast_runs.log


In [25]:
# Load the saved BLAST results
fasta_file = get_path("data", "BAC88139.1_top150_uniref90.fasta")
print(f"Loading: {fasta_file}")

records = list(SeqIO.parse(fasta_file, "fasta"))
print(f"Loaded {len(records)} sequences")

# Preview a few IDs
for record in records[:5]:
    print(record.id)


Loading: /home/likewise-open/ADS/oakley/labdata/users/Oakley/GitHub/cyano_rhodopsins/data/BAC88139.1_top150_uniref90.fasta
Loaded 144 sequences
UniRef90_Q7NP59
UniRef90_A0A969T0G4
UniRef90_A0A2W7ARY7
UniRef90_A0A969FEC7
UniRef90_A0A925M2S9
