[Pocket is shutting down](https://support.mozilla.org/en-US/kb/future-of-pocket)

I don't trust that anyone else who has alternatives is going to be able to necessarily be around forever so I guess I have to make it myself.

# Requirements
So here's what I want

## Goals
- A browser extension that I can trigger on any link
- An Android app that I can share links to
- The ability to tag links
- The ability to search stored links searching across both tags and text
- Keep costs low. We can use AWS or something, but I want to stay within the free tier as much as possible
- Some sort of monitoring

## Nice-to-haves
- Store the full article text
- Ability to pierce paywalls somewhat
- Use the archive link if the full link has a paywall
- AI-enabled search via embeddings
- CI/CD

## Non-goals
- Dealing with the app store, I'll side-load the app if need be
- Multi-tenancy, I'm ok with just a single account for the entire instance
- Scale


# Setup

## Project Setup
To set up the development environment:

This project uses [uv](https://docs.astral.sh/uv/getting-started/installation/) for managing all things Python including notebooks. This is largely to run tests and notebooks locally. Start the juypter server from this folder

## AWS Account Configuration

### SSO
Optional but recommended: follow the steps in the jupyter notebook here to set it up [here.](./aws-configuration.ipynb)

# Architecture

## V0

### Brainstorming
Here's how this is going to work:

A chrome plugin will - when triggered - give an optional popup to prefill tags, then  take that and the full content of the current web page as html or text, the url, and sends it (and maybe some images) to our "index-document" web service.

This is something that is implemented via an api gateway REST endpoint that triggers aws lambda. The lambda will be named `just-my-links-index-document` and the implementation should be in `python3.13`, and deployed via a container that is stored in an ecr registry.

The index-document service implementation will use aws lambda powertools for their good logging defaults, typing, rest api handling structures, and so on

Logs need to be written to s3 for backup. There should be alerts on that bucket filling up to be too large that email me

Within that service we are going to be receiving the sent documents and using the chromadb python package to add them to the database. The database will be stored in S3 and downloaded to Lambda's /tmp directory at startup, then uploaded back to S3 after processing.

Authentication will be simply checking a bearer token header against a value stored in secret manager

Only one of these lambdas should run at a time

there should also be an eventbridge named `just-my-links` and when the lambda successfully finishes adding a document, it should publish an event with a `type` property of "Document stored"

Then create a command line tool - it can be a python script - that will let me query my documents using boto3 or some other sdk. Using this I should be able to query my chromadb store using natural language and get urls

### Data Flow:

- Chrome plugin captures page content and sends to API Gateway
- API Gateway triggers the a Lambda function
- Lambda athenticates by checking bearer header against value in Secrets Manager
- Lambda downloads ChromaDB from S3 to /tmp, processes and stores documents, then uploads back to S3
- Lambda publishes success event to EventBridge
- CLI tool downloads ChromaDB from S3 to query locally

### Key Features:

- Single Lambda Execution: The Lambda is configured for single concurrent execution to avoid S3 concurrency conflicts
- S3-based Storage: ChromaDB data stored in S3, downloaded to Lambda /tmp for processing, then uploaded back
- Logging & Monitoring: CloudWatch logs backed up to S3 with size alerts
- Event-notifications: EventBridge publishes "Document stored" events
- Container Deployment: Lambda deployed from ECR container registry
- CLI tool interface to download ChromaDB from S3 and query locally using natural language queries, returning URLs from indexed documents.