# wordslab-notebooks-lib.context

> Functions to gather context for your LLM from the current notebook cells, files, documents, urls.

In [1]:
#| default_exp context

In [2]:
#| export
import asyncio
from ipykernel.comm import Comm

import nbformat
from fastcore.utils import *
from fastcore.xml import to_xml, Src, Source,Out,Outs,Cell

## Notebook cells

A Jupyter notebook is a convenient way to build context for a LLM one cell after the other: you are working in a fully editable conversation, where each conversation step can be generated by code if needed.

To collect the text of all previous cells from within a python code cell of the current notebook, we need to collaborate with a Jupyterlab frontend extension:
- the Jupyterlab frontend extension is a Javascript function defined in this package: `src\index.ts`, which runs in the web browser of the user
- it calls the Jupyterlab Javascript API to access the current notebook contents and track which cell is running
- it listens to messages sent by the notebook kernel, and sends the current notebook contents and active cell id when requested

The python function: `get_notebook_data()` can be called in any code cell of the notebook:
- it runs in the kernel on the Jupyterlab server marchine
- it send a message to the Jupyterlab frontend extension
- it gets back the "notebook" contents (dict) and the current "cell_id" (string)
- it returns these two values as a Python dictionary

This is the raw notebook data but you need to convert it to a compact string representation to pass it to your LLM.

The Python function: `get_notebook_context()`:
- get the notebook contents and current cell id by calling get_notebook_data()
- truncate the notebook contents just above the current cell (to mimic an ongoing conversation)
- converts everything to a single compact string, delimiting cell with XML tags which are easy to parse for the LLM

If you want to use any of these two functions, you will first need to install the Jupyterlab frontend extension:
- activate your Jupyterlab python virtual environment
- pip install wordslab-notebooks-lib
- restart your Jupyterlab server

The extension is already pre-installed if you work in the wordslab-notebooks environment.

The Jupyterlab extension is reloaded and re-initialized each time you refresh your browser page: 
- to check is the extension is installed and running, look at the browser console and llok for the message 'Wordslab notebooks extension activated'
- hit the refresh button if you encounter a bug and the extension stops working

### Jupyterlab frontend extension

The source code of the Jupyterlab frontend extension can be found in the following files:

Typescript source code, dependencies, and compilation config:

- `src/index.ts`
- `package.json`
- `tsconfig.json`
- `.yarnrc.yml`

Python package integration:

- `MANIFEST.in` -> graft wordslab_notebooks_lib/labextension

This how the compiled Javascript extension is embedded in the Python package.

- `wordslab_notebooks_lib\__init__.py` -> def _jupyter_labextension_paths()

This is what Jupyterlab uses at extension development time to find the Jupyterlab extension in a python module.

- `pyproject.toml` -> classifiers = [ "Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt" ]

This is what Jupyterlab uses at extension deployment time to find the python packages with Jupyterlab extensions.

#### Understand Jupyterlab kernels and frontend extensions

Jupyter kernels technical implementation details

https://chatgpt.com/share/692bea08-4510-8004-b9ab-c02feeb97c08

Jupyterlab extension development tutorial

https://jupyterlab.readthedocs.io/en/latest/extension/extension_tutorial.html

#### Build the Jupyterlab frontend extension

Open a Terminal

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib

source ../../jupyterlab/.venv/bin/activate

# Install dependencies
jlpm clean
jlpm install

# Build TypeScript extension
jlpm build
```

#### Test the extension in development mode

```bash
# Install in development mode
uv pip install -e .

source ../../jupyterlab/.venv/bin/activate

# Register the extension with JupyterLab during development
jupyter labextension develop . --overwrite

# Verify extension is found
jupyter labextension list

# Start JupyterLab
jupyter lab
```

After installing the extension in development mode once, you can iterate very fast:
- update the code in `src/index.ts`
- build the extension with `jlpm build`
- refresh the Jupyterlab single page app in your browser
- test the updated extension

No need to reinstall the extension or to restart Jupyterlab itself, just refrehs your browser page.

#### Publish the extension to pypi when ready

```bash
source ../../jupyterlab/.venv/bin/activate

# Make sure the frontend extension is built
jlpm build

source .venv/bin/activate

# Export notebooks to Python modules
nbdev_export

# Prepare for release
nbdev_prepare

# Publish to PyPI
nbdev_pypi
```

### Python extension client

#### Get notebook contents

In [3]:
#| export
async def get_notebook_data(timeout=0.5):
    future = asyncio.Future()
    
    def on_msg(msg):
        if not future.done():
            future.set_result(msg['content']['data'])
    
    comm = Comm(target_name='wordslab_notebook_comm', show_warning=False)
    comm.on_msg(on_msg)
    comm.send({'request': 'get_notebook_data'})

    try:
        return await asyncio.wait_for(future, timeout=timeout)
    except asyncio.TimeoutError:
        raise TimeoutError("Failed to receive notebook context from frontend")

# The first call always fails, I didn't find a way around it
try:
    await get_notebook_data()
except:
    ...

In [4]:
data =  await get_notebook_data()
data["cell_id"]

'21bd5abf-7507-4f84-874e-52bfdf53ba1c'

In [5]:
data =  await get_notebook_data()
data["cell_id"]

'7d2d4d5c-c7f4-4325-a4dc-60eca3090763'

In [6]:
data =  await get_notebook_data()
notebook_content_dict = data["notebook"]
executing_cell_id = data["cell_id"]
executing_cell_id

'51a7be30-8758-4fdb-ae44-af3e1db751c6'

#### Explore notebook format

https://nbformat.readthedocs.io/en/latest/format_description.html

In [7]:
nb = nbformat.from_dict(notebook_content_dict)

code_language = nb.metadata.language_info.name
print("> " + code_language + " notebook")

for cell in nb.cells:
    if cell.id == executing_cell_id: break
        
    is_markdown = cell.cell_type == "markdown"
    is_code = cell.cell_type == "code"
    is_raw = cell.cell_type == "raw"

    print("---------------------")
    print("cell", cell.id, cell.cell_type)
    print("---------------------")
    if is_markdown:
        print(cell.source[:100])
    elif is_code:
        print(f"```{code_language}\n" + cell.source[:100] + "\n```")
    elif is_raw:
        print(cell.source[:100])
    if is_code and cell.execution_count>0 and len(cell.outputs)>0:
        print("---------------------")
        print("cell outputs", cell.id, cell.execution_count)
        print("---------------------")
        for output in cell.outputs:
            if output.output_type == "stream":
                print(f"<{output.name}>")
                print(output.text[:100])
                print(f"</{output.name}>")
            elif output.output_type == "display_data":
                print("<display>")
                if "data" in output:
                    print("  <data>")
                    repr(output.data)
                    print("  </data>")
                if "metadata" in output and len(output.metadata)>0:
                    print("  <metadata>")
                    repr(output.metadata)
                    print("  </metadata>")
                print("</display>")
            elif output.output_type == "execute_result":
                print("<result>")
                if "data" in output:
                    print("  <data>")
                    print(output.data)
                    print("  </data>")
                if "metadata" in output and len(output.metadata)>0:
                    print("  <metadata>")
                    print(output.metadata)
                    print("  </metadata>")
                print("</result>")
            elif output.output_type == "error":
                print("<error>")
                print(output.ename)
                print(output.evalue)
                for frame in output.traceback:
                    print(frame)
                print("</error>")
        print("---------------------")

> python notebook
---------------------
cell 9d8a6aa0-8f58-4860-bcc1-2bfbdcb438b6 markdown
---------------------
# wordslab-notebooks-lib.context

> Functions to gather context for your LLM from the current notebo
---------------------
cell 845d409b-9d48-4b97-9af6-63143b61e9fa code
---------------------
```python
#| default_exp context
```
---------------------
cell ece4d545-8f78-4232-82fb-e837ea0185e4 code
---------------------
```python
#| export
import asyncio
from ipykernel.comm import Comm

import nbformat
from fastcore.utils import
```
---------------------
cell 0ff6fbdc-4a54-4e29-acbb-07529df8cfdd markdown
---------------------
## Notebook cells

A Jupyter notebook is a convenient way to build context for a LLM one cell after 
---------------------
cell da7ecd61-80f6-4a00-a795-6866d62b32bb markdown
---------------------
### Jupyterlab frontend extension

The source code of the Jupyterlab frontend extension can be found
---------------------
cell 4178ac20-4612-4c8d-8d48-0fd2d2605

#### Format notebook content for LLMs

Convert notebook contents to compact XML - code and format copied from **toolslm by AnswerDotAI**:

https://github.com/AnswerDotAI/toolslm/blob/main/00_xml.ipynb

In [8]:
#| exports
def get_mime_text(data):
    "Get text from MIME bundle, preferring markdown over plain"
    if 'text/markdown' in data: return ''.join(list(data['text/markdown']))
    if 'text/plain' in data: return ''.join(list(data['text/plain']))

In [9]:
#| exports
def cell2out(o):
    "Convert single notebook output to XML format"
    if hasattr(o, 'data'): 
        txt = get_mime_text(o.data)
        if txt: return Out(txt, mime='markdown' if 'text/markdown' in o.data else 'plain')
    if hasattr(o, 'text'):
        txt = o.text if isinstance(o.text, str) else ''.join(o.text)
        return Out(txt, type='stream', name=o.get('name', 'stdout'))
    if hasattr(o, 'ename'): return Out(f"{o.ename}: {o.evalue}", type='error')

In [10]:
#| exports
def cell2xml(cell):
    "Convert notebook cell to concise XML format"
    cts = Source(''.join(cell.source)) if hasattr(cell, 'source') and cell.source else None
    out_items = L(getattr(cell,'outputs',[])).map(cell2out).filter()
    outs = []
    if out_items: outs = Outs(*out_items)
    parts = [p for p in [cts, outs] if p]
    return Cell(*parts, type=cell.cell_type)

In [11]:
#| exports
def nb2xml(nb, until_cell_id):
    cells_xml = []
    for c in nb.cells:
        if c.id == until_cell_id: break
        if c.cell_type in ('code','markdown'):
            cells_xml.append(to_xml(cell2xml(c), do_escape=False))
    return '\n'.join(cells_xml)     

In [19]:
nb2xml(nb, executing_cell_id)[:3000]

'<cell type="markdown"><source># wordslab-notebooks-lib.context\n\n> Functions to gather context for your LLM from the current notebook cells, files, documents, urls.</cell>\n<cell type="code"><source>#| default_exp context</cell>\n<cell type="code"><source>#| export\nimport asyncio\nfrom ipykernel.comm import Comm\n\nimport nbformat\nfrom fastcore.utils import *\nfrom fastcore.xml import to_xml, Src, Source,Out,Outs,Cell</cell>\n<cell type="markdown"><source>## Notebook cells\n\nA Jupyter notebook is a convenient way to build context for a LLM one cell after the other: you are working in a fully editable conversation, where each conversation step can be generated by code if needed.\n\nTo collect the text of all previous cells from within a python code cell of the current notebook, we need to collaborate with a Jupyterlab frontend extension:\n- the Jupyterlab frontend extension is a Javascript function defined in this package: `src\\index.ts`, which runs in the web browser of the user\

In [13]:
#| exports
async def get_notebook_context(timeout=0.5):
    data = await get_notebook_data(timeout=timeout)
    notebook_content = data["notebook"]
    nb = nbformat.from_dict(notebook_content_dict)
    cell_id = data["cell_id"]
    return nb2xml(nb, cell_id)

In [20]:
await get_notebook_context(timeout=0.5)



You can see that the content of this cell, which is below the call to get_notebook_context(), doesn't appear in the context.