# wordslab-notebooks-lib.context

> Functions to gather context for your LLM from the current notebook cells, files, documents, urls.

In [1]:
#| default_exp context

In [2]:
#| export
import asyncio
from ipykernel.comm import Comm

import nbformat
from fastcore.utils import *
from fastcore.xml import to_xml, Src, Source,Out,Outs,Cell

## Access all notebook cells from within a code cell

A Jupyter notebook is a convenient way to build context for a LLM one cell after the other: you are working in a fully editable conversation, where each conversation step can be generated by code if needed.

To collect the text of all previous cells from within a python code cell of the current notebook, we need to collaborate with a Jupyterlab frontend extension:
- the Jupyterlab frontend extension is a Javascript function defined in this package: `src\index.ts`, which runs in the web browser of the user
- it calls the Jupyterlab Javascript API to access the current notebook contents and track which cell is running
- it listens to messages sent by the notebook kernel, and sends the current notebook contents and active cell id when requested

The python function: `get_notebook_data()` can be called in any code cell of the notebook:
- it runs in the kernel on the Jupyterlab server marchine
- it send a message to the Jupyterlab frontend extension
- it gets back the "notebook" contents (dict) and the current "cell_id" (string)
- it returns these two values as a Python dictionary

This is the raw notebook data but you need to convert it to a compact string representation to pass it to your LLM.

The Python function: `get_notebook_context()`:
- get the notebook contents and current cell id by calling get_notebook_data()
- truncate the notebook contents just above the current cell (to mimic an ongoing conversation)
- converts everything to a single compact string, delimiting cell with XML tags which are easy to parse for the LLM

If you want to use any of these two functions, you will first need to install the Jupyterlab frontend extension:
- activate your Jupyterlab python virtual environment
- pip install wordslab-notebooks-lib
- restart your Jupyterlab server

The extension is already pre-installed if you work in the wordslab-notebooks environment.

The Jupyterlab extension is reloaded and re-initialized each time you refresh your browser page: 
- to check is the extension is installed and running, look at the browser console and llok for the message 'Wordslab notebooks extension activated'
- hit the refresh button if you encounter a bug and the extension stops working

### Develop a Jupyterlab frontend extension

#### Understand Jupyterlab kernels and frontend extensions

Jupyter kernels technical implementation details

https://chatgpt.com/share/692bea08-4510-8004-b9ab-c02feeb97c08

Jupyterlab extension development tutorial

https://jupyterlab.readthedocs.io/en/latest/extension/extension_tutorial.html

#### Initialize the components of a frontend extension

The source code of the Jupyterlab frontend extension can be found in the following files:

Typescript source code, dependencies, and compilation config:

- `src/index.ts`
- `package.json`
- `tsconfig.json`
- `.yarnrc.yml`

Extension manifest and Javascript compiled code

- wordslab_notebooks_lib/labextension
  - package.json
  - static/remoteEntry.97d57e417eaf8ebadeb6.js 

This is how the extension files are included in the python package:

- `MANIFEST.in` 

```
include install.json
include package.json
recursive-include wordslab_notebooks_lib/labextension *

graft wordslab_notebooks_lib/labextension
graft src
```

This is how the extension files are installed in Jupyterlab extensions directory when the python package is installed:

- `pyproject.toml`

```toml
[tool.setuptools]
include-package-data = true 

[tool.setuptools.data-files]
"share/jupyter/labextensions/wordslab-notebooks-lib" = [
  "wordslab_notebooks_lib/labextension/package.json",
  "install.json"
]
"share/jupyter/labextensions/wordslab-notebooks-lib/static" = [
  "wordslab_notebooks_lib/labextension/static/*"
]
```

This how the command `jupyter labextension develop` finds the directory where the extension files live:

- `wordslab_notebooks_lib\__init__.py`

```python
def _jupyter_labextension_paths():
    return [{
        "src": "labextension",
        "dest": "wordslab-notebooks-lib"
    }]
```

This is how the python package is identified as a Jupyterlab extension in pypi:

- `pyproject.toml`

```
classifiers = [ "Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt" ]
```

#### Install the Jupyterlab frontend extension in development mode

Open a Terminal

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib
source $JUPYTERLAB_ENV/.venv/bin/activate

# Install Javascript dependencies
jlpm install

# Build TypeScript extension
jlpm build

# Register the extension with JupyterLab during development
# jupyter labextension develop . --overwrite
rm $JUPYTERLAB_ENV/.venv/share/jupyter/labextensions/wordslab-notebooks-lib
ln -s $WORDSLAB_WORKSPACE/wordslab-notebooks-lib/wordslab_notebooks_lib/labextension/ $JUPYTERLAB_ENV/.venv/share/jupyter/labextensions/wordslab-notebooks-lib

# Verify extension is found
jupyter labextension list
```

#### Test the Jupyterlab frontend extension 

After installing the extension in development mode once, you can iterate fast:
- update the code in `src/index.ts`
- build the extension with `jlpm build`

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib
source $JUPYTERLAB_ENV/.venv/bin/activate

# Build TypeScript extension
jlpm build
```
- **refresh** the Jupyterlab single page app in your browser
- test the updated extension

No need to reinstall the extension or to restart Jupyterlab itself, just refresh your browser page.

#### Install the python client library in development mode

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib
source .venv/bin/activate

# Install nbdev and twine
# Install the wordslab-notebooks-lib python library in editable mode
uv sync --dev
```

#### Generate the python library from the source notebooks

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib
source .venv/bin/activate

# Export notebooks to Python modules
nbdev_export

# Clean the notebooks before commit in git
nbdev_clean
```

#### Test the python client library

After installing the client library in development mode once, you can iterate fast:
- create a notebook using the kernel "wordslab-notebooks-lib"
- restart the kernel if needed
- import wordslab_notebooks_lib
- use the functions defined in the library

#### Publish the extension to pypi when ready

Create a file called ~/.pypirc with your token details. It should have these contents:

```toml
[pypi]
username = __token__
password = your_pypi_token
```

Then execute the following commands:

```bash
cd $WORDSLAB_WORKSPACE/wordslab-notebooks-lib
source .venv/bin/activate

# Bump the version number
nbdev_bump_version

# Publish to PyPI
nbdev_pypi
```

### Develop a python client for the extension

#### Get notebook cells

In [3]:
#| export
def _notebook_data():
    future = asyncio.Future()
    
    def on_msg(msg):
        if not future.done():
            future.set_result(msg['content']['data'])
    
    comm = Comm(target_name='wordslab_notebook_comm', show_warning=False)
    comm.on_msg(on_msg)
    comm.send({'request': 'get_notebook_data'})

    return future

async def get_notebook_data(timeout=1):
    future = _notebook_data()
    try:
        return await asyncio.wait_for(future, timeout=timeout)
    except asyncio.TimeoutError:
        try:
            future = _notebook_data()
            return await asyncio.wait_for(future, timeout=timeout)
        except asyncio.TimeoutError:   
            raise TimeoutError("Failed to receive notebook context from Jupyterlab frontend: install wordslab-notebooks-lib extension, or increase the timeout parameter in seconds, or try to refresh the web page.")

In [4]:
data =  await get_notebook_data()
data["cell_id"]

'21bd5abf-7507-4f84-874e-52bfdf53ba1c'

In [5]:
data =  await get_notebook_data()
data["cell_id"]

'7d2d4d5c-c7f4-4325-a4dc-60eca3090763'

In [6]:
data =  await get_notebook_data()
notebook_content_dict = data["notebook"]
executing_cell_id = data["cell_id"]
executing_cell_id

'51a7be30-8758-4fdb-ae44-af3e1db751c6'

#### Explore the notebook format

https://nbformat.readthedocs.io/en/latest/format_description.html

In [7]:
nb = nbformat.from_dict(notebook_content_dict)

code_language = nb.metadata.language_info.name
print("> " + code_language + " notebook")

for cell in nb.cells:
    if cell.id == executing_cell_id: break
        
    is_markdown = cell.cell_type == "markdown"
    is_code = cell.cell_type == "code"
    is_raw = cell.cell_type == "raw"

    print("---------------------")
    print("cell", cell.id, cell.cell_type)
    print("---------------------")
    if is_markdown:
        print(cell.source[:100])
    elif is_code:
        print(f"```{code_language}\n" + cell.source[:100] + "\n```")
    elif is_raw:
        print(cell.source[:100])
    if is_code and cell.execution_count>0 and len(cell.outputs)>0:
        print("---------------------")
        print("cell outputs", cell.id, cell.execution_count)
        print("---------------------")
        for output in cell.outputs:
            if output.output_type == "stream":
                print(f"<{output.name}>")
                print(output.text[:100])
                print(f"</{output.name}>")
            elif output.output_type == "display_data":
                print("<display>")
                if "data" in output:
                    print("  <data>")
                    repr(output.data)
                    print("  </data>")
                if "metadata" in output and len(output.metadata)>0:
                    print("  <metadata>")
                    repr(output.metadata)
                    print("  </metadata>")
                print("</display>")
            elif output.output_type == "execute_result":
                print("<result>")
                if "data" in output:
                    print("  <data>")
                    print(output.data)
                    print("  </data>")
                if "metadata" in output and len(output.metadata)>0:
                    print("  <metadata>")
                    print(output.metadata)
                    print("  </metadata>")
                print("</result>")
            elif output.output_type == "error":
                print("<error>")
                print(output.ename)
                print(output.evalue)
                for frame in output.traceback:
                    print(frame)
                print("</error>")
        print("---------------------")

> python notebook
---------------------
cell 9d8a6aa0-8f58-4860-bcc1-2bfbdcb438b6 markdown
---------------------
# wordslab-notebooks-lib.context

> Functions to gather context for your LLM from the current notebo
---------------------
cell 845d409b-9d48-4b97-9af6-63143b61e9fa code
---------------------
```python
#| default_exp context
```
---------------------
cell ece4d545-8f78-4232-82fb-e837ea0185e4 code
---------------------
```python
#| export
import asyncio
from ipykernel.comm import Comm

import nbformat
from fastcore.utils import
```
---------------------
cell 0ff6fbdc-4a54-4e29-acbb-07529df8cfdd markdown
---------------------
## Access all notebook cells from within a code cell

A Jupyter notebook is a convenient way to buil
---------------------
cell 9843c2c4-ac54-46d6-9725-0e957e944e3a markdown
---------------------
### Develop a Jupyterlab frontend extension
---------------------
cell 4178ac20-4612-4c8d-8d48-0fd2d2605aa9 markdown
---------------------
#### Understand Jupyte

#### Format the notebook cells for LLMs

Convert notebook contents to compact XML - code and format copied from **toolslm by AnswerDotAI**:

https://github.com/AnswerDotAI/toolslm/blob/main/00_xml.ipynb

In [8]:
#| exports
def get_mime_text(data):
    "Get text from MIME bundle, preferring markdown over plain"
    if 'text/markdown' in data: return ''.join(list(data['text/markdown']))
    if 'text/plain' in data: return ''.join(list(data['text/plain']))

In [9]:
#| exports
def cell2out(o):
    "Convert single notebook output to XML format"
    if hasattr(o, 'data'): 
        txt = get_mime_text(o.data)
        if txt: return Out(txt, mime='markdown' if 'text/markdown' in o.data else 'plain')
    if hasattr(o, 'text'):
        txt = o.text if isinstance(o.text, str) else ''.join(o.text)
        return Out(txt, type='stream', name=o.get('name', 'stdout'))
    if hasattr(o, 'ename'): return Out(f"{o.ename}: {o.evalue}", type='error')

In [10]:
#| exports
def cell2xml(cell):
    "Convert notebook cell to concise XML format"
    cts = Source(''.join(cell.source)) if hasattr(cell, 'source') and cell.source else None
    out_items = L(getattr(cell,'outputs',[])).map(cell2out).filter()
    outs = []
    if out_items: outs = Outs(*out_items)
    parts = [p for p in [cts, outs] if p]
    return Cell(*parts, type=cell.cell_type)

In [11]:
#| exports
def nb2xml(nb, until_cell_id):
    cells_xml = []
    for c in nb.cells:
        if c.id == until_cell_id: break
        if c.cell_type in ('code','markdown'):
            cells_xml.append(to_xml(cell2xml(c), do_escape=False))
    return '\n'.join(cells_xml)     

In [12]:
nb2xml(nb, executing_cell_id)[:3000]

'<cell type="markdown"><source># wordslab-notebooks-lib.context\n\n> Functions to gather context for your LLM from the current notebook cells, files, documents, urls.</cell>\n<cell type="code"><source>#| default_exp context</cell>\n<cell type="code"><source>#| export\nimport asyncio\nfrom ipykernel.comm import Comm\n\nimport nbformat\nfrom fastcore.utils import *\nfrom fastcore.xml import to_xml, Src, Source,Out,Outs,Cell</cell>\n<cell type="markdown"><source>## Access all notebook cells from within a code cell\n\nA Jupyter notebook is a convenient way to build context for a LLM one cell after the other: you are working in a fully editable conversation, where each conversation step can be generated by code if needed.\n\nTo collect the text of all previous cells from within a python code cell of the current notebook, we need to collaborate with a Jupyterlab frontend extension:\n- the Jupyterlab frontend extension is a Javascript function defined in this package: `src\\index.ts`, which r

In [15]:
#| exports
async def get_notebook_context(timeout=1):
    data = await get_notebook_data(timeout=timeout)
    notebook_content = data["notebook"]
    nb = nbformat.from_dict(notebook_content)
    cell_id = data["cell_id"]
    return nb2xml(nb, cell_id)

In [16]:
await get_notebook_context(timeout=1)



You can see that the content of this cell, which is below the call to get_notebook_context(), doesn't appear in the context.