**Our LLM Model**

One of the first things we had to do was figure out what LLM we wanted to use. We chose Ollama over GPT and Gemini because it allows full local hosting of its model, offering free and flexible usage without API charges.

Here we take the time to download Ollama on the host. Ollama can be remotely hosted on Google Colab thanks to the build in GPU runtime options.

In [None]:
#install ollama on the host
!curl https://ollama.ai/install.sh | sh

In [None]:
!ollama


 **Introduction of Ngrok:**

 “Ngrok is a tool that creates a secure tunnel from the public internet to your local machine, allowing your app to be accessed remotely. It’s useful when deploying apps or services that are running on your local machine but need to be accessed over the internet.”

In [None]:
 # install dependencies for python script
!pip install aiohttp pyngrok

!ngrok config add-authtoken 2p5IC1UVZyFzKJwqRJMwCAR3HZX_6BgDQLWbWnrjGX6A3xQhW



 **How does this code work?**

 Ngrok basically allowed us to expose the remote Ollama instance to the public. After running this code block
 1. Ollama mistral will begin running
 2. A URL is generated that can be shared as a means to utilize the service (Ollama mistral running remotely)
  meaning we could now access the service from within VScode where the app is running.

We also took the liberty to automate running Ollama's mistral model here.

In order for this code to work - the main user requires a Ngrok account. Ngrok made our code possible but for a more professional development an investment into using AWS or another cloud hosting platform is necessary.

Current Authentication Token is from Yvette Roos from Tufts University.

In [None]:
import os
import asyncio

# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

async def run_process(cmd):
  print('>>> starting', *cmd)
  p = await asyncio.subprocess.create_subprocess_exec(
      *cmd,
      stdout=asyncio.subprocess.PIPE,
      stderr=asyncio.subprocess.PIPE,
  )

  async def pipe(lines):
    async for line in lines:
      print(line.strip().decode('utf-8'))

  await asyncio.gather(
      pipe(p.stdout),
      pipe(p.stderr),
  )

#register an account at ngrok.com and create an authtoken and place it here
await asyncio.gather(
    run_process(['ngrok', 'config', 'add-authtoken','2p5IC1UVZyFzKJwqRJMwCAR3HZX_6BgDQLWbWnrjGX6A3xQhW'])
)

await asyncio.gather(
    run_process(['ollama', 'serve']),
    run_process(['ollama', 'run', 'mistral']),  # Run Ollama Mistral command
    run_process(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434'])
)
