<a href="https://colab.research.google.com/github/theaidran/AI/blob/main/simple_ui_4bit_textgen_gdrive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM text generation notebook for Google Colab

This notebook uses [https://github.com/oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) to run conversational models in chat mode. Find my latest version [here](https://github.com/eucdee/AI)

‚ñ∂‚è©Run all the cells and a public gradio URL will appear at the bottom in around 5 minutes.ü§ûüê±‚Äçüë§

https://status.gradio.app/

## Parameters

* **save_logs_to_google_drive**: saves your chat logs, characters, and softprompts to Google Drive automatically, so that they will persist across sessions.
* **text_streaming**: streams the text output in real time instead of waiting for the full response to be completed.
* **cai_chat**: makes the interface look like Character.AI. Otherwise, it looks like a standard WhatsApp-like chat.
* **load_in_8bit**: loads the model with 8-bit precision, reducing the GPU memory usage by half. This allows you to use the full 2048 prompt length without running out of memory, at a small accuracy and speed cost.
* **activate_silero_text_to_speech**: responses will be audios instead of text. There are 118 voices available (`en_0` to `en_117`), which can be set in the "Extensions" tab of the interface. You can find samples here: [Silero samples](https://oobabooga.github.io/silero-samples/).
* **activate_sending_pictures**: adds a menu for sending pictures to the bot, which are automatically captioned using BLIP.
* **activate_character_bias**: an extension that adds an user-defined, hidden string at the beginning of the bot's reply with the goal of biasing the rest of the response.
* **chat_language**: if different than English, activates automatic translation using Google Translate, allowing you to communicate with the bot in a different language.

## Updates

* check [README](https://github.com/eucdee/AI/blob/main/README.md) on github for Updates


## Characters

You can use the following websites to create characters compatible with this web UI:

* [JSON character creator](https://oobabooga.github.io/character-creator.html)
* [AI Character Editor](https://zoltanai.github.io/character-editor/)

## Credits

Based on the [original notebook by 81300](https://colab.research.google.com/github/81300/AI-Notebooks/blob/main/Colab-TextGen-GPU.ipynb).

Forked from [Philio](https://github.com/pcrii/Philo-Colab-Collection/blob/main/4bit_TextGen_Gdrive.ipynb). 



In [None]:
#@title 1. Keep this tab alive to prevent Colab from disconnecting you { display-mode: "form" }

#@markdown Press play on the music player that will appear below:
%%html
<audio src="https://oobabooga.github.io/silence.m4a" controls>

In [None]:
#@title 2. Install the web UI
#remember gradio is currently held back
save_logs_to_google_drive = False #@param {type:"boolean"} 
save_everything_to_google_drive = False #@param {type:"boolean"} 
#@markdown remember these models are large and free Gdrive is only 15Ggb <br>
install_gptq = True #@param {type:"boolean"}
#@markdown Install GPTQ-for-LLaMa for 4bit quantized models requiring --wbits 4
from IPython.display import clear_output
if save_logs_to_google_drive:
  import os
  import shutil
  from google.colab import drive
  drive.mount('/content/drive')
  base_folder = '/content/drive/MyDrive'

if save_everything_to_google_drive:
    import os
    import shutil
    from google.colab import drive
    drive.mount('/content/drive')
    base_folder = '/content/drive/MyDrive'
    repo_dir = '/content/drive/MyDrive/text-generation-webui'
    model_dir = '/content/drive/MyDrive/text-generation-webui/models'
    gptq_dir = '/content/drive/MyDrive/text-generation-webui/repositories/GPTQ-for-LLaMa'
    if os.path.exists(repo_dir):
        %cd {repo_dir}
        !git pull
    else:
        %cd /content/drive/MyDrive/
        !git clone https://github.com/oobabooga/text-generation-webui

else:
    model_dir = '/content/text-generation-webui/models'
    repo_dir = '/content/text-generation-webui'
    %cd /content
    !git clone https://github.com/oobabooga/text-generation-webui



if save_logs_to_google_drive:
  if not os.path.exists(f"{base_folder}/oobabooga-data"):
    os.mkdir(f"{base_folder}/oobabooga-data")
  if not os.path.exists(f"{base_folder}/oobabooga-data/logs"):
    os.mkdir(f"{base_folder}/oobabooga-data/logs")
  if not os.path.exists(f"{base_folder}/oobabooga-data/softprompts"):
    os.mkdir(f"{base_folder}/oobabooga-data/softprompts")
  if not os.path.exists(f"{base_folder}/oobabooga-data/characters"):
    shutil.move("text-generation-webui/characters", f"{base_folder}/oobabooga-data/characters")
  else:
    !rm -r "text-generation-webui/characters"
    
  !rm -r "text-generation-webui/softprompts"
  !ln -s "$base_folder/oobabooga-data/logs" "text-generation-webui/logs"
  !ln -s "$base_folder/oobabooga-data/softprompts" "text-generation-webui/softprompts"
  !ln -s "$base_folder/oobabooga-data/characters" "text-generation-webui/characters"

else:
  !mkdir text-generation-webui/logs

!ln -s text-generation-webui/logs .
!ln -s text-generation-webui/characters .
!ln -s text-generation-webui/models .
%rm -r sample_data
%cd text-generation-webui
!wget https://raw.githubusercontent.com/pcrii/Philo-Colab-Collection/main/settings-colab-template.json -O settings-colab-template.json

# Install requirements
!pip install -r requirements.txt
!pip install -r extensions/google_translate/requirements.txt
!pip install -r extensions/silero_tts/requirements.txt
print(f"\033[1;32;1m\n --> If you see a warning about \"pydevd_plugins\", just ignore it and move on to Step 3. There is no need to restart the runtime.\n\033[0;37;0m")

if install_gptq:
    if save_everything_to_google_drive:
        if os.path.exists(gptq_dir):
            %cd {gptq_dir}
            !git pull
            !pip install ninja
            !pip install -r requirements.txt
            !python setup_cuda.py install

        else:
            !mkdir /content/drive/MyDrive/text-generation-webui/repositories
            %cd /content/drive/MyDrive/text-generation-webui/repositories
            !git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
            !ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
            %cd GPTQ-for-LLaMa
            !pip install ninja
            !pip install -r requirements.txt
            !python setup_cuda.py install
    else:
        %mkdir /content/text-generation-webui/repositories/
        %cd /content/text-generation-webui/repositories/
        !git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
        !mkdir -p text-generation-webui/repositories
        !ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa
        %cd GPTQ-for-LLaMa
        !pip install ninja
        !pip install -r requirements.txt
        !python setup_cuda.py install
clear_output()
print("Finished")
if save_logs_to_google_drive or save_everything_to_google_drive:
    drive_NOT_mounted = False
else:
    drive_NOT_mounted = True

if drive_NOT_mounted:
  import os
print("Available Models")
print(os.listdir(model_dir))

Finished
Available Models
['config.yaml', 'place-your-models-here.txt']


In [None]:
#@title 3. Download Model
#@markdown you can insert any huggingface model in Organization/model format
model_download = "TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g" #@param [ "TheBloke/stable-vicuna-13B-GPTQ", "anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g", "anon8231489123/vicuna-13b-GPTQ-4bit-128g", "TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g", "reeducator/vicuna-13b-free", "OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", "Aitrepreneur/wizardLM-7B-GPTQ-4bit-128g", "TheBloke/wizardLM-7B-GPTQ", "gozfarb/oasst-llama13b-4bit-128g", "catalpa/codecapybara-4bit-128g-gptq", "mzedp/dolly-v2-12b-GPTQ-4bit-128g", "autobots/pythia-12b-gptqv2-4bit", "TheBloke/medalpaca-13B-GPTQ-4bit", "TheBloke/gpt4-alpaca-lora-13B-GPTQ-4bit-128g"] {allow-input: true}
#@markdown remember these models are large and free Gdrive is only 15Ggb <br>

#@markdown  Model type: Safetensors or Pickletensors
Safetensor = False #@param {type:"boolean"}

%cd /content/text-generation-webui
with open("download-model.py") as r:
  if Safetensor == False:
    text = r.read().replace("if classifications[i] in ['pytorch', 'pt']:", "if classifications[i] in ['safetensors']:")
  else:
    text = r.read().replace("if classifications[i] in ['safetensors']:", "if classifications[i] in ['pytorch', 'pt']:")
with open("download-model.py", "w") as w:
  w.write(text) 

#@markdown  if box is not checked Pickletensors model will be downloaded if available. Safetensor is always loaded as default in webui
%cd {repo_dir}
!python download-model.py {model_download}
#this lists directorys from your model folder you can copy the name provided for the model you want for use in the the next cell
!rm {model_dir}/place-your-models-here.txt
#clear_output()
if save_logs_to_google_drive or save_everything_to_google_drive:
    drive_NOT_mounted = False
else:
    drive_NOT_mounted = True

if drive_NOT_mounted:
  import os
print("Available Models")
print(os.listdir(model_dir))

In [None]:
#@title 4. Launch
import json

#Close server if is running
!pkill -f -e -c server.py

#@markdown if you dont know what to enter the previous cell should have printed available inputs <br> paste it here
model_load = "TheBloke_vicuna-13B-1.1-GPTQ-4bit-128g" #@param {type:"string"}
# Parameters
#auto_devices = False #@param {type:"boolean"}
load_4bit_models = False #@param {type:"boolean"}

groupsize_128 = False #@param {type:"boolean"}
load_in_8bit = False #@param {type:"boolean"}
chat = True #@param {type:"boolean"}

text_streaming = True #@param {type:"boolean"}
activate_silero_text_to_speech = False #@param {type:"boolean"}
activate_sending_pictures = False #@param {type:"boolean"}
activate_character_bias = False #@param {type:"boolean"}
chat_language = "English" # @param ['Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Azerbaijani', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Bulgarian', 'Catalan', 'Cebuano', 'Chinese (Simplified)', 'Chinese (Traditional)', 'Corsican', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Esperanto', 'Estonian', 'Finnish', 'French', 'Frisian', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hmong', 'Hungarian', 'Icelandic', 'Igbo', 'Indonesian', 'Irish', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Kurdish', 'Kyrgyz', 'Lao', 'Latin', 'Latvian', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Mongolian', 'Myanmar (Burmese)', 'Nepali', 'Norwegian', 'Nyanja (Chichewa)', 'Pashto', 'Persian', 'Polish', 'Portuguese (Portugal, Brazil)', 'Punjabi', 'Romanian', 'Russian', 'Samoan', 'Scots Gaelic', 'Serbian', 'Sesotho', 'Shona', 'Sindhi', 'Sinhala (Sinhalese)', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog (Filipino)', 'Tajik', 'Tamil', 'Telugu', 'Thai', 'Turkish', 'Ukrainian', 'Urdu', 'Uzbek', 'Vietnamese', 'Welsh', 'Xhosa', 'Yiddish', 'Yoruba', 'Zulu']

activate_google_translate = (chat_language != "English")

language_codes = {'Afrikaans': 'af', 'Albanian': 'sq', 'Amharic': 'am', 'Arabic': 'ar', 'Armenian': 'hy', 'Azerbaijani': 'az', 'Basque': 'eu', 'Belarusian': 'be', 'Bengali': 'bn', 'Bosnian': 'bs', 'Bulgarian': 'bg', 'Catalan': 'ca', 'Cebuano': 'ceb', 'Chinese (Simplified)': 'zh-CN', 'Chinese (Traditional)': 'zh-TW', 'Corsican': 'co', 'Croatian': 'hr', 'Czech': 'cs', 'Danish': 'da', 'Dutch': 'nl', 'English': 'en', 'Esperanto': 'eo', 'Estonian': 'et', 'Finnish': 'fi', 'French': 'fr', 'Frisian': 'fy', 'Galician': 'gl', 'Georgian': 'ka', 'German': 'de', 'Greek': 'el', 'Gujarati': 'gu', 'Haitian Creole': 'ht', 'Hausa': 'ha', 'Hawaiian': 'haw', 'Hebrew': 'iw', 'Hindi': 'hi', 'Hmong': 'hmn', 'Hungarian': 'hu', 'Icelandic': 'is', 'Igbo': 'ig', 'Indonesian': 'id', 'Irish': 'ga', 'Italian': 'it', 'Japanese': 'ja', 'Javanese': 'jw', 'Kannada': 'kn', 'Kazakh': 'kk', 'Khmer': 'km', 'Korean': 'ko', 'Kurdish': 'ku', 'Kyrgyz': 'ky', 'Lao': 'lo', 'Latin': 'la', 'Latvian': 'lv', 'Lithuanian': 'lt', 'Luxembourgish': 'lb', 'Macedonian': 'mk', 'Malagasy': 'mg', 'Malay': 'ms', 'Malayalam': 'ml', 'Maltese': 'mt', 'Maori': 'mi', 'Marathi': 'mr', 'Mongolian': 'mn', 'Myanmar (Burmese)': 'my', 'Nepali': 'ne', 'Norwegian': 'no', 'Nyanja (Chichewa)': 'ny', 'Pashto': 'ps', 'Persian': 'fa', 'Polish': 'pl', 'Portuguese (Portugal, Brazil)': 'pt', 'Punjabi': 'pa', 'Romanian': 'ro', 'Russian': 'ru', 'Samoan': 'sm', 'Scots Gaelic': 'gd', 'Serbian': 'sr', 'Sesotho': 'st', 'Shona': 'sn', 'Sindhi': 'sd', 'Sinhala (Sinhalese)': 'si', 'Slovak': 'sk', 'Slovenian': 'sl', 'Somali': 'so', 'Spanish': 'es', 'Sundanese': 'su', 'Swahili': 'sw', 'Swedish': 'sv', 'Tagalog (Filipino)': 'tl', 'Tajik': 'tg', 'Tamil': 'ta', 'Telugu': 'te', 'Thai': 'th', 'Turkish': 'tr', 'Ukrainian': 'uk', 'Urdu': 'ur', 'Uzbek': 'uz', 'Vietnamese': 'vi', 'Welsh': 'cy', 'Xhosa': 'xh', 'Yiddish': 'yi', 'Yoruba': 'yo', 'Zulu': 'zu'}

%cd {repo_dir}
# Applying the selected language and setting the prompt size to 2048
# if 8bit mode is selected
j = json.loads(open('settings-colab-template.json', 'r').read())
j["google_translate-language string"] = language_codes[chat_language]
if load_in_8bit:
  j["chat_prompt_size"] = 2048
with open('settings-colab.json', 'w') as f:
  f.write(json.dumps(j, indent=4))

params = set()
if chat:
  params.add('--chat')

if load_in_8bit:
  params.add('--load-in-8bit')
#if auto_devices:
#  params.add('--auto-devices')
if load_4bit_models:
  params.add('--wbits 4')

if groupsize_128:
  params.add('--groupsize 128')

active_extensions = []
if activate_sending_pictures:
  active_extensions.append('send_pictures')
if activate_character_bias:
  active_extensions.append('character_bias')
if activate_google_translate:
  active_extensions.append('google_translate')
if activate_silero_text_to_speech:
  active_extensions.append('silero_tts')
active_extensions.append('gallery')

if len(active_extensions) > 0:
  params.add(f'--extensions {" ".join(active_extensions)}')

if not text_streaming or activate_google_translate or activate_silero_text_to_speech:
  params.add('--no-stream')
if activate_character_bias:
  params.add('--verbose')

# Starting the web UI with tmux
cmd = f"tmux new -d python server.py --share  --api --autogptq --model {model_load}  --model_type LLaMa --settings settings-colab.json {' '.join(params)} "#>/content/logs.txt    #2>&1 
print(cmd) 
#for guanaco --quant_type  nf4  fp4
#for falcon model --autogptq --trust-remote-code --groupsize 64
!$cmd
!rm -f /tmp/tmuxpipe && mkfifo /tmp/tmuxpipe && tmux pipe-pane -t 0 -o 'cat >> /tmp/tmuxpipe'
!cat /tmp/tmuxpipe > /content/log.txt 2>&1 &


In [None]:
#@title 5. Logs - server is starting
# update and wait until server is fully started

import psutil, time 
from time import sleep
import IPython
from IPython.display import clear_output 
clear_output 

#check if proxy port is open
while((5000 in [i.laddr.port for i in psutil.net_connections()]) != True):
  sleep(5)
  !tail -n 1  /content/log.txt

!tail -n 10  /content/log.txt

Running on local URL:  http://127.0.0.1:7860
INFO:[32mLoading TheBloke_vicuna-13B-1.1-GPTQ-4bit-128g...[0m
INFO:[32mThe AutoGPTQ params are: {'model_basename': 'vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order', 'device': 'cuda:0', 'use_triton': False, 'use_safetensors': False, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None}[0m
INFO:[32mLoaded the model in 10.23 seconds.
[0m
INFO:[32mLoading the extension "gallery"...[0m
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
Starting API at http://127.0.0.1:5000/api
Running on local URL:  http://127.0.0.1:7860


In [None]:
#@title 6. Simple UI 

import IPython
from IPython.display import clear_output 

try: 
  import flask, flask_socketio
except ImportError:
  !pip install Flask flask-socketio eventlet gunicorn
  clear_output()

import requests
import asyncio
import json
import sys
import os
import threading
import secrets
import flask
from flask import Flask, request, jsonify, render_template_string, session
import flask_socketio
from flask_socketio import SocketIO

iport = 5001 # interface port
from google.colab.output import eval_js
print("External link:",end=" ")
print(eval_js(f"google.colab.kernel.proxyPort({iport})"))
from google.colab import output
output.serve_kernel_port_as_iframe(iport)

app = Flask(__name__)
app.logger.info("Starting...") 
socketio = SocketIO(app)
app.secret_key = secrets.token_hex(16)

def log_line() :
  with os.popen('tail -n 1 /content/log.txt') as pse:
    for line in pse:
      return line

try:
    import websockets
except ImportError:
    print("Websockets package not found. Make sure it's installed.")

!fuser -k {iport}/tcp  # close UI port   


#api1
# For local streaming, the websockets are hosted without ssl - ws://
HOST_stream = 'localhost:5005'
URI_stream = f'ws://{HOST_stream}/api/v1/stream'

# For reverse-proxied streaming, the remote will likely host with ssl - wss://
# URI = 'wss://your-uri-here.trycloudflare.com/api/v1/stream'

#api2 for one block reponse
HOST = 'localhost:5000'
URI = f'http://{HOST}/api/v1/generate'


def generate(prompt, temperature, top_p, typical_p, repetition_penalty, top_k):
    request = {
        'prompt': prompt,
        'max_new_tokens': 1000,
        'do_sample': True,
        'temperature': temperature,
        'top_p': top_p,
        'typical_p': typical_p,
        'repetition_penalty': repetition_penalty,
        'top_k': top_k,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
      # 'custom_stopping_strings': "You:" ##for example 
    }

    response = requests.post(URI, json=request)

    if response.status_code == 200:
        result = response.json()['results'][0]['text']
    
    return result

async def run(context, temperature, top_p, typical_p, repetition_penalty, top_k):
    request = {
        'prompt': context,
        'max_new_tokens': 1000,
        'do_sample': True,
        'temperature': temperature,
        'top_p': top_p,
        'typical_p': typical_p,
        'repetition_penalty': repetition_penalty,
        'top_k': top_k,
        'min_length': 0,
        'no_repeat_ngram_size': 0,
        'num_beams': 1,
        'penalty_alpha': 0,
        'length_penalty': 1,
        'early_stopping': False,
        'seed': -1,
        'add_bos_token': True,
        'truncation_length': 2048,
        'ban_eos_token': False,
        'skip_special_tokens': True,
        'stopping_strings': []
    }

    async with websockets.connect(URI_stream, ping_interval=None) as websocket:
        await websocket.send(json.dumps(request))

        #yield context  # Remove this if you just want to see the reply

        while True:
            incoming_data = await websocket.recv()
            incoming_data = json.loads(incoming_data)

            match incoming_data['event']:
                case 'text_stream':
                    yield incoming_data['text']
                case 'stream_end':
                    return 

response_apistream =""

async def print_response_stream(prompt, temperature, top_p, typical_p, repetition_penalty, top_k):
    async for response in run(prompt, temperature, top_p, typical_p, repetition_penalty, top_k):
        global response_apistream
        response_apistream = response_apistream + response

def stop_stream():
    stop_url = f'http://{HOST}/api/v1/stop'
    response = requests.post(stop_url)
    if response.status_code == 200:
        print("Stream stopped successfully.")
    else:
        print("Failed to stop the stream.")        

is_stream_running = False

def api_stream(temperature, top_p, typical_p, repetition_penalty, top_k):
    global question_text, is_stream_running
    asyncio.run(print_response_stream(question_text, temperature, top_p, typical_p, repetition_penalty, top_k))
    #is_stream_running = False


def start_api_stream(temperature, top_p, typical_p, repetition_penalty, top_k):
   global is_stream_running
   if not is_stream_running:
        t1 = threading.Thread(target=api_stream, args=(temperature, top_p, typical_p, repetition_penalty, top_k))
        t1.start()
        is_stream_running = True

#example prompt  
question_text ="This is a conversation with your Assistant. The Assistant is very helpful and is eager to chat with you and answer your questions. You: 4 + 102= ? Assistant:"
answer_text = ""

@app.route('/', methods=['GET', 'POST'])
def index():

    if request.method == 'POST':
        button_clicked = 'button_status' in request.form
        stream_enabled = 'stream_enable' in request.form
        temperature = float(request.form.get('temperature', session.get('temperature', '0.7')))
        top_p = float(request.form.get('top_p', session.get('top_p', '0.1')))
        typical_p = float(request.form.get('typical_p', session.get('typical_p', '1')))
        repetition_penalty = float(request.form.get('repetition_penalty', session.get('repetition_penalty', '1.18')))
        top_k = int(request.form.get('top_k', session.get('top_k', '40')))

        global question_text, answer_text, response_apistream, is_stream_running
        question_text = request.form.get('prompt', '')

        if button_clicked and not stream_enabled:
            answer_text = generate(question_text, temperature, top_p, typical_p, repetition_penalty, top_k)

        if response_apistream == answer_text and response_apistream != "":
            button_clicked = False  # Stream end
            is_stream_running = False
            response_apistream = ""

        if button_clicked and stream_enabled and not is_stream_running:
            start_api_stream(temperature, top_p, typical_p, repetition_penalty, top_k)  # Start stream

        #if button_clicked and not stream_enabled and is_stream_running:
          # stop_stream()
          # button_clicked = False  # Stream end
           # is_stream_running = False
           # response_apistream = ""
        
        if button_clicked and stream_enabled:
            
            if response_apistream == "":
                answer_text = "Generating..."
            else:
                answer_text = response_apistream  # Copy stream chunk

        log_text = log_line()

        session['temperature'] = temperature
        session['top_p'] = top_p
        session['typical_p'] = typical_p
        session['repetition_penalty'] = repetition_penalty
        session['top_k'] = top_k

    else:
        #question_text = request.form.get('prompt', '')
        answer_text = ''
        log_text = ''
        stream_enabled = True
        button_clicked = False
        temperature = float(session.get('temperature', '0.7'))
        top_p = float(session.get('top_p', '0.1'))
        typical_p = float(session.get('typical_p', '1'))
        repetition_penalty = float(session.get('repetition_penalty', '1.18'))
        top_k = int(session.get('top_k', '40'))

    return render_template_string('''
<!DOCTYPE html>
<html>
<head>
    <title>Simple Colab UI</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script>
        function updateSliderValue(slider) {
            var sliderId = slider.id;
            var valueElement = document.getElementById(sliderId + 'Value');
            valueElement.innerText = slider.value;
            }

        function simulateButtonClick() {         //Workaround to get data from colab web url, due o lack of suport of data requests like json etc.        
           if ($("#stream_enable").is(":checked")) {
                document.forms[0].submit();  // Send POST in loop
            }
        }

        function startTimer() {
            setInterval(simulateButtonClick, 2500); // 2.5 seconds
                 // document.forms[0].submit();  // Send POST first time after enabling stream
        }

        $(document).ready(function(){

            $("#loader").hide();   

            $("form").submit(function(event){  // Send Text button function definition 

                $("#button_status").prop("checked", true); // button click    
              //      event.preventDefault(); // Disable sending POST

                $("#loader").show();
                startTime = new Date();
                $("#answer").val("Generating...  0s");
                timer = setInterval(function() {
                    var currentTime = new Date();
                    var elapsedTime = Math.round((currentTime - startTime) / 1000);
                    $("#answer").val("Generating...  " + elapsedTime + "s");                   
                }, 1000);
  
            });

            if ($("#stream_enable").is(":checked") && $("#button_status").is(":checked")) {
                 startTimer(); // run cyclic request for streamdata 
                   } 
        });
    </script>
    <style>
        #loader {
            border: 5px solid #f3f3f3;
            border-top: 5px solid #3498db;
            border-right: 5px solid #3498db;
            border-bottom: 5px solid #f3f3f3;
            border-left: 5px solid #f3f3f3;
            border-radius: 50%;
            width: 10px;
            height: 10px;
            animation: spin 1.5s linear infinite;
        }

        @keyframes spin {
            0% { transform: rotate(0deg); }
            100% { transform: rotate(360deg); }
        }

        .hidden-checkbox {
        position: absolute;
        left: -9999px;
    }
    </style>
</head>
<body>
    <form method="POST">
           <label for="question">Question:</label><br>
        <textarea id="question" name="prompt" cols="160" rows="3">{{ question_text }}</textarea><br>
        <label for="answer">Answer:</label><br>
        <textarea id="answer" name="answer" cols="160" rows="16">{{ answer_text }}</textarea><br>

        <input type="submit" id="sendTextButton" value="Send Text" "><br><br>
        <input type="checkbox" id="stream_enable" name="stream_enable" {% if stream_enabled %}checked{% endif %}>
        <label for="stream_enable">Enable Streaming</label><br><br>&nbsp&nbsp

        <label for="temperature">Temperature:</label>
        <input type="range" id="temperature" name="temperature" min="0" max="2" step="0.01" value="{{ temperature }}" oninput="updateSliderValue(this)">
        <span id="temperatureValue">{{ temperature }}</span>&nbsp&nbsp
        <label for="top_p">Top P:</label>
        <input type="range" id="top_p" name="top_p" min="0" max="1" step="0.01" value="{{ top_p }}" oninput="updateSliderValue(this)">
        <span id="top_pValue">{{ top_p }}</span>&nbsp&nbsp
        <label for="typical_p">Typical P:</label>
        <input type="range" id="typical_p" name="typical_p" min="0" max="1" step="0.01" value="{{ typical_p }}" oninput="updateSliderValue(this)">
        <span id="typical_pValue">{{ typical_p }}</span>&nbsp&nbsp
        <label for="repetition_penalty">Repetition Penalty:</label>
        <input type="range" id="repetition_penalty" name="repetition_penalty" min="0" max="1.5" step="0.01" value="{{ repetition_penalty }}" oninput="updateSliderValue(this)">
        <span id="repetition_penaltyValue">{{ repetition_penalty }}</span>&nbsp&nbsp
        <label for="top_k">Top K:</label>
        <input type="range" id="top_k" name="top_k" min="1" max="200" step="1" value="{{ top_k }}" oninput="updateSliderValue(this)">
        <span id="top_kValue">{{ top_k }}</span>&nbsp&nbsp
        <input type="checkbox" id="button_status" name="button_status" class="hidden-checkbox" {% if button_clicked %}checked{% endif %}> <br>   
        <p>{{ log_text }}</p>    
        <div id="loader"></div>
    </form>
</body>
</html>
    ''', question_text=question_text, answer_text=answer_text, log_text=log_text, stream_enabled=stream_enabled, button_clicked=button_clicked, temperature=temperature, top_p=top_p, typical_p=typical_p, repetition_penalty=repetition_penalty, top_k=top_k)

socketio.run(app, port=iport)


External link: https://tn69zi7ee0l-496ff2e9c6d22116-5001-colab.googleusercontent.com/


<IPython.core.display.Javascript object>

In [None]:
#@title  Close main server
#Close main server
!pkill -f -e -c server.py # stop server
!fuser -k 5001/tcp  # close UI port 


In [None]:
#@title optional install "LLaMa" character file i found on reddit
!wget https://github.com/pcrii/Philo-Colab-Collection/raw/main/llama.json 
!mv llama.json {repo_dir}/characters
