Merge pull request #3 from nassimberrada/main

Restructure docs
unifyai · May 10, 2024 · be975e0 · be975e0
2 parents 98b81b7 + 8218595
commit be975e0
Show file tree

Hide file tree

Showing 40 changed files with 802 additions and 876 deletions.
diff --git a/docs/api/deploy_router.rst b/docs/api/deploy_router.rst
@@ -0,0 +1,131 @@
+Deploying a router
+==================
+
+In this section, we'll learn how to use the Unify router through the API.
+
+.. note::
+    If you haven't done so, we recommend you learn how to `make a request <https://unify.ai/docs/api/first_request.html>`_ first to get familiar with using the Unify API.
+
+Using the base router
+---------------------
+
+Optimizing a metric
+^^^^^^^^^^^^^^^^^^^
+
+When making requests, you can leverage the information from the `benchmark interface <https://unify.ai/docs/concepts/benchmarks.html>`_
+to automatically route to the best performing provider for the metric you choose. 
+
+Benchmark values change over time, so dynamically routing ensures you always get the best option without having to monitor the data yourself.
+
+To use the base router, you only need to change the provier name to one of the supported configurations. Currently, we support the following configs:
+
+- :code:`lowest-input-cost` / :code:`input-cost`
+- :code:`lowest-output-cost` / :code:`output-cost`
+- :code:`lowest-itl` / :code:`itl`
+- :code:`lowest-ttft` / :code:`ttft`
+- :code:`highest-tks-per-sec` / :code:`tks-per-sec`
+
+For e.g, with the Python package, we can route to the lowest TTFT endpoints as follows:
+
+.. code-block:: python
+
+    import os
+    from unify import Unify
+
+    # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument.
+    unify = Unify("mistral-7b-instruct-v0.2@lowest-ttft")
+
+    response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements")
+
+
+Defining thresholds
+^^^^^^^^^^^^^^^^^^^
+
+Additionally, you have the option to include multiple thresholds for other metrics in each configuration.
+
+This feature enables you to get, for example, the highest tokens per second (:code:`highest-tks-per-sec`) for any provider whose :code:`ttft` is lower than a specific threshold. To set this up, just append :code:`<[float][metric]` to your preferred mode when specifying a provider. To keep things simple, we have added aliases for :code:`output-cost` (:code:`oc`), :code:`input-cost` (:code:`ic`) and :code:`output-tks-per-sec` (:code:`ots`). 
+
+Let's illustrate this with some examples:
+
+- :code:`lowest-itl<0.5input-cost` - In this case, the request will be routed to the provider with the lowest
+  Inter-Token-Latency that has an Input Cost smaller than 0.5 credits per million tokens.
+- :code:`highest-tks-per-sec<1output-cost` - Likewise, in this scenario, the request will be directed to the provider
+  offering the highest Output Tokens per Second, provided their cost is below 1 credit per million tokens.
+- :code:`ttft<0.5ic<15itl` - Now we have something similar to the first example, but we are using :code:`ic` as
+  an alias to :code:`input-cost`, and we have also added :code:`<15itl` to only consider endpoints
+  that have an Inter-Token-Latency of less than 15 ms.
+
+Depending on the specified threshold, there might be scenarios where no providers meet the criteria,
+rendering the request unfulfillable. In such cases, the API response will be a 404 error with the corresponding
+explanation. You can detect this and change your policy doing something like:
+
+
+.. code-block:: python
+
+    import os
+    from unify import Unify
+
+    prompt = "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements"
+
+    # This won't work since no provider has this price! (yet?)
+    unify = Unify("mistral-7b-instruct-v0.2@lowest-itl<0.001ic")
+
+    response = unify.generate(prompt)
+
+    if response.status_code == 404:
+      # We'll get the cheapest endpoint as a fallback
+      payload["model"] = "mistral-7b-instruct-v0.2@lowest-input-cost"
+      response = unify.generate(prompt)
+
+
+.. raw:: html
+
+    <table align="center">
+      <tr>
+        <td>
+          <div>
+              <iframe width="420" height="315" allow="fullscreen;"
+              src="https://www.youtube.com/embed/6T3jMwKfM7k?si=8bcLPXN1yUXjS4ND" class="video">
+              </iframe>
+          </div>
+        </td>
+        <td>
+          <div>
+              <iframe width="420" height="315" allow="fullscreen;"
+                src="https://www.youtube.com/embed/pul7fklQTZQ?si=HQwOm8C31ASuIC8o" class="video">
+              </iframe>
+          </div>
+        </td>
+        <td>
+          <div>
+            <iframe width="420" height="315" allow="fullscreen;"
+              src="https://www.youtube.com/embed/SBwr32iSU8Q?si=Rj3xknJEg0765Psb" class="video">
+            </iframe>            
+          </div>
+        </td>
+      </tr>
+    </table>
+
+
+Using a custom router
+---------------------
+
+If you `trained a custom router <https://unify.ai/docs/interfaces/build_router.html>`_, you can deploy it with the Unify API much like using any other endpoint. Assuming we want to deploy the custom router we trained before, we can use the configuration Id in the same API call code to send our prompts to our custom router as follows:
+
+.. code-block:: python
+
+    import os
+    from unify import Unify
+
+    # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument.
+    unify = Unify("gpt-claude-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@custom”)
+
+    response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements")
+
+.. note::
+    You can also query the API with a CuRL request, among others. Just like explained in the first request page.
+
+Round Up
+--------
+
+That’s it! You now know how to deploy a router to send your prompts to the best endpoints for the metrics or tasks you care about. You can now start optimizing your LLM applications!
diff --git a/docs/home/make_your_first_request.rst → docs/api/first_request.rst b/docs/home/make_your_first_request.rst → docs/api/first_request.rst
@@ -1,52 +1,74 @@
-Make your First Request
-=======================
+Making your first request
+=========================
 
-To make a request, you will need a:
+In this section, you will learn how to use the Unify API to query and route across LLM endpoints. If you haven't done so already, start by `Signing Up <https://console.unify.ai>`_ to get your API key.
 
-#. **Unify API Key**. If you don't have one yet, log in to the `console <https://console.unify.ai/>`_ to get yours.
+Getting a key
+-------------
 
-#. **Model and Provider ID**. Used to identify an endpoint. You can find both in the `benchmark interface. <https://unify.ai/hub>`_ 
+When opening the console, you will first be greeted with the :code:`API` page. This is where you'll find your API key. There, you will also find useful links to our interfaces, where you can interact with the endpoints and the benchmarks, in no-code environments.
 
-For this example, we'll use the :code:`llama-2-70b-chat` model, hosted on :code:`anyscale`. We grabbed both IDs from the corresponding `model page <https://unify.ai/hub/llama-2-70b-chat>`_
+.. image:: ../images/console_api.png
+  :align: center
+  :width: 650
+  :alt: Console API.
+
+.. note::
+    If you suspect your API key was leaked in some way, you can safely regenerate it through this page. You would then only need to replace the old key with the new one in your workflows with the same balance and account settings as before.
+
+Finding a model and provider
+----------------------------
+
+To query an endpoint you will need to specify the model Id and provider Id, both used to identify the endpoint. You can find the Ids for a given model and provider through the model pages on the `benchmark interface. <https://unify.ai/benchmarks>`_
+
+Going through one of the pages, the model Id can be copied from the model name at the top, and the provider Id can be copied from the corresponding rows on the table. For e.g, the model page for **Mistral 7B V2** below shows that the model Id is :code:`mistral-7b-instruct-v0.2`. If you wanted to query the **Fireworks AI** endpoint you would then use :code:`fireworks-ai` as the provider name.
+
+.. image:: ../images/benchmarks_model_page.png
+  :align: center
+  :width: 650
+  :alt: Benchmarks Model Page.
+
+.. note::
+    If you `uploaded a custom endpoint <https://unify.ai/docs/interfaces/connecting_stack.html>`_ then you should be able to query it through the API using the name as the model Id and the provider name as the provider Id. 
+
+Querying an endpoint
+--------------------
 
 Using the Python Package
-------------------------------------
-The easiest way to query these endpoints is using the `unifyai <https://pypi.org/project/unifyai/>`_ Python package.  You can install it doing:
+^^^^^^^^^^^^^^^^^^^^^^^^
 
+The easiest way to use the Unify API is through the `unifyai <https://pypi.org/project/unifyai/>`_ Python package.  You can install it by doing:
 
 .. code-block:: bash
 
     pip install unifyai
 
-To use it in your script, import the package and initialize a :code:`Unify` client with your :code:`UNIFY API KEY`.
-You can then query any endpoint through the :code:`.generate` method.
-To specify the endpoint, you will need a :code:`model` and a :code:`provider`. 
+To use it in your script, import the package and initialize a :code:`Unify` client with your :code:`UNIFY API KEY`. You can then query any endpoint through the :code:`.generate` method. To specify the endpoint, you can use the model and provider Ids from above. 
 
 .. code-block:: python
 
     import os
     from unify import Unify
 
-    unify = Unify(
-        api_key=os.environ.get("UNIFY_KEY"),
-        endpoint="llama-2-7b-chat@anyscale",
-    )
+    # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument.
+    unify = Unify("mistral-7b-instruct-v0.2@fireworks-ai")
 
-    response = unify.generate(user_prompt="Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements")
+    response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements")
 
 This will return a string containing the model's response.
 
-The Python package supports both synchronous and asynchronous clients, as well as streaming responses.
-Check out the `package repo <https://github.com/unifyai/unify-llm-python?tab=readme-ov-file#unify-python-api-library>`_ for more information!
-
+.. note::
+    The Python package also lets you access the list of models and providers for a given model with a couple lines of code. You just need to run
+    :code:`unify.list_models()` to get a list of models and :code:`unify.list_providers("mistral-7b-instruct-v0.2")` to get the providers for a given model.
 
+In addition, the Python package supports both synchronous and asynchronous clients, as well as streaming responses. Check out the `package repo <https://github.com/unifyai/unify-llm-python?tab=readme-ov-file#unify-python-api-library>`_ to learn more!
 
 Using the :code:`inference` Endpoint
-------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-All models can be queried through the :code:`inference` endpoint, which requires a :code:`model`, :code:`provider`, and model :code:`arguments` that may vary across models. 
+All models can be queried through the :code:`inference` endpoint, which also requires a :code:`model` Id, :code:`provider` Id, and model :code:`arguments` that may vary across models. 
 
-In the header, you will need to include the **Unify API Key** that is associated with your account.
+In the header, you will need to include your :code:`Unify API Key`.
 
 .. note::
     Like any HTTP POST request, you can interact with the API using your preferred language!
@@ -60,8 +82,8 @@ Using **cURL**, the request would look like this:
         -H "Authorization: Bearer YOUR_UNIFY_KEY" \
         -H "Content-Type: application/json" \
         -d '{
-            "model": "llama-2-70b-chat",
-            "provider": "anyscale",
+            "model": "mistral-7b-instruct-v0.2",
+            "provider": "fireworks-ai",
             "arguments": {
                 "messages": [{
                     "role": "user",
@@ -85,8 +107,8 @@ If you are using **Python**, you can use the :code:`requests` library to query t
     }
 
     payload = {
-        "model": "llama-2-70b-chat",
-        "provider": "anyscale",
+        "model": "mistral-7b-instruct-v0.2",
+        "provider": "fireworks-ai",
         "arguments": {
             "messages": [{
                 "role": "user",
@@ -109,18 +131,17 @@ If you are using **Python**, you can use the :code:`requests` library to query t
     else:
         print(response.text)
 
-Check out the API reference `here. <https://unify.ai/docs/hub/reference/endpoints.html#post-query>`_ to learn more.
+Check out the `API reference <https://unify.ai/docs/api/reference.html>`_ to learn more.
 
 Using the OpenAI API Format
----------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-We also support the OpenAI API format for :code:`text-generation` models. More specifically, the :code:`/chat/completions` endpoint.
+We also support the OpenAI API format for :code:`text-generation` models. Specifically, the :code:`/chat/completions` endpoint.
 
 This API format wouldn't normally allow you to choose between providers for a given model. To bypass this limitation, the model
 name should have the format :code:`<uploaded_by>/<model_name>@<provider_name>`. 
 
-For example, if :code:`john_doe` uploads a :code:`llama-2-70b-chat` model and we want to query the endpoint that has been deployed in replicate, we would have to use :code:`john_doe/llama-2-70b-chat@replicate` as the model id in the OpenAI API. In this case, there is no username, so we will
-simply use :code:`llama-2-70b-chat@replicate`.
+For example, if :code:`john_doe` uploads a :code:`mistral-7b-instruct-v0.2` model and we want to query the endpoint that has been deployed in :code:`fireworks-ai` replicate, we would have to use :code:`john_doe/mistral-7b-instruct-v0.2@fireworks-ai` as the model Id in the OpenAI API. In this case, there is no username, so we will simply use :code:`mistral-7b-instruct-v0.2@fireworks-ai`.
 
 This is again just an HTTP endpoint, so you can query it using any language or tool. For example, **cURL**:
 
@@ -132,7 +153,7 @@ This is again just an HTTP endpoint, so you can query it using any language or t
         -H 'Authorization: Bearer YOUR_UNIFY_KEY' \
         -H 'Content-Type: application/json' \
         -d '{
-        "model": "llama-2-70b-chat@anyscale",
+        "model": "mistral-7b-instruct-v0.2@fireworks-ai",
             "messages": [{
                 "role": "user",
                 "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements"
@@ -152,52 +173,7 @@ Or **Python**:
     }
 
     payload = {
-        "model": "llama-2-70b-chat@anyscale",
-        "messages": [
-            {
-                "role": "user",
-                "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements"
-            }],
-        "stream": True
-    }
-
-    response = requests.post(url, json=payload, headers=headers, stream=True)
-
-    print(response.status_code)
-
-    if response.status_code == 200:
-        for chunk in response.iter_content(chunk_size=1024):
-            if chunk:
-                print(chunk.decode("utf-8"))
-    else:
-        print(response.text)
-
-The docs for this endpoint are available `here. <https://unify.ai/docs/hub/reference/endpoints.html#post-chat-completions>`_
-
-Runtime Dynamic Routing
------------------------
-
-When making requests, you can also leverage the information from the `benchmarks <https://unify.ai/docs/hub/concepts/benchmarks.html>`_
-to automatically route to the best performing provider for the metric you choose. 
-
-Benchmark values change over time, so dynamically routing ensures you always get the best option without having to monitor the data yourself.
-
-To use the router, you only need to change the provier name to one of the supported configurations, including :code:`lowest-input-cost`, :code:`highest-tks-per-sec` or :code:`lowest-ttft`. You can check out the full list `here <https://unify.ai/docs/hub/concepts/runtime_routing.html#available-modes>`_.
-
-If you are using the :code:`chat/completions` endpoint, this will look like:
-
-.. code-block:: python
-    :emphasize-lines: 9
-
-    import requests
-
-    url = "https://api.unify.ai/v0/chat/completions"
-    headers = {
-        "Authorization": "Bearer YOUR_UNIFY_KEY",
-    }
-
-    payload = {
-        "model": "llama-2-70b-chat@lowest-input-cost",
+        "model": "mistral-7b-instruct-v0.2@fireworks-ai",
         "messages": [
             {
                 "role": "user",
@@ -217,11 +193,10 @@ If you are using the :code:`chat/completions` endpoint, this will look like:
     else:
         print(response.text)
 
-You can learn more about about dynamic routing in the corresponding `page of the docs <https://unify.ai/docs/hub/concepts/runtime_routing.html>`_.
+The docs for this endpoint are available `here. <https://unify.ai/docs/api/reference.html`_
 
 Compatible Tools
-----------------
-
+^^^^^^^^^^^^^^^^
 Thanks to the OpenAI-compatible endpoint, you can easily integrate with lots of LLM tools. For example:
 
 OpenAI SDK
@@ -240,7 +215,7 @@ If your code is using the `OpenAI SDK <https://github.com/openai/openai-python>`
     )
 
     stream = client.chat.completions.create(
-        model="llama-2-70b-chat@anyscale",
+        model="mistral-7b-instruct-v0.2@fireworks-ai",
         messages=[{"role": "user", "content": "Can you say that this is a test? Use some words to showcase the streaming function"}],
         stream=True,
     )
@@ -263,8 +238,13 @@ Let's take a look at this code snippet:
     interpreter.offline = True
     interpreter.llm.api_key = "YOUR_UNIFY_KEY"
     interpreter.llm.api_base = "https://api.unify.ai/v0/"
-    interpreter.llm.model = "openai/llama-2-70b-chat@anyscale"
+    interpreter.llm.model = "openai/mistral-7b-instruct-v0.2@fireworks-ai"
 
     interpreter.chat()
 
 In this case, in order to use the :code:`/chat/completions` format, we simply need to set the model as :code:`openai/<insert_model>`!
+
+Round Up
+--------
+
+You now know how to query LLM endpoints through the Unify API. In the next section, you will learn how to use the API to route across endpoints.
diff --git a/docs/reference/images.rst → docs/api/images.rst b/docs/reference/images.rst → docs/api/images.rst
diff --git a/docs/reference/endpoints.rst → docs/api/reference.rst b/docs/reference/endpoints.rst → docs/api/reference.rst
@@ -1,17 +1,16 @@
-Endpoints
-=========
+API Reference
+=============
 
 Welcome to the Endpoints API reference!
 This page is your go-to resource when it comes to learning about the different Unify API endpoints you can interact with.
 
 .. note::
-  To use the endpoints you will need an API Key. If you don't have one yet, you can go through the instructions in
-  `this page <https://unify.ai/docs/hub/index.html>`_.
+  If you don't have one yet, `Sign Up <https://console.unify.ai>`_ first to get your API key.
 
 -----
 
 GET /get_credits
------------
+----------------
 
 **Get Current Credit Balance**