diff --git a/docs/api/deploy_router.rst b/docs/api/deploy_router.rst new file mode 100644 index 0000000..a7eb731 --- /dev/null +++ b/docs/api/deploy_router.rst @@ -0,0 +1,131 @@ +Deploying a router +================== + +In this section, we'll learn how to use the Unify router through the API. + +.. note:: + If you haven't done so, we recommend you learn how to `make a request `_ first to get familiar with using the Unify API. + +Using the base router +--------------------- + +Optimizing a metric +^^^^^^^^^^^^^^^^^^^ + +When making requests, you can leverage the information from the `benchmark interface `_ +to automatically route to the best performing provider for the metric you choose. + +Benchmark values change over time, so dynamically routing ensures you always get the best option without having to monitor the data yourself. + +To use the base router, you only need to change the provier name to one of the supported configurations. Currently, we support the following configs: + +- :code:`lowest-input-cost` / :code:`input-cost` +- :code:`lowest-output-cost` / :code:`output-cost` +- :code:`lowest-itl` / :code:`itl` +- :code:`lowest-ttft` / :code:`ttft` +- :code:`highest-tks-per-sec` / :code:`tks-per-sec` + +For e.g, with the Python package, we can route to the lowest TTFT endpoints as follows: + +.. code-block:: python + + import os + from unify import Unify + + # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. + unify = Unify("mistral-7b-instruct-v0.2@lowest-ttft") + + response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") + + +Defining thresholds +^^^^^^^^^^^^^^^^^^^ + +Additionally, you have the option to include multiple thresholds for other metrics in each configuration. + +This feature enables you to get, for example, the highest tokens per second (:code:`highest-tks-per-sec`) for any provider whose :code:`ttft` is lower than a specific threshold. To set this up, just append :code:`<[float][metric]` to your preferred mode when specifying a provider. To keep things simple, we have added aliases for :code:`output-cost` (:code:`oc`), :code:`input-cost` (:code:`ic`) and :code:`output-tks-per-sec` (:code:`ots`). + +Let's illustrate this with some examples: + +- :code:`lowest-itl<0.5input-cost` - In this case, the request will be routed to the provider with the lowest + Inter-Token-Latency that has an Input Cost smaller than 0.5 credits per million tokens. +- :code:`highest-tks-per-sec<1output-cost` - Likewise, in this scenario, the request will be directed to the provider + offering the highest Output Tokens per Second, provided their cost is below 1 credit per million tokens. +- :code:`ttft<0.5ic<15itl` - Now we have something similar to the first example, but we are using :code:`ic` as + an alias to :code:`input-cost`, and we have also added :code:`<15itl` to only consider endpoints + that have an Inter-Token-Latency of less than 15 ms. + +Depending on the specified threshold, there might be scenarios where no providers meet the criteria, +rendering the request unfulfillable. In such cases, the API response will be a 404 error with the corresponding +explanation. You can detect this and change your policy doing something like: + + +.. code-block:: python + + import os + from unify import Unify + + prompt = "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" + + # This won't work since no provider has this price! (yet?) + unify = Unify("mistral-7b-instruct-v0.2@lowest-itl<0.001ic") + + response = unify.generate(prompt) + + if response.status_code == 404: + # We'll get the cheapest endpoint as a fallback + payload["model"] = "mistral-7b-instruct-v0.2@lowest-input-cost" + response = unify.generate(prompt) + + +.. raw:: html + + + + + + + +
+
+ +
+
+
+ +
+
+
+ +
+
+ + +Using a custom router +--------------------- + +If you `trained a custom router `_, you can deploy it with the Unify API much like using any other endpoint. Assuming we want to deploy the custom router we trained before, we can use the configuration Id in the same API call code to send our prompts to our custom router as follows: + +.. code-block:: python + + import os + from unify import Unify + + # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. + unify = Unify("gpt-claude-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@custom”) + + response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") + +.. note:: + You can also query the API with a CuRL request, among others. Just like explained in the first request page. + +Round Up +-------- + +That’s it! You now know how to deploy a router to send your prompts to the best endpoints for the metrics or tasks you care about. You can now start optimizing your LLM applications! diff --git a/docs/home/make_your_first_request.rst b/docs/api/first_request.rst similarity index 53% rename from docs/home/make_your_first_request.rst rename to docs/api/first_request.rst index 041b312..3ac4a5b 100644 --- a/docs/home/make_your_first_request.rst +++ b/docs/api/first_request.rst @@ -1,52 +1,74 @@ -Make your First Request -======================= +Making your first request +========================= -To make a request, you will need a: +In this section, you will learn how to use the Unify API to query and route across LLM endpoints. If you haven't done so already, start by `Signing Up `_ to get your API key. -#. **Unify API Key**. If you don't have one yet, log in to the `console `_ to get yours. +Getting a key +------------- -#. **Model and Provider ID**. Used to identify an endpoint. You can find both in the `benchmark interface. `_ +When opening the console, you will first be greeted with the :code:`API` page. This is where you'll find your API key. There, you will also find useful links to our interfaces, where you can interact with the endpoints and the benchmarks, in no-code environments. -For this example, we'll use the :code:`llama-2-70b-chat` model, hosted on :code:`anyscale`. We grabbed both IDs from the corresponding `model page `_ +.. image:: ../images/console_api.png + :align: center + :width: 650 + :alt: Console API. + +.. note:: + If you suspect your API key was leaked in some way, you can safely regenerate it through this page. You would then only need to replace the old key with the new one in your workflows with the same balance and account settings as before. + +Finding a model and provider +---------------------------- + +To query an endpoint you will need to specify the model Id and provider Id, both used to identify the endpoint. You can find the Ids for a given model and provider through the model pages on the `benchmark interface. `_ + +Going through one of the pages, the model Id can be copied from the model name at the top, and the provider Id can be copied from the corresponding rows on the table. For e.g, the model page for **Mistral 7B V2** below shows that the model Id is :code:`mistral-7b-instruct-v0.2`. If you wanted to query the **Fireworks AI** endpoint you would then use :code:`fireworks-ai` as the provider name. + +.. image:: ../images/benchmarks_model_page.png + :align: center + :width: 650 + :alt: Benchmarks Model Page. + +.. note:: + If you `uploaded a custom endpoint `_ then you should be able to query it through the API using the name as the model Id and the provider name as the provider Id. + +Querying an endpoint +-------------------- Using the Python Package ------------------------------------- -The easiest way to query these endpoints is using the `unifyai `_ Python package. You can install it doing: +^^^^^^^^^^^^^^^^^^^^^^^^ +The easiest way to use the Unify API is through the `unifyai `_ Python package. You can install it by doing: .. code-block:: bash pip install unifyai -To use it in your script, import the package and initialize a :code:`Unify` client with your :code:`UNIFY API KEY`. -You can then query any endpoint through the :code:`.generate` method. -To specify the endpoint, you will need a :code:`model` and a :code:`provider`. +To use it in your script, import the package and initialize a :code:`Unify` client with your :code:`UNIFY API KEY`. You can then query any endpoint through the :code:`.generate` method. To specify the endpoint, you can use the model and provider Ids from above. .. code-block:: python import os from unify import Unify - unify = Unify( - api_key=os.environ.get("UNIFY_KEY"), - endpoint="llama-2-7b-chat@anyscale", - ) + # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. + unify = Unify("mistral-7b-instruct-v0.2@fireworks-ai") - response = unify.generate(user_prompt="Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") + response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") This will return a string containing the model's response. -The Python package supports both synchronous and asynchronous clients, as well as streaming responses. -Check out the `package repo `_ for more information! - +.. note:: + The Python package also lets you access the list of models and providers for a given model with a couple lines of code. You just need to run + :code:`unify.list_models()` to get a list of models and :code:`unify.list_providers("mistral-7b-instruct-v0.2")` to get the providers for a given model. +In addition, the Python package supports both synchronous and asynchronous clients, as well as streaming responses. Check out the `package repo `_ to learn more! Using the :code:`inference` Endpoint ------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -All models can be queried through the :code:`inference` endpoint, which requires a :code:`model`, :code:`provider`, and model :code:`arguments` that may vary across models. +All models can be queried through the :code:`inference` endpoint, which also requires a :code:`model` Id, :code:`provider` Id, and model :code:`arguments` that may vary across models. -In the header, you will need to include the **Unify API Key** that is associated with your account. +In the header, you will need to include your :code:`Unify API Key`. .. note:: Like any HTTP POST request, you can interact with the API using your preferred language! @@ -60,8 +82,8 @@ Using **cURL**, the request would look like this: -H "Authorization: Bearer YOUR_UNIFY_KEY" \ -H "Content-Type: application/json" \ -d '{ - "model": "llama-2-70b-chat", - "provider": "anyscale", + "model": "mistral-7b-instruct-v0.2", + "provider": "fireworks-ai", "arguments": { "messages": [{ "role": "user", @@ -85,8 +107,8 @@ If you are using **Python**, you can use the :code:`requests` library to query t } payload = { - "model": "llama-2-70b-chat", - "provider": "anyscale", + "model": "mistral-7b-instruct-v0.2", + "provider": "fireworks-ai", "arguments": { "messages": [{ "role": "user", @@ -109,18 +131,17 @@ If you are using **Python**, you can use the :code:`requests` library to query t else: print(response.text) -Check out the API reference `here. `_ to learn more. +Check out the `API reference `_ to learn more. Using the OpenAI API Format ---------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^ -We also support the OpenAI API format for :code:`text-generation` models. More specifically, the :code:`/chat/completions` endpoint. +We also support the OpenAI API format for :code:`text-generation` models. Specifically, the :code:`/chat/completions` endpoint. This API format wouldn't normally allow you to choose between providers for a given model. To bypass this limitation, the model name should have the format :code:`/@`. -For example, if :code:`john_doe` uploads a :code:`llama-2-70b-chat` model and we want to query the endpoint that has been deployed in replicate, we would have to use :code:`john_doe/llama-2-70b-chat@replicate` as the model id in the OpenAI API. In this case, there is no username, so we will -simply use :code:`llama-2-70b-chat@replicate`. +For example, if :code:`john_doe` uploads a :code:`mistral-7b-instruct-v0.2` model and we want to query the endpoint that has been deployed in :code:`fireworks-ai` replicate, we would have to use :code:`john_doe/mistral-7b-instruct-v0.2@fireworks-ai` as the model Id in the OpenAI API. In this case, there is no username, so we will simply use :code:`mistral-7b-instruct-v0.2@fireworks-ai`. This is again just an HTTP endpoint, so you can query it using any language or tool. For example, **cURL**: @@ -132,7 +153,7 @@ This is again just an HTTP endpoint, so you can query it using any language or t -H 'Authorization: Bearer YOUR_UNIFY_KEY' \ -H 'Content-Type: application/json' \ -d '{ - "model": "llama-2-70b-chat@anyscale", + "model": "mistral-7b-instruct-v0.2@fireworks-ai", "messages": [{ "role": "user", "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" @@ -152,52 +173,7 @@ Or **Python**: } payload = { - "model": "llama-2-70b-chat@anyscale", - "messages": [ - { - "role": "user", - "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" - }], - "stream": True - } - - response = requests.post(url, json=payload, headers=headers, stream=True) - - print(response.status_code) - - if response.status_code == 200: - for chunk in response.iter_content(chunk_size=1024): - if chunk: - print(chunk.decode("utf-8")) - else: - print(response.text) - -The docs for this endpoint are available `here. `_ - -Runtime Dynamic Routing ------------------------ - -When making requests, you can also leverage the information from the `benchmarks `_ -to automatically route to the best performing provider for the metric you choose. - -Benchmark values change over time, so dynamically routing ensures you always get the best option without having to monitor the data yourself. - -To use the router, you only need to change the provier name to one of the supported configurations, including :code:`lowest-input-cost`, :code:`highest-tks-per-sec` or :code:`lowest-ttft`. You can check out the full list `here `_. - -If you are using the :code:`chat/completions` endpoint, this will look like: - -.. code-block:: python - :emphasize-lines: 9 - - import requests - - url = "https://api.unify.ai/v0/chat/completions" - headers = { - "Authorization": "Bearer YOUR_UNIFY_KEY", - } - - payload = { - "model": "llama-2-70b-chat@lowest-input-cost", + "model": "mistral-7b-instruct-v0.2@fireworks-ai", "messages": [ { "role": "user", @@ -217,11 +193,10 @@ If you are using the :code:`chat/completions` endpoint, this will look like: else: print(response.text) -You can learn more about about dynamic routing in the corresponding `page of the docs `_. +The docs for this endpoint are available `here. ` ) stream = client.chat.completions.create( - model="llama-2-70b-chat@anyscale", + model="mistral-7b-instruct-v0.2@fireworks-ai", messages=[{"role": "user", "content": "Can you say that this is a test? Use some words to showcase the streaming function"}], stream=True, ) @@ -263,8 +238,13 @@ Let's take a look at this code snippet: interpreter.offline = True interpreter.llm.api_key = "YOUR_UNIFY_KEY" interpreter.llm.api_base = "https://api.unify.ai/v0/" - interpreter.llm.model = "openai/llama-2-70b-chat@anyscale" + interpreter.llm.model = "openai/mistral-7b-instruct-v0.2@fireworks-ai" interpreter.chat() In this case, in order to use the :code:`/chat/completions` format, we simply need to set the model as :code:`openai/`! + +Round Up +-------- + +You now know how to query LLM endpoints through the Unify API. In the next section, you will learn how to use the API to route across endpoints. diff --git a/docs/reference/images.rst b/docs/api/images.rst similarity index 100% rename from docs/reference/images.rst rename to docs/api/images.rst diff --git a/docs/reference/endpoints.rst b/docs/api/reference.rst similarity index 96% rename from docs/reference/endpoints.rst rename to docs/api/reference.rst index ad5192e..b4d395d 100644 --- a/docs/reference/endpoints.rst +++ b/docs/api/reference.rst @@ -1,17 +1,16 @@ -Endpoints -========= +API Reference +============= Welcome to the Endpoints API reference! This page is your go-to resource when it comes to learning about the different Unify API endpoints you can interact with. .. note:: - To use the endpoints you will need an API Key. If you don't have one yet, you can go through the instructions in - `this page `_. + If you don't have one yet, `Sign Up `_ first to get your API key. ----- GET /get_credits ------------ +---------------- **Get Current Credit Balance** diff --git a/docs/concepts/benchmarks.rst b/docs/concepts/benchmarks.rst index 946aae1..50c4ed5 100644 --- a/docs/concepts/benchmarks.rst +++ b/docs/concepts/benchmarks.rst @@ -1,16 +1,73 @@ Benchmarks ========== -One of the fundamental missions of Unify is to help you navigate through the maze of LLM deployment options and find the best solution for your needs. Because this is a complex decision, it needs to be made based on data. For this data to be reliable, it should also result from transparent and objective measurements, which we outline in this section. +In this section, we explain our process for benchmarking LLM endpoints. We discuss quality and runtime benchmarks separately. + +Quality Benchmarks +------------------ + +Finding the best LLM(s) for a given application can be challenging. The performance of a model can vary significantly depending on the task, dataset, and evaluation metrics used. Existing benchmarks attempt to compare models based on standardized approaches, but biases inevitably creep in as models learn to do well on these targeted assessments. + +Practically, the LLM community still heavily relies on testing models manually to build an intuition around their expected behavior for a given use-case. While this generally works better, hand-crafted testing isn't sustainable as one's needs evolve and new LLMs emerge at a rapid pace. +Our LLM assessment pipeline is based on the method outlined below. + +Design Principles +^^^^^^^^^^^^^^^^^ + +Our quality benchmarks are based on a set of guiding principles. Specifically, we strive to make our pipeline: + +- **Systematized:** A rigorous benchmarking pipeline should be standardized across assessments, repeatable, and scalable. We make sure to benchmark all LLMs identically to with a well-defined approach we outline in the next passage. + +- **Task-centric:** Models perform differently on various tasks. Some might do better at coding, others are well suited for summarizing content, etc. These broad task categories can also be refined into specific subtasks. For e.g summarizing technical content to generate product documentation is radically different from summarizing news. This should be reflected in assessments. For this reason, we allow you to upload your custom prompt dataset, that you believe reflects the intended task, to use as a reference for running benchmarks. + +- **Customizable:** Assessments should reflect the unique needs of the assessor. Depending on your application requirements, you may need to strictly include / exclude some models from the benchmarks. We try to strike a balance between standardization and modularity such that you can run the benchmarks that are relevant to your needs. + +Methodology +^^^^^^^^^^^ + +Overview +******** +We benchmark models using the LLM-as-a-judge approach. This relies on using a powerful language model to generate assessments on the outputs of other models, using a standard reviewing procedure. LLM-as-a-judge is sometimes used to run experiments at scale when generating human assessments isn't an option or to avoid introducing human biases. + +Given a dataset of user prompts, each prompt is sent to all endpoints to generate an output. Then, we ask GPT-4 to review each output and give a final assessment based on how helpful and accurate the response is relative to either (a) the user prompt, in the case of unlabelled datasets, or (b) the prompt and the reference answer, in the case of labelled datasets. + +Scoring +******* + +The assessor LLM reviews the output of an endpoint which it categorizes as :code:`irrelevant`, :code:`bad`, :code:`satisfactory`, :code:`very good`, or :code:`excellent`. Each of these labels is then mapped to a numeric score ranging from 0.0 to 1.0. We repeat the same proces for all prompts in the dataset to get the endpoint's performance score on each prompt. The overall endpoint's score is then the average of these prompt-specific scores. + +Visualizing Results +******************* + +In addition to the list of model scores, we also compute runtime performance for the endpoint (as explained in the section below). Doing so allows us to plot the quality performance versus runtime to assess the quality-to-performance of the endpoints, instead of relying on the quality scores alone. + +.. image:: ../images/console_dashboard.png + :align: center + :width: 650 + :alt: Console Dashboard. .. note:: - Our benchmarking code is openly available in `this repository `_. + Because quality scores are model-specific, they are the same across the different endpoints exposed for a given model. As a result, all the endpoints for a model will plot horizontally at the same quality level, with only the runtime metric setting them apart. + +Considerations and Limitations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Despite having a well-defined benchmarking approach, it also inevitably comes with its own issues. Using an LLM to judge outputs may introduce a different kind of bias through the data used to train the assessor model. We are currently looking at ways to mitigate this with more diversified and / or customized judge LLM selection. +Runtime Benchmarks +------------------ + +Finding the best model(s) for a task is just the first step to optimize LLM pipelines. Given the plethora of endpoint providers offering the same models, true optimization requires considering performance discrepancies across endpoints and time. + +Because this is a complex decision, it needs to be made based on data. For this data to be reliable, it should also result from transparent and objective measurements, which we outline in this below. + +.. note:: + Our benchmarking code is openly available in `this repository `_. Design Principles ------------------ +^^^^^^^^^^^^^^^^^ -Our benchmarks are based on a set of guiding principles. Specifically, we believe benchmarks should be: +Our runtime benchmarks are based on a set of guiding principles. Specifically, we believe benchmarks should be: - **Community-driven:** We invite everyone to audit or improve the logic and the code. We are building these benchmarks for the community, so contributions and discussions around them are more than welcome! @@ -20,15 +77,15 @@ Our benchmarks are based on a set of guiding principles. Specifically, we believ Methodology ------------ +^^^^^^^^^^^ Tokenizer -^^^^^^^^^ +********* To avoid biases towards any model-specific tokenizer, we calculate all metrics using the same tokenizer across different models. We have chosen the `cl100k_base` tokenizer from OpenAI's `tiktoken `_ library for this since it’s MIT licensed and already widely adopted by the community. Inputs and Outputs -^^^^^^^^^^^^^^^^^^ +****************** To fairly assess optimizations such as speculative decoding, we use real text as the input and avoid using randomly generated data. The length of the input affects prefill time and therefore can affect the responsiveness of the system. To account for this, we run the benchmark with two input regimes. @@ -42,7 +99,7 @@ For the outputs, we use randomized discrete values from the same distributions ( When running one benchmark across different endpoints, we seed each runner with the same initial value, so that the inputs are the same for all endpoints. Computation -^^^^^^^^^^^ +*********** To execute the benchmarks, we run three processes periodically from three different regions: **Hong Kong, Belgium and Iowa**. Each one of these processes is triggered every three hours and benchmarks every available endpoint. @@ -50,7 +107,7 @@ Accounting for the different input policies, we run a total of 4 benchmarks for Metrics -^^^^^^^ +******* Several key metrics are captured and calculated during the benchmarking process: @@ -67,25 +124,33 @@ Several key metrics are captured and calculated during the benchmarking process: - **Cost**: Last but not least, we present information about the cost of querying the model. This is usually different for the input tokens and the response tokens, so it can be beneficial to choose different models depending on the end task. As an example, to summarize a document, a provider with lower price in the input tokens would be better, even if it comes with a slightly higher price in the output. On the other hand, if you want to generate long-format content, a provider with a lower price per generated token will be the most appropriate option. Data Presentation -^^^^^^^^^^^^^^^^^ +***************** When aggregating metrics, particularly in benchmark regimes with multiple concurrent requests, we calculate and present the P90 (90th percentile) value from the set of measurements. We choose the P90 to reduce the influence of extreme values and provide a reliable snapshot of the model's performance. When applicable, aggregated data is shown both in the plots and the benchmark tables. -Additionally, we also include a MA5 view (Moving Average of the last 5 measurements) in the graphs. This smoothing technique helps mitigate short-term fluctuations and should provide a clearer trend representation over time. +.. image:: ../images/benchmarks_model_page.png + :align: center + :width: 650 + :alt: Benchmarks Model Page. -Not computed / No metrics are available yet -******************************************* +Additionally, we also include a MA5 view (Moving Average of the last 5 measurements) in the graphs. This smoothing technique helps mitigate short-term fluctuations and should provide a clearer trend representation over time. -In some cases, you will find :code:`Not computed` instead of a value, or even a :code:`No metrics are available yet` message instead of the benchmark data. This basically means that we don't have valid data to show you. Most of the time, this means we have hit a rate limit or there is an internal issue. We try to stay on top of these messages and we are probably working on (1) getting our quotas increased for the specific endpoint/provider or (2) fixing the problem. We'll try to get you the data ASAP! +.. note:: + In some cases, you will find :code:`Not computed` instead of a value, or even a :code:`No metrics are available yet` message instead of the benchmark data. This is typically due to an internal issue or a rate limit, which we'll be quickly fixing. Considerations and Limitations ------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We try to tackle some of the more significant limitations of benchmarking inference endpoints. For example, network latency, by running the benchmarks in different regions; or unreliable point-measurements, by continuously benchmarking the endpoints and plotting their trends over time. However, there are still some relevant considerations to have in mind. Our methodology at the moment is solely focused on performance, which means that we don't look at the output of the models. Nonetheless, even accounting for the public-facing nature of these endpoints (no gibberish allowed!), there might be some implementation differences that affect the output quality, such as quantization/compression of the models, different context window sizes, or different speculative decoding models, among others. We are working towards mitigating this as well, so stay tuned! + +Round Up +-------- + +You are now familiar with how we run our benchmarks. Next, you can explore how to `use the benchmarks, or run your own `_ through the benchmarks interface! diff --git a/docs/concepts/endpoints.rst b/docs/concepts/endpoints.rst index 74e1638..6f9f167 100644 --- a/docs/concepts/endpoints.rst +++ b/docs/concepts/endpoints.rst @@ -24,11 +24,10 @@ You can explore our list of supported models through the `benchmarks interface < .. If you prefer programmatic access, you can also use the - `List Models Endpoint `_ in our API to obtain a list of models. + `List Models Endpoint `_, we discussed how different models perform better at different tasks, and how appropriate performance benchmarks can help steer and inform model selection for a given use-case. + +Given the diversity of prompts you can send to an LLM, it can quickly become tedious to manually swap between models for every single prompt, even when they pertain to the same broad category of tasks. + +Motivated by this, LLM routing aims to make optimal model selection automatic. With a router, each prompt is assessed individually and sent to the best model, without having to tweak the LLM pipeline. +With routing, you can focus on prompting and ensure that the best model is always on the receiving end! + +Quality routing +--------------- + +By routing to the best LLM on every prompt, the objective is to consistently achieve better outputs than using a single, all-purpose, powerful mode, at a fraction of the cost. The idea is that smaller models can be leveraged for some simpler tasks, only using larger models to handle complex queries. + +Using several datasets to benchmark the router (star-shaped datapoints) reveals that it can perform better than individual endpoints on average, without compromising on other metrics like runtime performance for e.g, as illustrated below. + +.. image:: ../images/console_dashboard.png + :align: center + :width: 650 + :alt: Console Dashboard. + +You may notice that there are more than one star-shaped datapoints on the plot. This is because the *Router* can actually take all sorts of configurations, depending on the specified constraints in terms which endpoints can be routed to, the minimum acceptable performance level for a given metric, etc. As a result, a virtually infinite number of routers can be constructed by changing these parameters, allowing you to customize the routing depending on your requirements! + +Runtime routing +--------------- + +When querying endpoints, other metrics beyond quality can be critical depending on the use-case. For e.g, cost may be important when prototyping an application, latency when building a bot where responsiveness is key, or output tokens per second if we want to generate responses as fast as possible. + +However, endpoint providers are inherently transient (You can read more about this `here `_), which means they are affected by factors like traffic, available devices, changes in the software or hardware stack, and so on. + +Ultimately, this results in a landscape where it's usually not possible to conclude that one provider is *the best*. Let's take a look at this graph from our benchmarks. + +.. image:: ../images/mixtral-providers.png + :align: center + :width: 650 + :alt: Mixtral providers. + +In this image we can see the :code:`output tokens per second` of different providers hosting a :code:`Mixtral-8x7b` public endpoint. We can see how depending on the time of the day, the *best* provider changes. + +With runtime routing, your requests are automatically redirected to the provider outperforming the other services at that very moment. This ensures the best possible value for a given metric across endpoints. + +.. image:: ../images/mixtral-router.png + :align: center + :width: 650 + :alt: Mixtral performance routing. + +Round Up +-------- + +You are now familiar with routing. Next, you can `learn to use the router `_, or `build your custom router `_. diff --git a/docs/concepts/runtime_routing.rst b/docs/concepts/runtime_routing.rst deleted file mode 100644 index b068cd7..0000000 --- a/docs/concepts/runtime_routing.rst +++ /dev/null @@ -1,119 +0,0 @@ -Dynamic Routing -=============== - -.. raw:: html - - - -You can see more `Videos`_ at the bottom of this page! - -Introduction ------------- - -When querying models, we usually care for one metric over the rest. This can be cost if prototyping an application, TTFT if building a bot where responsiveness is key, or output tokens per second if we want to generate responses as fast as possible. Being able to compare these metrics among providers mitigates this issue (and that's why we run our `benchmarks! `_). - -However, these providers are inherently transient (You can read more about this `here `_), which means that they are affected by things like traffic, available devices, changes in the software or hardware stack, and so on. - -Ultimately, this results in a landscape where it's usually not possible to conclude that one provider is *the best*. - -Let's take a look at this graph from our benchmarks. - -.. image:: ../images/mixtral-providers.png - :align: center - :width: 650 - :alt: Mixtral providers. - -In this image we can see the **output tokens per second** of different providers hosting a Mixtral-8x7b public endpoint. We can see how depending on the time of the day, the "best" provider changes. - -When you use runtime dynamic routing, we automatically redirect your request to the provider that is outperforming the other services at that very moment! You don't need to do anything else ⬇️ - -.. image:: ../images/mixtral-router.png - :align: center - :width: 650 - :alt: Mixtral performance routing. - -How to route ------------- - -You can quickly try the routing yourself with `this `_ -example. Spoiler: All you need to do is replacing the provider in your query with one of the available routing modes! - -Available Modes -^^^^^^^^^^^^^^^ - -Currently, we support a set of predefined configurations for the routing: - -- :code:`lowest-input-cost` / :code:`input-cost` -- :code:`lowest-output-cost` / :code:`output-cost` -- :code:`lowest-itl` / :code:`itl` -- :code:`lowest-ttft` / :code:`ttft` -- :code:`highest-tks-per-sec` / :code:`tks-per-sec` - -For example, you can query the endpoint :code:`llama-2-7b-chat@itl` to get the provider with the -lowest Inter-Token-Latency. - -Thresholds -^^^^^^^^^^ - -Additionally, you have the option to include multiple thresholds for other metrics in each configuration. - -This feature enables you to get, for example, the highest tokens per second (:code:`highest-tks-per-sec`) for any provider whose :code:`ttft` is lower than a specific threshold. To set this up, just append :code:`<[float][metric]` to your preferred mode when specifying a provider. To keep things simple, we have added aliases for :code:`output-cost` (:code:`oc`), :code:`input-cost` (:code:`ic`) and :code:`output-tks-per-sec` (:code:`ots`). - -Let's illustrate this with some examples: - -- :code:`lowest-itl<0.5input-cost` - In this case, the request will be routed to the provider with the lowest - Inter-Token-Latency that has an Input Cost smaller than 0.5 credits per million tokens. -- :code:`highest-tks-per-sec<1output-cost` - Likewise, in this scenario, the request will be directed to the provider - offering the highest Output Tokens per Second, provided their cost is below 1 credit per million tokens. -- :code:`ttft<0.5ic<15itl` - Now we have something similar to the first example, but we are using :code:`ic` as - an alias to :code:`input-cost`, and we have also added :code:`<15itl` to only consider endpoints - that have an Inter-Token-Latency of less than 15 ms. - -Depending on the specified threshold, there might be scenarios where no providers meet the criteria, -rendering the request unfulfillable. In such cases, the API response will be a 404 error with the corresponding -explanation. You can detect this and change your policy doing something like: - -.. code-block:: python - :emphasize-lines: 9, 10 - - import requests - - url = "https://api.unify.ai/v0/chat/completions" - headers = { - "Authorization": "Bearer YOUR_UNIFY_KEY", - } - - payload = { - # This won't work since no provider has this price! (yet?) - "model": "llama-2-70b-chat@lowest-itl<0.001ic", - "messages": [{ - "role": "user", - "content": "Hello!" - }], - } - - response = requests.post(url, json=payload, headers=headers) - if response.status_code == 404: - # We'll get the cheapest endpoint as a fallback - payload["model"] = "llama-2-70b-chat@lowest-input-cost" - response = requests.post(url, json=payload, headers=headers) - - -That's about it! We will be making these modes more flexible in the coming weeks, allowing you to -define more specific and fine-grained rules 🔎 - -Videos -------- - -.. raw:: html - - - - + Getting Started --------------- -To start querying our endpoints, generate a Unify API key by :code:`Signing Up` through the `console `_ . Grab your key through the dashboard, and let's see how you can `make your first request! `_ +We recommend you give the concepts section a quick read to get familiar with routing and benchmarking. Once you're ready to go, start by `Signing In `_. From this point you can either learn how to: + +* **Use our interfaces (Recommended)**: The interfaces guides explain how to interact with endpoints and deploy your custom router in a no-code environment. -For a deeper Walkthrough on how you can use the Unify console, check out the `walkthrough `_ section. +* **Make your first request**: The `quickstart `_ guide explains how to start querying endpoints with our API, -.. note:: - If you encounter any issue, have any suggestion, feature you'd like to see, or anything you think can be improved please get in touch with us on - `Discord `_ or via :code:`hub@unify.ai` \ No newline at end of file +.. warning:: + Throughout the guides, you'll notice some sections marked as (Beta). **Any section marked as Beta is currently not available** and only illustrate planned features we are currently working on. We're constantly iterating on our roadmap so if you'd like to leave some feedback or suggestion on features you'd like to see, `we'd love to discuss `_ this with you! \ No newline at end of file diff --git a/docs/home/pricing.rst b/docs/home/pricing.rst deleted file mode 100644 index 61a0225..0000000 --- a/docs/home/pricing.rst +++ /dev/null @@ -1,24 +0,0 @@ -Pricing and Credits -=================== - -Credits are consumed when using the API. Each credit corresponds to 1 USD and there are **no charges on top of provider costs**; as a result, consumed credits directly reflect the cost of a request. - -We’re currently integrating a payment system to purchase additional credits. Meanwhile, we’re granting each user the equivalent to $50 in free credits when creating an account. - -You will soon be able to check this out properly in a dashboard. In the meantime, you can query the `Get Credits Endpoint `_ of the API to get your current credit balance. - -Top-up Code ------------ - -You may have received a code to increase your number credits, if that's the case, you can -activate it doing a request to this endpoint: - -.. code-block:: bash - - curl -X 'POST' \ - 'https://api.unify.ai/v0/promo?code=' \ - -H 'accept: application/json' \ - -H 'Authorization: Bearer ' - -Simply replace :code:`` with your top up code and :code:`` with your API Key and -do the request 🚀 diff --git a/docs/home/walkthrough.rst b/docs/home/walkthrough.rst deleted file mode 100644 index 33d570e..0000000 --- a/docs/home/walkthrough.rst +++ /dev/null @@ -1,377 +0,0 @@ -Unify Walkthrough -================= - -This section will guide you through the different aspects of the :code:`Unify Console`, and help you get up and running in no time. Feel free to skip any sections which aren't relevant to you. - -Creating your profile ---------------------- - -Depending on the sign-up method you chose, some of the entries in the :code:`Profile` sections will already be populated. Regardless, you can use this page to change your email address, add your personal information, and sign out in case you’d like to use another account. - -.. image:: ../images/console_profile.png - :align: center - :width: 650 - :alt: Console Profile. - -Using the Unify API -------------------- - -When opening the console, you will first be greeted with the :code:`API` page. This is where you'll find your API key which gives you access to all the LLMs available through Unify, as well as the router and the benchmarking API. - -You will also find useful links to the documentation and various applications, where you can interact with the endpoints and the benchmarks, in no-code environments. - -.. image:: ../images/console_api.png - :align: center - :width: 650 - :alt: Console API. - -.. note:: - If you suspect your API key was leaked in some way, you can safely regenerate it through this page. You would then only need to replace the old key with the new one in your workflows with the same balance and account settings as before. - -Setting up Billing ------------------- - -The :code:`Billing` page is where you can set-up your payment information to recharge your account, and track your spending. By default, you can only top-up manually by clicking on **Buy Credits**. We recommend you set-up automatic refill to avoid any disruption to your workflows when your credits run out. - -.. image:: ../images/console_billing_no_payment.png - :align: center - :width: 650 - :alt: Console Billing No Payment. - -As specified on the page, activating automatic refill requires you go through the dedicated :code:`Billing Portal`. Going to the :code:`Billing Portal` shows you the screen below where you can add your preferred payment method, update your billing information, and download your invoices. - -.. image:: ../images/console_portal_welcome.png - :align: center - :width: 650 - :alt: Console Portal Welcome. - -Clicking on **Add payment method** then lets you introduce your card information. - -.. image:: ../images/console_portal_setup.png - :align: center - :width: 650 - :alt: Console Portal Setup. - -With your payment information set-up, you can now toggle automatic refill on and off as needed on the main billing page. The automatic refill lets you specify the cut-off amount at which your account is automatically refilled by the specified amount when it reaches it. - -.. image:: ../images/console_billing_payment.png - :align: center - :width: 650 - :alt: Console Billing Payment. - - -.. warning:: - The sections below illustrate some of the planned features we're currently working on. All visuals, metrics and functionalities outlined are only presented for illustrative purposes. As such, **any section marked as Beta is currently not available**. We're constantly iterating on our roadmap so if you'd like to leave some feedback or suggestion on features you'd like to see, `we'd love to discuss `_ this with you! - - -Adding custom endpoints (Beta) ------------------------------ - -Prerequisite -^^^^^^^^^^^^ -Firstly, you’ll need to set up your own LLM endpoints. One option is to use off-the-shelf endpoints, such as those available in the `Azure ML Model Catalog `_, `Vertex AI Model Garden `_ and `AWS Bedrock `_. - -Alternatively, you can create and host your own LLM endpoint. There are a whole variety of ways to do this, but again it’s most common to do so via one of the major cloud providers. Feel free to check out these tutorials for creating custom LLM endpoints on `Azure ML `_, `AWS Bedrock `_ and `Vertex AI `_. - -Regardless of how you set up your LLM endpoints, it’s important that you expose an API for this, and that this API **adheres to** the `OpenAI standard `_. This is necessary in order to integrate with Unify. We intended to broaden support to other API formats at a later date. - -Adding the endpoints -^^^^^^^^^^^^^^^^^^^^ - -Once you’ve got your custom LLM endpoints set up, the next step is to add these to the :code:`Endpoints` section of your :code:`Unify Console`. - -.. image:: ../images/console_custom_endpoints.png - :align: center - :width: 650 - :alt: Console Custom Endpoints. - -That’s it, you now know how to add your own custom keys and custom endpoints! You can query your custom endpoint via the Unify API by specifying **@custom**, with *name* being the model name and *custom* being the provider. - -Adding custom datasets (Beta) ------------------------------ - -To add a custom dataset, first head to the :code:`Datasets` section of the console. Then, specify a local file to upload, containing the prompts you would like to benchmark on. Then, click the **Add Dataset** button. - -.. image:: ../images/console_datasets_start.png - :align: center - :width: 650 - :alt: Console Dataset Start. - -The resulting screen lets you specify the local :code:`.jsonl` file to upload, based on the specified format. - -.. image:: ../images/console_datasets_add.png - :align: center - :width: 650 - :alt: Console Dataset Add. - -Once your dataset is uploaded, you can click on the dataset and view it in the preview section. - -.. image:: ../images/console_datasets_preview.png - :align: center - :width: 650 - :alt: Console Dataset Preview. - -.. note:: - Datasets do not contain train, validation and test splits internally. If you would like to upload training, validation and test splits for a dataset, then these should each be uploaded and named independently. In the future, we plan to enable grouping datasets together and creating folder structures etc. - -That’s it, you now know how to add your own custom keys and custom endpoints! - -Running benchmarks on your custom datasets (Beta) -------------------------------------------------- - -To open up the benchmarking interface, first click on :code:`Benchmarks`. - -Runtime Benchmarks -^^^^^^^^^^^^^^^^^^ -In the benchmarks page, you can see all of the current and previous benchmark jobs you triggered, and you can also specify which endpoints you would like to include for runtime benchmarking. - -If you have various private endpoints deployed across various servers, each with varying latencies, it can be useful to track these speeds across time, to ensure you’re always sending your requests to the fastest servers. - -To trigger periodic runtime benchmarking for a custom endpoint, simply add it to the list under the heading **Runtime Benchmarks**. You also need to specify at least one IP address from where you would like to test this endpoint, and also at least one prompt dataset against which you would like to perform the benchmarking. - -.. image:: ../images/console_runtime_benchmarks.png - :align: center - :width: 650 - :alt: Console Runtime Benchmarks. - -Once all endpoints are added, you can then go to the `Benchmarks _` page, and you’ll find your model listed. Note the **lock icon**, which indicates that this benchmark is private (only accessible from your own account). - -.. image:: ../images/custom_benchmarks.png - :align: center - :width: 650 - :alt: Custom Benchmarks. - -You can open up the benchmark page like any other endpoint, and view the performance for various metrics plotted across time. - -.. image:: ../images/custom_benchmarks_model.png - :align: center - :width: 650 - :alt: Custom Benchmarks Model. - -That’s it! You now know how to set up periodic benchmarking for your custom endpoints. If you have several versions of the same model, you can use options such as :code:`lowest-itl`, explained `here `_, to route to the faster deployment based on the latest benchmarking data. - -We’ll next explore how to run quality benchmarks. - -Quality benchmarks -^^^^^^^^^^^^^^^^^^ - -This time, going to to the **Quality Benchmarks** subsection. We can click on **SUBMIT JOB** to trigger a new quality benchmark run. - -You need to specify the endpoints and the datasets you would like to benchmark. All endpoints will be tested on all datasets. If you only want to test some endpoints on some datasets, then you should submit multiple jobs. - -.. image:: ../images/console_quality_benchmarks.png - :align: center - :width: 650 - :alt: Console Quality Benchmarks. - -Once you are happy with the selection, press **Submit** and then the job will appear in the **Running Jobs** section, as shown below. - -.. image:: ../images/console_benchmarks_quality_submitted.png - :align: center - :width: 650 - :alt: Console Benchmarks Quality Submitted. - -The job can be expanded, to see each endpoint and dataset pair, and check the progress. - -.. image:: ../images/console_benchmarks_quality_jobs.png - :align: center - :width: 650 - :alt: Console Benchmarks Quality Quality Jobs. - -The entire history of benchmarking jobs can also be viewed by clicking on **History**, like so. - -.. image:: ../images/console_benchmarks_quality_history.png - :align: center - :width: 650 - :alt: Console Benchmarks Quality History. - -That’s it, you now know how to submit quality benchmarking jobs! In the next section, we’ll explain how to visualize these benchmarking results. - -Visualize Benchmark Results (Beta) ----------------------------------- - -Once the benchmarking is complete, we can then visualize the benchmarking results in the dashboard. First, click :code:`Dashboard` on the left hand pane. - -By default, all endpoints will be plotted on the :code:`Open Hermes` dataset, and the default foundation router will also be plotted, with various configurations of this router plotted as stars. - -.. image:: ../images/console_dashboard.png - :align: center - :width: 650 - :alt: Console Dashboard. - -On the dataset dropdown at the top, you can select any dataset of prompts to benchmark each model and provider against in the graph. - -.. image:: ../images/console_dashboard_dataset.png - :align: center - :width: 650 - :alt: Console Dashboard Dataset. - -When clicked, the scatter graph will be replotted, on your own custom prompts in your dataset. If no quality benchmarks have been run, then the scatter graph will be empty. In this case, let's plot the benchmarks for the custom dataset **Customer Calls 1**. - -.. image:: ../images/console_dashboard_custom_dataset.png - :align: center - :width: 650 - :alt: Console Dashboard Custom Dataset. - -We can see that the custom endpoints :code:`mixtral-tuned-finances`, :code:`llama-3-tuned-calls1` and :code:`llama-3-tuned-calls2` are plotted, alongside the foundation router, which is always plotted by default. If there is a model not plotted, but you would like it to be, then you can simply head over to the :code:`Benchmarks` page and trigger a quality benchmark job. Once the job completes, the model will then be visible in this dashboard. - -Let’s return to the default view of all models and providers plotted on the :code:`Open Hermes` dataset. We can change the metric plotted on the x axis from cost to something else, by clicking **Metric**. This will let us plot the score against time-to-first-token (TTFT) for e.g. - -.. image:: ../images/console_dashboard_metric.png - :align: center - :width: 650 - :alt: Console Dashboard Metric. - -You can remove any of these points by simply clicking on the model names on the key to the right of the graph. That model will then be removed from the graph, and the router points will be updated. - -.. image:: ../images/console_dashboard_filtered.png - :align: center - :width: 650 - :alt: Console Dashboard Filtered. - -That’s it! You now know how to visualize benchmark results across different models and providers on different datasets, including your own custom endpoints on your own custom datasets. - -Train a custom router (Beta) ----------------------------- - -Going to the :code:`Routers` page, we can click on any router and see the models it was trained to route between and the datasets used for training, both on the right hand side. - -.. image:: ../images/console_routers_start.png - :align: center - :width: 650 - :alt: Console Routers. - -To add a new router, first click **Add Router**. The upload window enables you to name the router, and specify the endpoints to route between and datasets to train on. -We'll name the router :code:`gpt4-llama3-calls`, as we intend to train on our custom call datasets and use GPT4 as well as the base llama3 and our fine tuned variants. - -The models are those which the router will be able to select between, and the datasets will be used as the input prompts to the router system to train which models to use, based on the quality of the output, with GPT4-as-a-judge responsible for the scoring. You can select the included models and / or datasetsfrom the corresponding dropdowns. Your custom model endpoints and datasets are included in the lists. - -.. image:: ../images/console_routers_train.png - :align: center - :width: 650 - :alt: Console Routers Train. - -.. note:: - You can notice that the endpoint providers are not listed. This is because the router training does not depend on the provider, only the model. - -Finally, clicking the **Train** button will submit a training job! Your router configuration will be grayed out while the training is being performed. While the benchmarks are being performed. - -.. image:: ../images/console_routers_benchmarking.png - :align: center - :width: 650 - :alt: Console Routers Benchmarking. - -In order to train a router, it’s necessary to first evaluate the performance of each model on each prompt in each dataset. This is exactly what happens when we submit quality benchmarks as explained in the quality benchmarking section above. - -If you go to the **Benchmarks** page, you’ll see that the router training job has automatically scheduled some quality benchmarks on your behalf. For any quality benchmarks which have already been performed ahead of time, the work will not be duplicated. - -For example, we previously benchmarked :code:`llama-3-tuned-calls1` and :code:`llama-3-tuned-call2` on the datasets :code:`customer-calls1` and :code:`customer-calls2`, so this will not be repeated (see above). However, we have not yet benchmarked :code:`llama-3-70b-chat` and :code:`gpt-4` on these datasets, so these are automatically triggered by the router training request. - -.. note:: - You will receive an email, so no need to manually track the progress! - -With the benchmarks done, the router training is then triggered, and the status of your router is updated accordingly. - -.. image:: ../images/console_routers_training.png - :align: center - :width: 650 - :alt: Console Routers Training. - -Once the router training is complete, you will receive a second email. The router is ready to be deployed, and is ready to be visualized on the dashboard tab for selecting the best configuration of the router (see next section). - -That’s it, you’ve now trained your own custom router, to route between your own custom models, trained directly on your own prompt data, to reflect the task you care about! - -Deploy your custom router (Beta) -------------------------------- - -Now that we have a custom trained router, the next step is to explore the various possible configurations for this router, each trading off quality, speed and cost in different variations. These various options can be visualized in the **Dashboard**. - -As before, we first choose the dataset to benchmark on. We will choose the custom dataset **Customer Calls 1**. - -After selecting the dataset, all data points which have been benchmarked on this dataset will automatically be plotted, which now also includes the custom trained router. - -.. image:: ../images/console_dashboard_custom_router_plotting.png - :align: center - :width: 650 - :alt: Console Dashboard Custom Router Plotting. - -The base router and all custom routers can also be further configured, by clicking on **Router**, and then clicking on the router which you’d like to customize. - -.. image:: ../images/console_dashboard_router.png - :align: center - :width: 650 - :alt: Console Dashboard Router. - -Then, the following window appears, from where a router view can be created. A router view takes a router and constrains the search space in some way. This can be useful if you only have access to certain models or providers in the deployment environment, or if you want to ensure each model routed to is guaranteed to meet certain quality and performance requirements. - -.. image:: ../images/console_dashboard_custom_view.png - :align: center - :width: 650 - :alt: Console Dashboard Custom View. - -Of course, only the models the router has been trained on will be visible in the dropdown. However, you can remove some of these models from the search space. Let's presume we don’t want to use anthropic models, as we don’t have them properly configured to run in our deployment environment yet. - -We don’t want to save the router view to our account, we’re only testing at the moment. We therefore click **Apply**. - -.. note:: - Alternatively, if we had clicked **Save** then it would have simply overwritten the router :code:`openai-llama3-calls` in place, again limited to this dashboard session only. - -In the key, the router view is displayed nested underneath the router which it is a view of. - -.. image:: ../images/console_dashboard_no_anthropic.png - :align: center - :width: 650 - :alt: Console Dashboard No Anthropic. - -We can see that removing the anthropic models slightly reduced the performance of the router, but not by a noticeable amount. Let’s assume we decide to stick with this decision, to avoid the need to set up Anthropic in our deployment environment in the immediate future. - -The next task is to choose the data point which best balances quality, speed and cost for our application. If a point is selected, its details will appear below the legend. Details include the full id of the configuration, as well as the endpoints it routes to with the routing frequency per endpoint. - -.. image:: ../images/console_dashboard_custom_selected.png - :align: center - :width: 650 - :alt: Console Dashboard Custom Selected. - -We select a data point that looks balanced. We can see that this router configuration makes use of :code:`gpt4` 42% of the time, :code:`llama-3-tuned-calls1` 29% of the time, :code:`llama-3-tuned-calls2` 18% of the time, and :code:`llama-3-70b-chat` 11% of the time. - -.. note:: - Once the point is selected, that same selected point will be visible across all three graphs, with x axes for cost, inter-token-latency and time-to-first-token, so that any specific router configuration can be thoroughly examined across all metrics of importance. - -As with the router views, we can save this router configuration either to the dashboard session, or to our user account. Let’s assume we’re very happy with this configuration, and we don’t want to forget it. We’ll therefore save it to our account, by clicking **Save As** - -.. image:: ../images/console_dashboard_custom_configuration.png - :align: center - :width: 650 - :alt: Console Dashboard Custom Selected. - -.. note:: - This router configuration depends on the router view **openai-llama3-calls->no-anthropic**, which has not yet been saved to the account. We are therefore informed that this will also save the router view to the account. - -Once saved, the new router view and router configuration are then both visible from the **Routers** page of your account. You can delete router views and router configurations anytime from that page. Pressing the copy button beside the configuration will copy the full configuration to the clipboard, being :code:`gpt4-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@unify`. - -.. image:: ../images/console_routers_configurations_views.png - :align: center - :width: 650 - :alt: Console Routers Configurations Views. - -.. note:: - You can also copy the configuration from the dashboard which will now show it (along with the parent view) by default. - -With the configuration copied to the clipboard, all you now need to do is pass this into the Unify instructor if using the Python client, like so: - -.. code-block:: python - - import os - from unify import Unify - - unify = Unify( - api_key=os.environ.get("UNIFY_KEY"), - endpoint="openai-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@unify”", - ) - - response = unify.generate(user_prompt="Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") - -.. note:: - You can also query the API with a CuRL request, among others. For more details head to the next section to learn how you can make your first request. - -That’s it! You now know how to explore the various configurations of your custom trained router, and get it deployed in your own application. diff --git a/docs/images/benchmarks_main.png b/docs/images/benchmarks_main.png new file mode 100644 index 0000000..14e893b Binary files /dev/null and b/docs/images/benchmarks_main.png differ diff --git a/docs/images/benchmarks_model_page.png b/docs/images/benchmarks_model_page.png new file mode 100644 index 0000000..3f8e7e7 Binary files /dev/null and b/docs/images/benchmarks_model_page.png differ diff --git a/docs/images/benchmarks_model_page_changed_setup.png b/docs/images/benchmarks_model_page_changed_setup.png new file mode 100644 index 0000000..0d0da87 Binary files /dev/null and b/docs/images/benchmarks_model_page_changed_setup.png differ diff --git a/docs/images/console_benchmarks_quality.png b/docs/images/console_benchmarks_quality.png new file mode 100644 index 0000000..a885289 Binary files /dev/null and b/docs/images/console_benchmarks_quality.png differ diff --git a/docs/images/console_becnhmarks_quality_history.png b/docs/images/console_benchmarks_quality_history.png similarity index 91% rename from docs/images/console_becnhmarks_quality_history.png rename to docs/images/console_benchmarks_quality_history.png index 244e599..8f48ef4 100644 Binary files a/docs/images/console_becnhmarks_quality_history.png and b/docs/images/console_benchmarks_quality_history.png differ diff --git a/docs/images/console_benchmarks_quality_jobs.png b/docs/images/console_benchmarks_quality_jobs.png index 9df8c92..c9288bd 100644 Binary files a/docs/images/console_benchmarks_quality_jobs.png and b/docs/images/console_benchmarks_quality_jobs.png differ diff --git a/docs/images/console_benchmarks_quality_submitted.png b/docs/images/console_benchmarks_quality_submitted.png index 7cdd403..acc67b4 100644 Binary files a/docs/images/console_benchmarks_quality_submitted.png and b/docs/images/console_benchmarks_quality_submitted.png differ diff --git a/docs/images/console_dashboard_custom_configuration.png b/docs/images/console_dashboard_custom_configuration.png index 01c41ac..54c4da0 100644 Binary files a/docs/images/console_dashboard_custom_configuration.png and b/docs/images/console_dashboard_custom_configuration.png differ diff --git a/docs/images/console_dashboard_custom_router_plotting.png b/docs/images/console_dashboard_custom_router_plotting.png index ce36e52..45b7919 100644 Binary files a/docs/images/console_dashboard_custom_router_plotting.png and b/docs/images/console_dashboard_custom_router_plotting.png differ diff --git a/docs/images/console_dashboard_custom_selected.png b/docs/images/console_dashboard_custom_selected.png index 70ef6dd..108e1b9 100644 Binary files a/docs/images/console_dashboard_custom_selected.png and b/docs/images/console_dashboard_custom_selected.png differ diff --git a/docs/images/console_dashboard_custom_view.png b/docs/images/console_dashboard_custom_view.png index 0806180..ab3a302 100644 Binary files a/docs/images/console_dashboard_custom_view.png and b/docs/images/console_dashboard_custom_view.png differ diff --git a/docs/images/console_dashboard_no_anthropic.png b/docs/images/console_dashboard_no_anthropic.png index 5ecbe86..a08d85b 100644 Binary files a/docs/images/console_dashboard_no_anthropic.png and b/docs/images/console_dashboard_no_anthropic.png differ diff --git a/docs/images/console_dashboard_router.png b/docs/images/console_dashboard_router.png index 0461016..58ebb41 100644 Binary files a/docs/images/console_dashboard_router.png and b/docs/images/console_dashboard_router.png differ diff --git a/docs/images/console_datasets_add.png b/docs/images/console_datasets_add.png index 7ad3652..ac60af5 100644 Binary files a/docs/images/console_datasets_add.png and b/docs/images/console_datasets_add.png differ diff --git a/docs/images/console_datasets_preview.png b/docs/images/console_datasets_preview.png index dbe851c..a0236e4 100644 Binary files a/docs/images/console_datasets_preview.png and b/docs/images/console_datasets_preview.png differ diff --git a/docs/images/console_datasets_start.png b/docs/images/console_datasets_start.png index 49878ca..bc41e56 100644 Binary files a/docs/images/console_datasets_start.png and b/docs/images/console_datasets_start.png differ diff --git a/docs/images/console_quality_benchmarks.png b/docs/images/console_quality_benchmarks.png deleted file mode 100644 index e1adc05..0000000 Binary files a/docs/images/console_quality_benchmarks.png and /dev/null differ diff --git a/docs/images/console_routers_benchmarking.png b/docs/images/console_routers_benchmarking.png index b19c4c7..5775f26 100644 Binary files a/docs/images/console_routers_benchmarking.png and b/docs/images/console_routers_benchmarking.png differ diff --git a/docs/images/console_routers_configurations_views.png b/docs/images/console_routers_configurations_views.png index c0045ed..a5a4359 100644 Binary files a/docs/images/console_routers_configurations_views.png and b/docs/images/console_routers_configurations_views.png differ diff --git a/docs/images/console_routers_start.png b/docs/images/console_routers_start.png index 0773e46..72cdf5a 100644 Binary files a/docs/images/console_routers_start.png and b/docs/images/console_routers_start.png differ diff --git a/docs/images/console_routers_train.png b/docs/images/console_routers_train.png index a183097..51d3002 100644 Binary files a/docs/images/console_routers_train.png and b/docs/images/console_routers_train.png differ diff --git a/docs/images/console_routers_training.png b/docs/images/console_routers_training.png index 9a725c4..2b1a6bf 100644 Binary files a/docs/images/console_routers_training.png and b/docs/images/console_routers_training.png differ diff --git a/docs/images/console_runtime_benchmarks.png b/docs/images/console_runtime_benchmarks.png index 87cb2be..fcb877b 100644 Binary files a/docs/images/console_runtime_benchmarks.png and b/docs/images/console_runtime_benchmarks.png differ diff --git a/docs/index.rst b/docs/index.rst index 11c152e..866acfd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,27 +7,26 @@ :maxdepth: -1 Welcome to Unify! - home/walkthrough.rst - home/make_your_first_request.rst - home/build_a_chatbot.rst - home/pricing.rst .. toctree:: :hidden: :maxdepth: -1 - :caption: Concepts + :caption: Interfaces - concepts/endpoints.rst - concepts/benchmarks.rst - concepts/runtime_routing.rst -.. concepts/on_prem_images.rst + interfaces/setting_up.rst + interfaces/connecting_stack.rst + interfaces/running_benchmarks.rst + interfaces/building_router.rst .. toctree:: :hidden: :maxdepth: -1 - :caption: API reference + :caption: API + + api/first_request.rst + api/deploy_router.rst + api/reference.rst - reference/endpoints.rst .. reference/images.rst .. toctree:: @@ -49,3 +48,12 @@ tools/openapi.rst tools/python_library.rst +.. toctree:: + :hidden: + :maxdepth: -1 + :caption: Concepts + + concepts/endpoints.rst + concepts/benchmarks.rst + concepts/routing.rst +.. concepts/on_prem_images.rst diff --git a/docs/interfaces/building_router.rst b/docs/interfaces/building_router.rst new file mode 100644 index 0000000..089954c --- /dev/null +++ b/docs/interfaces/building_router.rst @@ -0,0 +1,126 @@ +Building a custom router +======================== + +In this section, you'll learn how to train and customize a router through the console. + +Training a custom router +------------------------ + +Going to the :code:`Routers` page, you can add a new router by first clicking on :code:`Add Router`. The upload window enables you to name the router, and specify the endpoints to route between and datasets to train on. + +For this example, we'll name our router :code:`gpt-claude-llama3-calls`, as we intend to train on the custom call datasets we uploaded earlier and use GPT 4 as well as Claude 3, the base Llama 3 model, and our fine tuned variants as endpoints. + +.. note:: + You can learn how to add a custom endpoint and dataset `here `_ + +You can select the included models and / or datasets from the corresponding dropdowns. Your custom model endpoints and datasets will be included in the lists. + +.. image:: ../images/console_routers_train.png + :align: center + :width: 650 + :alt: Console Routers Train. + +.. note:: + You may notice that the endpoint providers are not listed. This is because the router training does not depend on the provider, only the model. + +Finally, clicking the :code:`Train` button will submit a training job. Your router configuration will be grayed out during the benchmarking and training processes. + +.. image:: ../images/console_routers_benchmarking.png + :align: center + :width: 650 + :alt: Console Routers Benchmarking. + +If you go to the :code:`Benchmarks` page, you’ll see that the router training job has automatically scheduled some quality benchmarks on your behalf. For any quality benchmarks which have already been performed ahead of time, the work will not be duplicated. + +For example, we previously benchmarked :code:`llama-3-tuned-calls1` and :code:`llama-3-tuned-call2` on the datasets :code:`customer-calls1` and :code:`customer-calls2`, so this will not be repeated. However, we have not yet benchmarked :code:`llama-3-70b-chat` and :code:`gpt-4` on these datasets for e.g, so these are automatically triggered by the router training request. + +.. note:: + You will receive an email when the results are ready, so no need to manually track the progress! + +With the benchmarks done, the router is moved to the training stage and, accordingly, training is triggered. + +.. image:: ../images/console_routers_training.png + :align: center + :width: 650 + :alt: Console Routers Training. + +Once the router training is complete, you will receive a second email. The router performance can now be visualized on the :code:`Dashboard` page. + +Customizing your router +----------------------- + +Now that we have a trained router, the next step is to explore the various possible configurations for this router, each trading off quality, speed and cost in different variations. These various options can be visualized in the :code:`Dashboard`. + +As before, we first choose the dataset to benchmark on. After selecting the dataset, all data points which have been benchmarked on this dataset will automatically be plotted, including the custom router. + +.. image:: ../images/console_dashboard_custom_router_plotting.png + :align: center + :width: 650 + :alt: Console Dashboard Custom Router Plotting. + +The base router and all custom routers can be further configured, by clicking on :code:`Router`, and then clicking on the router which you’d like to customize. + +.. image:: ../images/console_dashboard_router.png + :align: center + :width: 650 + :alt: Console Dashboard Router. + +The next window allows you to create a router view. A router view takes a router and constrains the search space in some way. This can be useful if you only have access to certain models or providers in the deployment environment, or if you want to ensure each model routed to is guaranteed to meet certain quality and performance requirements. + +.. image:: ../images/console_dashboard_custom_view.png + :align: center + :width: 650 + :alt: Console Dashboard Custom View. + +Of course, only the models the router has been trained on will be visible in the dropdown. However, you can remove some of these models from the search space. Let's presume we don’t want to use Anthropic models, as we don’t have them properly configured to run in our deployment environment yet. + +We don’t want to save the router view to our account, we’re only testing at the moment. We therefore click :code:`Apply`. + +.. note:: + Alternatively, had we clicked on :code:`Save`, this would have overwritten the :code:`gpt-clauce-llama3-calls` router in place. + +In the legend, the router view is displayed underneath its parent router. + +.. image:: ../images/console_dashboard_no_anthropic.png + :align: center + :width: 650 + :alt: Console Dashboard No Anthropic. + +We can see that removing the Anthropic models slightly reduced the performance of the router, but not by a noticeable amount. Let’s assume we decide to stick with this decision, to avoid the need to set up Anthropic in our deployment environment in the immediate future. + +The next task is to choose the data point which best balances quality, speed and cost for our application. If a point is selected, its details will appear below the legend. Details include the full Id of the configuration, as well as the routing frequency per endpoint on this dataset + +.. image:: ../images/console_dashboard_custom_selected.png + :align: center + :width: 650 + :alt: Console Dashboard Custom Selected. + +We select a data point that looks balanced. We can see that, on this dataset, this router configuration makes use of :code:`gpt4` 42% of the time, :code:`llama-3-tuned-calls1` 29% of the time, :code:`llama-3-tuned-calls2` 18% of the time, and :code:`llama-3-70b-chat` 11% of the time. + +.. note:: + Once the point is selected, that same point will be visible across all metrics graphs, with x axes for cost, inter-token-latency and time-to-first-token. This lets you verify how the configuration performs for other metrics. + +As with the router views, we can either save this router configuration for the current session by clicking on :code:`Apply`, or permanently to our user account. Let’s assume we’re very happy with this configuration, and we don’t want to forget it. We’ll therefore save it to our account by clicking :code:`Save As`. This opens a window where we can change the name our configuration before saving it. + +.. image:: ../images/console_dashboard_custom_configuration.png + :align: center + :width: 650 + :alt: Console Dashboard Custom Selected. + +.. note:: + This router configuration depends on the router view :code:`gpt-claude-llama3-calls->no-anthropic`, which has not yet been saved to the account. We are therefore informed that this will also save the router view to the account. + +Once saved, the new router view and router configuration are then both visible on the :code:`Routers` page of your account. You can delete views and configurations anytime from that page. Pressing the copy button beside the configuration will copy the full configuration to the clipboard, in this case :code:`gpt-claude-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@unify`. + +.. image:: ../images/console_routers_configurations_views.png + :align: center + :width: 650 + :alt: Console Routers Configurations Views. + +.. note:: + You can also copy the configuration from the dashboard which will now show it (along with the parent view) by default. + +Round Up +-------- + +That’s it! You have now trained your first custom router, ready to be used through our API. You can now `deploy it `_, or learn `how to query endpoints `_ first. diff --git a/docs/interfaces/connecting_stack.rst b/docs/interfaces/connecting_stack.rst new file mode 100644 index 0000000..83ed7a7 --- /dev/null +++ b/docs/interfaces/connecting_stack.rst @@ -0,0 +1,60 @@ +Connecting your stack +===================== + +In this section, you'll learn how to add your own endpoints and datasets to the console. + +Custom endpoints +---------------- + +Prerequisite +^^^^^^^^^^^^ +In this section, we'll assume you have already set up your own LLM endpoints. + +If not, one option is to use off-the-shelf endpoints, such as those available in the `Azure ML Model Catalog `_, `Vertex AI Model Garden `_ and `AWS Bedrock `_. Alternatively, you can create and host your own LLM endpoint. There are a whole variety of ways to do this, but again it’s most common to do so via one of the major cloud providers. + +Regardless of how you set up your LLM endpoints, you'll need to expose an API for this. The API should also **adhere to** the `OpenAI standard `_ to integrate with Unify. + +Adding the endpoints +^^^^^^^^^^^^^^^^^^^^ + +Once you’ve got your custom LLM endpoints set up, the next step is to add these to the :code:`Endpoints` section of the console. + +Click on :code:`Add Endpoint` to upload a new endpoint. You'll have to specify a name, and the cloud provider used for the endpoint. You will also need to include your API key for said provider so we can query your endpoint on your behalf. + +.. image:: ../images/console_custom_endpoints.png + :align: center + :width: 650 + :alt: Console Custom Endpoints. + +That's all! Your custom endpoints are now available through the Unify API as well as our interfaces, ready to be benchmarked and routed across. + +Custom datasets +--------------- + +You can add a dataset on the :code:`Datasets` section of the console. There, click the **Add Dataset** button. + +.. image:: ../images/console_datasets_start.png + :align: center + :width: 650 + :alt: Console Dataset Start. + +The resulting screen lets you specify the local :code:`.jsonl` file to upload, containing the prompts you would like to benchmark on. + +.. image:: ../images/console_datasets_add.png + :align: center + :width: 650 + :alt: Console Dataset Add. + +Note that the screen above is only for unlabelled datasets, which works fine to benchmark endpoints. If you want to train a custom router then you'll need to upload a list of prompts along with reference answers. + +Once your dataset is uploaded, you can click on it to preview the prompts. For example, the image below shows the preview for a labelled dataset. + +.. image:: ../images/console_datasets_preview.png + :align: center + :width: 650 + :alt: Console Dataset Preview. + +Round Up +-------- + +That’s it, you now know how to upload your own endpoints and datasets! You can now `run custom benchmarks `_, `build a custom router `_, or `query your endpoint `_ with the Unify API. \ No newline at end of file diff --git a/docs/interfaces/running_benchmarks.rst b/docs/interfaces/running_benchmarks.rst new file mode 100644 index 0000000..e99ee4d --- /dev/null +++ b/docs/interfaces/running_benchmarks.rst @@ -0,0 +1,164 @@ +Benchmarking endpoints +====================== + +In this section, you'll learn explore how to navigate our benchmarks and run your own custom ones. + +.. note:: + You can learn about the methodology behind our benchmarks, and the various metrics used, on the `benchmarks design `_ section. + + +Quality benchmarks +------------------ + +To compare the quality of different LLMs, you head to the :code:`Dashboard` page on the console. + +By default, all endpoints will be plotted on six datasets, with the :code:`OpenHermes` dataset shown first. The default router will also be plotted, with various configurations of this router plotted as stars. + +.. image:: ../images/console_dashboard.png + :align: center + :width: 650 + :alt: Console Dashboard. + +On the dataset dropdown at the top, you can select any dataset of prompts to benchmark each model and provider against. The scatter graph will then be replotted for the selected dataset. + +.. image:: ../images/console_dashboard_dataset.png + :align: center + :width: 650 + :alt: Console Dashboard Dataset. + +Similarily, you can change the metric plotted on the x axis from cost to something else, by clicking on the :code:`Metric` dropdown. This will let us plot the score against time-to-first-token (TTFT) for e.g. + +.. image:: ../images/console_dashboard_metric.png + :align: center + :width: 650 + :alt: Console Dashboard Metric. + +You can remove any of these points by simply clicking on the model names on the legend. That model will then be removed from the graph, and the router points will be updated to only account for the remaining endpoints. + +.. image:: ../images/console_dashboard_filtered.png + :align: center + :width: 650 + :alt: Console Dashboard Filtered. + +Runtime benchmarks +------------------ + +The benchmarks displayed on the console allow you to compare the average quality and runtime performance of LLM endpoints. As explained on the `benchmarks design `_ section, runtime metrics tend to change over time. + +For granular runtime benchmarks, head to the `benchmarks interface `_ outside of the console. There, you can find a list of popular LLM endpoints, periodically updated with new models and providers. + +.. image:: ../images/benchmarks_main.png + :align: center + :width: 650 + :alt: Benchmarks Page. + +Each page contains a suite of runtime benchmarks providing timely information on the speed, cost and latency of the endpoints exposed by different endpoint providers. + +.. note:: + You can learn more about endpoints providers on the dedicated `endpoints `_ section + +For e.g, the image below corresponds to the benchmark page for :code:`mistral-7b-instruct-v0.2`. + +.. image:: ../images/benchmarks_model_page.png + :align: center + :width: 650 + :alt: Benchmarks Model Page. + +The plot displays the changing value of the metric selected on the table for the region and sequence length specified, across time and providers. On the other hand, the table displays the latest values for all metrics across providers, and allows you to sort them by metric. + +You can plot a different metric on the graph by clicking on the graph icon next metric's column label. For e.g, the image below shows how the plot for :code:`TTFT` reveals different performance patterns than the default :code:`Output Tokens / Sec` figure. + +.. image:: ../images/benchmarks_model_page_changed_setup.png + :align: center + :width: 650 + :alt: Benchmarks Model Page TTFT. + +Running your own benchmarks (Beta) +---------------------------------- + +If you are using custom endpoints or need to compare endpoints for a specific task, you can customize the benchmarks to fit your needs. + +.. note:: + If you haven't done so, we recommend you learn how you can `add your own datasets and endpoints `_ to the console before resuming. + +Once you've added your endpoints and / or datasets, head to the :code:`Benchmarks` page on the console. There, you can see all of the current and previous benchmark jobs you triggered, and you can also specify which endpoints you would like to include for benchmarking. + +Runtime benchmarks +^^^^^^^^^^^^^^^^^^ + +If you have various private endpoints deployed across various servers, each with varying latencies, it can be useful to track these speeds across time, to ensure you’re always sending your requests to the fastest servers. + +To trigger periodic runtime benchmarking for a custom endpoint, simply add it to the list under :code:`Runtime Benchmarks`. You also need to specify at least one IP address from where you would like to test this endpoint, and also at least one prompt dataset against which you would like to perform the benchmarking. + +.. image:: ../images/console_runtime_benchmarks.png + :align: center + :width: 650 + :alt: Console Runtime Benchmarks. + +Once all endpoints are added, you can then go to the `benchmarks interface `_ where you’ll find your model listed with a lock icon, indicating that the benchmark is private (only accessible from your account). + +.. image:: ../images/custom_benchmarks.png + :align: center + :width: 650 + :alt: Custom Benchmarks. + +You can then open the benchmark page like with any other model, and view the performance for various metrics plotted across time. + +.. image:: ../images/custom_benchmarks_model.png + :align: center + :width: 650 + :alt: Custom Benchmarks Model. + +Quality benchmarks +^^^^^^^^^^^^^^^^^^ + +In order to train a router, or just compare the quality of endpoints, it’s necessary to first evaluate the performance of each model on each prompt of a dataset. + +On the :code:`Quality Benchmarks` subsection. You can click on :code:`Submit Job` to trigger a new benchmark comparing the output quality across different LLMs. + +You need to specify the endpoints and datasets you would like to benchmark. + +.. note:: + All selected endpoints will be tested on all selected datasets. So, if you only want to test some endpoints on some datasets, then you should submit separate jobs. + +.. image:: ../images/console_benchmarks_quality.png + :align: center + :width: 650 + :alt: Console Benchmarks Quality. + +Once you are happy with the selection, press :code:`Submit` and the job will appear under :code:`Running Jobs`, as shown below. + +.. image:: ../images/console_benchmarks_quality_submitted.png + :align: center + :width: 650 + :alt: Console Benchmarks Quality Submitted. + +The job can be expanded, to see each endpoint and dataset pair, and check the progress. + +.. image:: ../images/console_benchmarks_quality_jobs.png + :align: center + :width: 650 + :alt: Console Benchmarks Quality Quality Jobs. + +The entire history of benchmarking jobs can also be viewed by clicking on :code:`History`, like so. + +.. image:: ../images/console_benchmarks_quality_history.png + :align: center + :width: 650 + :alt: Console Benchmarks Quality History. + +Once the benchmarking is complete, you can visualize the results in the :code:`Dashboard` page. + +Like before, we can select the dataset through the :code:`Dataset` dropdown. In this case, we'll plot the benchmarks for the custom dataset we uploaded. + +.. image:: ../images/console_dashboard_custom_dataset.png + :align: center + :width: 650 + :alt: Console Dashboard Custom Dataset. + +We can see that the custom endpoints :code:`mixtral-tuned-finances`, :code:`llama-3-tuned-calls1` and :code:`llama-3-tuned-calls2` we added earlier are all plotted, alongside the foundation router, which is always plotted by default. + +Round Up +-------- + +That’s it! You now know how to compare LLM endpoints on quality and runtime metrics, and run benchmarks on your own endpoints and datasets. In the next section, we'll learn how to train and deploy a custom router. diff --git a/docs/interfaces/setting_up.rst b/docs/interfaces/setting_up.rst new file mode 100644 index 0000000..527443d --- /dev/null +++ b/docs/interfaces/setting_up.rst @@ -0,0 +1,80 @@ +Setting-Up +========== + +In this section, we'll go through how you can set-up your account pages to get started querying endpoints. + +Billing +------- + +The :code:`Billing` page is where you can set-up your payment information to recharge your account, and track your spending. + +By default, you can only top-up manually by clicking on :code:`Buy Credits`. **We recommend you set-up automatic refill** to avoid any disruption to your workflows when your credits run out. + +.. image:: ../images/console_billing_no_payment.png + :align: center + :width: 650 + :alt: Console Billing No Payment. + +Automatic refill +^^^^^^^^^^^^^^^^ + +Activating automatic refill requires you go through the dedicated :code:`Billing Portal` where you can add your preferred payment method, update your billing information, and download your invoices. + +.. image:: ../images/console_portal_welcome.png + :align: center + :width: 650 + :alt: Console Portal Welcome. + +Clicking on :code:`Add Payment Methods` then lets you introduce your card information. + +.. image:: ../images/console_portal_setup.png + :align: center + :width: 650 + :alt: Console Portal Setup. + +With your payment information set-up, you can now toggle automatic refill on and off as needed on the main billing page. + +Automatic refill lets you specify the cut-off amount at which your account is automatically refilled by the specified amount when it reaches it. + +.. image:: ../images/console_billing_payment.png + :align: center + :width: 650 + :alt: Console Billing Payment. + + +Pricing and credits +^^^^^^^^^^^^^^^^^^^ + +Credits are consumed when you query endpoints. Because we **don't apply any charge on top of provider costs**, consumed credits directly reflect the cost of a request. Upon signing-up, you are automatically granted **$50 in free credits**! + +.. note:: + You can check your current balance through the billing page but also also directly within your terminal by querying the `Get Credits Endpoint `_ of the API. + +Top-up codes +^^^^^^^^^^^^ + +You may have received a code to increase your number credits, if that's the case, you can activate it doing a request to this endpoint: + +.. code-block:: bash + + curl -X 'POST' \ + 'https://api.unify.ai/v0/promo?code=' \ + -H 'accept: application/json' \ + -H 'Authorization: Bearer ' + +Simply replace :code:`` with your top up code and :code:`` with your API Key and do the request 🚀 + +(Optional) Customizing your profile +----------------------------------- + +Depending on the sign-up method you chose, some of the entries in the :code:`Profile` sections will already be populated. Regardless, you can use this page to change your email address, add your personal information, and sign out in case you’d like to use another account. + +.. image:: ../images/console_profile.png + :align: center + :width: 650 + :alt: Console Profile. + +Round Up +-------- + +You're all set! In the next section, you will learn how to upload your own endpoints and datasets on the console.