Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README links and text edits #2

Merged
merged 5 commits into from
Nov 28, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 7 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,26 +45,25 @@ inference service via an HTTP or gRPC endpoint, allowing remote
clients to request inferencing for any model being managed by the
server. TRTIS provides the following features:

* Multiple model support. The server can manage any number and mix of
* `Multiple framework support <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#model-definition>`_. The server can manage any number and mix of
models (limited by system disk and memory resources). Supports
TensorRT, TensorFlow GraphDef, TensorFlow SavedModel and Caffe2
NetDef model formats. Also supports TensorFlow-TensorRT integrated
models.
* Multi-GPU support. The server can distribute inferencing across all
system GPUs.
* Multi-tenancy support. Multiple models (or multiple instances of the
* `Concurrent model execution support <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#instance-groups>`_. Multiple models (or multiple instances of the
same model) can run simultaneously on the same GPU.
* Batching support. For models that support batching, the server can
accept requests for a batch of inputs and respond with the
corresponding batch of outputs. The server also supports *dynamic
batching* where individual inference requests are dynamically
corresponding batch of outputs. The server also supports `dynamic
batching <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#dynamic-batching>`_ where individual inference requests are dynamically
combined together to improve inference throughput. Dynamic batching
is transparent to the client requesting inference.
* Model repositories may reside on a locally accessible file system or
* `Model repositories <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#>`_ may reside on a locally accessible file system (e.g. NFS) or
in Google Cloud Storage.
* Readiness and liveness health endpoints suitable for
Kubernetes-style orchestration.
* Metrics indicating GPU utiliization, server throughput, and server
* Readiness and liveness `health endpoints <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/http_grpc_api.html#health>`_ suitable for any orchestration or deployment framework, such as Kubernetes.
* `Metrics <https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/metrics.html>`_ indicating GPU utiliization, server throughput, and server
latency.

.. overview-end-marker-do-not-remove
Expand Down