
AIknowlEDGE is a desktop application that enables users to seamlessly deploy and manage their disconnected Azure AI services with ease.
View Demo
Explore the docs
·
Report Bug
·
Request Feature
Table of Contents
Welcome to the AI-KnowlEDGE project! The goal of this repository is to provide an accelerator that enables users to quickly and efficiently set up and run Azure AI services on a disconnected environment. By following our comprehensive set of instructions and utilizing the provided sample project, users can seamlessly deploy and manage their disconnected services with ease. Our aim is to simplify the setup process, reduce deployment time, and ensure a smooth operational experience for all users.
Azure AI services offer a variety of Docker containers that allow you to utilize the same APIs available in Azure, but on your own premises. By using these containers, you gain the flexibility to position Azure AI services closer to your data, which can be beneficial for compliance, security, or other operational needs. Currently, container support is offered for a limited number of Azure AI services.
Containerization is a method of software distribution where an application or service, along with its dependencies and configuration, is bundled together into a container image. This container image can be deployed on a container host with minimal or no alterations. Containers are isolated from one another and from the underlying operating system, and they have a smaller footprint compared to virtual machines. They can be created from container images for temporary tasks and removed when they are no longer needed.
-
Immutable Infrastructure: DevOps teams can utilize a consistent and reliable set of known system parameters while remaining adaptable to changes. Containers offer the flexibility to pivot within a stable ecosystem and prevent configuration drift.
-
Control Over Data: You can determine where your data is processed by Azure AI services, which is crucial if you cannot send data to the cloud but still need access to Azure AI services APIs. This approach supports consistency in hybrid environments across data, management, identity, and security.
-
Control Over Model Updates: You have the flexibility to version and update models deployed in your solutions as needed.
-
Portable Architecture: Containers enable the creation of a portable application architecture that can be deployed on Azure, on-premises, and at the edge. They can be deployed directly to Azure Kubernetes Service, Azure Container Instances, or a Kubernetes cluster on Azure Stack. For more details, see Deploy Kubernetes to Azure Stack.
-
High Throughput / Low Latency: Containers allow Azure AI services to run close to your application logic and data, meeting high throughput and low latency requirements. They do not limit transactions per second (TPS) and can scale both vertically and horizontally to meet demand, given sufficient hardware resources.
-
Scalability: With the growing popularity of containerization and container orchestration software like Kubernetes, scalability is a key focus. Building on a scalable cluster foundation allows for application development that supports high availability.
Follow these steps to run the project locally.
- Python 3.12+
- Docker
- VS Code
- Ollama
-
Clone the repository:
git clone https://github.com/Azure-Samples/AI-knowlEDGE.git cd AI-knowlEDGE
-
Install Docker and start. Then open the cmd and pull the container by running the following command:
docker pull mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu
-
Get the Azure Cognitive Services keys and endpoints for Azure Document Intelligence and Azure AI Language. Before using Cognitive Services containers in disconnected environments, you must complete a request form and purchase a commitment plan. Next, provision a new resource in the portal. Select the DC0 option for the Pricing tier to enable disconnected containers, (for Document Intelligence additionally choose a custom, read, or prebuilt commitment tier).
Note: For a “semi-disconnected” mode, you can provision a Document Intelligence and Language Resource with the S0 commitment plan, without filling any request form.
For your convienice, create a txt document and save the keys and endpoint in the following format:
```
AZURE_DOCUMENT_ANALYSIS_ENDPOINT = <document-intelligence-endpoint>
AZURE_DOCUMENT_ANALYSIS_KEY = <document-intelligence-key>
LANGUAGE_ENDPOINT = <ai-language-endpoint>
LANGUAGE_KEY = <ai-language-key>
```
-
Create a folder on your C:/ drive named
ExtractiveModel
. -
Download the SLMs for the Summarization Service. Start Docker and run:
docker run -v C:\ExtractiveModel:/models mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu downloadModels=ExtractiveSummarization billing=LANGUAGE_ENDPOINT apikey=LANGUAGE_KEY
-
Now open the cloned repo in a command line or VSCode, and set up the Python environment and install project dependencies:
python -m venv venv venv\Scripts\activate # On Linux use `source venv/bin/activate` pip install -r requirements.txt
-
Create a docker-compose.yml (in the root folder) with the following code:
version: "3.9" services: azure-form-recognizer-read: container_name: azure-form-recognizer-read image: mcr.microsoft.com/azure-cognitive-services/form-recognizer/read-3.1 environment: - EULA=accept - billing=<document-intelligence-endpoint> - apiKey=<document-intelligence-key> ports: - "5000:5000" networks: - ocrvnet textanalytics: image: mcr.microsoft.com/azure-cognitive-services/textanalytics/summarization:cpu environment: - eula=accept - rai_terms=accept - billing=<language-endoint> - apikey=<language-key> volumes: - "C:\\ExtractiveModel:/models" ports: - "5001:5000" networks: ocrvnet: driver: bridge
-
Create the container by running:
docker-compose up
-
Make a .env file:
AZURE_DOCUMENT_ANALYSIS_ENDPOINT=http://localhost:5000 AZURE_DOCUMENT_ANALYSIS_KEY=<document-intelligence-key> LANGUAGE_ENDPOINT=http://localhost:5001 LANGUAGE_KEY=<language-key>
-
Download Ollama and install at least one SLM and one embedding model:
ollama pull phi3
We also use an embedding model to vectorize document chunks and build a local RAG solution with ChromaDB. Thus, additionally, ensure that the required embedding model is installed.
ollama pull nomic-embed-text
-
Start the FastAPI backend (from the root folder):
uvicorn backend.main:app --host 0.0.0.0 --port 8000
If you're using VS Code, simply press F5 or go to Run > Start Debugging. The
launch.json
is already configured. -
Open another terminal and start the Streamlit frontend:
streamlit run frontend/app.py --server.port=8501
-
If you see the following error
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /get_models/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x124209be0>: Failed to establish a new connection: [Errno 61] Connection refused'))
make sure the FastAPI backend is running. Check the outputs if needed. -
If you're able to start the streamlit app and see the following error message:
No Ollama models found. Please ensure Ollama is running and models are installed.
this means you didn't start the ollama application. -
You can also run unit tests from the tests folder, for debugging the FastAPI backend.
- Other Containers Integration
- Speech service
- Translation service
- Other Use Cases Integration
- Packaging
- Single-click installation
- Cross-platform installation
See the open issues for proposed features and known issues.
Contributions make open source great. We appreciate all contributions.
- Fork this repo
- Create a Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more info.
- Raoui Lassoued LinkedIn
- Serge Retkowsky LinkedIn
- Farid El Attaoui LinkedIn
- Alibek Jakupov @ajakupov1 LinkedIn
Project Link: AIKnowlEDGE
- Microsoft France