A Flask-based API that provides shopping assistance through audio and image inputs, powered by Google's Gemini AI model.
- Audio input processing with speech recognition
- Image analysis for product identification
- Structured product information responses
- Support for multiple product categories (Food, Clothes, Shoes)
- Detailed size and pricing information
- Rack location tracking
- Python 3.8 or higher
- Google Cloud API key for Gemini AI
- Internet connection for API calls
- Clone the repository:
git clone <repository-url>
cd <repository-name>
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Create a
.env
file in the root directory with:
GOOGLE_API_KEY=your_api_key_here
python app.py
The server will start on http://0.0.0.0:5000
- Method: POST
- Input: Audio file (WAV format)
- Output Format:
{
"transcription": "user's speech text",
"text_response": {
"Section": "section name",
"Rack": "rack position",
"Name": "product name",
"Price": "price",
"Size": "size information"
}
}
- Method: POST
- Input: Base64 encoded image
- Output Format:
{
"text_response": {
"Section": "section name",
"Rack": "rack position",
"Name": "product name",
"Price": "price",
"Size": "size information"
}
}
-
Food
- Available sizes: Small Pack (200g), Medium Pack (500g), Large Pack (1kg)
- Products: Biscuits, Snacks, etc.
-
Clothes
- Available sizes: S, M, L, XL, XXL
- Products: T-shirts, Shirts, etc.
-
Shoes
- Available sizes: US 6-12, UK 5-11, EU 39-45
- Products: Leather Shoes, Formal Shoes, Loafers
All responses follow this structure:
Section: [section name]
Rack: [rack position]
Name: [product name]
Price: [price]
Size: [size information]
The API handles various error cases:
- Invalid audio input
- Unsupported image format
- Missing API key
- Network errors
- Speech recognition errors
Use the provided test scripts:
test_audio.py
for testing audio inputtest_image.py
for testing image input
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.