Skip to content

PixelScribe: Integrating Python with Google Gemini API for Image Capturing and Description" is a comprehensive guide designed to bridge the gap between capturing live images using Python and leveraging the powerful Google Gemini API for image description.

Notifications You must be signed in to change notification settings

yassienshaalan/PixelScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PixelScribe

PixelScribe: Integrating Python with Google Gemini API for Image Capturing and Description" is a comprehensive guide designed to bridge the gap between capturing live images using Python and leveraging the powerful Google Gemini API for image description. This repository serves as a practical extension of the foundational concepts introduced in the earlier article on Google's Gemini APIs, tailored for both novices and seasoned programmers. Through a step-by-step approach, this guide demystifies the process of interfacing with APIs and handling image data in Python, ensuring a smooth and accessible learning experience for individuals eager to explore the synergy between cutting-edge API technology and Python programming.

Features

Live Image Capturing: Utilizes your webcam to capture live images, providing a continuous stream of visual data for analysis. Real-Time Descriptions: Leverages Google's Gemini API to generate descriptive text for the captured images, offering insights into the visual content. User-Friendly Interface: Features a simple GUI built with Tkinter, making it easy for users of all levels to operate the application. Flexible User Input: Allows users to input custom descriptions or queries, tailoring the AI's focus to specific elements within the images. Asynchronous Processing: Employs threading to ensure that image capturing and description generation processes run smoothly without interrupting the user experience.

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.6 or later An active internet connection A valid Google Cloud Platform account with access to the Gemini API The following Python packages: opencv-python, Pillow, tkinter, google-cloud, dotenv

Installation

To install PixelScribe, follow these steps:

Clone the repository to your local machine: bash Copy code

git clone https://github.com/yourusername/pixelscribe.git

Navigate to the project directory and install the required Python packages: bash Copy code

cd pixelscribe
pip install -r requirements.txt

Set up your Google API credentials and export them as environment variables or store them in a .env file at the root of the project directory.

Usage

To use PixelScribe, run the main Python script from the command line:

bash Copy code

python app.py

The GUI will launch, displaying the live feed from your webcam. Use the "Describe the frame" button to generate descriptions for the current image, and the "Stop" button to halt the video stream.

Customization

PixelScribe is designed with customization in mind. Feel free to dive into the code to adjust the image capture settings, modify the GUI layout, or tweak the interaction with the Google Gemini API to better suit your specific use case.

Contributing

Contributions to PixelScribe are welcome! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request.

License

PixelScribe is released under the MIT License. Feel free to use, modify, and distribute the code in accordance with the license terms.

Acknowledgments

Thanks to Google Cloud Platform for providing the powerful Gemini API that powers the core functionality of this application. Appreciation to the open-source community for the invaluable Python packages that make projects like this possible.

About

PixelScribe: Integrating Python with Google Gemini API for Image Capturing and Description" is a comprehensive guide designed to bridge the gap between capturing live images using Python and leveraging the powerful Google Gemini API for image description.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages