# Image captioning app 🖼️📝

Load your HF API key and relevant Python libraries

In [1]:
pip install python-dotenv



In [None]:
import os
from dotenv import load_dotenv, find_dotenv

# Load environment variables from .env file
_ = load_dotenv(find_dotenv())

# Set your Hugging Face API key as an environment variable
os.environ['HF_API_KEY'] = 'hf_XXXX'  # Replace with your actual API key

# Check if the variable is set
if 'HF_API_KEY' in os.environ:
    print("HF_API_KEY has been set:", os.environ['HF_API_KEY'])
else:
    print("HF_API_KEY is not set in the environment variables.")

# Now you can use hf_api_key in your code
hf_api_key = os.environ['HF_API_KEY']


 A function called get_completion is designed to interact with an image-to-text endpoint. It uses the requests library to make a POST request to the specified endpoint with given inputs and headers, and it returns the response data as JSON.

 The free images are available on: https://free-images.com/

In [3]:
import requests
import json

HF_API_ITT_BASE = "https://free-images.com/"

#Image-to-text endpoint
def get_completion(inputs, parameters=None, ENDPOINT_URL=HF_API_ITT_BASE):
    headers = {
      "Authorization": f"Bearer {hf_api_key}",
      "Content-Type": "application/json"
    }
    data = { "inputs": inputs }
    if parameters is not None:
        data.update({"parameters": parameters})
    response = requests.request("POST",
                                ENDPOINT_URL,
                                headers=headers,
                                data=json.dumps(data))
    return json.loads(response.content.decode("utf-8"))


## Building an image captioning app

Here we'll be using an [Inference Endpoint](https://huggingface.co/inference-endpoints) for `Salesforce/blip-image-captioning-base` a 14M parameter captioning model.

The code would look very similar if you were running it locally instead of from an API. You can check the [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) documentation page.

```py
from transformers import pipeline

get_completion = pipeline("image-to-text",model="Salesforce/blip-image-captioning-base")

def summarize(input):
    output = get_completion(input)
    return output[0]['generated_text']
    
```

In [4]:
pip install transformers



In [5]:
from transformers import pipeline

get_completion = pipeline("image-to-text",model="Salesforce/blip-image-captioning-base")

def summarize(input):
    output = get_completion(input)
    return output[0]['generated_text']

In [6]:
import io
import IPython
from PIL import Image
import base64

In [7]:
image_url = "https://free-images.com/sm/9596/dog_animal_greyhound_983023.jpg"
display(IPython.display.Image(url=image_url))
get_completion(image_url)



[{'generated_text': 'a dog wearing a santa hat and a red scarf'}]

## Captioning with `gr.Interface()`

In [8]:
pip install gradio




Using the gradio library to create a graphical interface for image captioning using the get_completion function defined earlier. The interface allows users to upload an image and receive a caption generated by the model.

In [11]:
import gradio as gr

def captioner(image):
    result = get_completion(image)
    return result[0]['generated_text']

gr.close_all()
demo = gr.Interface(fn=captioner,
                    inputs=[gr.Image(label="Upload image", type="pil")],
                    outputs=[gr.Textbox(label="Caption")],
                    title="Image Captioning with BLIP",
                    description="Caption any image using the BLIP model",
                    allow_flagging="never")
                    # examples=["christmas_dog.jpeg", "bird_flight.jpeg", "cow.jpeg"])

demo.launch(share=True, server_port=9991)

Closing server running on port: 9990
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://bcf90c887c3a00f624.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
gr.close_all()