I have a list of dribble shots urls (eg. https://dribbble.com/shots/21444048-Sweet-7). give me a python method that takes the shots url as input, and returns the images as well as the metadata. if there exists an api (whether public or reverse engineered), use it. Otherwise, use whatever the most efficient way it is. be pythonic, highly efficient and elegant.M

In [1]:
import requests
from bs4 import BeautifulSoup
import re

def get_shot_data(url):
    # Send a request to the URL
    response = requests.get(url)

    # If the GET request is successful, the status code will be 200
    if response.status_code == 200:

        # Get the content of the response
        page_content = response.content

        # Create a BeautifulSoup object and specify the parser
        soup = BeautifulSoup(page_content, 'html.parser')

        # Get metadata
        title = soup.find('title').get_text()
        description = soup.find('meta', attrs={'name': 'description'}).get('content')
        tags = [tag.get_text() for tag in soup.find_all('a', attrs={'class': 'dribbble-overview__link'})]

        # Get image URL
        image_tag = soup.find('img', attrs={'class': 'shot-media-shot'})
        image_url = image_tag.get('src') if image_tag else None

        return {
            'title': title,
            'description': description,
            'tags': tags,
            'image_url': image_url
        }

    else:
        return None


In [2]:
get_shot_data("https://dribbble.com/shots/21448724-Bot-Detection-Website-Hero")

{'title': 'Bot Detection Website - Hero by Awsmd on Dribbble',
 'description': 'Bot Detection Website - Hero designed by Awsmd. Connect with them on Dribbble; the global community for designers and creative professionals.',
 'tags': [],
 'image_url': None}

the "app" that contains the images is located at: `//*[@id="ssr-app"]/div[1]`, and the div id changes on reload. on a example post, an actual image is located at: `//*[@id="ssr-app"]/div[1]/div/div/a`, with the content:
```
skipped
```

In [3]:
import requests
from bs4 import BeautifulSoup

def get_shot_data(url):
    # Send a request to the URL
    response = requests.get(url)

    # If the GET request is successful, the status code will be 200
    if response.status_code == 200:

        # Get the content of the response
        page_content = response.content

        # Create a BeautifulSoup object and specify the parser
        soup = BeautifulSoup(page_content, 'html.parser')

        # Get metadata
        title = soup.find('title').get_text()
        description = soup.find('meta', attrs={'name': 'description'}).get('content')
        tags = [tag.get_text() for tag in soup.find_all('a', attrs={'class': 'dribbble-overview__link'})]

        # Get image URLs
        image_tags = soup.find_all('a', attrs={'data-photoswipe-image': ''})
        image_urls = [tag.get('href') for tag in image_tags]

        return {
            'title': title,
            'description': description,
            'tags': tags,
            'image_urls': image_urls
        }

    else:
        return None


In [4]:
get_shot_data("https://dribbble.com/shots/21448724-Bot-Detection-Website-Hero")

{'title': 'Bot Detection Website - Hero by Awsmd on Dribbble',
 'description': 'Bot Detection Website - Hero designed by Awsmd. Connect with them on Dribbble; the global community for designers and creative professionals.',
 'tags': [],
 'image_urls': ['/',
  '/shots',
  '/shots/popular',
  '/shots/recent',
  '/shots?list=playoffs',
  '/stories',
  '/shots/popular/animation',
  '/shots/popular/branding',
  '/shots/popular/illustration',
  '/shots/popular/mobile',
  '/shots/popular/print',
  '/shots/popular/product-design',
  '/shots/popular/typography',
  '/shots/popular/web-design',
  '/jobs',
  '/jobs',
  '/freelance-jobs',
  '/projects',
  '/pro/pitch',
  '/learn',
  '/courses/product-design',
  '/courses/ui-design',
  '/courses/design-systems',
  '/courses/career-prep',
  '/learn',
  '/pro',
  '/hiring',
  '/designers',
  '/jobs/new',
  '/freelance-jobs',
  '/',
  '/search',
  '/session/new?return_to=%2Fshots%2F21448724-Bot-Detection-Website-Hero',
  '/signup/new',
  '/shots/popula

it's not consistent. when using it on a shot page, i get:
```
{'title': 'Bot Detection Website - Hero by Awsmd on Dribbble',
 'description': 'Bot Detection Website - Hero designed by Awsmd. Connect with them on Dribbble; the global community for designers and creative professionals.',
 'tags': [],
 'image_urls': ['/',
  '/shots',
  '/shots/popular',
  '/shots/recent',
  '/shots?list=playoffs',
....
```
other than using selenium, is there other way to get the actual url?



In [11]:
!pip install pyppeteer
!pip install websockets --upgrade

Collecting websockets
  Using cached websockets-11.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)
Installing collected packages: websockets
  Attempting uninstall: websockets
    Found existing installation: websockets 10.4
    Uninstalling websockets-10.4:
      Successfully uninstalled websockets-10.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyppeteer 1.0.2 requires websockets<11.0,>=10.0, but you have websockets 11.0.3 which is incompatible.[0m[31m
[0mSuccessfully installed websockets-11.0.3


In [12]:
# !sudo apt-get update
!sudo apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget


Reading package lists...
Building dependency tree...
Reading state information...
fonts-liberation is already the newest version (1:1.07.4-11).
fonts-liberation set to manually installed.
libatk1.0-0 is already the newest version (2.35.1-1ubuntu2).
libatk1.0-0 set to manually installed.
libcairo2 is already the newest version (1.16.0-4ubuntu1).
libcairo2 set to manually installed.
libfontconfig1 is already the newest version (2.13.1-2ubuntu3).
libfontconfig1 set to manually installed.
libnspr4 is already the newest version (2:4.25-1).
libnspr4 set to manually installed.
libpango-1.0-0 is already the newest version (1.44.7-2ubuntu4).
libpango-1.0-0 set to manually installed.
libpangocairo-1.0-0 is already the newest version (1.44.7-2ubuntu4).
libpangocairo-1.0-0 set to manually installed.
libxcb1 is already the newest version (1.14-2).
libxcb1 set to manually installed.
libxcomposite1 is already the newest version (1:0.4.5-1).
libxcomposite1 set to manually installed.
libxcursor1 is alr

In [9]:
# running stand-alone
import asyncio
from pyppeteer import launch

async def get_image_urls(url):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(url)

    # Select the image elements and extract the URLs
    image_elements = await page.querySelectorAll('a[data-photoswipe-image]')
    image_urls = await page.evaluate('''elements => {
        return elements.map(element => element.href);
    }''', image_elements)

    await browser.close()

    return image_urls

# Use asyncio to run the function
image_urls = asyncio.get_event_loop().run_until_complete(get_image_urls('https://dribbble.com/shots/21444048-Sweet-7'))
print(image_urls)


RuntimeError: This event loop is already running

In [15]:
# jupyter: working
import asyncio
from pyppeteer import launch

async def get_image_urls(url):
    browser = await launch()
    page = await browser.newPage()
    await page.goto(url)
    
    # Select the image elements and extract the URLs
    image_elements = await page.querySelectorAll('a[data-photoswipe-image]')
    image_urls = []
    for element in image_elements:
        url = await page.evaluate('(element) => element.href', element)
        image_urls.append(url)

    await browser.close()
    return image_urls

# Use asyncio to run the function
image_urls = await get_image_urls('https://dribbble.com/shots/21444048-Sweet-7')
print(image_urls)

['https://cdn.dribbble.com/userupload/6956233/file/original-6ea950c8004e9d39cb757ce3f30e17b1.jpg?compress=1&resize=752x']


In [17]:
url = input("url:")
image_urls = await get_image_urls(url)
print(image_urls)

url: https://dribbble.com/shots/21448800-Tabato-Calendar-app-Bookings


['https://cdn.dribbble.com/userupload/6968465/file/original-6ab2b1fe5280cbf36527d1bcd0ec0f9a.png?compress=1&resize=752x', 'https://cdn.dribbble.com/userupload/6968466/file/original-63ce67fcb0551eaa6e8cba233ab383c7.png?compress=1&resize=752x', 'https://cdn.dribbble.com/userupload/6968467/file/original-f63b71daa147c67281d72645a3d45b55.png?compress=1&resize=752x']
