# X-Raying LinkedIn and email discovery - InfoSec Jupyterthon 2024

## Introduction
In this notebook, we'll leverage the power of Google dorking to search for employees associated with a given organization and retrieve emails.

## Setup
First, we'll define the required imports, API keys and the target.

In [None]:
import requests
import json
import re

# Google Custom Search JSON API
# https://developers.google.com/custom-search/v1/introduction
API_KEY = ""
ID = ""

TARGET = "microsoft"
TARGET_DOMAIN = "microsoft.com"

## Function Definitions
Next, we'll define the functions used in our scraper:
- `execute_dorks`: Retrieves search results from the Google Custom Search API.

In [None]:
def execute_dorks(query, type):
    start = 1
    total_results = 0
    total_gathered = 0
    limit = False
    results = True
    info = []

    while results and start<100 and not limit:
        payload = {"key":API_KEY,"cx":ID,"start":start,"q":query}
        res = requests.get("https://www.googleapis.com/customsearch/v1",params=payload)
        data = json.loads(res.text)
        if "error" in data:
            print(data["error"]["status"])
            limit = True
        else:
            if start == 1:
                total_results = data["searchInformation"]["totalResults"]
            if "items" in data:
                for item in data["items"]:
                    try:
                        if type == "names":
                            l = item["link"].split("?")[0] if "?" in item["link"] else item["link"]
                            first_name = item["pagemap"]["metatags"][0]["profile:first_name"]
                            last_name = item["pagemap"]["metatags"][0]["profile:last_name"]
                            name = f"{first_name} {last_name}"
                            info.append((name,l))
                        elif type == "emails":
                            l = item["link"].split("?")[0] if "?" in item["link"] else item["link"]
                            regex = r"[\%a-zA-Z\.0-9_\-\+]+@" + TARGET_DOMAIN
                            text = json.dumps(item)
                            emails = re.findall(regex, text.replace("<em>", "").replace("<\em>","")
                                                .replace("<strong>", "").replace("</strong>", "")
                                                .replace("<b>", "").replace("</b>", ""))
                            info += emails
                        total_gathered = total_gathered + 1
                    except KeyError as e:
                        pass
                    except Exception as e:
                        print(f"Unexpected error: {str(e)}")
            else:
                results = False

        start = start + 10
    return (info,total_results,total_gathered,limit)

- `linkedin_xray`: X-Rays LinkedIn to find profiles based on the organization name.

In [None]:
def linkedin_xray(role=""):
	dork = f'site:linkedin.com -inurl:jobs -inurl:company -inurl:posts -inurl:pulse -inurl:learning "{TARGET}" "{role}"'
	info,total_results,total_gathered,limit = execute_dorks(dork, "names")
	for employee in info:
		print("[\033[92m*\033[00m] "+ employee[0] + ": " + employee[1])

## Main Execution
Finally, we'll execute the function.


In [None]:
linkedin_xray()

Extracting more names can be achieved by adding roles to the dork.

In [None]:
roles = [
    "CTO",
    "Manager",
    "Engineer",
    "Developer",
    "Administrator"
]

for role in roles:
    linkedin_xray(role)

- `find_emails`: Find emails from dorks.

In [None]:
def find_emails():
    dorks = [f'{TARGET_DOMAIN} "e-mail"',
             f'{TARGET_DOMAIN} *@{TARGET_DOMAIN}',
             f'intext:"@{TARGET_DOMAIN}" (site:linkedin.com OR site:github.com)',
             f'intext:"@{TARGET_DOMAIN}" (site:twitter.com OR site:facebook.com OR site:instagram.com)',
             f'intext:"@{TARGET_DOMAIN}" (filetype:log OR filetype:sql OR site:pastebin.com)',
             f'intext:"@{TARGET_DOMAIN}" (filetype:pdf OR filetype:doc OR filetype:docx)',
             f'intext:"@{TARGET_DOMAIN}" (inurl:email OR inurl:contacts OR inurl:about)']
    for dork in dorks:
        (info, total, gathered, limit) = execute_dorks(dork, "emails")
        for email in info:
            print(email)
        
		
    

Executing the function to retrieve emails using dorks.

In [None]:
find_emails()

## Conclusions
Dorks can be very powerful not only for emails and names discovery but for more type of entities, even to find vulnerable websites. You just need to find the key elements that will help you get only the results you are looking for.