# Telegram Analysis - Getting the Data

This notebooks demonstrates how to query Telegram for different types of data:
* groups you are a member of
* participants of these groups
* messages sent in these groups, including action messages (like members joining or leaving)

As always we start with importing libraries. We also load the environmental variables required to
log in to your Telegram account.

In [None]:
from pathlib import Path
import configparser

from telethon.sync import TelegramClient

In [None]:
# Reading Configs
config = configparser.ConfigParser()
config.read(Path.cwd().parent/".env")

# Setting configuration values
api_id = config['Telegram']['api_id']
api_hash = config['Telegram']['api_hash']

api_hash = str(api_hash)

phone = config['Telegram']['phone']
username = config['Telegram']['username']

We query Telegram for all of the dialogs you are a member of. This includes group chats as well as
private conversations.

You can use the `.to_dict()` function to convert methods to dictionaries. Not as computationally clean,
but easier to explore when you don't know much about the data structure.

In [None]:
dialogs = []
async with TelegramClient(username, api_id, api_hash) as client:
    async for dialog in client.iter_dialogs():
        dialogs.append(dialog.to_dict())

dialogs

Next we filter for only the dialogues we are interested in. Here I filter by some part of the name.

In [None]:
dialog_names_of_interest = [d.get('name') for d in dialogs if 'red' in d.get('name').lower()]
dialog_names_of_interest

Next we collect all the messages from all of the dialogs we are interested in. You can collect by either name
or id.

We convert the telethon-native instances to dictionaries and collect all of the data into a single dictionary
for ease of processing.

In [None]:
messages_of_interest = {}
for dialog in dialog_names_of_interest:
    ms = []
    async with TelegramClient(username, api_id, api_hash) as client:
        async for message in client.iter_messages(dialog):
            ms.append(message.to_dict())
    messages_of_interest[dialog] = ms

We do the same for participant information.

In [None]:
participants = {}
for dialog in dialog_names_of_interest:
    ps = []
    async with TelegramClient(username, api_id, api_hash) as client:
        async for participant in client.iter_participants(dialog):
            ps.append(participant.to_dict())
    participants[dialog] = ps.copy()

One of the aspects I am particularly interested in is the joining history. When did which users join?
The messages contain a field "action". "action" is empty for normal messages, but tells us about these
type of events.

In [None]:
# I am interested in the "joined" events.
# These can be obtained by filtering for the messages with a defined action
print(f"Analyzing joining behaviour for group {dialog_names_of_interest[1]}")
for i, m in enumerate(messages_of_interest[dialog_names_of_interest[1]]):
    if m.get('action') is not None:
        print(i, m.get('date'), m.get('action'), m.get('action').get('users'))