# Basics of Using Python on the WRDS Platform

This notebook walks you through the essentials of connecting to WRDS and querying data using Python. It covers installation, listing databases and tables, querying with SQL, joining datasets, and exporting results.

## 0. Setup

Install the `wrds` package before running any code below. See [pypi.org/project/wrds](https://pypi.org/project/wrds/) for details.

If the cell below does not work, run `pip install wrds` manually in your terminal (Mac) or Anaconda prompt (Windows).

In [None]:
!pip install wrds

## 1. Import the WRDS Package

In [None]:
import wrds

## 2. Connect to the WRDS Server

On first use, you will be prompted for your WRDS username and password. Your credentials are stored locally so you only need to enter them once.

In [None]:
conn = wrds.Connection()

## 3. List Available Libraries

A **library** in WRDS corresponds to a database (e.g., `comp` for Compustat, `ff` for Fama-French). Use `list_libraries()` to see all databases your institution subscribes to.

> **Note:** Utrecht University does not have a subscription to every WRDS database (e.g., we do not have CRSP). Queries against databases we are not subscribed to will not work. See the [WRDS overview](README.md) for the full list of databases available to us.

In [None]:
conn.list_libraries().sort()
type(conn.list_libraries())

## 4. List Tables within a Library

Each library contains one or more **tables** (datasets). Use `list_tables()` to see what is available within a given library.

In [None]:
conn.list_tables(library='comp')

## 5. Query Data with `get_table()`

`get_table()` is the simplest way to pull data from a single table. You can limit rows with `obs` and select specific columns with `columns`.

In [None]:
# Extract first 5 obs from comp.company

company = conn.get_table(library='comp', table='company', obs=5)
company.shape

company

In [None]:
# Narrow down the specific columns to extract

company_narrow = conn.get_table(library='comp', table='company', columns = ['conm', 'gvkey', 'cik'], obs=5)
company_narrow.shape

company_narrow

## 6. Query Data with `raw_sql()`

For more control — filtering rows, specifying date ranges, or joining tables — use `raw_sql()` with standard SQL syntax. The `date_cols` parameter automatically parses date columns into `datetime` format.

In [None]:
# Select one stock's monthly price
# from 2019 onwards

apple = conn.raw_sql("""select gvkey, datadate, fyear, at, sale, ni 
                        from comp.funda 
                        where gvkey = '16917'
                        and datadate>='2019-01-01'""", 
                     date_cols=['datadate'])

apple 

In [None]:
apple.dtypes

## 7. Join Multiple Tables

You can join tables directly in your SQL query, just like in SAS `proc sql`. The example below merges Compustat annual fundamentals (`comp.funda`) with monthly security prices (`comp.secm`).

In [None]:
apple_fund = conn.raw_sql("""select a.gvkey, a.iid, a.datadate, a.tic, a.conm,
                            a.at, b.prccm, b.cshoq 
                            
                            from comp.funda a 
                            inner join comp.secm b 
                            
                            on a.gvkey = b.gvkey
                            and a.iid = b.iid
                            and a.datadate = b.datadate
                        
                            where a.tic = 'AAPL' 
                            and a.datadate>='01/01/2010'
                            and a.datafmt = 'STD' 
                            and a.consol = 'C' 
                            and a.indfmt = 'INDL'
                            """, date_cols=['datadate'])

apple_fund.shape
apple_fund 

## 8. Saving Output

Pandas DataFrames can be exported to many formats. Replace `/your local directory/` with your actual file path.

In [None]:
import pandas as pd

In [None]:
# export the dataframe to csv format

apple_fund.to_csv('/your local directory/apple_fund.csv')

# export the dataframe to xlsx format

apple_fund.to_excel('/your local directory/apple_fund.xlsx')

In [None]:
# pickle the dataframe

apple_fund.to_pickle("/your local directory/apple_fund.pkl")

In [None]:
# export the dataframe to dta format for STATA

apple_fund.to_stata('/your local directory/apple_fund.dta')