# Loading example data sets from the pyrsm package

In [1]:
import pandas as pd
import pyrsm as rsm

The pyrsm contains all example packages included with the family of Radiant R packages. If you see an example dataset mentioned in any of the help files listed under the `data`, `design`, `basics`, `model`, or `multivariate` dropdown menus shown at https://radiant-rstats.github.io/docs/ then you will be able to load that same dataset as a pandas dataframe using commands like those shown below.

For example, to load the `catalog` data set from example 1 (https://radiant-rstats.github.io/docs/model/regress.html) you would use the command below to load the dataset and the provided data description.

In [2]:
catalog, catalog_description = rsm.load_data(pkg="model", name="catalog")
catalog

Unnamed: 0,id,Sales,Income,HH_size,Age
0,1,178.63,93.0,4,55.0
1,2,338.59,79.0,5,35.0
2,3,210.26,70.0,4,64.0
3,4,378.64,95.0,2,39.0
4,5,227.09,119.0,2,43.0
...,...,...,...,...,...
195,196,234.85,42.0,2,38.0
196,197,138.20,94.0,2,58.0
197,198,340.74,88.0,2,35.0
198,199,496.48,110.0,5,37.0


To view a description of the `catalog` dataset use:

In [3]:
rsm.md(catalog_description)

## Catalog sales

### Description

Data from a company selling men's and women's apparel through mail-order catalogs. The company maintains a database on past and current customers' value and characteristics. Value is determine as the total $ sales to the customer in the last year. The data are a random sample of 200 customers from the company's database.

### Variables

A data frame with 200 observations on 4 variables

- id = Customer id
- Sales =	Total sales (in $) to a household in the past year
- Income = Household income ($1000) 
- HH.size = Size of the household (# of people) 
- Age = Age of the head of the household

If you are not sure what menu a dataset is from but you do know the name, then you can use the code below. The `load_data` function will now search through all available datasets to find a match. If no match is found, an empty dictionary will be returned

In [4]:
rndnames, rndnames_description = rsm.load_data(name="rndnames")
rndnames

Unnamed: 0,Names,Gender
0,Ervin Escalona,Male
1,Allan Ammerman,Male
2,Milton Mothershed,Male
3,Deshawn Dawn,Male
4,Jc Julius,Male
...,...,...
95,Marylee Malatesta,Female
96,Janna Jacob,Female
97,Alita Aikin,Female
98,Junko Jungers,Female


Finally, if you would like to load all example datasets available in the Radiant R packages, call the `load_data` function without either the `pkg` or `name` arguments. To add them all to the global environment, we will use the `dct` argument as follows:

In [5]:
rsm.load_data(dct=globals())

In [6]:
diamonds

Unnamed: 0,price,carat,clarity,cut,color,depth,table,x,y,z,date
0,580,0.32,VS1,Ideal,H,61.0,56.0,4.43,4.45,2.71,2012-02-26
1,650,0.34,SI1,Very Good,G,63.4,57.0,4.45,4.42,2.81,2012-02-26
2,630,0.30,VS2,Very Good,G,63.1,58.0,4.27,4.23,2.68,2012-02-26
3,706,0.35,VVS2,Ideal,H,59.2,56.0,4.60,4.65,2.74,2012-02-26
4,1080,0.40,VS2,Premium,F,62.6,58.0,4.72,4.68,2.94,2012-02-26
...,...,...,...,...,...,...,...,...,...,...,...
2995,4173,1.14,SI1,Very Good,J,63.3,55.0,6.60,6.67,4.20,2015-12-01
2996,8396,1.51,SI1,Ideal,I,61.2,60.0,7.39,7.37,4.52,2015-12-01
2997,449,0.32,VS2,Premium,I,62.6,58.0,4.37,4.42,2.75,2015-12-01
2998,4370,0.91,VS1,Very Good,H,62.1,59.0,6.17,6.20,3.84,2015-12-01


In [7]:
rsm.md(diamonds_description)

## Diamond prices

Prices of 3,000 round cut diamonds

### Description

A dataset containing the prices and other attributes of a sample of 3000 diamonds. The variables are as follows:

### Variables

- price = price in US dollars ($338--$18,791)
- carat = weight of the diamond (0.2--3.00)
- clarity = a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
- cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal)
- color = diamond color, from J (worst) to D (best)
- depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (54.2--70.80)
- table = width of top of diamond relative to widest point (50--69)
- x = length in mm (3.73--9.42)
- y = width in mm (3.71--9.29)
- z = depth in mm (2.33--5.58)
- date = shipment date

### Additional information

<a href="http://www.diamondse.info/diamonds-clarity.asp" target="_blank">Diamond search engine</a>


List all pandas dataframe in the global python environment

In [8]:
[key for key, value in globals().items() if key[0] != "_" and isinstance(value, pd.DataFrame)]

['catalog',
 'rndnames',
 'consider',
 'salary',
 'demand_uk',
 'newspaper',
 'computer',
 'carpet',
 'city',
 'tpbrands',
 'retailers',
 'movie',
 'mp3',
 'city2',
 'shopping',
 'toothpaste',
 'ketchup',
 'ratings',
 'titanic',
 'diamonds',
 'ideal',
 'houseprices',
 'direct_marketing',
 'fraud_data',
 'dvd',
 'avengers',
 'superheroes',
 'publishers']