# Creating Random Samples

To create a random sample from a Pandas DataFrame, we use the **`sample()`** method.

This method provides flexibility in how the sample is generated.

It is a versatile tool for creating random subsets of data for various analytical or testing purposes.

In [2]:
# importing pandas
import pandas as pd

# csv file location
url = 'https://dq-content.s3.amazonaws.com/291/f500.csv'

# making data frame from csv file
data = pd.read_csv(url, index_col = 'company')

**Key parameters:**
* `n` : Specifies the exact number of random rows to select.

In [None]:
# select 10 random rows
data_selected = data.sample(n=10)

# show dataframe
data_selected

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Emirates Group,480,22799,0.3,340.3,33096,-82.5,Sheikh Ahmed bin Saeed Al Maktoum,Airlines,Transportation,472,U.A.E,"Dubai, U.A.E",http://www.theemiratesgroup.com,2,64768,9395
Aflac,483,22559,8.1,2659.0,129819,5.0,Daniel P. Amos,"Insurance: Life, Health (stock)",Financials,0,USA,"Columbus, GA",http://www.aflac.com,10,10212,20482
Hindustan Petroleum,384,28166,-2.3,1228.1,12370,63.4,Mukesh Kumar Surana,Petroleum Refining,Energy,367,India,"Mumbai, India",http://www.hindustanpetroleum.com,14,10422,3245
BASF,126,63641,-18.6,4485.3,80675,1.4,Kurt W. Bock,Chemicals,Chemicals,88,Germany,"Ludwigshafen, Germany",http://www.basf.com,23,109543,33545
Nestle,64,90814,-1.6,8659.2,129824,-8.1,Ulf Mark Schneider,Food Consumer Products,"Food, Beverages & Tobacco",66,Switzerland,"Vevey, Switzerland",http://www.nestle.com,23,328000,63573
American Express,315,33823,-1.8,5408.0,158893,4.7,Kenneth I. Chenault,Diversified Financials,Financials,302,USA,"New York, NY",http://www.americanexpress.com,23,56400,20501
L’Oreal,379,28572,2.0,3434.5,37577,-6.1,Jean-Paul Agon,Household and Personal Products,Household Products,378,France,"Clichy, France",http://www.loreal.com,23,89331,25840
National Australia Bank,405,26958,-21.1,259.0,594967,-94.8,Andrew G. Thorburn,Banks: Commercial and Savings,Financials,304,Australia,"Docklands, Australia",http://www.nab.com.au,22,34263,39244
China Shipbuilding Industry,233,42149,17.0,485.8,69621,-62.9,Hu Wenming,Industrial Machinery,Industrials,281,China,"Beijing, China",http://www.csic.com.cn,7,182129,17789
Sears Holdings,489,22138,-12.0,-2221.0,9362,,Edward S. Lampert,General Merchandisers,Retailing,425,USA,"Hoffman Estates, IL",http://www.searsholdings.com,23,140000,-3824


**Key parameters:**
* `frac` : Specifies the fraction of rows to select, as a float between 0 and 1.

In [None]:
# sample 20% of the rows
data_selected = data.sample(frac=0.2)

# show dataframe
data_selected

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Aluminum Corp. of China,248,40278,6.0,-282.5,75089,,Yu Dehui,Metals,Materials,262,China,"Beijing, China",http://www.chalco.com.cn,10,121146,2669
George Weston,293,36211,-1.3,414.9,28299,0.6,Galen G. Weston,Food and Drug Stores,Food & Drug Stores,274,Canada,"Toronto, Ontario, Canada",http://www.weston.ca,23,195000,5790
Rite Aid,325,32845,6.9,4.1,11594,-97.6,John T. Standley,Food and Drug Stores,Food & Drug Stores,340,USA,"Camp Hill, PA",http://www.riteaid.com,19,70430,614
Iberdrola,332,32308,-7.3,2991.3,112536,11.4,Jose Ignacio Sanchez Galan,Utilities,Energy,295,Spain,"Bilbao, Spain",http://www.iberdrola.com,13,28389,38695
Bank of America Corp.,62,93662,0.7,17906.0,2187702,12.7,Brian T. Moynihan,Banks: Commercial and Savings,Financials,64,USA,"Charlotte, NC",http://www.bankofamerica.com,23,208024,266840
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Banco Santander,73,82801,-2.5,6860.7,1412281,3.7,Jose Antonio Alvarez,Banks: Commercial and Savings,Financials,75,Spain,"Madrid, Spain",http://www.santander.com,23,185606,95906
UnitedHealth Group,13,184840,17.7,7017.0,122810,20.7,Stephen J. Hemsley,Health Care: Insurance and Managed Care,Health Care,17,USA,"Minnetonka, MN",http://www.unitedhealthgroup.com,21,230000,38274
Beijing Automotive Group,137,61130,11.3,1260.6,57783,14.9,Xu Heyi,Motor Vehicles and Parts,Motor Vehicles & Parts,160,China,"Beijing, China",http://www.baicgroup.com.cn,5,134765,8037
Rio Tinto Group,316,33781,-3.0,4617.0,89263,,Jean-Sebastien Jacques,"Mining, Crude-Oil Production",Energy,296,Britain,"London, Britain",http://www.riotinto.com,12,51029,39290


**Key parameters:**
* `replace` : A boolean value (default `False`). If `True`, samples are drawn with replacement, meaning the same row can be selected multiple times. This is necessary if `frac > 1`.

In [None]:
# sample with replacement, potentially selecting the same row multiple times
data_selected = data.sample(n=10, replace=True)

# show dataframe
data_selected

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
China Railway Construction,58,94877,-0.8,1192.4,109968,7.8,Meng Fengchao,"Engineering, Construction",Engineering & Construction,62,China,"Beijing, China",http://www.crcc.cn,6,336872,10146
Nationwide,254,40074,-0.4,334.3,197790,-42.4,Stephen S. Rasmussen,Insurance: Property and Casualty (Mutual),Financials,241,USA,"Columbus, OH",http://www.nationwide.com,23,34320,15537
Jizhong Energy Group,320,33366,-11.8,-153.9,30819,,Yang GuoZhan,"Mining, Crude-Oil Production",Energy,267,China,"Xingtai, China",http://www.jznyjt.com,7,127298,2042
La Poste,427,25760,0.8,938.9,257526,33.3,Philippe Wahl,"Mail, Package, and Freight Delivery",Transportation,418,France,"Paris, France",http://www.laposte.fr,22,240407,11513
State Farm Insurance Cos.,85,76132,0.6,350.3,256030,-94.4,Michael L. Tipsord,Insurance: Property and Casualty (Mutual),Financials,93,USA,"Bloomington, IL",http://www.statefarm.com,23,68234,87592
Pemex,152,57774,-21.4,-10256.3,113115,,Jose Antonio Gonzalez Anaya,"Mining, Crude-Oil Production",Energy,98,Mexico,"Mexico City, Mexico",http://www.pemex.com,23,125689,-59909
Comcast,79,80403,7.9,8695.0,180500,6.5,Brian L. Roberts,Telecommunications,Telecommunications,96,USA,"Philadelphia, PA",http://www.comcastcorporation.com,15,159000,53943
US Foods Holding,475,22919,-0.9,209.8,8945,25.2,Pietro Satriano,Wholesalers: Food and Grocery,Wholesalers,461,USA,"Rosemont, IL",http://www.usfoods.com,2,25000,2538
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
HBIS Group,221,43769,-3.3,-146.8,51858,,Yu Yong,Metals,Materials,201,China,"Shijiazhuang, China",http://www.hbisco.com,9,125552,7693


**Key parameters:**
* `random_state` : An integer value or `numpy.random.RandomState` object. Setting `random_state` ensures reproducibility; running the code with the same `random_state` will always yield the same sample.

In [5]:
# select 10 random rows
data_selected = data.sample(n=10, random_state=1)

# show dataframe
data_selected

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Accenture,305,34798,5.7,4111.9,20609,34.7,Pierre Nanterme,Information Technology Services,Technology,312,Ireland,"Dublin, Ireland",http://www.accenture.com,16,384000,7555
China Poly Group,341,31508,18.1,744.1,95657,-11.4,Zhang Zhengao,Real estate,Financials,401,China,"Beijing, China",http://www.poly.com.cn,3,76425,7676
J.P. Morgan Chase,48,105486,4.4,24733.0,2490972,1.2,James Dimon,Banks: Commercial and Savings,Financials,55,USA,"New York, NY",http://www.jpmorganchase.com,23,243355,254190
Dongfeng Motor,68,86194,4.1,1415.0,59532,-4.4,Li Shaozhu,Motor Vehicles and Parts,Motor Vehicles & Parts,81,China,"Wuhan, China",http://www.dfmc.com.cn,8,189795,11464
Emirates Group,480,22799,0.3,340.3,33096,-82.5,Sheikh Ahmed bin Saeed Al Maktoum,Airlines,Transportation,472,U.A.E,"Dubai, U.A.E",http://www.theemiratesgroup.com,2,64768,9395
GS Caltex,486,22207,-11.4,1221.1,15969,42.1,Jin-Soo Huh,Petroleum Refining,Energy,431,South Korea,"Seoul, South Korea",http://www.gscaltex.com,6,2949,8150
Alimentation Couche-Tard,311,34145,-1.1,1193.5,12304,27.9,Brian P. Hannasch,Food and Drug Stores,Food & Drug Stores,301,Canada,"Laval, Quebec, Canada",http://www.couche-tard.com,4,105000,5044
Verizon,32,125980,-4.3,13127.0,244180,-26.6,Lowell C. McAdam,Telecommunications,Telecommunications,30,USA,"New York, NY",http://www.verizon.com,23,160900,22524
Manulife Financial,250,40238,49.4,2209.7,537461,28.9,Donald A. Guloien,"Insurance: Life, Health (stock)",Financials,394,Canada,"Toronto, Ontario, Canada",http://www.manulife.com,15,34500,31197
Uniper,91,74407,,-3557.5,51541,,Klaus Schafer,Energy,Energy,0,Germany,"Dusseldorf, Germany",http://www.uniper.energy,1,12890,12889


**Key parameters:**
* `weights` : A column name or array-like of numeric values. Allows for biased sampling, where rows with higher weights are more likely to be selected.

In [None]:
# sample with weights, making row with higher value in colomn rank more likely
data_selected = data.sample(n=10, weights='rank')

# show dataframe
data_selected

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Roche Group,169,53427,2.0,9719.9,75609,5.5,Severin Schwan,Pharmaceuticals,Health Care,167,Switzerland,"Basel, Switzerland",http://www.roche.com,23,94052,23534
LafargeHolcim,398,27308,11.4,1817.9,68521,,Eric Olsen,"Building Materials, Glass",Materials,438,Switzerland,"Jona, Switzerland",http://www.lafargeholcim.com,9,90903,30337
HBIS Group,221,43769,-3.3,-146.8,51858,,Yu Yong,Metals,Materials,201,China,"Shijiazhuang, China",http://www.hbisco.com,9,125552,7693
Mitsubishi Heavy Industries,294,36122,7.2,809.6,49205,52.3,Shunichi Miyanaga,Industrial Machinery,Industrials,307,Japan,"Tokyo, Japan",http://www.mhi.com,23,82728,15074
Noble Group,205,46528,-30.3,8.7,12285,,William J. Randall,Trading,Wholesalers,116,China,"Hong Kong, China",http://www.thisisnoble.com,10,1000,3974
Inditex,428,25733,11.6,3485.0,21203,9.9,Pablo Isla Alvarez de Tejera,Specialty Retailers,Retailing,463,Spain,"Arteixo, Spain",http://www.inditex.com,2,162450,13738
China Guodian,397,27315,-10.5,268.7,114611,-67.2,Qiao Baoping,Energy,Energy,345,China,"Beijing, China",http://www.cgdc.com.cn,8,124056,7496
SNCF Mobilites,317,33747,3.8,565.1,39993,,Guillaume Pepy,Railroads,Transportation,319,France,"St. Denis, France",http://www.sncf.com,23,193718,4696
BASF,126,63641,-18.6,4485.3,80675,1.4,Kurt W. Bock,Chemicals,Chemicals,88,Germany,"Ludwigshafen, Germany",http://www.basf.com,23,109543,33545
ConocoPhillips,444,24360,-21.3,-3615.0,89772,,Ryan M. Lance,"Mining, Crude-Oil Production",Energy,339,USA,"Houston, TX",http://www.conocophillips.com,23,13300,34974
