## 14. How do I create a pandas Series and DataFrame from another object?
We often need to create a DataFrame or Series just to test a function or method. In this blog we will learn about a number of methods of creating DataFrame and Series from other python objects, like lists and tuples. 

In [1]:
import pandas as pd

### 14.1. Creating Series from other objects

#### 14.1.1. Creating Series from list

We can use top-level function ‘Series( )’ and pass a list or tuple to create a pandas series. The default index will be numeric index. We can, however, specify our own index by equating ‘index’ parameter to a list or tuple of index names. We can provide a name to the series using ‘name’ parameter. It is like the column name in a DataFrame.

In [2]:
pd.Series([100, 101, 102])

0    100
1    101
2    102
dtype: int64

In [3]:
pd.Series([3000000, 85000], index=["Albania", "Andorra"], name="population")

Albania    3000000
Andorra      85000
Name: population, dtype: int64

### 14.2. Creating DataFrame from other objects

#### 14.2.1. Creating DataFrame from a Dictionary

This is probably the most common way used for creating a small DataFrame. We will be using a top-level function ‘DataFrame( )’ for creating DataFrames. We will create a dictionary where ‘keys’ indicate the name of columns and ‘values’ are the values to be included in respective columns. Note that when passing a dictionary, by default, pandas will assign the columns randomly. Although ‘id’ was the first column for me, it may not be the same for you. We can, however, use the ‘columns’ parameter to order the columns in a particular order. Similarly, we can use the ‘index’ parameter to provide an index for the DataFrame.

In [4]:
pd.DataFrame({"id":[100, 101, 102], "color":["red", "blue", "red"]})

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


In [5]:
df = pd.DataFrame({"id":[100, 101, 102], "color":["red", "blue", "red"]}, columns=["id", "color"], index=["a", "b", "c"])
df

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


#### 14.2.2. Creating DataFrame from a list/tuple of lists/tuples

We will pass a list of lists to the ‘DataFrame( )’ function. Each list within the bigger list will correspond to a row in a DataFrame. When using a dictionary to create a DataFrame we use ‘columns’ parameter to specify the order of columns, but while creating DataFrame from other data structures, ‘list’, ‘tuple’ and ‘array’, we will be using ‘columns’ parameter to specify the names of columns.

In [6]:
pd.DataFrame([[100, "red"], [101, "blue"], [102, "red"]])

Unnamed: 0,0,1
0,100,red
1,101,blue
2,102,red


In [7]:
pd.DataFrame([[100, "red"], [101, "blue"], [102, "red"]], columns=["id", "color"])

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


#### 14.2.3. Creating DataFrame from a Numpy array

Numpy has numerous functions that can be used to generate an array of random numbers under the ‘random’ attribute. Numpy's ‘rand (4,2)’ function creates an array with four columns and two rows with numbers between 0 and 1. We will assign the array to the ‘arr’ variable and pass it to the ‘DataFrame( )’ function to create our DataFrame. We tend to use this method when we need a relatively bigger DataFrame.

In [8]:
import numpy as np

In [9]:
arr = np.random.rand(4,2)
arr

array([[0.80024121, 0.43763435],
       [0.44176776, 0.58811895],
       [0.53734048, 0.99858865],
       [0.64173363, 0.87990439]])

In [10]:
pd.DataFrame(arr)

Unnamed: 0,0,1
0,0.800241,0.437634
1,0.441768,0.588119
2,0.53734,0.998589
3,0.641734,0.879904


In [11]:
pd.DataFrame(arr, columns=["one", "two"]).head(2)

Unnamed: 0,one,two
0,0.800241,0.437634
1,0.441768,0.588119


Numpy’s top-level function ‘arange(100, 110, 1)’, short for array range, is similar to the range in python. The first input (inclusive) specifies the first element, the second input (exclusive) specifies the end of the range. We can also specify the step as the third input, here 1. Another random function ‘randint(60, 101, 10)’, will generate random integers between the first input (inclusive) and middle input (exclusive). The last input specifies the number of integers we want to generate.



In [12]:
pd.DataFrame({"student":np.arange(100, 110, 1), "test":np.random.randint(60, 101, 10)})

Unnamed: 0,student,test
0,100,61
1,101,65
2,102,95
3,103,93
4,104,79
5,105,85
6,106,68
7,107,64
8,108,100
9,109,76


#### 14.2.4. Extending DataFrame with a Series

When we have to incorporate a new series to our DataFrame, we generally use “DataFrame_name[‘column_name’]=Series”. This method modifies our original DataFrame. There may be instances when we don’t want to modify our original DataFrame, but want to add a series and a DataFrame. We can use pandas top-level function ‘concat( )’ for this task. We will create a list containing the DataFrame name and Series name and pass it to the ‘concat( )’ function. The ‘axis=1’ parameter indicates we want to stack vertically. Note that pandas will align the rows based on the index when stacking vertically. We will learn more about the function in the next blog.

In [13]:
s=pd.Series(["round", "square"], index=["c", "b"], name="shape")
s

c     round
b    square
Name: shape, dtype: object

In [14]:
df

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


In [15]:
pd.concat([df,s], axis=1)

Unnamed: 0,id,color,shape
a,100,red,
b,101,blue,square
c,102,red,round


#### 14.2.5. Expanding a Series of lists/tuples into a DataFrame

We will first create a DataFrame with the help of a dictionary. Note that the second column is a series of lists and we want to break the list into individual elements and attach it to our DataFrame. There may be several methods for achieving our goal. We will use ‘apply( )’ as a series method and pass ‘pd.Series’ as our function. The function will take each list and convert it to individual Series. The code will output a DataFrame with all those series.

In [16]:
df = pd.DataFrame({"col_one":["a", "b", "c"], "col_two":[[10, 40], [20, 50], [30,60]]})
df

Unnamed: 0,col_one,col_two
0,a,"[10, 40]"
1,b,"[20, 50]"
2,c,"[30, 60]"


In [17]:
df_new = df.col_two.apply(pd.Series)
df_new

Unnamed: 0,0,1
0,10,40
1,20,50
2,30,60


In [18]:
df_final = pd.concat([df, df_new], axis="columns")
df_final

Unnamed: 0,col_one,col_two,0,1
0,a,"[10, 40]",10,40
1,b,"[20, 50]",20,50
2,c,"[30, 60]",30,60


We can then ‘concat( )’ the new DataFrame with our original DataFrame and modify the resulting DataFrame as per our need.


 

In [19]:
df_final.drop('col_two', axis=1, inplace=True)

In [20]:
df_final.columns = ['col_one', 'col_two', 'col_three']

In [21]:
df_final

Unnamed: 0,col_one,col_two,col_three
0,a,10,40
1,b,20,50
2,c,30,60
