# Pandas:  Data Structure, Operations, and Handling Missing Data

- Pandas is built on top of NumPy, and provides an efficient implementation of a DataFrames.
- DataFrames are essential multidimensional arrays with attached row and column labels, 
- DataFrames are often with heterogeneous types and/or missing data
- NumPy works very well with the clean, well organized data 
- NumPy's limitations become more clear when we need more flexibility (attaching labels to data, working with missing data etc.) and when attempting operations like groupings, pivots to analyze less structured data available in real world.

In [4]:
import numpy as np
import pandas as pd

In [None]:
pd.show_versions()

# Introducing Pandas Objects

- Three fundamental data structures
    - Series
    - DataFrame
    - Index

## The Pandas Series Object

- Pandas Series is a one dimensional array of indexed data
- It can be created from a list or array

In [28]:
sdata = pd.Series([0.25, 0.5, 0.75, 1.0])

In [33]:
# Series data structure of Pandas provide sequence of indices and sequence of values
print(sdata)

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64


In [30]:
# We can access the elements of series same as ndarray by using square bracket [] notation
sdata[1]

0.5

In [36]:
# we can access the values and index simply by using the attributes values and index

print("Series Values:", sdata.values)
print("Series Index:", sdata.index)

Series Values: [0.25 0.5  0.75 1.  ]
Series Index: RangeIndex(start=0, stop=4, step=1)


In [26]:
npdata = np.arange(11, 20)

In [32]:
print(npdata)

[11 12 13 14 15 16 17 18 19]


In [27]:
npdata[1]

12

**- Pandas Series is much more general and flexible than the one-dimensional NumPy array.**

### Series as generalized NumPy array

- From above example we see that Series and NumPy array is basically a one-dimensional array.
- The essential difference is the Index.
- NumPy array has an *implicitly defined* integer index used to access the values.
- Pandas Series has an *explicitly defined* index associated with the values

- This *explicitly index* definition gives the Series object additional capabilities.
- For example: index need not be a integer, but can consist of values of any desired types.

In [50]:
sdata = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

In [51]:
sdata

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [52]:
print(sdata['b'])

0.5


In [43]:
# We can use noncontinuous or nonsequential indices

sdata = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2,5,3,7])

In [45]:
sdata[5]

0.5

### Series as specialized dictionary

In [66]:
# Create a Python dictionary 

population_dict = {'Taipei': 7871900,
                   'Banqiao': 543342,
                  'Kaohsiung': 1519711,
                  'Taichung': 1040725,
                  'Tainan': 771235
                  }

In [67]:
population_dict

{'Taipei': 7871900,
 'Banqiao': 543342,
 'Kaohsiung': 1519711,
 'Taichung': 1040725,
 'Tainan': 771235}

In [70]:
# Pandas series provide the functionality to sort out the index
population = pd.Series(population_dict).sort_index()
population

Banqiao       543342
Kaohsiung    1519711
Taichung     1040725
Tainan        771235
Taipei       7871900
dtype: int64

In [71]:
# Pandas series provide the functionality to sort out the values

population = pd.Series(population_dict).sort_values()
print(population)

Banqiao       543342
Tainan        771235
Taichung     1040725
Kaohsiung    1519711
Taipei       7871900
dtype: int64


In [73]:
# Access the element using the index

population['Kaohsiung']

1519711

In [75]:
# Series also supports array-style operations such as slicing

population['Taichung': 'Taipei']

Taichung     1040725
Kaohsiung    1519711
Taipei       7871900
dtype: int64

### Constructing Series Objects

In [None]:
# Structure of creating the Pandas series
# providing index values is optional which by default takes integer sequence

pd.Series(data, index=index)

In [78]:
pd.Series([2,4,6])

0    2
1    4
2    6
dtype: int64

In [79]:
# Data can be scalar, which is repeated to fill the specified index

pd.Series(5, index=[100,200,300])

100    5
200    5
300    5
dtype: int64

In [81]:
# Data can be a dictionary

pd.Series({2:'a', 1:'b', 3:'c'})

1    b
2    a
3    c
dtype: object

In [82]:
# Index can be explicitly set if a different result is preferred

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3,2])

3    c
2    a
dtype: object

## The Pandas DataFrame Object

- DataFrame is a two-dimensional array with flexible row indices and flexible column names.

In [4]:
# Population of five states

population_dict = {'California': 38332521,
                    'Texas': 26448193,
                    'New York': 19651127,
                    'Florida': 19552860,
                    'Illinois': 12882135}

In [5]:
population = pd.Series(population_dict)

In [6]:
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [7]:
# Area of five states

area_dict = {'California': 423967, 
             'Texas': 695662, 
             'New York': 141297,
             'Florida': 170312, 
             'Illinois': 149995}

In [8]:
area = pd.Series(area_dict)

In [9]:
area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
dtype: int64

In [14]:
# We can construct a single two-dimensional object containing the information of population and area

states = pd.DataFrame({'population': population,
                      'area': area})

In [15]:
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [16]:
# Like Series object, Dataframe has an index attribute that gives access to index labels

states.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')

In [17]:
# Additionally, Dataframe has columns attribute, which is an index object holding the column labels

states.columns

Index(['population', 'area'], dtype='object')

- In Dataframe both rows and columns have a generalized index for accessing the data

### Dataframe as specialized dictionary

- A dictionary maps a key to a value
- Dataframe maps a column name to a Series of column data.
- For example: asking for the 'area' attribute returns the Series object containing the areas

In [18]:
states['area']

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

In [21]:
npdata = np.arange(9).reshape(3,3)

In [22]:
npdata[0]

array([0, 1, 2])

In [25]:
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


### Constructing DataFrame objects

- [ ] From a single Series object
- [ ] From a list of dictionaries
- [ ] From a dictionary of Series objects
- [ ] From a two-dimensional NumPy array
- [ ] From a NumPy structured array

In [30]:
print(type(population))
print(type(states))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


In [32]:
# From a single Series object

pd.DataFrame(population, columns=['population'])

Unnamed: 0,population
California,38332521
Texas,26448193
New York,19651127
Florida,19552860
Illinois,12882135


In [36]:
# From a list of dictionaries

data = [{'a': i, 'b': 2*i}
       for i in range(3)]

In [37]:
data

[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}]

In [38]:
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4


In [41]:
# Dataframe can handle missing values
# Create a Dataframe using list of dictionaries

pd.DataFrame([{'a':1, 'b':2}, {'b':3, 'c':4}])

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


In [39]:
# From a dictionary of Series objects

pd.DataFrame({'population': population,
             'area': area})

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [43]:
# From a two-dimensional NumPy array

pd.DataFrame(np.random.rand(3,2),
            columns=['foo', 'bar'],
            index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.035777,0.720662
a,0.280876,0.401226
c,0.974344,0.525917


## The Pandas Index Object

- Series and Dataframe objects contain an explicit index that lets us reference and modify data.
- Index object can be either as an immutable array or as an ordered set.
- Technically a multiset, as Index objects may contain repeated values.

In [44]:
# An Index object containing repeated values

df = pd.DataFrame(np.random.rand(3,2),
            columns=['foo', 'bar'],
            index=['a', 'a', 'c'])

print(df)

Unnamed: 0,foo,bar
a,0.973775,0.254463
a,0.102351,0.369807
c,0.239225,0.866873


In [47]:
sdata = pd.Series(5, index=[100,100,200,300])

print(sdata)

100    5
100    5
200    5
300    5
dtype: int64


In [49]:
# Let's construct an Index from the list of integer

ind = pd.Index([2,3,5,7,11])
ind

Int64Index([2, 3, 5, 7, 11], dtype='int64')

### Index as immutable array

- Index object in many ways operated like an array
- Like indexing notation to retrieve values or slices

In [51]:
ind[1]

3

In [52]:
ind[::2]

Int64Index([2, 5, 11], dtype='int64')

- Index object have attributes familiar  from NumPy arrays

In [53]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


- One difference between Index objects and NumPy arrays is that indices are immutable- that is they can't be modified.

In [54]:
ind[1]=0

TypeError: Index does not support mutable operations

## Index as ordered set

- Pandas objects facilitate operations such as joins across datasets like unions, intersections, differences, and other combinations

In [55]:
indA = pd.Index([1,3,5,7,9])
indB = pd.Index([2,3,5,7,11])

In [56]:
# Intersection

indA & indB

  indA & indB


Int64Index([3, 5, 7], dtype='int64')

In [57]:
indA.intersection(indB)

Int64Index([3, 5, 7], dtype='int64')

In [58]:
# Union

indA | indB

  indA | indB


Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [59]:
indA.union(indB)

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [60]:
# Difference

indA^indB

  indA^indB


Int64Index([1, 2, 9, 11], dtype='int64')

In [61]:
indA.symmetric_difference(indB)

Int64Index([1, 2, 9, 11], dtype='int64')

# Data Indexing and Selection

- In NumPy we looked into the methods and tools to access, and modify the values
- Using indexing (arr[2, 1]), slicing (arr[:, 1:5]), masking (arr[arr > 0]), fancy indexing (arr[0, [1, 5]]) and other combinations (arr[:, [1, 5]])
- We will look into similar means of accessing and modifying values in Pandas Series and Dataframe objects

## Data Selection in Series

- Series object acts in many ways like a one dimensional array and in many ways like a standard Python dictionary.
- These two analogies in mind will help us to understand the patterns of data indexing and selection in these arrays.

In [1]:
import pandas as pd

### Series as dictionary

In [2]:
# Like dictionary, the Series object provides a mapping from a collection of keys to a collection of values.

data = pd.Series([0.25, 0.5, 0.75, 1.0],
                index=['a', 'b', 'c', 'd'])

In [3]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [4]:
# We can also use dictionary like Python expressions and methods to examine the keys/indices and values

'a' in data

True

In [5]:
data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

In [7]:
list(data.items())

[('a', 0.25), ('b', 0.5), ('c', 0.75), ('d', 1.0)]

In [11]:
# Series object can be modified with a dictionary-like syntax
# As we can extend dictionary by assigning a new key. We can extend a Series by assigning to a new index value

data['e'] = 1.25
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

### Series as one-dimensional array

- A series builds on this dictionary like interface
- Series provides array-style item selection via the same basic mechanisms as NumPy arrays
- For example: slicing, masking, fancy indexing

In [12]:
# Slicing by explicit index: the final index is included 

data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [13]:
# Slicing by implicit integer index: the final index is excluded

data[0:2]

a    0.25
b    0.50
dtype: float64

In [16]:
# Masking by putting the condition in index notation

data[(data>0.3) & (data<0.8)]

b    0.50
c    0.75
dtype: float64

In [17]:
# Fancy indexing: a list of indices

data[['a', 'c']]

a    0.25
c    0.75
dtype: float64

- What if the Series has an explicitly integer index?
- In that case indexing operation such as data[1] will use explicit indices
- While a slicing operation like data[1:3] will use the implicit Python-style index

### Indexer loc, and iloc

In [27]:
data = pd.Series(['a', 'b', 'c'],
                index=[1,3,5])

In [28]:
data

1    a
3    b
5    c
dtype: object

In [29]:
# Indexing operation using explicit indices

data[1]

'a'

In [30]:
# Slicing operation using implicit indices

data[1:3]

3    b
5    c
dtype: object

- To solve this confusion Pandas provide some special indexer attributes.
- These are not functional methods but attributes 
1. loc attribute allows indexing and slicing that always reference the explicit index

In [31]:
data.loc[1]

'a'

In [32]:
data.loc[1:3]

1    a
3    b
dtype: object

2. iloc attribute allows indexing and slicing that always reference the implicit index

In [33]:
data.iloc[1]

'b'

In [34]:
data.iloc[1:3]

3    b
5    c
dtype: object

- These two attributes loc and iloc make code easier to read and understand, especially in the case of integer indexes

## Data Selection in DataFrame

- Dataframe acts in many ways like a two-dimensional or structured array
- In other ways like a dictionary of Series structures sharing the same index

### DataFrame as a dictionary

In [35]:
# For example:

area = pd.Series({'California': 423967, 
             'Texas': 695662, 
             'New York': 141297,
             'Florida': 170312, 
             'Illinois': 149995})

In [36]:
pop = pd.Series({'California': 38332521,
                    'Texas': 26448193,
                    'New York': 19651127,
                    'Florida': 19552860,
                    'Illinois': 12882135})

In [38]:
data = pd.DataFrame({'area': area,
                    'pop': pop})

In [39]:
data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


- The individual series that make-up the columns of the DataFrame can be accessed via dictionary-style indexing of the column name

In [50]:
dict1 = {'California': 423967, 
             'Texas': 695662, 
             'New York': 141297,
             'Florida': 170312, 
             'Illinois': 149995}

In [51]:
dict1['California']

423967

In [41]:
# Dictionary-style indexing

data['area']

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

In [44]:
#Additionally, we can use attribute-style access with column names that are strings 

data.area

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

In [53]:
# The attribute-style access is same as the dictionary-style access

data.area is data['area']

True

- Attribute-style acess doesn't work when the column names are not strings
- Or column names conflict with the methods of the Dataframe

In [54]:
data.pop is data['pop']

False

In [55]:
# Also avoid column assignment via attribute 
# data.pop = z
# data['pop'] = z

SyntaxError: invalid syntax (<ipython-input-55-2b46f094456c>, line 1)

In [56]:
data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


- Dictionary-style syntax can also be used to modify the object 
- For example to add a new column

In [57]:
data['density'] = data['pop']/data['area']

In [58]:
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


### DataFrame as Two-dimensional array

In [59]:
# DataFrame as an enhanced two dimensional array

data.values

array([[4.23967000e+05, 3.83325210e+07, 9.04139261e+01],
       [6.95662000e+05, 2.64481930e+07, 3.80187404e+01],
       [1.41297000e+05, 1.96511270e+07, 1.39076746e+02],
       [1.70312000e+05, 1.95528600e+07, 1.14806121e+02],
       [1.49995000e+05, 1.28821350e+07, 8.58837628e+01]])

In [60]:
# We can do array-like observations on the DataFrame. like transpose the full Dataframe to swap rows and columns

data.T

Unnamed: 0,California,Texas,New York,Florida,Illinois
area,423967.0,695662.0,141297.0,170312.0,149995.0
pop,38332520.0,26448190.0,19651130.0,19552860.0,12882140.0
density,90.41393,38.01874,139.0767,114.8061,85.88376


In [61]:
# Passing a single index to an array access a row:

data.values[0]

array([4.23967000e+05, 3.83325210e+07, 9.04139261e+01])

In [63]:
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


In [62]:
# Passing a single index to a Dataframe accesses a column:

data['area']

California    423967
Texas         695662
New York      141297
Florida       170312
Illinois      149995
Name: area, dtype: int64

- To do array-style indexing, Pandas uses the loc, and iloc indexes.
- Using the iloc indexer we can index the underlying array as if it is a simple NumPy array (using the implicit Python-style index)
- The Dataframe index and column labels are maintained in the result

In [64]:
# Implicit indexing exclude the final index
data.iloc[:3, :2]

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127


In [67]:
# Explicit indexing include the final index

data.loc[:'Illinois', :'pop']

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


- We can use NumPy-style data access patterns within loc and iloc indexes
- loc and iloc indexer can combine masking and fancy indexing

In [73]:
data.loc[data.density>100, ['pop', 'density']]

Unnamed: 0,pop,density
New York,19651127,139.076746
Florida,19552860,114.806121


In [74]:
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


In [75]:
# Modifying DataFrame value using iloc

data.iloc[0,2] = 90
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.0
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


# Operating on Data in Pandas

- Whatever the operations we do in Pandas, it preserves and alignment of index and column labels in the output
- Means that keeping the context of data and combining data from different sources- both potentially error prone task in NumPy array- becomes foolproof with Pandas

## Index Preservation

- Pandas is design to work with NumPy, so any NumPy ufunc will work on Pandas Series and Dataframe objects.

In [5]:
# Series

rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0,10,4)) #randint(low, high, size, dtype=int) high is exclusive

ser

0    6
1    3
2    7
3    4
dtype: int64

In [80]:
# Dataframe

df = pd.DataFrame(rng.randint(0,10,(3,4)),
                 columns=['a','b','c','d'])
df

Unnamed: 0,a,b,c,d
0,1,7,5,1
1,4,0,9,5
2,8,0,9,2


- If we apply NumPy ufunc on either if these objects, the results will be another Pandas object with the indices preserved

In [83]:
# Index preservation in Series

print(np.exp(ser))
print('Type:',type(np.exp(ser)))

0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64
Type: <class 'pandas.core.series.Series'>


In [90]:
# Index preservation in DataFrame

print(np.sin(df * np.pi / 4)) 
print('Type:', type(np.sin(df * np.pi / 4)))

              a         b         c         d
0  7.071068e-01 -0.707107 -0.707107  0.707107
1  1.224647e-16  0.000000  0.707107 -0.707107
2 -2.449294e-16  0.000000  0.707107  1.000000
Type: <class 'pandas.core.frame.DataFrame'>


## Index Alignment in Series

- For binary operations on Series or Dataframe objects, Pandas will align indices in the process of performing the operation.
- It is convenient when we work on incomplete data

- For example we are combining two different data sources and find only the top three US states by area and the top three states by population

In [91]:
area = pd.Series({'Alaska': 1723337,
                 'Texas': 695662,
                 'California': 423967},
                name='area')

In [92]:
population = pd.Series({'California': 38332521,
                       'Texas': 26448193,
                       'New York': 19651127}, name='population')

- Let's see what happen when we divide these to compute the population density:

In [93]:
# Result is the union of the two input arrays
# Any missing values are filled in with NaN (not a number) by default
area/population

Alaska             NaN
California    0.011060
New York           NaN
Texas         0.026303
dtype: float64

In [95]:
area.index.union(population.index) # confirm the union of input arrays

Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

- Fill value for any missing elements

In [96]:
A = pd.Series([2,4,6], index=[0,1,2])
B = pd.Series([1,3,5], index=[1,2,3])

In [97]:
A+B

0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

In [98]:
A.add(B, fill_value=0)

0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

## Index Alignment in DataFrame

- A similar type of alignment takes place for index and column when operations are performed on DataFrame

In [121]:
A = pd.DataFrame(rng.randint(0, 20, (2,2)),
                columns=list('xy'))

A

Unnamed: 0,x,y
0,9,3
1,17,11


In [122]:
B = pd.DataFrame(rng.randint(0,10, (3,3)),
                columns=list('xyz'))

In [123]:
B

Unnamed: 0,x,y,z
0,1,9,3
1,7,6,8
2,7,4,1


In [124]:
A + B

Unnamed: 0,x,y,z
0,10.0,12.0,
1,24.0,17.0,
2,,,


In [125]:
A.stack()

0  x     9
   y     3
1  x    17
   y    11
dtype: int64

In [126]:
fill = A.stack().mean()

In [127]:
fill

10.0

In [128]:
A.add(B, fill_value=fill)

Unnamed: 0,x,y,z
0,10.0,12.0,13.0
1,24.0,17.0,18.0
2,17.0,14.0,11.0


In [129]:
print(A)

    x   y
0   9   3
1  17  11


In [130]:
print(B)

   x  y  z
0  1  9  3
1  7  6  8
2  7  4  1


## Operations Between Series and DataFrame

- Operations between Series and DataFrame are similar to operations between 1D array and 2D array.

In [6]:
A = rng.randint(10, size=(3,4))

In [7]:
A

array([[6, 9, 2, 6],
       [7, 4, 3, 7],
       [7, 2, 5, 4]])

In [8]:
A[0]

array([6, 9, 2, 6])

In [10]:
print(A.shape)
print(A[0].shape)

(3, 4)
(4,)


In [9]:
# Array subtraction of 1D will follow the rule of broadcasting to do subtraction

A-A[0]

array([[ 0,  0,  0,  0],
       [ 1, -5,  1,  1],
       [ 1, -7,  3, -2]])

In [12]:
df = pd.DataFrame(A, columns=list('QRST'))
df

Unnamed: 0,Q,R,S,T
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


In [14]:
df.iloc[0]

Q    6
R    9
S    2
T    6
Name: 0, dtype: int64

In [15]:
# In Pandas, operation is done row-wise by default
df-df.iloc[0]

Unnamed: 0,Q,R,S,T
0,0,0,0,0
1,1,-5,1,1
2,1,-7,3,-2


In [16]:
# To operate column wise, specify the axis keyword

df.subtract(df['R'], axis=0)

Unnamed: 0,Q,R,S,T
0,-3,0,-7,-3
1,3,0,-1,3
2,5,0,3,2


In [18]:
# Automatically Align indices between two elements

halfrow = df.iloc[0,::2]
halfrow

Q    6
S    2
Name: 0, dtype: int64

In [19]:
df-halfrow

Unnamed: 0,Q,R,S,T
0,0.0,,0.0,
1,1.0,,1.0,
2,1.0,,3.0,


In [20]:
df

Unnamed: 0,Q,R,S,T
0,6,9,2,6
1,7,4,3,7
2,7,2,5,4


**- This preservation and alignment of indices and columns means that operations on data in Pandas will always maintain the data context**