5 months ago

The general format of selecting items within a Dataframe looks like this:

df.loc[row_index, col_index] ## or

df.iloc[row_index, col_index]

When both row_index and col_index are specified, the methods loc and iloc differ in the following ways:

  • You cannot use slicing or a list of integers for the col_index in the loc method.
  • You cannot use labels (i.e. string) for the col_index in the iloc method.

You can use slicing or a list of integers for the row_index in both loc and iloc, but they differ:

df.loc[0:2,] ## returns row 0,1,2 and all columns

df.iloc[0:2,] ## returns row 0,1 and all columns

Returned data type

Depending on what you passed into the methods, you get different data types back:

df.iloc[0, 1] ## returns the data of cell row 0 and col 1


df.iloc[[0], 1] ## returns a Series of row 0 and col 1

df.iloc[0:1, 1] ## returns a Series of row 0 and col 1


df.iloc[0, [1]] ## returns a Series of row 0 and col 1

df.iloc[0, 1:2] ## returns a Series of row 0 and col 1


df.iloc[[0], [1]] ## returns a DataFrame of row 0 and col 1

df.iloc[0:1,1:2]  ## returns a DataFrame of row 0 and col 1

The same rule applies to the loc method. So to sum it up:

  • If both row_index and col_index are a list or slicing, then you will get a DataFrame back
  • If either row_index or col_index is a list or slicing, then you will get a Series back
  • If none of row_index or col_index is a list or slicing, then you will get whatever is stored in that cell back

If only the row_index is specified, the same rule applies, an integer will return a Series object, a list or slicing will return a DataFrame object:

df.iloc[0] ## returns a Series of row 0

df.iloc[[0]] ## returns a DataFrame of row 0

df.iloc[0:1] ## returns a DataFrame of row 0

When to use which method

  • If all of your indexes are integers, slicing or a list of integers, use iloc (that is perceisily what the i stands for)
  • If you want to use Strings for the column names, use loc

The only caveat is when using slicing for the row_index in loc is that 0:2 yeilds 0,1,2, but for the iloc methods, it yeilds 0,1 (which is how slicing works everywhere else in python)

Examples

Selecting row 0,1,2 and all columns
df.loc[0:2,] ## note it is 2 here not 3

df.iloc[0:3,] 
Selecting row 0 and 2 and all columns
df.loc[[0,2],]
df.iloc[[0,2],]
Selecting row 0 and 2, and column 1 and 2
df.loc[[0,2], ['col1', 'col2']] ## has to use column names for col 1 & 2

df.iloc[[0,2], 1:3]
Selecting row 0 as Series
df.iloc[0]
df.loc[0]
Selecting row 0 as DataFrame
df.iloc[[0]]
df.loc[[0]]
Selecting a cell located at row 0 and column 2
df.iloc[0,2]
df.loc[0,'col_2']
← Python3's pass more than one parameter to lambda function Pandas selecting rows and columns →
 
comments powered by Disqus