pandas中的DataFrame的索引如何操作？看完这篇文章你就全都懂了

科技 08-01 来源：张大鹏520

设置行索引

通过df.index = 可迭代对象可以设置DataFrame的行索引。要求可迭代对象的元素数量和DataFrame的行数保持一致。

示例代码：

import numpy as npimport pandas as pd# 通过随机数组创建，列索引和行索引默认是有序整数score = np.random.randint(40, 100, (10, 5))df2 = pd.DataFrame(score)# 指定列索引和行索引subjects = ["语文", "数学", "英语", "政治", "体育"]# 通过df2.shape[0] 能够取到行数stu = ["同学" + str(i) for i in range(df2.shape[0])]# 设置行索引df2.index = stuprint(df2)print("================================")

输出结果：

      0   1   2   3   4同学0  57  60  73  40  64同学1  81  77  57  65  52同学2  60  93  99  41  79同学3  60  89  94  99  76同学4  81  96  40  74  96同学5  64  63  81  61  59同学6  98  94  94  92  74同学7  65  89  86  60  77同学8  70  85  91  50  70同学9  58  45  63  55  50================================

重置索引

通过df.reset_index()可以重置索引。重置索引以后，原来的索引列会变成新的一列，新的索引是从0开始的有序整数。

示例代码：

import numpy as npimport pandas as pd# 通过随机数组创建，列索引和行索引默认是有序整数score = np.random.randint(40, 100, (10, 5))df2 = pd.DataFrame(score)# 指定列索引和行索引subjects = ["语文", "数学", "英语", "政治", "体育"]# 通过df2.shape[0] 能够取到行数stu = ["同学" + str(i) for i in range(df2.shape[0])]# 设置行索引df2.index = stuprint(df2)print("================================")# 重置索引df2 = df2.reset_index()print(df2)print("================================")

输出结果：

      0   1   2   3   4同学0  95  63  46  94  77同学1  72  93  68  47  66同学2  49  41  90  41  49同学3  52  83  45  66  87同学4  91  60  79  81  59同学5  41  96  71  62  74同学6  43  91  49  73  81同学7  72  85  71  77  75同学8  86  49  48  41  52同学9  76  90  79  58  89================================  index   0   1   2   3   40   同学0  95  63  46  94  771   同学1  72  93  68  47  662   同学2  49  41  90  41  493   同学3  52  83  45  66  874   同学4  91  60  79  81  595   同学5  41  96  71  62  746   同学6  43  91  49  73  817   同学7  72  85  71  77  758   同学8  86  49  48  41  529   同学9  76  90  79  58  89================================

重置索引并删除原来的所有

如果需要重置索引，又不想要保留原本的索引，则可以通过df.reset_index(drop=True)方法实现。

示例代码：

import numpy as npimport pandas as pd# 通过随机数组创建，列索引和行索引默认是有序整数score = np.random.randint(40, 100, (10, 5))df2 = pd.DataFrame(score)# 指定列索引和行索引subjects = ["语文", "数学", "英语", "政治", "体育"]# 通过df2.shape[0] 能够取到行数stu = ["同学" + str(i) for i in range(df2.shape[0])]# 设置行索引df2.index = stuprint(df2)print("================================")# 重置索引并删除原来的索引df2 = df2.reset_index(drop=True)print(df2)print("================================")

输出结果：

      0   1   2   3   4同学0  50  50  54  82  48同学1  89  72  89  44  50同学2  48  83  41  96  63同学3  85  74  96  74  94同学4  84  84  92  45  60同学5  48  74  58  80  41同学6  72  68  64  62  49同学7  50  66  90  45  83同学8  45  55  65  44  72同学9  49  85  50  84  49================================    0   1   2   3   40  50  50  54  82  481  89  72  89  44  502  48  83  41  96  633  85  74  96  74  944  84  84  92  45  605  48  74  58  80  416  72  68  64  62  497  50  66  90  45  838  45  55  65  44  729  49  85  50  84  49================================

设置多级索引

pandas的DataFrame支持多级索引，类似于MySQL的联合索引。要设置多级索引，通过以下方法实现：

df = df.set_index(["year", "month"])

示例代码：

import numpy as npimport pandas as pddf = pd.DataFrame({'month': [1, 4, 7, 10],                   'year': [2012, 2014, 2013, 2014],                   'sale': [55, 40, 84, 31]})# 设置多级索引df = df.set_index(["year", "month"])print(df)

输出结果：

            saleyear month2012 1        552014 4        402013 7        842014 10       31

多级索引的常用属性

通过以下方式，查看多级索引的名称：

# 查看多级索引的名称print(df.index.names)print("================================================")

通过以下方式，查看多级索引的等级：

# 查看多级索引的等级print(df.index.levels)print("================================================")

示例代码：

import numpy as npimport pandas as pddf = pd.DataFrame({'month': [1, 4, 7, 10],                   'year': [2012, 2014, 2013, 2014],                   'sale': [55, 40, 84, 31]})# 设置多级索引df = df.set_index(["year", "month"])print(df)print("================================================")# 查看多级索引print(df.index)print("================================================")# 查看多级索引的名称print(df.index.names)print("================================================")# 查看多级索引的等级print(df.index.levels)print("================================================")

输出结果：

            saleyear month2012 1        552014 4        402013 7        842014 10       31================================================MultiIndex([(2012,  1),            (2014,  4),            (2013,  7),            (2014, 10)],           names=['year', 'month'])================================================['year', 'month']================================================[[2012, 2013, 2014], [1, 4, 7, 10]]================================================