Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4.1 Pandas介绍

pandas

它是一个擅长用来对数据进行收集、清洗和分析的python第三方库。

早期多用于时间序列的分析,在金融领域使用较多,目前已经扩展到所有需要对数据进行处理的场景中。

1. Pandas的诞生

作为pandas的主要发明者,韦斯·麦金尼(Wes McKinney)讲述了他的个人经历:

img

"I’m an American computer programmer and the Director of Ursa Labs. I studied theoretical mathematics at MIT (graduating in late 2006) before becoming very interested in programming and tools for data analysis, especially for industry use cases, in 2007.

From August 2007 to July 2010, I worked on the front office quant research team at AQR Capital Management, a large quantitative investment manager in Greenwich, CT. During this time, I led a very successful effort to migrate research and production model building and research processes to the Python programming language. I started building pandas on April 6, 2008, as part of a skunkworks effort to reproduce some econometric research in Python. As part of my work, we formed a new Research Development team for the global macro group to drive software innovation in the front office.

I joined the PhD program in the Statistical Science Department at Duke University before taking leave in Summer 2011 to explore ways to develop open source software (such as pandas) in a sustainable way. I discovered that entrepreneurship often makes more sense than consulting to fund open source with more leverage."

--- Wes McKinney, The creater of Pandas.

练习:Pandas的诞生

读完之后,你能回答一下问题么:

  1. pandas是什么时间发明出来的?

  2. 最早pandas是为了做什么用的?




2. Panda和NumPy的不同

Pandas和NumPy是Python中常用的两个库,用于数据分析和科学计算。

Pandas专注于处理和分析结构化数据,如表格数据。它引入了DataFrame这一数据结构,类似于数据库中的表格或Excel中的电子表格。DataFrame可以存储不同类型的数据,并支持灵活的行列选择和操作。Pandas提供了多种数据选择和过滤方法,包括标签、位置和条件选择等,以方便用户根据特定需求进行数据操作。此外,Pandas还提供了丰富的统计和分析函数,使得数据处理和分析更加方便高效。

相比之下,NumPy主要用于数值计算和科学计算。它提供了数组(array)数据结构,支持多维数值容器。由于NumPy数组在内存中是连续存储的,因此可以高效地进行数值计算和操作。

总的来说,Pandas旨在处理和分析结构化数据,而NumPy则专注于提供高性能的数值计算功能。

3. 你的pandas版本

如何加载pandas呢?

import pandas as pd

如果需要了解你的pandas的版本号,请输入以下命令:

pd.__version__

如果import过程中出现失败,显示未找到pandas,可能没有安装。请在jupyter-lab新建一个Terminal(终端)。

在终端下,输入命令

pip install pandas --user

如果网速比较慢,可以打断以上命令,使用ctrl+c。然后指定pypi镜像地址:https://mirrors.aliyun.com/pypi/simple,输入命令如下:

pip install pandas --user -i https://mirrors.aliyun.com/pypi/simple

练习:安装Pandas