In [1]:
import pandas as pd

# 8.1 解析Unix时间戳

**在pandas里面处理Unix时间戳并不是很直观--这个花了我很久的时间。我在这用的文件是一个非常流行的竞赛文件。**

**[这里](http://popcon.ubuntu.com/README)介绍了这个文件怎么运行。**

**希望里面没什么敏感的东西**

In [2]:
#读取文件，不要最后一行
popcon = pd.read_csv('data/popularity-contest', sep=' ', )[:-1]
popcon.columns = ['atime', 'ctime', 'package-name', 'mru-program', 'tag']

In [3]:
popcon.head()

Unnamed: 0,atime,ctime,package-name,mru-program,tag
0,1387295797,1367633260,perl-base,/usr/bin/perl,
1,1387295796,1354370480,login,/bin/su,
2,1387295743,1354341275,libtalloc2,/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7,
3,1387295743,1387224204,libwbclient0,/usr/lib/x86_64-linux-gnu/libwbclient.so.0,<RECENT-CTIME>
4,1387295742,1354341253,libselinux1,/lib/x86_64-linux-gnu/libselinux.so.1,


**pandas解析时间戳中比较奇特的一点是numpy日期时间已经按照Unix时间戳排过序了。所以我们只要告诉pandas这些整数是真实的时间--并不需要做任何的转化**

In [5]:
popcon['atime'] = popcon['atime'].astype('int')
popcon['ctime'] = popcon['ctime'].astype('int')

**每个Numpy的数组和pandas序列都有dtype -- 经常是一些int64, float64或者object。**


**一些有用的时间是datetime64[s], datatime64[ms], datetime64[us].也有timedelta类型，类似的。**

**我们可以用pd.to_datetime函数来把整数时间戳转换成日期。这个是常数时间操作，我们并不真正的改变数据，只是告诉pandas怎么对待它。**

In [6]:
popcon['atime'] = pd.to_datetime(popcon['atime'], unit='s')
popcon['ctime'] = pd.to_datetime(popcon['ctime'], unit='s')

In [7]:
popcon.head()

Unnamed: 0,atime,ctime,package-name,mru-program,tag
0,2013-12-17 15:56:37,2013-05-04 02:07:40,perl-base,/usr/bin/perl,
1,2013-12-17 15:56:36,2012-12-01 14:01:20,login,/bin/su,
2,2013-12-17 15:55:43,2012-12-01 05:54:35,libtalloc2,/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7,
3,2013-12-17 15:55:43,2013-12-16 20:03:24,libwbclient0,/usr/lib/x86_64-linux-gnu/libwbclient.so.0,<RECENT-CTIME>
4,2013-12-17 15:55:42,2012-12-01 05:54:13,libselinux1,/lib/x86_64-linux-gnu/libselinux.so.1,


**现在假设我们想看所有不是libraries的packages**

**首先，我们要排除那些时间戳是0的。留意我们如何只用一个字符串就可以对比。pandas的确很神奇**

In [8]:
popcon = popcon[popcon['atime'] > '1970-01-01']

**现在没我们可以使用pandas的magic字符串来看看那些package名字中不包含lib的行。**

In [9]:
nonlibraries = popcon[~popcon['package-name'].str.contains('lib')]

In [10]:
nonlibraries.sort_values('ctime', ascending=False)[:10]

Unnamed: 0,atime,ctime,package-name,mru-program,tag
57,2013-12-17 04:55:39,2013-12-17 04:55:42,ddd,/usr/bin/ddd,<RECENT-CTIME>
450,2013-12-16 20:03:20,2013-12-16 20:05:13,nodejs,/usr/bin/npm,<RECENT-CTIME>
454,2013-12-16 20:03:20,2013-12-16 20:05:04,switchboard-plug-keyboard,/usr/lib/plugs/pantheon/keyboard/options.txt,<RECENT-CTIME>
445,2013-12-16 20:03:20,2013-12-16 20:05:04,thunderbird-locale-en,/usr/lib/thunderbird-addons/extensions/langpac...,<RECENT-CTIME>
396,2013-12-16 20:08:27,2013-12-16 20:05:03,software-center,/usr/sbin/update-software-center,<RECENT-CTIME>
449,2013-12-16 20:03:20,2013-12-16 20:05:00,samba-common-bin,/usr/bin/net.samba3,<RECENT-CTIME>
397,2013-12-16 20:08:25,2013-12-16 20:04:59,postgresql-client-9.1,/usr/lib/postgresql/9.1/bin/psql,<RECENT-CTIME>
398,2013-12-16 20:08:23,2013-12-16 20:04:58,postgresql-9.1,/usr/lib/postgresql/9.1/bin/postmaster,<RECENT-CTIME>
452,2013-12-16 20:03:20,2013-12-16 20:04:55,php5-dev,/usr/include/php5/main/snprintf.h,<RECENT-CTIME>
440,2013-12-16 20:03:20,2013-12-16 20:04:54,php-pear,/usr/share/php/XML/Util.php,<RECENT-CTIME>


**好了，棒棒哒！上面说我最近安装了ddd和postgresql!我记得我的确安装过这些东西**

**这里面要告诉我们的所有事情是，如果你有个以s，ms，um为单位的时间戳，那么你就可以cast它到datetime64[the_right-thing]，剩下的pandas会为你做完**