Chapter 2: Essential DataFrame Operations
- 章节名:Chapter 2: Essential DataFrame Operations
注意选择多列时,不要用tuple,要用list
movie['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name'] # 会报错 # 上述等价于 movie[('actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name')] # 正确的用法是list cols =[['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name']]可以根据列的dtype/列名来选择多列
movie.get_dtype_counts() # 先看看各种dtype的列的个数 movie.select_dtypes(include=['number']).head() # 只看数值列 movie.filter(like='facebook').head() # 列名称包含子串 movie.filter(regex='\d').head() # 正则能match上 movie.filter(items=['actor_1_name', 'asdf']).head() # 跟columns取交集,多写的不会报错(如不存在的'asdf'列)列名排序最佳实践 * 离散靠左,连续靠右 * 比较重要的列靠左
# 按照最佳实践排好序 disc_core = ['movie_title','title_year', 'content_rating','genres'] disc_people = ['director_name','actor_1_name', 'actor_2_name','actor_3_name'] disc_other = ['color','country','language','plot_keywords','movie_imdb_link'] cont_fb = ['director_facebook_likes','actor_1_facebook_likes','actor_2_facebook_likes', 'actor_3_facebook_likes', 'cast_total_facebook_likes', 'movie_facebook_likes'] cont_finance = ['budget','gross'] cont_num_reviews = ['num_voted_users','num_user_for_reviews', 'num_critic_for_reviews'] cont_other = ['imdb_score','duration', 'aspect_ratio', 'facenumber_in_poster'] # 拼接成大的list,作为新的columns new_col_order = disc_core + disc_people + disc_other + \ cont_fb + cont_finance + cont_num_reviews + cont_other # 记得check一下,是否有遗漏 set(movie.columns) == set(new_col_order) # 通过选择来reorder_columns movie2 = movie[new_col_order]pandas 中的统计方法一般默认会skipna,安静地过滤掉了,有点危险,可以强制改变这一行为 `movie.min(skipna=False)` 写单元测试时,可以用到这个包
from pandas.testing import assert_frame_equal assert_frame_equal(college_ugds_, college_ugds_) # 注意eq方法不同,返回的是同shape的DataFrame college_ugds_.eq(.0019).head()
13人阅读
koyo922对本书的所有笔记 · · · · · ·
-
Chap1 Pandas Foundation
pandas 的 Series/DataFrame支持基本的运算符,及其函数式调用;如 `imdb_score.floordiv(7) ...
-
Chapter 2: Essential DataFrame Operations
> 查看全部2篇
说明 · · · · · ·
表示其中内容是对原文的摘抄