作者:
Donald Miner
/
Adam Shook 出版社: O'Reilly Media 副标题: Building Effective Algorithms and Analytics for Hadoop and Other Systems 出版年: 2012-12-22 页数: 230 定价: USD 44.99 装帧: Paperback ISBN: 9781449327170
Design patterns for the MapReduce framework, until now, have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you're using. Each pattern is explained in context, with pitfalls and cavea...
Design patterns for the MapReduce framework, until now, have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you're using. Each pattern is explained in context, with pitfalls and caveats clearly identified - so you can avoid some of the common design mistakes when modeling your Big Data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. Hadoop MapReduce code is provided to help you learn how to apply the design patterns by example. Topics include: Basic patterns, including map-only filter, group by, aggregation, distinct, and limit Joins: traditional reduce-side join, reduce-side join with Bloom filter, replicated join with distributed cache, merge join, Cartesian products, and intersections Binning, sharding for other systems, sorting, sampling, unions, and other patterns for organizing data Job optimization patterns, including multi-job map-only job folding, and overloading the key grouping to perform two jobs at once
After
iteration, the comment lengths are sorted to find the median value. If the list has an odd
number of entries, the median value is set to the middle value. If the number is even,
the middle two values are averaged. (查看原文)
Description: 仅仅使用Map完成大数据集的Join. Intent: 避免使用Reduce, 提高性能 前提: A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持 A composite join should be used when: • An inner or full outer join is desired. • All the data sets are sufficiently large. • All data sets can be read with the ...
2013-06-21 10:38:52
Description:
仅仅使用Map完成大数据集的Join.
Intent:
避免使用Reduce, 提高性能
前提:
A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持
A composite join should be used when:
• An inner or full outer join is desired.
• All the data sets are sufficiently large.
• All data sets can be read with the foreign key as the input key to the mapper.
• All data sets have the same number of partitions.
• Each partition is sorted by foreign key, and all the foreign keys reside in the asso‐ ciated partition of each data set. That is, partition X of data sets A and B contain the same foreign keys and these foreign keys are present only in partition X. For a visualization of this partitioning and sorting key, refer to Figure 5-3.
• The data sets do not change often (if they have to be prepared).
Description: 仅仅使用Map完成大数据集的Join. Intent: 避免使用Reduce, 提高性能 前提: A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持 A composite join should be used when: • An inner or full outer join is desired. • All the data sets are sufficiently large. • All data sets can be read with the ...
2013-06-21 10:38:52
Description:
仅仅使用Map完成大数据集的Join.
Intent:
避免使用Reduce, 提高性能
前提:
A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持
A composite join should be used when:
• An inner or full outer join is desired.
• All the data sets are sufficiently large.
• All data sets can be read with the foreign key as the input key to the mapper.
• All data sets have the same number of partitions.
• Each partition is sorted by foreign key, and all the foreign keys reside in the asso‐ ciated partition of each data set. That is, partition X of data sets A and B contain the same foreign keys and these foreign keys are present only in partition X. For a visualization of this partitioning and sorting key, refer to Figure 5-3.
• The data sets do not change often (if they have to be prepared).
Description: 仅仅使用Map完成大数据集的Join. Intent: 避免使用Reduce, 提高性能 前提: A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持 A composite join should be used when: • An inner or full outer join is desired. • All the data sets are sufficiently large. • All data sets can be read with the ...
2013-06-21 10:38:52
Description:
仅仅使用Map完成大数据集的Join.
Intent:
避免使用Reduce, 提高性能
前提:
A, B 数据集必须根据需要join的foreign key 排序, 分区 , 并且保证A,B数据集中同一个foreign key必须在同一个分区. Hadoop中有composite inputformat支持
A composite join should be used when:
• An inner or full outer join is desired.
• All the data sets are sufficiently large.
• All data sets can be read with the foreign key as the input key to the mapper.
• All data sets have the same number of partitions.
• Each partition is sorted by foreign key, and all the foreign keys reside in the asso‐ ciated partition of each data set. That is, partition X of data sets A and B contain the same foreign keys and these foreign keys are present only in partition X. For a visualization of this partitioning and sorting key, refer to Figure 5-3.
• The data sets do not change often (if they have to be prepared).
0 有用 tomsheep 2014-09-10 20:24:24
相当一部分“pattern”被总结出来,只说明了Hadoop太笨。
0 有用 Hanyu💤 2016-04-10 15:47:48
慢慢思索,仍需品味…
0 有用 Stanley 2013-09-04 22:55:16
花了大概3-4个小时快速看完,温习了一下Input/OutputFormat, RecordReader/Writer, InputSplit,基本没收获,比较适合刚会写MapReduce的码农们快速浏览一遍
0 有用 WeiLu 2017-04-30 11:18:04
大概13年的时候读过这本书,当时觉得觉得收获非常大,基本覆盖了用mr处理数据的常用方法,不过现在看开用hive就够了。
0 有用 长脸方 2016-02-14 10:13:44
就告诉你如何用MR实现SQL中的JOIN、聚合函数等
0 有用 WeiLu 2017-04-30 11:18:04
大概13年的时候读过这本书,当时觉得觉得收获非常大,基本覆盖了用mr处理数据的常用方法,不过现在看开用hive就够了。
0 有用 Hanyu💤 2016-04-10 15:47:48
慢慢思索,仍需品味…
0 有用 wscanf 2016-02-23 11:42:38
入门了,略拖沓。
0 有用 长脸方 2016-02-14 10:13:44
就告诉你如何用MR实现SQL中的JOIN、聚合函数等
0 有用 磁爆步兵杨永信 2015-02-17 17:49:08
作者也是屌。几道MR例题也能出本书。