Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you'll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It's ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. This third edition covers recent cha...
Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you'll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It's ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. This third edition covers recent changes to Hadoop, including new material on the new MapReduce API, as well as version 2 of the MapReduce runtime (YARN) and its more flexible execution model. You'll also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. * Store large datasets with the Hadoop Distributed File System (HDFS), then run distributed computations with MapReduce * Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence * Discover common pitfalls and advanced features for writing real-world MapReduce programs * Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud * Use Pig, a high-level query language for large-scale data processing * Analyze datasets with Hive, Hadoop's data warehousing system * Load data from relational databases into HDFS, using Sqoop * Take advantage of HBase, the database for structured and semi-structured data * Use ZooKeeper, the toolkit for building distributed systems
MapReduce is a programming model for data processing. MapReduce works by breaking the processing into two phases: the map phase and the reduce phase. Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function. (查看原文)
中文版412页: 所以理论上,任何东西都可以表示成二进制形式,然后转化成为长整型的字符串或直接对数据结构进行序列化,来作为键值。 原文460页: ..., so theoretically anything can serve as row key, from strings to binary representations of long or even serialized ...
(展开)
0 有用 George Sun 2013-03-20 22:15:07
终于读完了。。。这本是本系列的第三版了,好评如潮,无需我再费口舌了。
1 有用 天色已晚 2014-05-13 14:46:51
很细致,跟着入门一遍。对这种快速演进的技术,还是老老实实看原版吧..中文版落后太多,等新翻译出来黄花菜都凉了
2 有用 Monkey.D.Law 2015-08-05 10:15:52
中英结合疗效好,不过还是建议中文为主,英文为辅。
0 有用 jps 2014-08-29 10:36:45
The system of Big Data, all focuse on the Scality, Fault torlerance, Scheduler, Shuffle.
0 有用 bernie 2013-01-04 21:17:11
第三版
0 有用 FE 2022-04-03 21:55:59
很不错,看了能对整个hadoop架构有个简要了解。之后就靠读代码了。一年看一本系列
0 有用 Gnillor 2022-01-23 00:25:42
2202年了,这本书依然是了解Hadoop生态最好的。特别喜欢第二部分讲spark、hbase、hive的内容,精辟
0 有用 暴风之翼 2021-02-01 12:10:34
关于hadoop 非常好的入门介绍,把之前知道的很多零散知识都串起来了。只看了part1, 2, 5的一部分。
0 有用 深海之蓝 2019-10-22 10:26:45
可以当做概览
0 有用 BBHMM 2019-05-15 18:42:31
这段时间学完了大数据,看这本书复习一遍还是很棒的