Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The...
Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop solves specific problems.
Organizations large and small are adopting Apache Hadoop to deal with huge application datasets. Hadoop: The Definitive Guide provides you with the key for unlocking the wealth this data holds. Hadoop is ideal for storing and processing massive amounts of data, but until now, information on this open-source project has been lacking -- especially with regard to best practices. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters.
With case studies that illustrate how Hadoop solves specific problems, this book helps you:
* Learn the Hadoop Distributed File System (HDFS), including ways to use its many APIs to transfer data
* Write distributed computations with MapReduce, Hadoop's most vital component
* Become familiar with Hadoop's data and IO building blocks for compression, data integrity, serialization, and persistence
* Learn the common pitfalls and advanced features for writing real-world MapReduce programs
* Design, build, and administer a dedicated Hadoop cluster
* Use HBase, Hadoop's database for structured and semi-structured data
And more. Hadoop: The Definitive Guide is still in progress, but you can get started on this technology with the Rough Cuts edition, which lets you read the book online or download it in PDF format as the manuscript evolves.
* The architecture of HDFS is described in “The Hadoop Distributed File System” by Konstantin Shvachko,
Hairong Kuang, Sanjay Radia, and Robert Chansler (Proceedings of MSST2010, May 2010, http://
storageconference.org/2010/Papers/MSST/Shvachko.pdf).
† “Scaling Hadoop to 4000 nodes at Yahoo!,” http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop
_to_4000_nodes_a.html. (查看原文)
中文版412页: 所以理论上,任何东西都可以表示成二进制形式,然后转化成为长整型的字符串或直接对数据结构进行序列化,来作为键值。 原文460页: ..., so theoretically anything can serve as row key, from strings to binary representations of long or even serialized ...
(展开)
0 有用 Julian 2015-02-03 14:19:33
Hadoop入门必读啊
0 有用 optman 2012-10-15 18:13:42
太多细节 英文第三版
0 有用 散关清渭 2013-03-05 13:52:50
这本书算是hadoop的入门手册 书中讲的还是比较详细的 涉及hadoop hdfs hbase hive pig cassading zookeeper 。。。。 但是 美中不足的是说的都比较浅 没有涉及核心 更多的功能有待读者继续探索
0 有用 jason 2011-01-27 15:59:18
把当前阶段关注的要点大略的看了一遍,对于想了解hadoop系统设计和使用的人, 是本好的入门教材。
0 有用 Asura 2014-10-21 09:18:43
搞懂Hadoop机理。
0 有用 阿凡达弟弟 2022-02-23 13:13:14
MapReduce讲的挺详细的,其他组件或框架或许还要找对应书籍再深入看,算是大数据框架入门了。
0 有用 Валия 2020-06-28 14:26:53
第三版
0 有用 herihe 2020-06-11 08:36:18
英文版和中文版的评价能分开吗 一个时代的结束
0 有用 memex 2020-03-29 08:50:10
大概13年左右看的 当初学的这个之后的工作中派上了用场
0 有用 ren 2019-11-29 18:13:03
大致翻过。谷歌三驾马车的开源实现,讲得比论文详细。