作者:
Martin Kleppmann 出版社: O'Reilly Media 副标题: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 出版年: 2017-4-2 页数: 614 定价: USD 44.99 装帧: Paperback ISBN: 9781449373320
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your appl...
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
Make informed decisions by identifying the strengths and weaknesses of different tools
Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern databases are built
Peek behind the scenes of major online services, and learn from their architectures
作者简介
· · · · · ·
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.
Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.
UDP is a good choice in situations where delayed data is worthless. For example, in a VoIP phone call, there probably isn’t enough time to retransmit a lost packet before its data is due to be played over the loudspeakers. In this case, there’s no point in retransmitting the packet—the application must instead fill the missing packet’s time slot with silence (causing a brief interruption in the second) and move on in the stream. The retry happens at the human layer instead. (“Could you repeat that please? The sound just cut out for a moment.”) (查看原文)
For data warehouse queries that need to scan over millions of rows, a big bottleneck is the bandwidth for getting data from disk into memory. However, that is not the only bottleneck. Developers of analytical databases also worry about efficiently using the bandwidth from main memory into the CPU cache, avoiding branch mispredictions and bubbles in the CPU instruction processing pipeline, and making use of single-instruction-multi-data (SIMD) instructions in modern CPUs.
Besides reducing the volume of data that needs to be loaded from disk, columnoriented storage layouts are also good for making efficient use of CPU cycles. For example, the query engine can take a chunk of compressed column data that fits comfortably in the CPU’s L1 cache and iterate through it in a tight loop (that is, w... (查看原文)
0 有用 hoterran 2018-11-13 13:10:03
蛮好的,大数据、分布式系统的基础书,都琢磨透了架构师妥妥的 线性一致性这章需要深入研究一下。 准备再读一遍
4 有用 𝒜𝓇𝒾𝑒𝓈𝒟𝑒𝓋𝒾𝓁 2018-03-14 16:48:24
广度有了,深度不够,不过给出了好多引用,够看两年...
2 有用 旺三 2018-03-28 20:26:39
对大数据系统有了一个整体的认识,以后遇到问题之前能知道解决方向。
1 有用 硅胶鱼 2018-05-27 15:51:34
值得再读一遍。分布式数据系统 真•big picture
10 有用 白杨树 2016-05-04 14:12:28
我靠,这本书实在太牛了。 赶紧读!赶紧读!赶紧读!