出版社: O'Reilly Media
副标题: A deep-dive into how distributed data systems work
出版年: 2019-10-22
页数: 376
定价: USD 59.99
装帧: Paperback
ISBN: 9781492040347
内容简介 · · · · · ·
Have you ever wanted to learn more about Databases but did not know where to start? This is a book just for you.
We can treat databases and other infrastructure components as black boxes, but it doesn’t have to be that way. Sometimes we have to take a closer look at what’s going on because of performance issues. Sometimes databases misbehave, and we need to find out what exactl...
Have you ever wanted to learn more about Databases but did not know where to start? This is a book just for you.
We can treat databases and other infrastructure components as black boxes, but it doesn’t have to be that way. Sometimes we have to take a closer look at what’s going on because of performance issues. Sometimes databases misbehave, and we need to find out what exactly is going on. Some of us want to work in infrastructure and develop databases. This book’s main intention is to introduce you to the cornerstone concepts and help you understand how databases work.
The book consists of two parts: Storage Engines and Distributed Systems since that’s where most of the differences between the vast majority of databases is coming from.
作者简介 · · · · · ·
Alex is an Infrastructure Engineer, Apache Cassandra Committer, working on building data infrastructure and processing pipelines. He’s interested in CS Theory, algorithms, Distributed Systems, understanding how things work and sharing it with others
目录 · · · · · ·
Preface
How to Contact Us
I. Storage Engines
1. Introduction and Overview
DBMS Architecture
· · · · · · (更多)
Preface
How to Contact Us
I. Storage Engines
1. Introduction and Overview
DBMS Architecture
Memory- Versus Disk-Based DBMS
Durability in Memory-Based Stores
Column- Versus Row-Oriented DBMS
Row-Oriented Data Layout
Column-Oriented Data Layout
Distinctions and Optimizations
Wide Column Stores
Data Files and Index Files
Data Files
Index Files
Primary Index as an Indirection
Buffering, Immutability, and Ordering
Summary
2. B-Tree Basics
Binary Search Trees
Tree Balancing
Trees for Disk-Based Storage
Disk-Based Structures
Hard Disk Drives
Solid State Drives
On-Disk Structures
Ubiquitous B-Trees
B-Tree Hierarchy
Separator Keys
B-Tree Lookup Complexity
B-Tree Lookup Algorithm
Counting Keys
B-Tree Node Splits
B-Tree Node Merges
Summary
3. File Formats
Motivation
Binary Encoding
Primitive Types
Strings and Variable-Size Data
Bit-Packed Data: Booleans, Enums, and Flags
General Principles
Page Structure
Slotted Pages
Cell Layout
Combining Cells into Slotted Pages
Managing Variable-Size Data
Versioning
Checksumming
Summary
4. Implementing B-Trees
Page Header
Magic Numbers
Sibling Links
Rightmost Pointers
Node High Keys
Overflow Pages
Binary Search
Binary Search with Indirection Pointers
Propagating Splits and Merges
Breadcrumbs
Rebalancing
Right-Only Appends
Bulk Loading
Compression
Vacuum and Maintenance
Fragmentation Caused by Updates and Deletes
Page Defragmentation
Summary
5. Transaction Processing and Recovery
Buffer Management
Caching Semantics
Cache Eviction
Locking Pages in Cache
Page Replacement
Recovery
Log Semantics
Operation Versus Data Log
Steal and Force Policies
ARIES
Concurrency Control
Serializability
Transaction Isolation
Read and Write Anomalies
Isolation Levels
Optimistic Concurrency Control
Multiversion Concurrency Control
Pessimistic Concurrency Control
Lock-Based Concurrency Control
Summary
6. B-Tree Variants
Copy-on-Write
Implementing Copy-on-Write: LMDB
Abstracting Node Updates
Lazy B-Trees
WiredTiger
Lazy-Adaptive Tree
FD-Trees
Fractional Cascading
Logarithmic Runs
Bw-Trees
Update Chains
Taming Concurrency with Compare-and-Swap
Structural Modification Operations
Consolidation and Garbage Collection
Cache-Oblivious B-Trees
van Emde Boas Layout
Summary
7. Log-Structured Storage
LSM Trees
LSM Tree Structure
Updates and Deletes
LSM Tree Lookups
Merge-Iteration
Reconciliation
Maintenance in LSM Trees
Read, Write, and Space Amplification
RUM Conjecture
Implementation Details
Sorted String Tables
Bloom Filters
Skiplist
Disk Access
Compression
Unordered LSM Storage
Bitcask
WiscKey
Concurrency in LSM Trees
Log Stacking
Flash Translation Layer
Filesystem Logging
LLAMA and Mindful Stacking
Open-Channel SSDs
Summary
Part I Conclusion
II. Distributed Systems
8. Introduction and Overview
Concurrent Execution
Shared State in a Distributed System
Fallacies of Distributed Computing
Processing
Clocks and Time
State Consistency
Local and Remote Execution
Need to Handle Failures
Network Partitions and Partial Failures
Cascading Failures
Distributed Systems Abstractions
Links
Two Generals’ Problem
FLP Impossibility
System Synchrony
Failure Models
Crash Faults
Omission Faults
Arbitrary Faults
Handling Failures
Summary
9. Failure Detection
Heartbeats and Pings
Timeout-Free Failure Detector
Outsourced Heartbeats
Phi-Accural Failure Detector
Gossip and Failure Detection
Reversing Failure Detection Problem Statement
Summary
10. Leader Election
Bully Algorithm
Next-In-Line Failover
Candidate/Ordinary Optimization
Invitation Algorithm
Ring Algorithm
Summary
11. Replication and Consistency
Achieving Availability
Infamous CAP
Use CAP Carefully
Harvest and Yield
Shared Memory
Ordering
Consistency Models
Strict Consistency
Linearizability
Sequential Consistency
Causal Consistency
Session Models
Eventual Consistency
Tunable Consistency
Witness Replicas
Strong Eventual Consistency and CRDTs
Summary
12. Anti-Entropy and Dissemination
Read Repair
Digest Reads
Hinted Handoff
Merkle Trees
Bitmap Version Vectors
Gossip Dissemination
Gossip Mechanics
Overlay Networks
Hybrid Gossip
Partial Views
Summary
13. Distributed Transactions
Making Operations Appear Atomic
Two-Phase Commit
Cohort Failures in 2PC
Coordinator Failures in 2PC
Three-Phase Commit
Coordinator Failures in 3PC
Distributed Transactions with Calvin
Distributed Transactions with Spanner
Database Partitioning
Consistent Hashing
Distributed Transactions with Percolator
Coordination Avoidance
Summary
14. Consensus
Broadcast
Atomic Broadcast
Virtual Synchrony
Zookeeper Atomic Broadcast (ZAB)
Paxos
Paxos Algorithm
Quorums in Paxos
Failure Scenarios
Multi-Paxos
Fast Paxos
Egalitarian Paxos
Flexible Paxos
Generalized Solution to Consensus
Raft
Leader Role in Raft
Failure Scenarios
Byzantine Consensus
PBFT Algorithm
Recovery and Checkpointing
Summary
Part II Conclusion
A. Bibliography
Index
· · · · · · (收起)
喜欢读"Database Internals"的人也喜欢 · · · · · ·
Database Internals的书评 · · · · · · ( 全部 2 条 )

我是这本书的翻译者,欢迎大家在这个帖子给一些勘误建议
> 更多书评 2篇
论坛 · · · · · ·
在这本书的论坛里发言这本书的其他版本 · · · · · · ( 全部2 )
-
机械工业出版社 (2020)8.0分 144人读过
以下书单推荐 · · · · · · ( 全部 )
谁读这本书? · · · · · ·
二手市场
· · · · · ·
- 在豆瓣转让 有465人想读,手里有一本闲着?
订阅关于Database Internals的评论:
feed: rss 2.0
0 有用 宋四月 2020-08-14 16:12:40
非常赞
0 有用 豆友62491293 2020-09-03 11:11:24
纲领性数据库底层说明文件
0 有用 ò.⒏㈢ ㄧ° 2022-03-24 22:06:33
读到一半觉得不是一本好书,读完觉得简直更糟糕了。 懂的人没必要看,不懂的人看了多无益。概念的诠释太依赖上下文,但又缺失对上下文的抽象补全,图例也无感,全靠文字脑补传达力太弱了。所以要做到深入浅出属实不易,过于细致容易迷失在过多无关细节里,那不如直接看论文,代码实现。浮光掠影又容易建立不起系统的因果心智模型,迷失在缺失细节里。 只是看的过程中觉得那句:「premature optimizatio... 读到一半觉得不是一本好书,读完觉得简直更糟糕了。 懂的人没必要看,不懂的人看了多无益。概念的诠释太依赖上下文,但又缺失对上下文的抽象补全,图例也无感,全靠文字脑补传达力太弱了。所以要做到深入浅出属实不易,过于细致容易迷失在过多无关细节里,那不如直接看论文,代码实现。浮光掠影又容易建立不起系统的因果心智模型,迷失在缺失细节里。 只是看的过程中觉得那句:「premature optimization is the root of all evil 」应该去掉 premature,性能优化更像是抽象的穿透,以及问题价值域置换下的局部最优,直到契机带来范式迁移,不然真是个无底洞。稍不注意就是能唬倒人的玄学。 (展开)
7 有用 弗格孙 2019-10-27 01:04:00
讲述的方式和技巧都比DDIA那本差了很远,造成的结果就是你之前懂的内容就能看懂他讲的(也基本可以认为没必要读),你之前不懂的看了还是不懂。看在文中提过一嘴tidb的份上多给一星。
0 有用 出租车司机 2020-03-17 14:43:41
没有大规模分布式存储系统写的好,过于抽象,不够实用
0 有用 huyan00 2023-10-14 22:02:33 北京
很苍白
0 有用 3点一直线 2023-09-13 13:43:39 美国
感觉不错的,比较详细。 但是感觉单纯只看书, 很容易忘记。
0 有用 邻家の躺平人 2023-03-02 23:23:05 浙江
这本书就是来搞笑的鸡肋,实用性还不如 https://book.douban.com/subject/25723658/。说用来入门吧,内容上东一嘴西一耙的完全做不到 introductory。一开始总觉得这种模式似曾相识但是又说不清楚是什么,直到看到分布式系统部分才猛然发觉这不就是我平时看了某个东西顺手给自己写 notes/memo 的风格吗....
0 有用 五小聊 2022-07-08 21:37:29
这本书是摘要性质的书。能覆盖数据库和分布式系统的入门知识,构建系统性知识框架。但是想了解具体数据库怎么实现还是需要找具体源码或者专门的书籍、论文看。翻译尚可,能看得懂。可以和ddia结合起来读。
0 有用 映天蓝 2022-06-20 19:55:45
应该换一个书名更贴切一点,database system, a short introduction,不想看那几本黑皮教材的可以看这本,稍微轻松点,范例也更直观,面向没打好专业基础的工程师。从覆盖范围来看,重点都涉及到了,是专业学习一个不错的起点。