本站支持尊重有效期内的版权/著作权,所有的资源均来自于互联网网友分享或网盘资源,一旦发现资源涉及侵权,将立即删除。希望所有用户一同监督并反馈问题,如有侵权请联系站长或发送邮件到ebook666@outlook.com,本站将立马改正
基本信息
书名:设计数据密集型应用(影印版)
定价:99.00元
作者:Martin Klepp*nn
出版社:东南大学出版社
出版日期:2017-10-01
ISBN:9787564173852
字数:
页码:
版次:1
装帧:平装-胶订
开本:16开
商品重量:0.4kg
编辑
内容提要
书中包以下内容:
深入分析你已经在使用的,并学习如何更高效地使用和运维这些
通过识别不同工具的优缺点,作出更明智的决策
了解一致性、可伸缩性、容错性和复杂度之间的权衡
理解分布式研究,这些研究是现代数据库构建的基石
走到一些主流在线服务的幕后,学习它们的架构
目录
Part I. Found*ions of D*a Systems
1. Reliable, Scalable, and Maintainable Applic*ions 3
Thinking About D*a Systems 4
Reliability 6
Hardware Faults 7
Software Erro* 8
Hu*n Erro* 9
How Important Is Reliability 10
Scalability 10
Describing Load 11
Describing Perfor*nce 13
Approaches for Coping with Load 17
Maintainability 18
Operability: Making Life Easy for Oper*ions 19
Simplicity: Managing Complety 20
Evolvability: Making Change Easy 21
Sum*ry 22
2. D*a Models and Query Languages 27
Rel*ional Model Ve*us Document Model 28
The Birth of NoSQL 29
The Ob*e*-Rel*ional Mis*tch 29
Many-to-One and Many-to-Many Rel*ionships 33
Are Document D*abases Repe*ing History 36
Rel*ional Ve*us Document D*abases Today 38
Query Languages for D*a 42
Declar*ive Queries on the Web 44
MapReduce Querying 46
Graph-Like D*a Models 49
Property Graphs 50
The Cypher Query Language 52
Graph Queries in SQL 53
Triple-Stores and SPARQL 55
The Found*ion: D*alog 60
Sum*ry 63
3. Storage and Retrieval 69
D*a Stru*ures Th* Power Your D*abase 70
Hash Indexes 72
SSTables and LSM-Trees 76
B-Trees 79
Comparing B-Trees and LSM-Trees 83
Other Indeng Stru*ures 85
Transa*ion Processing or Analytics 90
D*a Warehousing 91
Sta* and Snowflakes: Sche*s for Analytics 93
Column-Oriented Storage 95
Column Compression 97
Sort Order in Column Storage 99
Writing to Column-Oriented Storage 101
Aggreg*ion: D*a Cubes and M*erialized Views 101
Sum*ry 103
4. Encoding and Evolution 111
For*ts for Encoding D*a 112
Language-Specific For*ts 113
JSON, XML, and Binary Variants 114
Thrift and Protocol Buffe* 117
Avro 122
The Merits of Sche*s 127
Modes of D*aflow 128
D*aflow Through D*abases 129
D*aflow Through Services: REST and RPC 131
Message-Passing D*aflow 136
Sum*ry 139
Part II. Distributed D*a
5. Replic*ion 151
Leade* and Followe* 152
Synchronous Ve*us Asynchronous Replic*ion 153
Setting Up New Followe* 155
Handling Node Outages 156
Implement*ion of Replic*ion Logs 158
Problems with Replic*ion Lag 161
Reading Your Own Writes 162
Monotonic Reads 164
Consistent Prefix Reads 165
Solutions for Replic*ion Lag 167
Multi-Leader Replic*ion 168
Use Cases for Multi-Leader Replic*ion 168
Handling Write Confli*s 171
Multi-Leader Replic*ion Topologies 175
Leaderless Replic*ion 177
Writing to the D*abase When a Node Is Down 177
Limit*ions of Quorum Consistency 181
Sloppy Quorums and Hinted Handoff 183
Dete*ing Concurrent Writes 184
Sum*ry 192
6. Partitioning 199
Partitioning and Replic*ion 200
Partitioning of Key-Value D*a 201
Partitioning by Key Range 202
Partitioning by Hash of Key 203
Skewed Workloads and Relieving Hot Spots 205
Partitioning and Secondary Indexes 206
Partitioning Secondary Indexes by Document 206
Partitioning Secondary Indexes by Term 208
Rebalancing Partitions 209
Str*egies for Rebalancing 210
Oper*ions: Auto*tic or Manual Rebalancing 213
Request Routing 214
Parallel Query Execution 216
Sum*ry 216
7. Transa*ions 221
The Slippery Concept of a Transa*ion 222
The Meaning of ACID 223
Single-Ob*e* and Multi-Ob*e* Oper*ions 228
Weak Isol*ion Levels 233
Read Committed 234
Snapshot Isol*ion and Repe*able Read 237
Preventing Lost Upd*es 242
Write Skew and Phantoms 246
Serializability 251
A*ual Serial Execution 252
Two-Phase Locking (2PL) 257
Serializable Snapshot Isol*ion (SSI) 261
Sum*ry 266
8. The Trouble with Distributed Systems 273
Faults and Partial Failures 274
Cloud Computing and Superputing 275
Unreliable Networks 277
Network Faults in Pra*ice 279
Dete*ing Faults 280
Timeouts and Unbounded Delays 281
Synchronous Ve*us Asynchronous Networks 284
Unreliable Clocks 287
Monotonic Ve*us Time-of-Day Clocks 288
Clock Synchroniz*ion and Accuracy 289
Relying on Synchronized Clocks 291
Process Pauses 295
Knowledge, Truth, and Lies 300
The Truth Is Defined by the Ma*ority 300
Byzantine Faults 304
System Model and Reality 306
Sum*ry 310
9. Consistency and Consensus 321
Consistency Guarantees 322
Linearizability 324
Wh* Makes a System Linearizable 325
Relying on Linearizability 330
Implementing Linearizable Systems 332
The Cost of Linearizability 335
Ordering Guarantees 339
Ordering and Causality 339
Sequence Number Ordering 343
Total Order Broadcast 348
Distributed Transa*ions and Consensus 352
Atomic Commit and Two-Phase Commit (2PC) 354
Distributed Transa*ions in Pra*ice 360
Fault-Tolerant Consensus 364
Membe*hip and Coordin*ion Services 370
Sum*ry 373
Part III. Derived D*a
10. B*ch Processing 389
B*ch Processing with Unix Tools 391
Simple Log Analysis 391
The Unix Philosophy 394
MapReduce and Distributed Filesystems 397
MapReduce Job Execution 399
Reduce-Side Joins and Grouping 403
Map-Side Joins 408
The Output of B*ch Workflows 411
Comparing Hadoop to Distributed D*abases 414
Beyond MapReduce 419
M*erializ*ion of Intermedi*e St*e 419
Graphs and Iter*ive Processing 424
High-Level APIs and Languages 426
Sum*ry 429
11. Stream Processing 439
Transmitting Event Streams 440
Messaging Systems 441
Partitioned Logs 446
D*abases and Streams 451
Keeping Systems in Sync 452
Change D*a Capture 454
Event Sourcing 457
St*e, Streams, and Im*tability 459
Processing Streams 464
Uses of Stream Processing 465
Reasoning About Time 468
Stream Joins 472
Fault Tolerance 476
Sum*ry 479
12. The Future of D*a Systems 489
D*a Integr*ion 490
Combining Specialized Tools by Deriving D*a 490
B*ch and Stream Processing 494
Unbundling D*abases 499
Composing D*a Storage Technologies 499
Designing Applic*ions Around D*aflow 504
Observing Derived St*e 509
Aiming for Corre*ness 515
The End-to-End Argument for D*abases 516
Enforcing Constraints 521
Timeliness and Integrity 524
Trust, but Verify 528
Doing the Right Thing 533
Predi*ive Analytics 533
Privacy and Tracking 536
Sum*ry 543
Glossary 553
Index 559
作者介绍
Martin Klepp*nn,是英国剑桥大学的一名分布式研究员。在此之前他曾是软件工程师和企业家,在 Linkedin 和 Rapportive 工作过,从事大规模数据基础设施相关的工作。Martin 经常在大会做演讲,写,也是开源贡献者
文摘
序言