書籍「Designing Data-Intensive Applications」まとめ(Part1)


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems


Part I. Foundations of Data Systems

Chapter 1. Reliable, Scalable, and Maintainable Applications

この書籍での大事な評価指標であるReliability, Scalability, Maintainabilityに関する定義です。


The application performs the function that the user expected. It can tolerate the user making mistakes or using the software in unexpected ways. Its performance is good enough for the required use case, under the expected load and data volume. The system prevents any unauthorized access and abuse.


Scalability is the term we use to describe a system’s ability to cope with increased load. Note, however, that it is not a one-dimensional label that we can attach to a system: it is meaningless to say “X is scalable” or “Y doesn’t scale.” Rather, discussing scalability means considering questions like “If the system grows in a particular way, what are our options for coping with the growth?” and “How can we add computing resources to handle the additional load?”




Make it easy for operations teams to keep the system running smoothly.


Make it easy for new engineers to understand the system, by removing as much complexity as possible from the system. (Note this is not the same as simplicity of the user interface.)


Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change. Also known as extensibility, modifiability, or plasticity. As previously with reliability and scalability, there are no easy solutions for achieving

Chapter 2. Data Models and Query Languages


Relational Model Versus Document Model


However, if your application does use many-to-many relationships, the document model becomes less appealing.

その他、schema-on-read, schema-on-writeの違いなどを議論しています。ここあたり知らないと、サービスイン後にデータスキーマ変更できるの?できないの?みたいな議論ができなくなってしまいますね。

Query Languages for Data


In a declarative query language, like SQL or relational algebra, you just specify the pattern of the data you want — what conditions the results must meet, and how you want the data to be transformed (e.g., sorted, grouped, and aggregated) — but not how to achieve that goal. It is up to the database system’s query optimizer to decide which indexes and which join methods to use, and in which order to execute various parts of the query.

Graph-Like Data Models


As is typical for a declarative query language, you don’t need to specify such execution details when writing the query: the query optimizer automatically chooses the strategy that is predicted to be the most efficient, so you can get on with writing the rest of your application.

Chapter 3. Storage and Retrieval


Data Structures That Power Your Database

これが世界でもっともシンプルなデータベースだそうです。分かりやすいですね。これであれば、僕でも作れますね。世界で一番シンプルなKVSという名称でgithubに公開しようかな。それ以外にもSSTable やLSM-Treesなどの話も面白いですね。もちろんB木も紹介されています。

#!/ bin/ bash 
db_set () { 
    echo "$ 1, $ 2" > > database 

db_get () { 
    grep "^ $ 1," database | sed -e "s/ ^ $ 1,//" | tail -n 1 
Transaction Processing or Analytics?




ビッグデータといえばカラム志向ですね。列方向の集計が多いから、それに合わせたデータフォーマットにするという話、当たり前といえば当たり前ですが、面白い。 最近のカンファレンスではParquetr採用のおかげでストレージがだいぶ減ったみたいな話もありましたね。