Choosing the Right Database using CAP Theorem

avs sridhar
3 min readJan 24, 2022

In this article let us understand how to chose the right database for our project.We also talk about CAP theorem and how it is used to decide the right database based on our requirements.

Firstly let us briefly look at the different between SQL and NOSQL databases.

SQL Databases

These are RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS).

These databases have fixed or static or predefined schema

These databases are not suited for hierarchical data storage.

These databases are best suited for complex queries

Vertically Scalable

Follow ACID property.

Examples:MySQL,MariaDB,Oracle.

NO SQL Databases

Non-relational or distributed database system.

They have dynamic schema

These databases are best suited for hierarchical data storage.

These databases are not so good for complex queries

Horizontally scalable

They compromise on ACID properties.

Example:MongoDB,Redis,Cassandra

Requirements while Choosing a database

When we are in high level design phase and need to decide on database below are few points which should be considered to identify SQL vs NoSQL database and if its is NoSQL database which one.

Functional Requirement ​- This will help you to identify “nature of schema” I.e. fixed or dynamic

Non-Functional Requirement​

  • Scaling ​- What are scaling requirement, Read vs Write ratio?
  • Consistency — Do we need ACID compliance​?
  • Availability — Does every request gets response?​
  • Partition Tolerance — Should out system continue to work despite message loss or partial failure?

Now let us also understand CAP theorem before jumping into how the decision is made when choosing the database.

CAP Theorem

CAP stands for Consistency,Availability,Partition Tolerance. These three are desirable properties of distributed systems.

Consistency

Consistency means that the nodes will have the same copies of a replicated data item visible for various transactions. A guarantee that every node in a distributed cluster returns the same, most recent, successful write. Consistency refers to every client having the same view of the data.

Availability

Availability means that each read or write request for a data item will either be processed successfully or will receive a message that the operation cannot be completed. Every non-failing node returns a response for all read and write requests in a reasonable amount of time. The key word here is every. To be available, every node on (either side of a network partition) must be able to respond in a reasonable amount of time.

Partition Tolerant

Partition tolerance means that the system can continue operating if the network connecting the nodes has a fault that results in two or more partitions, where the nodes in each partition can only communicate among each other. That means, the system continues to function and upholds its consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals

The CAP theorem states that it is not possible to guarantee all three of the desirable properties — consistency, availability, and partition tolerance at the same time in a distributed system with data replication.

Decision Making

  • Consistent and Partition tolerant: Stop serving request/Read from out of date partition I.e. User can only read data once write is committed in S2. Limitation: Not 100% available​
  • Available and Partition tolerant: User gets response instantly I.e. latest data from S1 where old or no data from S2. Now system is 100% Available and Partition tolerant but READ is not consistent. Limitation: Not consistent​
  • Consistent and Available: Write data for on same server which will be fetched/READ i.e. Consistent and Available. Limitation: Compromised partition tolerance/failure.

Examples

  • Financial System : Consistent & Available​
  • Chat Applications : Consistent & Partition tolerant​
  • Cache : Redis — Consistent & partition tolerant​
  • Social Site (Analytics Metadata as likes, feeds, views) — Available and Partition tolerant

--

--