System design for Sharding a Database

Database sharding is the process of splitting the data across multiple servers for scalability of an application.





The above diagram gives a clear idea of sharding a database. In the above diagram, the above database is a collection of a single database with all the customers from 1 to 8. In the below diagram, we have split the database into shards of 4.

Sharding has many advantages, it helps us to maintain the database lot easier, easily scalable, and maintainable.

Let us design this database in detail, Our system would have some requirements, let's start with it and design our system with those assumptions in mind.

Requirements

  • Data size: Let's assume we have data of few 100 TBs

  • Data partition: Data partition can be done in many ways, it depends on the problem on what basis we can partition our data. We can do partition based on customer_id, location, items, inventory, and others.

Estimations

In this part, we will discuss the rough estimations of our system

  • Data part: In the data part, we will discuss how the data size would be, what would be the rate of the data getting inserted in the database.

    • Size: We have already assumed that the data might be around 100 TB initially

    • Rate: Let's assume we are like a Facebook or Google in that was assuming 1 TB per day of data getting generated and stored in our database.

    • Machine: Let's assume we have machines with 128 GB RAM and 10 TB hard disk. With this logic, we would have 100 TB /10 TB per machine = 10 machines.

    • Threshold: If we keep the threshold to 80% we need to add a new machine every 8 days.


  • Sharding Policy: In this, we can assume that each shard is independent of each other, In no case, a query would look into 2 shards to fetch the data. We need to store the data in such a way where these shards are independent of each other

CAP Theorem Check

The understanding of the CAP Theorem is very important in design a system, it states that out of Consistency, Availability, and Partition Tolerance only 2 can be achieved successfully at a time.

  • Consistency: Every read receives the most recent write or an error

  • Availability:  Every request receives a (non-error) response, without the guarantee that it contains the most recent write

  • Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

The CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability

When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed

Whereas, when choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date data.

Let's check with our problem, which points fits in

  • Consistency: Since we are dealing with the database we need to be consistent

  • Availability:  Every request receives a (non-error) response, without the guarantee that it contains the most recent write

  • Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

In-Depth analysis

Let us discuss in depth all the scenarios aspects of our system.
  • Fixed No. of Shards: Data within a shard should fit on a single machine completely. In our case the data is increasing at an alarming rate, hence the 10TB would get exhausted in a few days or months. Hence we cannot have a fixed number of shards. The shards need to be increased over time.

  • Variable No. of Shards: If we give a thought about this, the shards can increase per day, per week, or even per month, however once we have the frequency calculations it's easy to scale up the deployment. 

  • Distribution of Data: Let us say we have S shards. One way of distributing the data could be based on a hash H. Then we can calculate the data shard as (H % S). Now, this hash H can be calculated in different ways like server IP address, mac address, etc.

    In the above scenario, there would be a problem when the number of shards is increased then our shard number would increase by (S+1). Because of this, our hash function H % (S +1) would force us to relocate all the keys and values according to the new hash function. This would make our database extremely expensive and highly undesirable.

Consistent Hashing

In the earlier section, we discussed that adding a new shard would make us relocate all the keys and values, however with Consistent hashing which all the major databases follow, it is easier to deal with all the corner cases. Let us, deep-dive, into consistent hashing.

  • Let's say we calculate a 32/64bit integer hash for every key and map it to a ring. Let's say we start with X shards. Each shard is assigned a position on the ring as well. Each key maps to the first shard on the ring in a clockwise direction





  • What happens if we need to add another shard? Or what if one of the shards goes down and we need to re-distribute the data among remaining shards?





  • Similarly, there is a problem of cascading failure when a shard goes down.



  • The above hashing technique solves the majority of our problems, however, as we can see from the above diagram if any of the shards fails then the load on S3 would increase drastically. To overcome this as well we can have a modified version of consistent hashing.

  • We slightly changed the ring so that instead of one copy per shard, now we have multiple copies of the same shard spread over the ring.



  • When a new shard is added



  • When there are failures



From the above scenarios and hashing, we can say that we have designed our sharding database. For a good read please check this engineering blog from Pinterest where they have scaled there MySQL database to support billions of boards and pins.

Comment if you have any feedback with this article.










Comments

Popular posts from this blog

Kubernetes Nginx-ingress controller with HAProxy on Bare Metal

System design for a LRU cache