System Design - My Coding Interview Journey

Chapter 1: Computer Architecture

Why not using a bigger and bigger CPUs? Mostly because CPU speeds reached a plateau.

Chapter 2: Application Architecture

What should you do when a server is not capable of serving your current load?

We can make the server a bit better: Better CPU, More RAM, etc. This is called vertical scaling.
We can take our server and make copy of it. This is called horizontal scaling.

Metric Service: provides all the metric we care about. How resources of the server is being utilized. Sometime this is actually log-based metrics.

When something goes wrong, we wouldn't want to manually go and look at the metrics to understand something is wrong!
We feed the metric to another service called alerting service to automatically notify us when some performance falls behind some configured threshold.

Chapter 3: Design Requirements

High Level requirements:

Move Data
Store Data:
- Usually on database
- Blob store
- File systems or distributed file systems
Transform Data

NOTE: Bad design choices for application architecture are very difficult to correct later. For example, if you design a database badly, it would hurt really bad.

Availability: $U pt im e / (U pt im e + Do w n t im e)$

If a system has 99% availability, it experiences 1% downtime. By increasing availability to 99.9%, the downtime reduces significantly, from 1% to just 0.1%, representing a tenfold improvement. This increase is substantial, as even a small rise in availability dramatically minimizes potential disruptions. 99% is two-9s! :D

SLO: Service Level Objective SLA: Service Level Agreement which states if a company's service is interrupted, how the customer will be provided by partial refund.

Throughput

For user related transactions, it is # of requests / seconds
For Databases usually the throughput is measured as Queries/Second or QPS.
For pipelines where large data flows through different stages, byte / seconds is the metric

IPv4: 32-bit IPv6: 128-bit Port Number is a 16-bit value.

Sequence number in the TCP header

When sending a request to https://google.com browser reserves a port for your client machine on which you will receive the data back from the backend server.

TCP is reliable

IP does not have a solution for it.
Missing packets will be retransmitted
It is connection-oriented protocol and requires a physical connection be established before data can be exchanged
Full-Duplex
3-way handshake
Data can be arrived out-of-order and reordered

UDP

User Datagram Protocol
Connection-Less
Some packet will be lost or arrive our of order: No guaranty
A lot faster than TCP
UDP is used for DNS for some reason

DNS

Domain Name System
ICANN: International Corporation for Assigned Names and Numbers
- Non-Profit Organization: Does not sell the domains
- Domain Registrar resell domains.
A Records is the main Address Records
On a URL: https://domains.google.com/get-started
- com: is called TLD (Top Level Domain)
- google: domain name or primary domain name
- www: does not serve a technical purpose. It was mainly a convention to be used
- /get-started: is a path

RPC: Remote Procedure Call, executing code that does not reside on the local machine.

HTTP:

Client and Server do not need to know anything about each other.
No state management
Anything needed for that request and response, is handled/included in the request/response.

Endpoint: A URL and a method like GET, PUT, ... is defined as an endpoint.

HTTP Responses:

Informational Responses: (100-199)
Successful Responses: (200-299)
- 201 (created) which is a reasonable response for a POST request
Redirection Responses: (300-399)
Client error responses: (400-499)
Server error responses: (500-599)

SSL/TLS:

SSL: Secure Socket Layer
- It came before TLS
- It is outdated but still being used
TLS: Transport Layer Security

HTTPS:

Enhance HTTP against Man In The Middle (MITM) attack.

WebSocket:

It resolves an issue if you use http
- Let's say you need to retrieve all the chat messages in a group
- If you use HTTP, you need to frequently send requests to the server to send you the messages
- It is some kind of polling. It might work but it is not optimized:
  - Every time it needs to make a new connection
Client sends a http request to establish a WebSocket connection
Going forward there is a persistent connection between the client and server
Now, the client and the server can be the initiator for a transaction.
- For example a client starts typing a message. This will be delivered to everybody in the channel.
- For other clients, the server is the initiator.

HTTP2: Added streams to its protocol and it will make WebSocket obsolete.

API: Application Programming Interface

Three of the modern APIs:

REST
GraphQL
gPRC

REST API:

REpresentation State Transfer
State-Less: There is not any server stored on the server
For example let's say we need to query all the comments for a Youtube video:
- If we use pagination, we just send something like: https://youtube.com/videos?id=abc&offset=0&limit=10 which gives us the first 10 comments.
Being state-less helps in case of having multiple servers! No data needs to be saved!

JSON: Java Script Object Notation

Readable
From Performance perspective is not the best

GraphQL:

It is built on top of http
It only uses post request to be able to send data on the body which tells which resource is required to be retrieved from the server
When asking for resources, we end up with overfetching which means for example for a particular user we need the profile picture and the username. Not everything

gRPC:

Built on top of HTTP2 because it needs certain features of HTTP2
Cannot be used natively from a browser
You need a proxy
It is usually used for server-to-server communication
It is faster than REST
Instead of json it sends data using Protocol Buffers
leads to send smaller amount of data
Instead of responses as in REST, it uses error messages: custom error handling

Good API

Let's say you have a POST endpoint which requires a userId and the content. After a while you might need to add another field in the body.

You can make that extra field optional so that your endpoint is backward-compatible.
You can use versioning.

Let's talk about an API for tweets!

POST: https://api.twitter.com/v1.0/tweet and pass tweet text and userId
GET (to retrieve one tweet): https://api.twitter.com/v1.0/tweet/:id
DELETE (to retrieve on tweet): https://api.twitter.com/v1.0/tweet/:id
GET (to retrieve list of a user's tweet): We already used https://api.twitter.com/v1.0/tweet/:id. Therefore we need another path: https://api.twitter.com/v1.0/users/:id/tweets?limit=10&offset=0 (for pagination). We make limit and offset optional.

Caching

The response header called cache-control is the header that leads to caching a file from the browser
x-cache: HIT means the data was returned from cache

Write-Around Cache:

When we write the data directly into the primary place which is disk.
The first time when reading the data, it's gonna be a miss. Then we copy the data to the memory (cache).

Write-Through Cache:

When writing the data, it will be directly to the cache.

Write-Back Cache:

What is the difference between write-through and write-Back

CDN: Content Delivery Network

CDNs are used for static content that are not changing
Pull CDNs and Push CDNs

Proxies

Load Balancers are examples of reverse proxy.

Load balancing Strategies:

Round Robin
Weighted Round Robin
Number of Least Connections
Location
Hashing

There are two types of load balancers: Layer 4 (transport layer )and Layer 7 (Application layer)

Layer 4 is faster since it looks at the IP. Does not have higher levels' visibility
That's why layer 7 is more powerful. It is more expensive.

Note: Look at a paper by Google called Maglev.

Consistent Hashing (for Load Balancing)

Let's say you have multiple servers, each having their own on-memory cache. How can we make sure we use the best out of the cached data? For example when a server goes down or we add more servers. How can we make sure we map current requests to a server with cached data?

Note: You need to research this more thoroughly.

Note: Learn Rendezvous Caching

SQL

RDBMS: Relational DataBase Management Systems

SQL are stored on disk
Efficient Read and Write operations are necessary.
It uses B+ tree. TODO: study this

Tradeoff in using RDBMS:

Atomicity
- We have transactions in database
- Transactions should be atomic
- If a portion of a transaction fails, the whole transaction should fail
  - For example you need to move money from one customer and add it to another one. What happens if you update the first customer's bank account value. But, before updating the second customer's, database crashes?
Consistency
- Some columns should not be NULL.
- Certain columns should be unique for example.
Isolation:
- Transactions should not have any impact on each other when running in parallel (side effect).
- We might need to enforce a sequence on transactions
Durability:
- Data should be persisted
- Redis is not durable and therefore not ACID-compliant

NoSQL (Not Only SQL)

The biggest limitation that NoSQL can get over is scale.

Examples of NoSQL:

Key-Value Stores: Redis, Memcached, etcd
- Usually used for cache
Document Store:
- More is like a collection for example a JSON
- The benefit is that it is a lot more flexible
- The most popular is MongoDB
Wide-Column:
- A lot of writes like time series, less read and update
- Might have schema as well
- The most optimized for a massive amount of writing
- Cassandra and Google's BigTable.
Graph DB:
- Built on top of SQL in some ways
- Efficient for example for Facebook Friendship use case

SQL vs NoSQL:

NoSQL dbs can generally scale a lot better than SQL dbs
- Mostly because of the constraints on SQL: ACID
- Scaling horizontally for SQL is quite difficult because of ACID
  - joining two data from two servers and enforcing the constraints, is really difficult
  - For NoSQL we don't have the constraints and therefore they can scale better
ACID for SQL is like BaSE for NoSQL
- Eventual Consistency
- Leader-Follower Concept:
  - We have two database that are supposed to be replicate of each other
  - Write should be on the leader database
  - Eventually the leader will make sure that the same data to the follower(S)
  - For a shorter amount of the time, if requests go to the followers, they might see non up-to-date values
  - We are able to handle higher amount of read capacity
Partitioning data is available for SQL databases as well: Sharding
- by default is not implemented for SQLs
- it is gonna be much more complicated. That's why not implemented by default.

Replication

Leader-Follower Replication: We talked about it a bit earlier.

Asynchronous vs. Synchronous Replication

Leader-Leader Replications:

Pros:
- We can read and write from every single node, scaling up read as well write.
Cons:
- Making consistency is gonna be difficult
Each master usually servers different part of the world, independently.

Sharding

When you massive amount of data, a single query can take seconds to complete.
We take half of the rows in one database in machine 1 and take the second half in another database in machine 2.
How dow we divide the data then?
- We use shard key like female or male
  - Or like Last name from A-L goes to machine 1 and M-Z goes to machine 2
  - Running joins between two machine is gonna be complicated
- Hash-based Sharding
SQL databases like MySQL and Postgres do not provide sharding by default. We should implement them ourselves.
Sharding are not designed to be used with RDBMS databases.

CAP Theorem

Consistency
Availaiblity
Partition Tolerance

It is like having two servers, one leader and follower. Partition Tolerance is when two servers cannot talk to each other. Now we have two options:

Prioritize Consistency which means we make followers unavailable
Prioritize Availability which means although some data are not up-to date, users can send request to them.

PACELC: Give P (network connection between servers working), choose A or C. Else, favor Latency or Consistency

Object Storage

The predecessors to Object Storage was blob storages.

BLOB: Binary Large OBjects.

When we put files there, we cannot edit that files.
Optimized to save a large amount of data
Similar to Hashmaps, your files' ID should globally unique

Use Cases:

When you storing large files that you need for your applications for long-term usages
Metadata can still be saved in databases
Storing large files in a database does not make sense
You can easily send one simple GET HTTP request and retrieve your file

Message Queues

When there are large number of events coming to a server and in case the server can not keep up with the events, message queues come into picture.
The data will be stored on disk
Usually events will be processed FIFO

Data transportation between queue and server:

Server can pull from the queue
Queue can push the data to the servers
- Sometimes the server responds with an acknowledge

Pub/Sub:

There are multiple publishers that produce the events
A middle layer called Topics feed into different subscriptions

Note:

The benefit is that if one application or served is added, it can easily scale and adapt to the architecture by defining more topics and subscribers.
Popular message queues include RabbitMQ and Kafka
Popular cloud-based one is Google's pub/sub

MapReduce

We have a large amount of personal data
For each person we want to redact pipelines
If we need to process it quickly we might need multiple servers to process it in parallel

Now we have two types of processing:

Batch processing:
- We have all the data upfront
- Micro-batch processing is like processing a bunch of data every 30 seconds.
Streaming
- We are processing data in real-time (data is coming)

Note:

Spark Cluster and Hadoop is examples of MapReduce. Probably Flink

Keyboard shortcuts

My Coding Interview Journey