requirements
Functional
- sendMsg(messageBody)
- receiveMessage()
Non-functional
- scalable(handles load increases, more queues, and messages)
- highly avaiable(survive hardware/network failures)
- performant(single digit latency for main operations)
- durable(once submitted, data is not lost)
High-level architecture
VIP and Load balancer
VIP can be SPOF. so VIP partitioning is required.
FrontEnd Service
- a lightweight web service
- stateless service deployed across several data centers
Functions
- request validation
- required parameters are present
- data falss within an acceptable range
- Authentication/Authorization
- validating identity of a user of a service
- TLS(SSL) termination
- SSL on the load balancer is expensive
- termination is
- Server-side encryption
- Caching
- Rate limiting(Throttling)
- leaky bucket algorithm
- request dispatching
- circuit breaker pattern prevents an application from repeately trying to execute an opertion that will be likely to fail
- bulkhead pattern helps to isolate elements of an application into pools so that if one fails, the other will continue to function.
- request depulication
- may occur when a successful sendMessage fails to reach a client.
- usage data collection
- billing/ realtime usage
Metadata service
- caching layer between frontend and a storage
- many read, little writes
- strong consistency storage preferred
backend service
- where and how do we store message? -> RAM and local disk
- how do we replicate data?
- how does FrontEnd select a backend host to send data to? Metadata service
- how does frontend know where to retrive data from? Metadata service
Option A: Leader-follower relationshiop
OPtion B:
comparions OPtion A/B :
in-cluster manager | out-cluster manager | |||
---|---|---|---|---|
manages queue assignment within the cluster | managers queue assignment among clusters | |||
maintains a list of hosts in the cluster | maintains a list of cluters | |||
monitors heartbeats from hosts | monitos each cluster health | |||
deals with leader and follower failures | deals with overheated clusters | |||
split queue between cluster nodes(partitioning) | splits queue between clusters |
What else is important
- Queue creation and deletion
- message deletion
- do not delete message. it can be deleted by batch job
- consumer needs to call deleteMessae
- message replication
- async replication: low latency. how to sync when one host is down?
- sync replication: high latency. hit consistency
- hard to achieve
exactly once delivery
- push vs pull
- FIFO. doesn’t guarantee the strict order of the message
- security: encrypte messages
- monitoring
final look
DB selection
https://www.youtube.com/watch?v=cODCpXtPHbQ
- Structured data
- yes
- Need ACID
- RDBMS(yes): mysql, orcle, sql server, postgres
- Need ACID
- no
- Ever increasing data && + finite queries -> Columnar DB: Cassandra, HBase
- ++ Data types && ++ queries -> document DB: mongoDB, Couch Base
- yes
interesting example: At ecommerece site, we shouldn’t sell more than the remaining quantity. it should support ACID. so we can use SQL before placing order. but once order is created, then you can use MongoDB to save the data.