Rate limiting Service

2 minute read

This section covers proposed solutions for problems when designing rate limiting or throttling service

  1. Why do we need Rate limiting service?
    • It helps to protect service again abusive behaviors such as overwhelming the service and to enhance security
  2. What are the requirements of the service?
    • Highly available: it should always operational
    • Low latency: we do not want the service to introduce to much additional latency to entire system
  3. What is the logical workflow of throttling service?
    • Each service/API have different limit number of request for a fixed/dynamic time window. When a particular limit is crossed, the server returns status ā€œ429 - Too many requestsā€
  4. What do we store in database?
    • We need to map user identity (UserId) to start timestamp of the time window and current number of counts within (current_time-start_time). If (current_time-start_time) exceeds our limit, we simply reset the count
    • The rate limiter data is not critical and can afford to lose, we dont really need persistant storage, instead in-memory such as Redis or memchached can be used
  5. What are Rate limiting types?
    • Hard throttling: number of requests cannot exceed a defined limit
    • Soft throttling: we set the request limit to exceed a certain percentage
    • Dynamic throttling: the number of requests exceeding the limit depends on available resources
  6. what are Rate limiting algorithms?
    • Bucket algorithm
    • Fixed window algorithm: the time window is defined from from a fixed start time to a fixed end time. However, we could potentially see lots of request within short time span, i.e. when many requests are sent within 2nd half of a window and 1st half of next window. This problem can be handled with sliding window algorithm.
    • Rolling window algorithm: the time window starts from the time at which the request is made plus the time window length.
    • Sliding window algorithm: instead of storing only start timestamp, we can store list of timestamps where requests are made. The max length of the list is our limit. When a new request is made, we check if the first timestamp of the list is still within our time window and if the limit has crossed, then we reject the new request, else we can insert new request timestamp to end of the list. If our fixed time window is large, i.e. hourly, we can split the window into multiple smaller fixed time window and then modify our sliding window algorithm by storing number of counts per minutes
  7. Which entity should we throttle on?
    • IP: There are huge number of IPv6 addresses and we will run out of memory to keep track of all IPv6
    • User account: we can only limit based on user account after authentication, but we cant do it on login API
    • We can do hybrid depending on API/service

Leave a comment