Task Scheduling Service

1 minute read

  1. How do we maintain a set of prioritized jobs that allow us to find the highest priority job?
    • For a single machine, we can maintain a min-heap where entries are ordered by their priority. An additional hash table can be used to map jobs to their corresponding entry in the min-heap to make deletions fast.
    • A more scalable solution is to distribute the data across multiple machines. One approach is to apply a hash function to the job ids and partition the resulting hash codes into ranges, one per machine. Insertion and deletion require communication with just one server.
    • For deletion of min value, we send a lookup minimum message to all the machines, infer the min from their responses, and then delete it. However, gf many clients are trying to do this operation at the same time, we may run into a situation where most clients will find that the min event they are trying to extract has already been deleted., If the throughput of this service can be handled by a single machine, we can make one server solely responsible for responding to all the requests. This server can prefetch the top hundred or so events from each of the machines and keep them in a heap.
    • In many applications, we do not need strong consistency guarantees. A client could pick one of the machines at random, and request the highest priority job. This would work well for the distributed crawler application. It is not suited to event-driven simulation because of dependencies.
    • If a node fails, all list of work on that node fails as well, we can have nodes to contain overlapped lists and the dispatching node in this case will handle duplicates. Consistent hashing can be used to reduce number of migration if a node is lost

Leave a comment