Feed is often seen in middle of your screen when visiting social sites such as Facebook, Twitter, Pinterest, etc. The purpose of feed is to increase user’s engagement. It is almost always personalized to tailor to individual’s interest, based on either their networks (friends or follower) or any interest that they follow. For example, Quora or Pinterest allows you to follow topics, Glassdoor allows you to follow companies and jobs, etc.
At Hoovada.com- the largest Vietnamese knowledge exchange social sites, we set out our target to develop a highly personalized feed based on the topics that they are fond of. Obviously the service is very heavy-read since the whole idea is to display the feed to user based on certain ranking.
There are 2 main problems when building activity feeds:
- Real-time update on events/activities that we want to collect from users
- Quickly displayed relevant activities to users’ feeds
The two problems are often in conflict, as the famous Justin Bieber problem demonstrates it is hard to update users’ feeds with activities published by famous users. Of the 2 problems, we often prioritize the first problem at it is quite okay if user misses a few new feeds. The feed service, therefore is a very good example of highly-available service, it is acceptable for data to be inconsistent and users do not always get the latest feed for sometimes.
We discuss ideas in the 2 main tasks for building a feed service: Feed building and feed publishing
Feed Generation and Ranking
Feeds are ranked depending on the application. It could simply be Chronological, i.e. Twitter and Instagram or ranked based on more complicated criterion, i.e. Facebook uses a lot of ranking signals such as whether the post has images/videos, number of likes, comments, shares, time of the update, who the user typically interact with.
- What is the logical flow of getting feeds?
get(user_id, , current_timestamp, number_of_feeds=10) Return: list of feed items
- Client sends get API to backend server to request for a number of feeds (we can set a default value), it gets all groups and people that the user follow to get all the feeds and rank them before return top feeds to user.
- What do we store in database
- Mapping of feedId with metadata, i.e. content, location,creation date, etc.
- Mapping of userID and the metadata, ie. email,name, creationDate, etc,
- Mapping of userId to their followers Ids, the groupIDs that they belongs and the feed Id that they like
- What do we store in storage?
- We need to store different types of feed: text, videos and images. They can be store in object or distributed filesystem
- Should we perform live or offline feed generation?
- Live generation can increase latency if users have huge number of friends, follows, etc. Also it causes huge loads for people or pages that have a lot of followers.
- We can perform offline generation and store in a pre-generated server and send to clients whenever they request. However, some users are not very active so we cannot keep their feeds for very long, we need to remove them to free up space for others.
- Should we always notify users if there are new posts available?
- Yes, It could be useful for users to get notified. However, on mobile devices, where data usage is relatively expensive, It can consume unnecessary bandwidth, hence we can choose not to push data, instead, let users “Pull to Refresh” to get new posts only.
Feed Publishing or Fan-out
Fan-out-on-load or pull mode: the feed is loaded every time user demand for feed, i.e. by scrolling down their feed page as in the case of Facebook. As a result, new data might will not be shown to the users until they issue a pull request, and if there is no new data, pulling can cause waste of resources.
Fan-out-on-write or push mode: system push new feeds to users whenever there is new feed. Twitter is known for this approach. In order to do this, we need to maintain a persistent connection with client, i.e. with Websocket or Long Polling. In the case of Twitter, they would experience a heavy load to push updates from a celebrity user with a lot of users. An improvement from push mode is to load the feed from celebrity only when the followers pull the feeds.