A recommender system is made up of five core components (Figure 1). This post is intended to give the big picture. In future posts we’ll jump into details.
Arguably, the core component is the one that generates recommendations for users; the recommender model (2). It is responsible for taking data, such as user preferences and descriptions of the items that can be recommended, and predicting which items will be of interest to a given set of users. By far, the vast amount of work reported in the field of RecSys concentrates on the recommender model. It can be easy to forget that an end-to-end system also has other important components.
Recommenders are very much garbage-in-garbage-out systems so it’s worth investing time in developing a suitable data collection and processing component (1). The inner workings of this component are very much use case specific but it’s common to have some data cleansing and normalisation steps plus some feature generation and selection capabilities. As the recommender model has an upstream dependency on this data, the quality of recommendations generated is constrained by the quality of the input data.
The recommendations generated by the recommender model often need some post-processing (3) before being shown to users. At this point, it’s common for some recommendations to be filtered out and some reranking to be applied. This component is typically responsible for making sure that the recommender doesn’t look stupid. It may implement some business logic such as not recommending certain types of items to particular users or trying to increase the diversity of the recommendations to give users a more varied selection of items to choose from. It’s worth noting that post-processing can be done in batch mode (i.e. offline), real-time mode (i.e. online) or as a combination of the two modes, depending upon the system’s requirements.
After post-processing the recommendations, there’s then a set of online modules (4) that are responsible for serving them and tracking their use. It’s here where you’ll define what needs to be stored in your logs in order to both report how your system is performing and perhaps to also learn from the usage and interactions. If you want to perform any online testing (e.g. A/B testing) of different methods for generating recommendations then this capability tends to also live here.
Once you have generated recommendations, you then need a way to show them to users. The user interface component (5) defines what users see and how they can interact with the recommender. It’s probably not a surprise to find out that the interface also has a big impact on the usefulness of a recommender system. In the future we’ll write about some good practices to follow here and some pitfalls to watch out for. For example, it’s good practice to explain to users why they are being recommended an item (e.g. “You may like watching this movie because you liked movies X and Y”), as this makes the recommender’s decisions more transparent.
These five components can be developed in parallel or sequentially so you can structure their development however it suits you – depending on your team and your goals. In practice, you’ll often find that you want to develop some of the components more fully than others. For example, once you have the basic components in place, you may want to spend more time at the outset on making sure that the data collection and processing is done well and just have a shallow implementation of the recommender post-processing component. We’ll discuss the pros and cons of these approaches in future blog posts along with many practical examples of how to build these components up.