When recommending items to users, it’s not a good idea to recommend the same ones over and over again if the user isn’t interacting with them. In a previous post, we discussed a technique called dithering that allows us to change the order of recommended items, creating the illusion of freshness, and reducing the chances of them looking stale. This is a good starting point but what we really want is for the recommender system to be a bit smarter and for it to learn that when a user is repeatedly shown a recommendation and doesn’t interact with it that it’s an implicit signal that they are not interested in it. As a result, such items should be penalised, perhaps by bumping them down the list of recommended items or by removing them entirely. In this post, we’ll look at a technique that aims to do just, called impression discounting.
Impression discounting is a technique described by Lei et al. and implemented at LinkedIn for recommending people to follow in LinkedIn’s social network and skills to attribute to connections. So what’s with the strange name, impression discounting? Well an impression is another way of saying that an item has been displayed to a user. So in LinkedIn’s People You May Know case, a profile is impressed when it is displayed to a user. If an impressed item is not engaged with (i.e. it is neither accepted nor rejected) then they want to reduce (i.e. discount) the likelihood that it will be push out as a recommendation again. It turns out that if you apply impression discounting then you can increase the acceptance rate of recommendations.
Under the Hood
Impression discounting relies on learning a model for users based on their historical engagement (i.e. impressions) with recommendations over a certain period of time. You can think of it as a re-ranking plugin for a recommender, where it’s implemented either in the Recommendation Post-Processing or Online Modules components, or even in both. The model, when applied to a list of recommendations for a user will then penalize (i.e. discount) items that have not been engaged with and change the ordering of the items in the list. Items that have previously been impressed but not interacted with will be pushed down the list, allowing other items to bubble up. The hard part of this process is training the model so that it discounts items by the correct amount, neither being too harsh or lenient.
More formally, when you apply impression discounting to a ranked list of recommendations, you generate new scores for the items based on the product of their original scores and a discounting factor:
new_score = orig_score * f(g(X))
where, X is a set of features relevant to the user’s impressions with the item (e.g. number of times item was impressed, how long since the item was last impressed), g is a discounting function for individual features, and f is a function that combines the discounted features to get an aggregated discounting factor. There are different types of discounting functions, g, such as linear, exponential and quadratic ones that can be used. The authors of the impression discounting paper found that using exponential decay worked best for them, and this is what we’ve also found in our experience.
For example, if we use only two features impCount and lastSeen and discount them using an exponential discount function, g = exp(a*X+b), then we can aggregate them using a linear model, f = w*g(X). The weights, w1 and w2 can be learned by training a linear regression model on your log data. So the new score can be computed as follows:
new_score = orig_score * (w1 * g(impCount) + w2 * g(lastSeen))
Is it surprising that impression discounting has a positive effect on acceptance of recommendations? Not really. In fact, it’s pretty intuitive to assume that items not engaged with aren’t that interesting to users. Therefore it’s reasonable to penalise them over time and to make way for other items that may be more interesting. It’s one of those sensible ideas that feel right and do actually work when put into practice. It’s also a common finding in recommender systems that incorporating implicit feedback into your models can improve the quality of recommendations generated. This was one of the key findings that came out of the Netflix Challenge. This is a reasonably straight-forward technique to put in place but keeping a history of user-item interactions can be burdensome in terms of maintenance for your system so take that into consideration before throwing this one in.
We have really just given the high level ideas behind this technique here. For far more information on this you can take a look at the presentation Modeling Impression Discounting in Large-Scale Recommender Systems given at KDD 2014 or check out the original paper.