In this post we’ll look at a relatively low cost but high value technique for improving the quality of your recommendations, named dithering. It’s a technique that re-orders a list of recommendations by slightly shuffling them around. It is typically implemented in the recommendation post-processing component. Despite its value, it doesn’t seem to have been widely adopted in the RecSys community with the notable exception of Ted Dunning who has recently championed it in several of his presentations, and dedicated a section on it in his book with Ellen Friedman on “Practical Machine Learning”.
The goal of a recommender is to find relevant items that a user may be interested in. The list of recommended items is typically sorted by the items’ relevance scores (highest to lowest) and is presented to the user in that order. These scores may not change much over time, meaning that the list of recommendations for individual users don’t change much either. Depending upon your user interface, you’ll also be restricted to showing only the top n recommendations to users in a single screen at a time, despite the fact that you may have generated a much larger list of recommendations for them. This means that when users return, many will see pretty much the same top n recommendations every visit and you’ll miss the opportunity to show them ‘less relevant’, but perhaps very interesting, recommendations.
Dithering provides a solution to this problem. The idea behind dithering is to re-order the recommendations list by adding random noise to the original relevance-based ordering. This results in surfacing some of the items that are further down the list (e.g. from the second page or even later pages) to the first page. In doing so, it’s possible to create the illusion of freshness in your list of recommendations as they appear to change regularly between visits although they may actually have been generated from a single run of a recommender model. What’s more, there are also benefits to recommending more items to users as it increases the variety of interactions from which your system can learn. We’ll look more into learning from such interactions in future posts.
The implementation of dithering, as described by Ted Dunning, is quite simple. The idea is to add normally distributed random noise to the initial rank of the recommendations (based on relevance scores: from highest to lowest), and re-order the results based on the new dithering score (from lowest to highest) as defined in Equation 1. How much the recommendations in the list get re-shuffled depends on the amount of random noise being added (i.e. values of epsilon).
We’ve included a simple Spark implementation of dithering (Code Snippet 1), and our language of choice is Scala. We define a function, ditherRecommendations, which takes in an RDD of Recommendation objects, and re-orders the recommendations for each user based on the desired amount of shuffling defined by parameter epsilon. Note that the epsilon parameter needs to be greater than 1.0, otherwise we set it to a very small number which results in almost no change to the original rank order. When dithering the recommendations for a given user, we first make sure they are sorted by the relevance scores (highest to lowest), then assign an index (i.e. zero-based rank) to each recommendation. Once we have this, we compute the ditherScore (Equation 1), and finally re-order the recommendations based on it (lowest to highest).
For completeness, we’ve also added code (Code Snippet 2) showing how to create an RDD of Recommendation objects (required input to ditherRecommendations function) from a tab-separated input file containing (user, item, score) triples on each line. Then we dither the input recommendations as described above, and store the final dithered recommendations in a different tab-separated file containing (user, item, score) triples on each line.
The effect of dithering can be better explained with an example. In Table 1, we demonstrate how an ordered list of 30 recommended items can change when different levels of dithering (i.e. values of epsilon) are applied to it. Each column shows a list of recommended items. Each row matches the rank (i.e. position) of the item in the list. The numbers in the cells correspond to the rank of each item as it appears in the original list, with no dithering applied. So, in the case of the first column, the cells are ordered from 1-30, indicating that this column shows the original ordering of the recommended items. The colour of the cells is a gradient from red to blue, with respect to the original ordering of the items. That is, a cell will always have the same colour for a given value of rank, regardless of the column in which it appears, and will form a gradient pattern in the first column which has the original ranking preserved.
The second column shows how the order of the recommended items changes when dithering is applied with a noise level of 1.25. In this case, the order of the first three items remains unchanged, but the fourth item in the dithered list is now the item that appeared seventh in the original, undithered list. With this low level of noise, the ordering of items has changed some but those that appeared at the top still tend to do so and those that appeared at the bottom of the list still tend to do so. As the level of noise is increased the amount by which individual item rankings change also increases. The amount of noise can reasonably be interpreted as the amount of shuffling that takes place in the lists. By the time the noise level has reached 5, the top 10 items in the list include five items that originally appeared in the top 10 and five that did not.
In the case that you have a perfectly ordered list of recommendations, dithering is guaranteed to reduce the correctness of item ordering. In practice, however, recommenders are not perfect so you tend to start with a reasonably ordered set of recommendations where some of the items will be relevant to the user and others will not. When you apply dithering to them, there is a tradeoff between the amount that users will appreciate them and amount by which the list changes over time/freshness (Figure 1). If you have set up a good experimentation and evaluation framework then you can find a reasonable level of dithering that will give you the sweet spot between user appreciation and item freshness. In practice, we’ve found that a small amount of dithering can actually increase user engagement and satisfaction with recommender products.
By introducing dithering, the recommendation page changes (each time the seed for sampling random variables changes), whether that is every day, few times a day, every hour. Having a sense of the recommendation list is changing and fresh, makes it more interesting for the users and hence they are more likely to come back to the page to check for “new”, fresh recommendations. Dithering also makes it possible to get feedback on items that the user may not have otherwise seen as they would not have been displayed on the first or even second pages. As a result, the recommendation engine is able to use more complete training data (i.e. more items to choose from, broader feedback and user activity), which often times results in higher user engagement.
Dithering is one of those techniques that can seem counter-intuitive. If you invest lots of time in building a super recommender model that can generate a great ranked list of recommendations then why spoil it by playing with the ordering? It helps to understand that for most problems, even the best recommender models may have trouble generating high quality lists of recommendations for users. At best, they can get it reasonably right, so the starting point isn’t a great ranked list of recommendations but something less great. Dithering introduces freshness into the list so that the user doesn’t see the same top n recommendations every visit. It does take a little away from the quality of the recommendations but the opportunity created to visit items that appear further down the list often makes it a price worth paying. The intuition behind dithering isn’t a million miles from that behind explore/exploit algorithms. In future posts, we’ll look at some of these, like Bayesian Bandits (e.g. Thompson Sampling), and show how they too can deliver value for users of recommender systems.