I recently took part in a panel at the Berlin Data Days. During the hour-and-a-half-long discussion concerning the role of 'Relevance' in Big Data an interesting point of conversation arose: Why do recommendation engines suck? General consensus agreed that the recommendations made by the algorithms are often rather unimpressive, if not just plain 'dumb'. Amazon’s recommendation engine (the ‘people that bought this product also bought' feature) was repeatedly quoted as a case in point.
Back home, I went onto Amazon to see whether the engine was as bad as everyone had suggested. A quick test using one of the books I had recently read (Extremely Loud and Incredibly Close by Jonathan Safran Foer) did indeed fail to blow me away. The first two suggestions were books by the same author. The third suggestion was, rather to my surprise, the Koran. The fourth suggestion and those that followed were completely obscure... I’d never heard of neither the books, nor the authors, and, moreover, they didn’t seem particularly well-regarded by reviewers. I logged into my account to conduct a second test. Given the continuous stream of books in my viewing and purchasing history, this time I expected something to happen. To my disappointment, there was no change at all in the recommendation results. However, I do recall that in some cases I have gone on to purchase recommended books - mostly non-fiction ones, if memory serves.
Here’s my take on the matter....
Despite the promise and capabilities offered by Big Data technologies, the problem remains hugely complex. Determining the tastes and interests of millions of individuals based on a finite amount of individual data points for each individual is very difficult.
At minimum, we only have a few data points for each visitor to the website: the item (s) in their shopping basket, the browsing history for their session, and potentially their geo-location and operating system. A large proportion of web traffic, especially as regards eCommerce, is unidentified. Therefore, we don’t have access to very much information that would serve as a basis for algorithmic optimization. A simple rule of thumb of statistics is that the more complex the algorithms, the greater the amount of input data is required. In our example we are thus confined to very simple suggestions; Amazon ends up suggesting books written by the same author.
Once the visitor has been identified, and, ideally, once he or she has already built up a history of interactions with the website (and/or a purchase history), the case becomes more interesting. Now, a lot more input data for the individual is available, and more sophisticated recommendations may be made. More complex models can be applied that can leverage the purchase history of all customers. Regression analysis can help derive recommendations based on the behavior of all customers by identifying the ones with similar tastes and preferences to our visitor at hand. In sum, the quality of recommendations (e.g. as measured by conversion rates) can be significantly improved.
However, we ought to remain realistic - accounting for the individual tastes of millions of human beings, even when using a greater amount of complex algorithms, will nonetheless require a great deal of simplification. In practice, the issue often lies in the fact that advances in the prediction accuracy for a certain customer group will degrade the accuracy of those for another customer group, or even prove detrimental to the overall accuracy of the predictive system. Addressing one issue often entails the creation of a new one.
Depending on the business sector, a great deal of emphasis needs to placed upon on making 'safe' recommendations. In Amazon’s case, the site offers plenty of content that could prove offending to customer. As an example, Brian imagined what might happen if Rushdie's Satanic Verses were to be recommended: customers might be offended or alienated. While one might argue that certain content could simply be excluded from the recommendations, this would pose a complex problem in and of itself given the diversity of Amazon’s consumer audience, not to mention the effort and cost that would be necessary to create and maintain such tags. It is therefore much cheaper to implement algorithms that inherently ‘play it safe'.
Recommendations necessarily rely upon the trust of the individual at whom they are directed- the more unexpected the recommendation, the more difficult it will be to convince the individual in question to accept it. Hence, they must trust the recommending individual or the recommendation engine that they have chosen to consult. When it comes to the previously discussed Amazon recommendation, although it may seem obvious and uninspired, a list of other books by the same author will intuitively make sense for most clients. Compare this list with a selection of authors largely unknown to the individual – in the latter case, the client will in all likelihood hardly have the time to actually verify the relevance or quality of the recommendation beyond taking the plunge and buying the recommended book, and thereafter investing the time required to read it. Trust must be established over a period of time, which I believe is a systemic issue that limits recommendation engines to making 'safer' recommendations, at least at the beginning of each individual client interaction.
An interesting criticism leveraged not only against today's recommendation engines but also against the concept of recommendation engines itself is grounded in a concept known as the 'filter bubble'. When people are provided with content that has been personalized based on their previous interactions with and inferred or stated preferences regarding the content provider, they receive less exposure to conflicting viewpoints and are thereby intellectually isolated in their own informational bubble, which leads in turn to a lack of opportunities for discovery as well as the systematic removal of serendipity. While this certainly might be the case for certain particular recommendation engine implementation scenarios, I do not believe that this is a systemic problem inherent to the technology. There is no reason why technology could not allow for serendipity. The simplest 'algorithm' for achieving this would be a purely aleatory choice of (the portion of of) the (total) content recommended to the individual.
Although this seems like a poor method for fostering serendipity, since it risks generating irrelevant recommendations, more sophisticated approaches will eventually emerge. One approach our company successfully implements is learning what other clients with similar preferences like in order to derive a recommendation for the client at hand. Clearly, identifying clients with 'similar' interests is complicated, but multi-dimensional regression analysis today already allow us to achieve more relevant results.
Consider a rather thought-provoking rejoinder to the criticism of current recommendation engines: what if they are actually fine for the time being?
Development efforts in analytics produce non-linear output and are hardly predictable. For days, or even weeks and months, there might be no progress at all, and then all of a sudden a single idea can foment a drastic change in performance. Additional incremental improvements to the overall system performance will thereafter become harder to achieve. Improving a completely random selection of recommendations is rather easy, but as the sophistication of the recommendation engine increases, improvement will become increasingly difficult. Let’s also not forget that the recommendation engine needs to work for a large number of people. An idea that might improve recommendation quality for a certain group of people should not degrade the recommendation quality for other clients.
Faced with a typically strongly decreasing curve of marginal improvements, it rapidly becomes unprofitable to seek further improvements. In the case of private companies funding the development, they will scrutinize the business model and the gains that can reasonably be expected after costly and lengthy system development. Many companies, when reviewing their recommendation engines, acknowledge the engines’ imperfections but ultimately decide that they can’t make a case for major further developments
For example, Netflix famously offered 1 million USD to any team that could improve its recommendation engine (aimed at fostering 'discovery' and monetizing its vast long-tail of lesser-known movies). Following their most recent competition, only one team was able to improve the accuracy even minimally with the use of a very complex set of algorithms. They have never been brought into production.
Lastly, I believe that whether or not current recommendation engines 'suck' or 'perform' is purely a matter of expectations. The Big Data hype has certainly contributed to high expectations in terms of what can be achieved in the personalization field. And it is unsurprising that the technology available today as yet fails to match these ambitions. From the perspective of a company seeking to increase relevancy for individual customers however, the solutions we can build today using a limited amount of effort do create a lot of value when compared to very naive recommendation approaches, or not having a recommendation engine at all.
Ultimately, the business case IS attractive, despite its potential failure to 'wow' clients with very high expectations. If a recommendation engine increases conversion by several percent, the impact on the bottom line is often significant. If the engine can currently be tested and implemented within weeks, and at limited to no investment at that, it constitutes a highly attractive business case.