Online learning algorithms with application in ranked recommendations
Master Thesis
Author
Panagiotopoulou, Evgenia
Παναγιωτοπούλου, Ευγενία
Date
2019Advisor
Telelis, OrestisΤελέλης, Ορέστης
View/ Open
Keywords
Online learning ; Multiarmed bandit problem ; Contextual bandits ; Ranked recommendations ; Linear rewards ; LinUCB ; RBA ; IBAAbstract
In this work we study the Online Learning and its application in ranked recommendations’ systems that use context. Nowadays, modern platforms, websites and applications create an increased need for recommendations’ systems that offer useful content suggestions. Online learning poses a great solution towards that purpose, as it can leave the customer – or user – satisfied, while requiring minimal computational resources, without demanding training or past data and with the ability to adapt quickly to new data. Furthermore, by introducing relevant context – side information – into an online learning recommendation system we can expect to produce content suggestions for the users that are appealing and tailored to their needs and interests.
Specifically, over the course of this study we explore bibliographically the Multiarmed Bandit problem (MAB), the Contextual Bandits, the Rankings of Recommendations and the corresponding algorithms. In order to delve deeper into the online learning recommendations, we design and conduct experiments with our own generated artificial datasets, using the algorithms that we found the most interesting. Our idea was to combine the recommendation meta-algorithms RBA and IBA with instances of the linear rewards contextual algorithm LinUCB. As it is, the two cases we are comparing are the RBA-LinUCB – a single-click, diverse-rankings algorithm that has been tested experimentally before – and the IBA-LinUCB, which is a multiple-clicks algorithm that is being tested for the first time in this work, to our knowledge.
In the results of our experiments it appears that the RBA-LinUCB has an increasingly better performance than the IBA-LinUCB, as an increase in the standard deviation of the arm rewards (SDR) of the MAB leads to a higher cumulative average regret by the IBA-LinUCB, while the RBA-LinUCB remains unaffected. Moving to another viewpoint, though, it appears that the
IBA-LinUCB yields increasingly more clicks than RBA-LinUCB, as the average rate of rewards (ARR) of the arms increases. Finally, by monitoring the way the instances in the recommendation slots learn, it is revealed that the IBA-LinUCB slots learn much faster and more accurately than those of RBA-LinUCB. The above observations lead us to the fact that the IBA-LinUCB is expected to offer more substantial results and yield more clicks than the RBA-LinUCB, and thus constitutes a more effective solution when used in online contextual recommendation systems.