Smarter: Born by Large Scale Data and State of Art AI Techniques
By analyzing more than 100k reviews from 30k customers to ~10k restarurants, with a state of art AI techniques, Hierarchical Latent Tree Analysis, (HLTA) presented by AAAI 2016, Recomen can understand every restaurant, every customer mmore precisely than ever before. Recomen build profiles for every restaurant and every customer, from the public reviews given by customers.
First, Recomen builds vectors for each restaurant from perspectives of topics from HLTA, by analyzing the sentiment of reviews related to each topic. If the sentiment of reviews in this topic is positive, it means that this restaurant in this filed is good, vice versa. The sentiment score is generated by NLTK tool. Then we analyze customers by counting the appereance of each topic from all of thier reviews. If the customer mentioned a topic a lot of times, this probably means that the custormer cares about this topic. (We normalized the counts to sublinear). Finally we just need to find the max inner product of customer and restaurant to recommend a restaurant. More importantly, Recomen could have 5% Improvement compare to collaboratie filtering recommendation syetem.
Faster: x1000 times faster than tranditional recommendation systems
Since Recomen only need to do linear scan to find a max inner product, the speed is much faster than traditionaly recommendation techniques, like Collaborative filtering. Further more, we accelerate our linear scan by advanced data mining techinique, Locality Sensitive Hashing, and improve the speed by x10 times (even x100 times for lager dataset), to make the speed improvement to hit x1000 times!
Robuster: stop sufferring from sparsity of matrix problem
Recomen never suffer from the problem of sparsity of matrix, so Recomen will never lost her mind when customer comes to a new environment, like traveling, while traditionaly model like Collaborative filtering alway rely on finding similar customers and restaurants, if customer come to a new enviroment, it can not work.