When I hear about recommendation algorithms someone always brings up Machine Learning. I've been thinking about how to make a better recommendation algorithm, here's my idea:
First, we ask the user to select topics he likes from a given list.
Second, we ask the user to select topics he dislikes from the same given list.
Then we present recommendations of 4 types :
(Familiar) : Content from topics the user clearly likes.
(Fresh): Content from topics that the user likes that intersects with topics that user does not dislike.
(Novel): Content from topics that the user does not dislike.
(Hit-or-miss): Content from liked topics that intersects with disliked topics or content from liked topics that intersects with not-disliked topics that intersects with disliked topics. (more brefly: [like and disliked] or [liked and not-disliked and disliked])
We present those 4 types with the following ratio :
(Familiar) 50%
(Fresh) 30%
(Novel) 18%
(Hit-or-miss) 2%
Each time the user watches a video from type (Fresh),(Novel) and (Hit-or-miss) we give the opportunity to the user to add the topics of the video that are in doesn't disliked or disliked to the liked category or not-disliked category. If the user does not do anything it stays in their respective categories.
At any time the user can change his preferences in the settings.
I wonder if others think that this is a good idea for handling recommendations? Do you think that it is better to do it that way as opposed to relying on machine learning or the tiktok 5 sec rule.
Was inspired by this discussion specifically : https://news.ycombinator.com/item?id=24578603
Honestly, read up on the articles discussing the Netflix prize, they tended to mention a lot of unexpected gotchas. For instance, it turns out, ratings aren’t really independent. If you rate something good, it will affect how you rate the next thing. Stuff like that.
One type of recommendation system is user-user recommendation, so it is taking stuff you've (A) looked at, comparing it to what other's have seen (B), then trying to find what from A that B might like from weighting features of each possible item to find the item that person might like most (and therefore, do the action you want them to: view, buy, etc)
So if you view Video 1 (action), Video 3 (action) and Video 4 (comedy).
And someone else views Video 4 (comedy), Video 2 (comedy) and Video 3 (action).
If we are only using genre as a feature, the system would want to recommend you Video 2 (a comedy you haven't seen) and it would recommend the other user Video 1 (an action video they haven't seen).
Your system is introducing new features that could be useful.
In my opinion, the reason why recommendation systems suck is either people learning how to game it and therefore have their stuff recommended when their item shouldn't be, the system not having enough data or the system just not being great at figuring out what you like due to bad weights.
EDIT: Fully re-reading this post, it seems my post is a bit off-topic.
With that said, ML is not a dirty word or has to be some magical thing. It is applying stats to a large dataset. It's not anything fancy nor does it have to use the new shiny thing.
ML can be as simple as: go through this list of users and videos, given these weights, what videos would each user likely like most?
Instead of presenting a list of recommendations (like YouTube, Netflix or Instagram does), TikTok infers what it thinks you will like and directly shows it to you. Then based on how you react: if you like the video, follow the creator, watch it until the end, or swipe away, they know with much more accuracy the type of content you enjoy. This allows for more testing on their part too.
On the other hand, on platforms like YouTube, there are a lot of factors beyond the video itself that influences if a person even clicks on it (like the thumbnail, the title, the number of views, who the creator is), so if someone doesn't click on a video, that doesn't tell you if they would have liked it or not. Because of that their data to make recommendations isn't as accurate.
SELECT * FROM videos WHERE NOT EXISTS (SELECT 1 FROM votes WHERE video_id = videos.id AND user_id = $user_id) AND NOT EXISTS (SELECT 1 FROM not_interested WHERE video_id = videos.id AND user_id = $user_id);
Aka don't recommend anything I've already voted on or already explicitly told you I'm not interested in. That would remove at least 98% of my recommendations.
Even without that improvement, though, YouTube is an amazing service, and the reliability is first class. I can't remember the last time it was actually down. Personally I suspect the YouTube team is just tasked with keeping things ticking over, and that they're too busy putting out fires and coping with tech debt to actually develop useful features.
1) for user to be able to manage those percentages directly or using various pre-made profiles and maybe allow some randomness mixed in;
and
2) "exploraton mode" where topics with content that is "disliked but not conflicting with the core values", "disliked but they may try to convince me here" and "initially neutral, unknown or not entirely irrelevant to them" are suggested.
I also think that attempting to extract the core values as hidden variables and using them to predict, or guess from oher users and external news, the future direction of their evolution can also be useful.
My question would be how do you get all of your content classified in a reasonable way?