With the continuous development of network technology and the ever-expanding scale of e-commerce, the number and variety of goods grow rapidly and users need to spend a lot of time to find the goods they want to buy. This is information overload. To solve this problem, the recommendation system came into being.
The recommendation system is a subset of the Information Filtering System, which can be used in a range of areas such as movies, music, e-commerce, and Feed stream recommendations. The recommendation system discovers the user’s personalized needs and interests by analyzing and mining user behaviors and recommends information or products that may be of interest to the user. Unlike search engines, recommendation systems do not require users to accurately describe their needs but model their historical behavior to proactively provide information that meets user interests and needs.
In this article we use PaddlePaddle, a deep learning platform from Baidu, to build a model and combine Milvus, a vector similarity search engine, to build a personalized recommendation system that can quickly and accurately provide users with information that might be of interest to them.
We take MovieLens Million Dataset (ml-1m)  as an example. The ml-1m dataset contains 1,000,000 reviews of 4,000 movies by 6,000 users, collected by the GroupLens Research lab. The original data includes feature data of the movie, user feature, and user rating of the movie, you can refer to ml-1m-README .
ml-1m dataset includes 3 .dat articles: movies.dat、users.dat and ratings.dat.movies.dat includes movie’s features, see example below:
This means that the movie id is 1, and the title is 《Toy Story》, which is divided into three categories. These three categories are animation, children, and comedy.
users.dat includes user’s features, see example below:
This means that the user ID is 1, female, and younger than 18 years old. The occupation ID is 10.
ratings.dat includes the feature of movie rating, see example below:
That is, the user 1 evaluates the movie 1193 as 5 points.
In the film personalized recommendation system, we used the Fusion Recommendation Model  which PaddlePaddle has implemented. This model is created from its industrial practice.
a. The user features incorporate four attribute information: user ID, gender, occupation, and age.
b. The movie feature incorporates three attribute information: movie ID, movie type ID, and movie name.
Combined with PaddlePaddle’s fusion recommendation model, the movie feature vector generated by the model is stored in the Milvus vector similarity search engine, and the user feature is used as the target vector to be searched. A similarity search is performed in Milvus to obtain the query result as the recommended movies for the user.
The inner product (IP) method is provided in Milvus to calculate the vector distance. After normalizing the data, the inner product similarity is consistent with the cosine similarity result in the fusion recommendation model.
There are three steps in building a movie recommendation system with Milvus, details on how to operate please refer to Milvus Bootcamp .
Running this command will generate a model recommender_system.inference.model in the directory, which can convert movie data and user data into feature vectors, and generate application data for Milvus to store and retrieve.
# Data preprocessing, -f followed by the parameter raw movie data file name
$ python get_movies_data.py -f movies_origin.txt
Running this command will generate test data movies_data.txt in the directory to achieve the pre-processing of movie data.
# Implementing personal recommender system based on user conditions
$ python infer_milvus.py -a <age>-g <gender>-j <job>[-i]
Running this command will implement personalized recommendations for specified users.
The main process is:
Prediction of the top five movies that the user is interested in:
By inputting user information and movie information to the fusion recommendation model we can get matching scores, and then sort the scores of all movies based on the user to recommend movies that may be of interest to the user. This article combines Milvus and PaddlePaddle to build a personalized movie recommendation system. Milvus, a vector search engine, is used to store all movie feature data, and then similarity retrieval is performed on user features in Milvus. The search result is the movie ranking recommended by the system to the user.
Milvus  vector similarity search engine is compatible with various deep learning platforms, searching billions of vectors with only millisecond response. You can explore more possibilities of AI applications with Milvus with ease!
 MovieLens Million Dataset (ml-1m): http://files.grouplens.org/datasets/movielens/ml-1m.zip
 ml-1m-README: http://files.grouplens.org/datasets/movielens/ml-1m-README.txt
 Fusion Recommendation Model by PaddlePaddle: https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basics/recommender_system/index.html#id7
 Milvus: https://milvus.io/en/