content_based_algo.py

class ucas_dm.prediction_algorithms.content_based_algo.ContentBasedAlgo(item_vector, dimension)[source]

Bases: ucas_dm.prediction_algorithms.base_algo.BaseAlgo

Content-based prediction algorithm

__init__(item_vector, dimension)[source]
Parameters:
  • item_vector – Should be a pd.DataFrame contains item_id(integer) and its vector([float]) | id | vector |
  • dimension – Vector’s dimensions.
_build_user_model(user_ids)[source]

This method will calculate user model for all users in user_ids.

Parameters:user_ids – users’ id list
Returns:A dict contains user’s id and vector.
static _calc_dim_average(vectors_array)[source]

This func calculate the average value on every dimension of vectors_array, but it only count none-zero values.

Parameters:vectors_array – np.array contains a list of vectors.
Returns:A vector has the average value in every dimension.
_generate_retrieval_model()[source]

Use the retrieval model(faiss) to speed up the vector indexing

Returns:Ready-to-work retrieval model from faiss
static _vector_normalize(vectors_array)[source]
classmethod load(fname)[source]

See BaseAlog.load for more details.

save(fname, ignore=None)[source]

See BaseAlog.save for more details.

to_dict()[source]

Convert algorithm model to a dict which contains algorithm’s type and it’s main hyper-parameters.

Returns:A dict contains type and hyper-parameters.
top_k_recommend(u_id, k)[source]

See BaseAlog.top_k_recommend for more details.

train(train_set)[source]

Main job is calculating user model for every user. Use multi-process to speed up the training.

See BaseAlog.train for more details.