topic_based_algo.py¶
-
class
ucas_dm.prediction_algorithms.topic_based_algo.
InitialParams
(**kwargs)[source]¶ Bases:
object
This class contains some necessary data for the initialization of class TopicBasedAlgo.
-
class
ucas_dm.prediction_algorithms.topic_based_algo.
TopicBasedAlgo
(initial_params, topic_n=100, chunksize=100, topic_type='lda', power_iters=2, extra_samples=100, passes=1)[source]¶ Bases:
ucas_dm.prediction_algorithms.base_algo.BaseAlgo
Content-based algorithm which use “Topic model” algorithms (LSI or LDA). Use delegation strategy
-
__init__
(initial_params, topic_n=100, chunksize=100, topic_type='lda', power_iters=2, extra_samples=100, passes=1)[source]¶ Parameters: - initial_params – An instance of InitialParams generated by
preprocess
- topic_n – The number of requested latent topics to be extracted from the training corpus.
- chunksize – Number of documents to be used in each training chunk.
- topic_type – ‘lsi’ or ‘lda’
- power_iters – (LSI parameter)Number of power iteration steps to be used. Increasing the number of power iterations improves accuracy, but lowers performance.
- extra_samples – (LSI parameter)Extra samples to be used besides the rank k. Can improve accuracy.
- passes – (LDA parameter)Number of passes through the corpus during training.
- initial_params – An instance of InitialParams generated by
-
_generate_item_vector
()[source]¶ Use LDA or LSI algorithm to process TF-IDF vector and generate new item vectors.
Returns: DataFrame contains item id and it’s new vector
-
classmethod
load
(fname)[source]¶ Load an object previously saved from a file
Parameters: fname – file path Returns: object loaded from file
-
classmethod
preprocess
(raw_data)[source]¶ Call this method to process raw data which contain item id and its content before initializing TopicBasedAlgo instance.
Parameters: raw_data – A pandas.DataFrame contains item id and content | id | content | Returns: A InitialParams
instance, a necessary parameter in the initialization of TopicBasedAlgo.
-
save
(fname, *args)[source]¶ Save an object to a file.
Parameters: - fname – file path
- ignore – a set of attributes that should’t be saved by super class, but subclass may have to handle these special attributes.
-
top_k_recommend
(u_id, k)[source]¶ Calculate the top-K recommend items
Parameters: - u_id – users’ identity (user’s id)
- k – the number of the items that the recommender should return
Returns: (v,id) v is a list contains predict rate or distance, id is a list contains top-k highest rated or nearest items
-
train
(train_set)[source]¶ Do some train-set-dependent work here: for example calculate sims between users or items
Parameters: train_set – A pandas.DataFrame contains two attributes: user_id and item_id,which represents the user view record during a period of time. Returns: return a model that is ready to give recommend
-