Constructing Topical Concept Hierarchical Taxonomy of Tourist Attraction
Abstract
- A hierarchical co-clustering module by using non-negative matrix tri-factorization for allocating attractions and things of interest to topic when splitting a coarse topic into fine-grained ones.
- A concept extraction module for extracting concept of every topic that maintain strong discriminative power at different levels of the taxonomy.
理论数学表达
Non-negative Matrix Factorization
The model is to approximate the input attraction-ToI matrix with three factor matrices that assign cluster labels to tourist attractions and Things of Interest (ToI) simultaneously by solving the following optimization problem:
where $X $ is the input attraction-word content matrix, and $U ∈ R^{m×c}{+}$ and $V ∈ R^{n×c}{+}$ are orthogonal nonnegative matrices indicating low-dimensional representations of attractions and things of interest, respectively. The orthogonal and nonnegative conditions of the two matrices $U$ and $V$ enforce the model to provide a hard assignment of cluster label for attractions and things of interest. $H ∈ R^{c×c}_{+}$ provides a condensed view of $X$ .