Constructing Topical Concept Hierarchical Taxonomy of Tourist Attraction

Abstract

  • A hierarchical co-clustering module by using non-negative matrix tri-factorization for allocating attractions and things of interest to topic when splitting a coarse topic into fine-grained ones.
  • A concept extraction module for extracting concept of every topic that maintain strong discriminative power at different levels of the taxonomy.

理论数学表达

Non-negative Matrix Factorization

The model is to approximate the input attraction-ToI matrix with three factor matrices that assign cluster labels to tourist attractions and Things of Interest (ToI) simultaneously by solving the following optimization problem:

where $X $ is the input attraction-word content matrix, and $U ∈ R^{m×c}{+}$ and $V ∈ R^{n×c}{+}$ are orthogonal nonnegative matrices indicating low-dimensional representations of attractions and things of interest, respectively. The orthogonal and nonnegative conditions of the two matrices $U$ and $V$ enforce the model to provide a hard assignment of cluster label for attractions and things of interest. $H ∈ R^{c×c}_{+}$ provides a condensed view of $X$ .

Read more

运用股指期货套期保值模拟分析

对于证券公司等机构投资者而言,如何完善一个完整的套期保值流程和套保策略是提高证券投资收益和规避风险的一个重要环节。股指期货相对于股票现货最重要的两个功能就是杠杆和做空。运用杠杆能够提高资金的使用效率,利用做空能够实现套期保值。

Read more

Study Notes of LaTeX

A study notes of LaTeX, including 特殊符号 常见用法 字体设置 空格设置 插图 表格 浮动体 数学公式 参考文献——BibTex

Read more

Convolutional Neural Networks

Attention

  • 本文适合已经对向后传播(Backpropagation)神经网络有所了解的同学进一步学习卷积神经网络(CNN),感到困难的同学可以自行学习BP后再阅读。
  • This article is suitable for students who are already familiar with Backpropagation Neural Networks to further study Convolutional Neural Networks (CNN). Students who find it difficult can learn BP on their own before reading.
Read more
Insight into Word2Vec

Insight into Word2Vec

A comprehensive understanding of Word2Vec, including background, development, and formula derivation

自然语言处理是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系,但又有重要的区别。自然语言处理并不是一般地研究自然语言,而在于研制能有效地实现自然语言通信的计算机系统,特别是其中的软件系统。因而它是计算机科学的一部分。

自然语言处理的最最基础的部分就是要让计算机能够识别人类的语言,因此词向量也就应运而生了。词向量顾名思义就是以向量的形式表示词。

Read more