LAMBADA Method: How to use Data Augmentation in NLU?
Not Enough Data? Deep Learning to the Rescue!

Frontiers of Natural Language Processing

Natural language processing (NLP) is the technique to provide semantics to information extracted from optical character recognition engines and documents. In this article, we progress from reviewing the recent history of natural language processing towards a deeper understanding of information understanding through NLP.

We will look at the history, biggest open problems and frontiers methodology.

Read more

Constructing Topical Concept Hierarchical Taxonomy of Tourist Attraction

Abstract

  • A hierarchical co-clustering module by using non-negative matrix tri-factorization for allocating attractions and things of interest to topic when splitting a coarse topic into fine-grained ones.
  • A concept extraction module for extracting concept of every topic that maintain strong discriminative power at different levels of the taxonomy.

理论数学表达

Non-negative Matrix Factorization

The model is to approximate the input attraction-ToI matrix with three factor matrices that assign cluster labels to tourist attractions and Things of Interest (ToI) simultaneously by solving the following optimization problem:

where $X $ is the input attraction-word content matrix, and $U ∈ R^{m×c}{+}$ and $V ∈ R^{n×c}{+}$ are orthogonal nonnegative matrices indicating low-dimensional representations of attractions and things of interest, respectively. The orthogonal and nonnegative conditions of the two matrices $U$ and $V$ enforce the model to provide a hard assignment of cluster label for attractions and things of interest. $H ∈ R^{c×c}_{+}$ provides a condensed view of $X$ .

Read more

Convolutional Neural Networks

Attention

  • 本文适合已经对向后传播(Backpropagation)神经网络有所了解的同学进一步学习卷积神经网络(CNN),感到困难的同学可以自行学习BP后再阅读。
  • This article is suitable for students who are already familiar with Backpropagation Neural Networks to further study Convolutional Neural Networks (CNN). Students who find it difficult can learn BP on their own before reading.
Read more
Insight into Word2Vec

Insight into Word2Vec

A comprehensive understanding of Word2Vec, including background, development, and formula derivation

自然语言处理是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系,但又有重要的区别。自然语言处理并不是一般地研究自然语言,而在于研制能有效地实现自然语言通信的计算机系统,特别是其中的软件系统。因而它是计算机科学的一部分。

自然语言处理的最最基础的部分就是要让计算机能够识别人类的语言,因此词向量也就应运而生了。词向量顾名思义就是以向量的形式表示词。

Read more