Posted 2020-12-25Algorithm / Natural Language Processing5 minutes read (About 681 words)

Frontiers of Natural Language Processing

Natural language processing (NLP) is the technique to provide semantics to information extracted from optical character recognition engines and documents. In this article, we progress from reviewing the recent history of natural language processing towards a deeper understanding of information understanding through NLP.

We will look at the history, biggest open problems and frontiers methodology.

A Review of the Recent History of NLP

2001 • Neural language models

First neural language models: feed-forward neural networks that take into account n previous words
Initial look-up layer is commonly known as word embedding matrix as each word corresponds to one vector

2013 • Word embeddings

Main innovation: pretraining word embedding look-up matrix on a large unlabeled corpus
Popularized by word2vec, an efficient approximation to language modeling
word2vec comes in two variants: skip-gram and CBOW

2013 • Neural networks for NLP

Recurrent neural networks
- Long-short term memory networks are the model of choice
Convolutional neural networks
- focus on local features
- Can be extended with wider receptive fields (dilated convolutions) to capture wider context
- Convolutions can be used to speed up an LSTM
Recursive neural networks
- Natural language is inherently hierarchical
- Treat input as tree rather than as a sequence
- Can also be extended to LSTMs

2014 • Sequence-to-sequence models

General framework for applying neural networks to tasks where output is a sequence

Typically RNN-based, but other encoders and decoders can be used
New architectures mainly coming out of work in Machine Translation

2015 • Attention

One of the core innovations in Neural Machine Translation

Weighted average of source sentence hidden states
Mitigates bottleneck of compressing source sentence into a single vector

2018 • Pretrained language models

Language models pretrained on a large corpus capture a lot of additional information
Language model embeddings can be used as features in a target model or a language model can be fine-tuned on target task data
Enables learning models with significantly less data
Additional benefit: Language models only require unlabeled data
Enables application to low-resource languages where labeled data is scarce

The biggest open problems in NLP

Problem 1: Natural Language Understanding and Reasoning

Almost none of our current models have “real” understanding
Models should incorporate common sense

Problem 2: NLP for low-resource scenarios

Generalization beyond the training data
Domain-transfer, transfer learning, multi-task learning
Learning from small amounts of data
Unsupervised learning; Semi-supervised, weakly-supervised, “Wiki-ly” supervised,distantly-supervised, lightly-supervised, minimally-supervised

Problem 3: Datasets, problems and evaluation

Perhaps the biggest problem is to properly define the problems themselves. And by properly defining a problem, I mean building datasets and evaluation procedures that are appropriate to measure our progress towards concrete goals. Things would be easier if we could reduce everything to Kaggle style competitions!

Frontiers of Natural Language Processing

Neural networks for NLP

Can be extended with wider receptive fields (dilated convolutions) to capture wider context [Kalchbrenner et al., ’17]
CNNs and LSTMs can be combined and stacked [Wang et al., ACL ’16]
Convolutions can be used to speed up an LSTM [Bradbury et al., ICLR ’17]
CNNs over a graph (trees), e.g. graph-convolutional neural networks [Bastings et al., EMNLP ’17]

Sequence-to-sequence models

Deep LSTM [Wu et al., ’16]
Convolutional encoders [Kalchbrenner et al., arXiv ’16; Gehring et al., arXiv ’17]
Transformer [Vaswani et al.,NIPS ’17]
Combination of LSTM and Transformer [Chen et al., ACL ’18]

Attention

Different forms of attention available [Luong et al., EMNLP ’15]
Constituency parsing [Vinyals et al., NIPS ’15]
Reading comprehension [Hermann et al., NIPS ’15]
One-shot learning [Vinyals et al.,NIPS ’16],
Image captioning [Xu et al., ICML ’15]
Used in Transformer [Vaswani et al., NIPS ’17], state-of-the-art architecturefor machine translation

Pretrained language models

Language model embeddings can be used as features in a target model [Peters et al., NAACL ’18]
Can be fine-tuned on target task data [Howard & Ruder, ACL ’18]

Frontiers of Natural Language Processing

http://vincentgaohj.github.io/Blog/2020/12/25/Frontiers-of-Natural-Language-Processing/