Frontiers of Natural Language Processing

Natural language processing (NLP) is the technique to provide semantics to information extracted from optical character recognition engines and documents. In this article, we progress from reviewing the recent history of natural language processing towards a deeper understanding of information understanding through NLP.

We will look at the history, biggest open problems and frontiers methodology.

A Review of the Recent History of NLP

2001 • Neural language models

  • First neural language models: feed-forward neural networks that take into account n previous words
  • Initial look-up layer is commonly known as word embedding matrix as each word corresponds to one vector

2013 • Word embeddings

  • Main innovation: pretraining word embedding look-up matrix on a large unlabeled corpus
  • Popularized by word2vec, an efficient approximation to language modeling
  • word2vec comes in two variants: skip-gram and CBOW

2013 • Neural networks for NLP

  • Recurrent neural networks
    • Long-short term memory networks are the model of choice
  • Convolutional neural networks
    • focus on local features
    • Can be extended with wider receptive fields (dilated convolutions) to capture wider context
    • Convolutions can be used to speed up an LSTM
  • Recursive neural networks
    • Natural language is inherently hierarchical
    • Treat input as tree rather than as a sequence
    • Can also be extended to LSTMs

2014 • Sequence-to-sequence models

General framework for applying neural networks to tasks where output is a sequence

  • Typically RNN-based, but other encoders and decoders can be used
  • New architectures mainly coming out of work in Machine Translation

2015 • Attention

One of the core innovations in Neural Machine Translation

  • Weighted average of source sentence hidden states
  • Mitigates bottleneck of compressing source sentence into a single vector

2018 • Pretrained language models

  • Language models pretrained on a large corpus capture a lot of additional information
  • Language model embeddings can be used as features in a target model or a language model can be fine-tuned on target task data
  • Enables learning models with significantly less data
  • Additional benefit: Language models only require unlabeled data
  • Enables application to low-resource languages where labeled data is scarce

The biggest open problems in NLP

Problem 1: Natural Language Understanding and Reasoning

  • Almost none of our current models have “real” understanding
  • Models should incorporate common sense

Problem 2: NLP for low-resource scenarios

  • Generalization beyond the training data
  • Domain-transfer, transfer learning, multi-task learning
  • Learning from small amounts of data
  • Unsupervised learning; Semi-supervised, weakly-supervised, “Wiki-ly” supervised,distantly-supervised, lightly-supervised, minimally-supervised

Problem 3: Datasets, problems and evaluation

Perhaps the biggest problem is to properly define the problems themselves. And by properly defining a problem, I mean building datasets and evaluation procedures that are appropriate to measure our progress towards concrete goals. Things would be easier if we could reduce everything to Kaggle style competitions!


Frontiers of Natural Language Processing

Neural networks for NLP

  • Can be extended with wider receptive fields (dilated convolutions) to capture wider context [Kalchbrenner et al., ’17]
  • CNNs and LSTMs can be combined and stacked [Wang et al., ACL ’16]
  • Convolutions can be used to speed up an LSTM [Bradbury et al., ICLR ’17]
  • CNNs over a graph (trees), e.g. graph-convolutional neural networks [Bastings et al., EMNLP ’17]

Sequence-to-sequence models

  • Deep LSTM [Wu et al., ’16]
  • Convolutional encoders [Kalchbrenner et al., arXiv ’16; Gehring et al., arXiv ’17]
  • Transformer [Vaswani et al.,NIPS ’17]
  • Combination of LSTM and Transformer [Chen et al., ACL ’18]

Attention

  • Different forms of attention available [Luong et al., EMNLP ’15]
  • Constituency parsing [Vinyals et al., NIPS ’15]
  • Reading comprehension [Hermann et al., NIPS ’15]
  • One-shot learning [Vinyals et al.,NIPS ’16],
  • Image captioning [Xu et al., ICML ’15]
  • Used in Transformer [Vaswani et al., NIPS ’17], state-of-the-art architecturefor machine translation

Pretrained language models

  • Language model embeddings can be used as features in a target model [Peters et al., NAACL ’18]
  • Can be fine-tuned on target task data [Howard & Ruder, ACL ’18]
Author

Haojun(Vincent) Gao

Posted on

2020-12-25

Updated on

2022-02-22

Licensed under

Comments