Frontiers of Natural Language Processing
Natural language processing (NLP) is the technique to provide semantics to information extracted from optical character recognition engines and documents. In this article, we progress from reviewing the recent history of natural language processing towards a deeper understanding of information understanding through NLP.
We will look at the history, biggest open problems and frontiers methodology.
A Review of the Recent History of NLP
2001 • Neural language models
- First neural language models: feed-forward neural networks that take into account n previous words
- Initial look-up layer is commonly known as word embedding matrix as each word corresponds to one vector
2013 • Word embeddings
- Main innovation: pretraining word embedding look-up matrix on a large unlabeled corpus
- Popularized by word2vec, an efficient approximation to language modeling
- word2vec comes in two variants: skip-gram and CBOW
2013 • Neural networks for NLP
- Recurrent neural networks
- Long-short term memory networks are the model of choice
- Convolutional neural networks
- focus on local features
- Can be extended with wider receptive fields (dilated convolutions) to capture wider context
- Convolutions can be used to speed up an LSTM
- Recursive neural networks
- Natural language is inherently hierarchical
- Treat input as tree rather than as a sequence
- Can also be extended to LSTMs
2014 • Sequence-to-sequence models
General framework for applying neural networks to tasks where output is a sequence
- Typically RNN-based, but other encoders and decoders can be used
- New architectures mainly coming out of work in Machine Translation
2015 • Attention
One of the core innovations in Neural Machine Translation
- Weighted average of source sentence hidden states
- Mitigates bottleneck of compressing source sentence into a single vector
2018 • Pretrained language models
- Language models pretrained on a large corpus capture a lot of additional information
- Language model embeddings can be used as features in a target model or a language model can be fine-tuned on target task data
- Enables learning models with significantly less data
- Additional benefit: Language models only require unlabeled data
- Enables application to low-resource languages where labeled data is scarce
The biggest open problems in NLP
Problem 1: Natural Language Understanding and Reasoning
- Almost none of our current models have “real” understanding
- Models should incorporate common sense
Problem 2: NLP for low-resource scenarios
- Generalization beyond the training data
- Domain-transfer, transfer learning, multi-task learning
- Learning from small amounts of data
- Unsupervised learning; Semi-supervised, weakly-supervised, “Wiki-ly” supervised,distantly-supervised, lightly-supervised, minimally-supervised
Problem 3: Datasets, problems and evaluation
Perhaps the biggest problem is to properly define the problems themselves. And by properly defining a problem, I mean building datasets and evaluation procedures that are appropriate to measure our progress towards concrete goals. Things would be easier if we could reduce everything to Kaggle style competitions!
Frontiers of Natural Language Processing
Neural networks for NLP
- Can be extended with wider receptive fields (dilated convolutions) to capture wider context [Kalchbrenner et al., ’17]
- CNNs and LSTMs can be combined and stacked [Wang et al., ACL ’16]
- Convolutions can be used to speed up an LSTM [Bradbury et al., ICLR ’17]
- CNNs over a graph (trees), e.g. graph-convolutional neural networks [Bastings et al., EMNLP ’17]
Sequence-to-sequence models
- Deep LSTM [Wu et al., ’16]
- Convolutional encoders [Kalchbrenner et al., arXiv ’16; Gehring et al., arXiv ’17]
- Transformer [Vaswani et al.,NIPS ’17]
- Combination of LSTM and Transformer [Chen et al., ACL ’18]
Attention
- Different forms of attention available [Luong et al., EMNLP ’15]
- Constituency parsing [Vinyals et al., NIPS ’15]
- Reading comprehension [Hermann et al., NIPS ’15]
- One-shot learning [Vinyals et al.,NIPS ’16],
- Image captioning [Xu et al., ICML ’15]
- Used in Transformer [Vaswani et al., NIPS ’17], state-of-the-art architecturefor machine translation
Pretrained language models
- Language model embeddings can be used as features in a target model [Peters et al., NAACL ’18]
- Can be fine-tuned on target task data [Howard & Ruder, ACL ’18]
Frontiers of Natural Language Processing
http://vincentgaohj.github.io/Blog/2020/12/25/Frontiers-of-Natural-Language-Processing/