Data Augmentation in NLP

Data Augmentation in NLP

 

Word Substitution

 

  1. Synonym-based substitution

Data Augmentation in NLP

 

  1. Word embedding substitution

Data Augmentation in NLP

Data Augmentation in NLP

  1. Masked language model

Data Augmentation in NLP

  1. TF-IDF-based word substitution

The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.

Data Augmentation in NLP

 

Back Translation

Data Augmentation in NLP

 

Text Surface Transformation

Data Augmentation in NLP

 

Random Noise Injection

 

  1. Misspelling injection

Data Augmentation in NLP

  1. QWERTY keyboard error injection

Data Augmentation in NLP

  1. empty noise injection

Data Augmentation in NLP

  1. Random injection

Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.

Data Augmentation in NLP

  1. Sentence reorganization

Data Augmentation in NLP

 

Syntax Tree

Data Augmentation in NLP

 

reference

https://blog.****.net/lqfarmer/article/details/107006551