Data Augmentation for Deep Learning Models

Neural nets require large scale dataset during training process. However, it is quite expensive to have the access to enough data size. One approach to deal with this issue is Data augmentation, which means increasing the number of data points.


It works when we can find appropriate invariant properties that the model should posses


  • rescaling or applying affine distortions to images (translating, scalingt, rotating, flipping of the input image)



Unlike image and speech, data augmentation using signal transformation is not reasonable, because exact order of characters may form rigorous syntactic and semantic meaning.

Best way:

  • human rephrases of sentences -> unrealistic and expensive


  • synonyms replacement: replace words or phrases with synonyms
  • back-translation: use [english - ‘intermediate language’ - english] translastion. [2]
  • data noising: [3]
  • contextual augmentation: [5]


