Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Pythonbased natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117...
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Pythonbased natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks.
2021: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Srivastava, Samson Tan, Tongshuang (Sherry) Wu, J. Sohl-Dickstein, Jinho D. Choi, E. Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, C. Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, E. Chapuis, Wanxiang Che, Mukund Choudhary, C. Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honoré, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey J. Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, V. Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, N. Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton W. Murray, M. Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, J. Pfister, R. Plant, Vinay Uday Prabhu, V. Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas M. Roberts, Juan Diego Rodriguez, C. Roux, S. VasconcellosP.H., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, T. Sefara, Saqib Shamsi, Xu-dong Shen, Haoyue Shi, Yiwen Shi, Anna V. Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, P. Soni, Taylor M Sorensen, W. Soto, Aman Srivastava, K V Aditya Srivatsa, Tony Sun, T. MukundVarma, A. Tabassum, Fiona Anting Tan, Ryan Teehan, Monalisa Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie Jay Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydma'nski, Tianbao Xie, Usama Yaseen, M. Yee, Jing Zhang, Yue Zhang
https://arxiv.org/abs/2112.02721v1
View more