doi.org
Local Post-hoc Explainable Methods for Adversarial Text Attacks
December 2021 • Yidong Chai, Ruicheng Liang, Hongyi Zhu, Sagar Samtani, Meng Wang, Yezheng Liu, Yuanchun Jiang
Deep learning models have significantly advanced various natural language processing tasks. However, they are strikingly vulnerable to adversarial text attacks, even in the black-box setting where no model knowledge is accessible to hackers. Such attacks are conducted with a two-phase framework: 1) a sensitivity estimation phase to evaluate each element’s sensitivity to the target model’s prediction, and 2) a perturbation execution phase to craft the adversarial examples based on estimated element sensitivity. Thi…