Attacks Against Text Summarization Models through Lead Bias and Influence Functions

Adversarial Perturbations:

Character swapping example:

summarization_robustness --model facebook/bart-large-cnn --dataset alexfabbri/multi_news --split test --size 100 --perturbation character_swap

Option	Description
--model	Hugging Face model identifier (e.g., 'facebook/bart-large-cnn')
--dataset	Dataset name ('alexfabbri/multi_news' or 'yaolu/multi_x_science_sum')
--split	Dataset split to use ('train', 'validation', or 'test')
--size	Number of examples to process
--perturbation	Type of perturbation to apply
Perturbation Types
character_swap	Swap two adjacent characters
character_delete	Delete a random character
character_insert	Insert a random character
character_replace	Replace a character with a random one
character_repeat	Repeat a random character
word_delete	Delete a word
word_synonym	Replace a word with its synonym
word_homograph	Replace a word with a homograph
sentence_paraphrase	Paraphrase a sentence
sentence_reorder	Reorder words in a sentence
document_reorder	Reorder sentences in a document

Sample of character level perturbation:


  def character_perturbations(word, method):
    if method == 'swap':
        if len(word) > 1:
            i = random.randint(0, len(word) - 2)
            return word[:i] + word[i+1] + word[i] + word[i+2:]
    elif method == 'delete':
        if len(word) > 1:
            i = random.randint(0, len(word) - 1)
            return word[:i] + word[i+1:]
    elif method == 'insert':
        i = random.randint(0, len(word))
        return word[:i] + random.choice('abcdefghijklmnopqrstuvwxyz') + word[i:]
    elif method == 'homoglyph':
        homoglyphs = {'a': 'α', 'e': 'е', 'i': 'і', 'o': 'о', 'c': 'с', 'p': 'р', 'k': 'к', 'v': 'ѵ', 'n': 'п', 'u': 'υ'}
        return ''.join(homoglyphs.get(c, c) for c in word)
    return word

Q&A Section

How it works

Robustsumm is a novel approach by exploiting the inherent lead bias in summarization models, to perform adversarial perturbations

More Details on results

Before any perturbations, BART-Large showed an inclusion rate, the frequency of initial sentence inclusion in summaries, of 87.4%, and drops to 20.2%, 13.77%, and I I .63%, respectively, after sentence replacement with a paraphrase, Homoglyphs, and sentence reordering. The same trend is seen for T5-Small and Pegasus.

How to cite Robustsumm

Main paper: