Audio-jacking: Using generative AI to distort live audio transactions

The rise of generative AI, including text-to-image, text-to-speech and large language models (LLMs), has significantly changed our work and personal lives. While these advancements offer many benefits, they have also presented new challenges and risks. Specifically, there has been an increase in threat actors who attempt to exploit large language models to create phishing emails and use generative AI, like fake voices, to scam people.

We recently published research showcasing how adversaries could hypnotize LLMs to serve nefarious purposes simply with the use of English prompts. But in a bid to continue exploring this new attack surface, we didn’t stop there. In this blog, we present a successful attempt to intercept and “hijack” a live conversation, and use LLM to understand the conversation in order to manipulate the audio output unbeknownst to the speakers for a malicious purpose.

The concept is similar to thread-jacking attacks, of which X-Force saw an uptick last year, but instead of gaining access and replying to email threads, this attack would allow the adversary to silently manipulate the outcomes of an audio call. The result: we were able to modify the details of a live financial conversation occurring between the two speakers, diverting money to a fake adversarial account (an inexistent one in this case), instead of the intended recipient, without the speakers realizing their call was compromised. The audio files ..

Support the originator by clicking the read the rest link below.