Could a threat actor socially engineer ChatGPT?

As the one-year anniversary of ChatGPT approaches, cybersecurity analysts are still exploring their options. One primary goal is to understand how generative AI can help solve security problems while also looking out for ways threat actors can use the technology. There is some thought that AI, specifically large language models (LLMs), will be the equalizer that cybersecurity teams have been looking for: the learning curve is similar for analysts and threat actors, and because generative AI relies on the data sets created by users, there is more control over what threat actors can access.

What gives threat actors an advantage is the expanded attack landscape created by LLMs. The freewheeling use of generative AI tools has opened the door for accidental data leaks. And, of course, threat actors see tools like ChatGPT as a way to create more realistic and targeted social engineered attacks.

LLMs are designed to provide users with an accurate response based on the data in its system based on the prompt offered. They are also designed with safeguards in place to prevent them from going rogue or being manipulated for evil purposes. However, these guardrails aren’t foolproof. IBM researchers, for example, were able to “hypnotize” LLMs that offered a pathway for AI to provide wrong answers or leak confidential information.

There’s another way that threat actors can manipulate ChatGPT and other generative AI tools: prompt injections. By combining prompt engineering and classic social engineering tactics, threat actors are able to disable the safeguards on generative AI and can do anything from creating malicious code to extracting sensitive data.