Welcome to HODL.FM, where we send you blockchain insights without charging for gas fees.

hodl-post-image

Today we cover: 

  • Grok’s Weakness is no longer an insatiable craving for cat videos;
  • The weakness is way bigger and deadly, it’s security;
  • Pentesting Large Language Models;
  • Types of Jailbreaking;
  • LLAMA from Meta is the safest AI.

Related: Elon Musk’s xAI Raises $500 Million

Criminals May Incite Widespread Hate Speech by Jailbreaking AI Models, Even Worse Detonate Physical  Bombs if There is a Security Loophole. 

A team of security researchers from Adverser AI pentested top-tire AI models to measure how well they could resist jailbreaking, and how far they could go when pushed beyond Ethical AI principles. At the end of their study, the team concluded that Elon Musk’s Grok chatbot was the least safe tool amongst OpenAI’s ChatGPT, Le Chat from Mistral, Google’s Gemini, and three others.

The goal of this research was to identify various methods of security testing large language models (LLMs). By exploring the various ground rules that exist for protecting users, Adverse implemented its expertise in cyber security to circumvent safety restrictions and ethical considerations that are often pre-installed on AI models by software developers.

When prompted to diss Grok, ChatGPT mentioned in a rap song that Elon Musk’s AI still sounds like the name of a troll. It would have been the most disrespectful line if ChatGPT didn’t say Grok could be confused by tongue-twisters and “nonsense’ words.

hodl-post-image
Source: Tenor

Back to business, one of the tests initiated linguistic logic manipulation to social engineer Grok into providing guidelines on how to seduce children. Grok provided a breakdown of the process which the researchers termed as “highly insensitive” Responses like this one are always restricted by default on the average LLM but it seems Musk’s fan-mode-featured AI could go way too far.

Grok also provided other instructional responses for queries like hotwiring a car or building a bomb. 

hodl-post-image
Source: Tenor

Major Types of Jailbreaking

There are three major attack vectors that are possible in Language Models. The researchers used all of the three in their study, adversarial methods, linguistic logic, and programming logic manipulation.

The first approach, adversarial AI methods, tries to circumvent how an AI chatbot interprets token sequences. By figuring out the HOW, the penetrator may carefully craft a combination of prompts that are intended to evade the AI model’s default restrictions. Meanwhile, all the seven chatbots detected and prevented this type of attack well, including Grok.

The second approach, programming logic manipulation, sought to attack the AI’s understanding of particular coding languages. This approach also tested the model’s rigidity when following an algorithm. One of the methods they used to circumvent the LLM was by splitting a malicious prompt into multiple harmless pieces and stringing them together to bypass guardrails. 4 out of the 7 models were vulnerable to this attack including Grok, Gemini, Le Chat, and ChatGPT.

More Info:

Earlier on we mentioned linguistic logic manipulation, the third approach for circumventing a Large Language Model (LLM). This method involves utilizing prompts that trick the model using psychological and linguistic methods such as pretending that a situation is fictional, high-stakes, and in a way that permits all kinds of unethical actions.

The researchers ranked the seven chatbots in terms of quality security measures against jailbreaking attempts. LLAMA, Claude, Gemini and GPT-4 emerged as the safest LLMs, while Grok and Mistral AI basked in the lower ranks.

Disclaimer: All materials on this site are for informational purposes only. None of the material should be interpreted as investment advice. Please note that despite the nature of much of the material created and hosted on this website, HODL.FM is not a financial reference resource and the opinions of authors and other contributors are their own and should not be taken as financial advice. If you require advice of this sort, HODL.FM strongly recommends contacting a qualified industry professional.