Code Mirage

July 5, 2023

Topics

Cybersecurity

Introduction

The landscape of cybercrime continues to evolve, and cybercriminals are constantly seeking new methods to compromise software projects and systems. In a disconcerting development, cybercriminals are now capitalizing on AI-generated unpublished package names also known as “AI-Hallucinated packages” to publish malicious packages under commonly hallucinated package names. It should be noted that artificial hallucination is not a new phenomenon as discussed in [4]. This article sheds light on this emerging threat, wherein unsuspecting developers inadvertently introduce malicious packages into their projects through the code generated by AI.

AI-Hallucinations

Artificial intelligence (AI) hallucinations, as described [2], refer to confident responses generated by AI systems that lack justification based on their training data. Similar to human psychological hallucinations, AI hallucinations involve the AI system providing information or responses that are not supported by the available data. However, in the context of AI, hallucinations are associated with unjustified responses or beliefs rather than false percepts. This phenomenon gained attention around 2022 with the introduction of large language models like ChatGPT[3], where users observed instances of seemingly random but plausible-sounding falsehoods being generated. By 2023, it was acknowledged that frequent hallucinations in AI systems posed a significant challenge for the field of language models.

The Exploitative Process

Cybercriminals begin by deliberately publishing malicious packages under commonly hallucinated names produced by large language machines (LLMs) such as ChatGPT[3] within trusted repositories. These package names closely resemble legitimate and widely used libraries or utilities, such as the legitimate package ‘arangojs’ vs the hallucinated package ‘arangodb’ as shown in the research done by Vulcan [1].

The Trap Unfolds

When developers, unaware of the malicious intent, utilize AI-based tools or large language models (LLMs) to generate code snippets for their projects, they inadvertently can fall into a trap. The AI-generated code snippets can include imaginary unpublished libraries, enabling cybercriminals to publish commonly used AI-generated imaginary package names. As a result, developers unknowingly import malicious packages into their projects, introducing vulnerabilities, backdoors, or other malicious functionalities that compromise the security and integrity of the software and possibly other projects.

Implications for Developers

The exploitation of AI-generated hallucinated package names poses significant risks to developers and their projects. Here are some key implications:

Trusting Familiar Package Names: Developers commonly rely on package names they recognize to introduce code snippets into their projects. The presence of malicious packages under commonly hallucinated names makes it increasingly difficult to distinguish between legitimate and malicious options when relying on the trust from AI generated code.

Blind Trust in AI-generated Code: Many developers embrace the efficiency and convenience of AI-powered code generation tools. However, blind trust in these tools without proper verification can lead to unintentional integration of malicious code into projects

Mitigating Risks

To protect themselves and their projects from the risks associated with AI-generated code hallucinations, developers should consider the following measures:

Code Review and Verification: Developers must meticulously review and verify code snippets generated by AI tools, even if they appear to be similar to well-known packages. Comparing the generated code with authentic sources and scrutinizing the code for suspicious or malicious behavior is essential.

Independent Research: Conduct independent research to confirm the legitimacy of the package. Visit official websites, consult trusted communities, and review the reputation and feedback associated with the package before integration.

Vigilance and Reporting: Developers should maintain a proactive stance in reporting suspicious packages to the relevant package managers and security communities. Promptly reporting potential threats helps mitigate risks and protect the wider developer community.

Conclusion

The exploitation of commonly hallucinated package names through AI generated code is a concerning development in the realm of cybercrime. Developers must remain vigilant and take necessary precautions to safeguard their projects and systems. By adopting a cautious approach, conducting thorough code reviews, and independently verifying the authenticity of packages, developers can mitigate the risks associated with AI-generated hallucinated package names.

Furthermore, collaboration between developers, package managers, and security researchers is crucial in detecting and combating this evolving threat. Sharing information, reporting suspicious packages, and collectively working towards maintaining the integrity and security of repositories are vital steps in thwarting the efforts of cybercriminals.

As the landscape of cybersecurity continues to evolve, staying informed about emerging threats and implementing robust security practices will be paramount. Developers play a crucial role in maintaining the trust and security of software ecosystems, and by remaining vigilant and proactive, they can effectively counter the risks posed by AI-generated hallucinated packages.

Remember, the battle against cybercrime is an ongoing one, and the collective efforts of the software development community are essential in ensuring a secure and trustworthy environment for all.

References

[1] Lanyado, B. (2023, June 15). Can you trust chatgpt’s package recommendations? Vulcan Cyber. https://vulcan.io/blog/ai-hallucinations-package-risk

[2] Wikimedia Foundation. (2023, June 22). Hallucination (Artificial Intelligence)1. Wikipedia. https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

[3] Website [Internet]. Chat GPT. (2023 June 23). https://chat.openai.com/chat

[4] Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, et al. Survey of hallucination in natural language generation. ACM Comput Surv. (2023 June 23). https://doi.org/10.1145/3571730