The Vulnerability Lab: Hacking techniques in artificial intelligence and language models

2 minutes

min read

April 30, 2025

Strike’s Hacking Team is pulling back the curtain on three real-world vulnerabilities recently uncovered in the field. In a Spanish-language webinar titled The Vulnerability Lab, Head of Hacking Team Javier Bernardo and Lead Striker Yesenia Trejo walk through critical AI security issues with hands-on demonstrations and practical guidance. With step-by-step breakdowns, real payloads, and an offensive security mindset, this blog captures the most important insights and actionable tips from that session to help you harden your systems against these emerging threats. Let's dive in!

Targeted attacks against AI are already happening. In 2023 alone, over 150 new vulnerabilities affecting LLMs and AI infrastructure were reported. These aren’t just technical bugs—they’re direct entry points for attackers to extract data, hijack outputs, and manipulate how models behave.

Why AI systems need pentesting

Reasons to proactively test AI systems:

Data exposure: Tokens, credentials, and internal files can leak due to poor output handling.
Response manipulation: Attackers can steer LLM outputs through prompt injection or linguistic trickery.
Loss of trust: Public-facing AI can be used to deliver misleading or unauthorized information.
Training data poisoning: Malicious interactions can degrade long-term model performance.

Vulnerability 1: Unsafe output handling in language models

When an LLM fails to properly validate or sanitize its responses, attackers can:

Inject malicious content using prompt manipulation
Trick the model into leaking sensitive information
Execute unauthorized scripts via embedded outputs

Key insights from the session:

Language-based bypasses: Switching from English to a less commonly used language can sometimes bypass security filters.
Prompt injection with XSS payloads: Attackers can embed JavaScript commands or extract session tokens through the dialogue interface.
Local storage exploitation: If cookies are inaccessible, attackers may target local storage to extract tokens or sensitive data.
Chaining vulnerabilities: A self-XSS in a chatbot may seem minor on its own, but combined with an authorization flaw (e.g., IDOR), the impact can escalate quickly.

Prevention tips:

Sanitize and validate all model outputs.
Restrict the model’s response capabilities to the minimum necessary.
Monitor multilingual interactions for unusual behavior.
Control memory persistence to prevent data leakage across sessions.

Vulnerability 2: CAPTCHA bypass and brute-force automation

Tools like Capsolver allow attackers to bypass CAPTCHA mechanisms via external APIs, enabling large-scale credential stuffing—especially in sensitive environments like banking portals or account recovery flows. Take a look here where Yesenia and Javier are using the tool:

Techniques highlighted in the session:

Scripts that test thousands of password combinations without triggering lockouts
Automated CAPTCHA solvers that emulate human behavior to bypass challenges

Risks:

Account takeovers and fraudulent transactions
Mass credential stuffing leading to system-wide compromise
Lateral movement across connected services

Best practices:

Implement adaptive CAPTCHAs that change with each attempt.
Set strict limits on login attempts per IP and user.
Use behavioral analytics to detect automated activity.
Strengthen identity verification in high-risk flows with multi-step authentication.

Vulnerability 3: Path traversal and second-order file exposure

Path traversal attacks allow threat actors to access files outside of authorized directories by manipulating predictable URLs or upload paths. These flaws can expose configuration files, internal scripts, or give attackers remote control of systems.

Common scenarios:

Modifying an image upload URL to access sensitive files like /config.xml, server.jsp, or even OS-level files like /etc/passwd
Uploading a web shell or malicious script, then accessing it through known server paths

Detection and mitigation:

Sanitize and validate all file paths and URL parameters.
Enforce strict role-based access control (RBAC).
Store uploaded files in isolated sandbox environments.
Continuously monitor access logs to detect unusual patterns or directory traversal attempts.

The attack techniques demonstrated in The Vulnerability Lab show how even small misconfigurations in AI systems can lead to major compromise. From language model manipulation to CAPTCHA bypass and path traversal abuse, these findings reflect real scenarios observed in the field by Strike’s Hacking Team.

Watch the full session in Spanish here.