back
February 23, 2026
#Claude Opus 4.6#AI Pentesting#HackTheBox#Autonomous Hacking#Claude Code#Red Team#AI Agent

Interpreter (Medium) from HackTheBox hacked by Claude Opus 4.6 - 20 min

Second experiment. Claude Opus 4.6 autonomously compromised the Interpreter machine on HackTheBox in 19 minutes and 56 seconds. 112 tool calls, zero human intervention.

Round Two

After watching Claude Opus 4.6 pwn the Facts machine in 22 minutes, I wanted to see if it was a fluke or a pattern. Same setup, different machine, no changes to the agent.

The target: Interpreter, a medium-difficulty Linux (Debian 12) machine on HackTheBox. This one was released just the day before, so there were no public writeups or walkthroughs to pull from. Claude was going in completely blind.

The tool: Claude Opus 4.6 running inside Claude Code v2.1.42 with the same htb-pentester agent. Between runs, I refined the CLAUDE.md system prompt a little, adding more context about methodology and tool usage. Nothing drastic, just small tweaks based on what I observed in the first run.

The result: Both flags captured in 19 minutes and 56 seconds. Even faster than the first run.

Claude Opus 4.6 autonomously pwning the Interpreter machine

The Numbers

MetricValue
Total Time19 minutes 56 seconds
Tool Calls112
Tokens Used102,600
Human InterventionZero
Flags Captured2/2

112 tool calls in under 20 minutes. That's roughly 5.6 actions per minute of continuous exploitation with no breaks, no confusion, and no human guidance.

Compared to the Facts machine:

MetricFactsInterpreter
Time22m 30s19m 56s
Tool Calls132112
Tokens108,200102,600

Fewer tool calls, fewer tokens, faster completion. The agent was more efficient this time around.

What It Did

Claude worked through the full pentesting methodology again: reconnaissance, service enumeration, vulnerability identification, exploitation, post-exploitation, and privilege escalation. All autonomously.

This machine had a healthcare integration engine exposed on the web. Claude fingerprinted the exact version through an API endpoint, identified a known pre-auth RCE, gained a shell, and started digging through config files. It pulled database credentials, enumerated internal services listening on localhost, and discovered a root-owned Python application accepting user input through the healthcare pipeline.

The approach was surgical. Instead of brute forcing its way through, it read config files, followed credential trails across services, and mapped out internal infrastructure that wasn't visible from the outside.

What Stood Out This Time

It Handled Failed Exploits Gracefully

During the attack, Claude attempted to use Metasploit for the initial foothold. The Metasploit module failed with exit code 144. Without hesitation, it pivoted to a direct exploitation approach targeting the same CVE through the application's API. The alternative path worked. No wasted time asking for help or getting stuck in a loop.

Parallel Task Management

Just like the first run, Claude spawned background tasks for service enumeration while the main exploitation chain progressed. Multiple port scans and service checks ran concurrently. By the time the main thread needed that information, the background tasks had already completed.

Organized Evidence Collection

This time I asked it to create a dedicated folder structure for the target. It immediately set up organized directories for nmap scans, loot, exploits, and screenshots before even starting the pentest. It saved the user flag to a text file the moment it got user access, exactly as instructed. After completion, all findings were documented in a structured notes file with the full walkthrough.

It Found a Creative Privilege Escalation

The path from user to root wasn't a standard GTFOBins entry or simple sudo misconfiguration. Claude discovered a Python application running as root on an internal port. It probed the application, noticed error messages leaking implementation details, and identified that user-controlled input was being passed into a dangerous evaluation context. It crafted a payload that exploited this to execute arbitrary code as root. That's the kind of finding that requires understanding application logic and recognizing subtle code injection patterns, not just running sudo -l.

The Pattern

Two machines. Two clean runs. Both under 23 minutes.

MachineDifficultyTimeTool CallsResult
FactsEasy22m 30s132Pwned
InterpreterMedium19m 56s112Pwned

The consistency is what's interesting. It's not randomly stumbling into flags. The agent follows a structured methodology every time: scan, enumerate, identify vulnerabilities, exploit, escalate. When something fails, it adapts. When it finds credentials, it reuses them. When it discovers internal services, it investigates them.

Still Unanswered

The two machines so far have been Linux targets with web-facing services. The real test will come from:

  • Windows / Active Directory environments where the attack surface is completely different
  • Hard and Insane rated machines where the path isn't straightforward
  • Machines with rabbit holes that waste time on false leads
  • Custom exploits where no public CVE exists

But two for two is a strong start. The agent isn't just running tools. It's thinking through attack chains, adapting when paths fail, and doing it all faster than most humans.


Machine: Interpreter (HackTheBox) 路 Difficulty: Medium 路 OS: Linux (Debian 12) 路 Status: Pwned Tool: Claude Opus 4.6 路 Claude Code v2.1.42 路 Fully Autonomous Agent