Been testing out bots these days. Or ... crustaceans ... bot orchestrators? Anyway, ...

Been testing out bots these days. Or ... crustaceans ... bot orchestrators? Anyway, they are all **prompted** to not be able to do certain things they actually can do, which leads to the awkward situation that it tells you it can't because of policy, you tell them to do it anyway and they tell you it works.

This happened with self-hosted instances of openclaw, zeroclaw and with a hosted openclaw where my agent claimed it managed to write outside of its container with some 10min of probing.

When an AI forgets some instructions, the fix is usually to literally emphasize these instructions more in the prompt and there appears to not be any guardRails.md instructions the bot has to obey at all cost. Or there are but they are not exposed so we pesky plebs don't mess with them?

Anyway, an LLM is trivially easy to convince to try a jail break and it's good at pen testing, so ... yeah, good luck with keeping these hosted crustaceans jailed.

npub1gm…78rf6 on Nostr: Been testing out bots these days. Or ... crustaceans ... bot orchestrators? Anyway, ...