AI made every test pass, but the code was still wrong

Doodledapp – AI Made Every Test Pass. The Code Was Still Wrong. FAQ Community Docs Feed Networks Company Roadmap Back to Feed AI Made Every Test Pass. The Code Was Still Wrong. We used AI to validate our Solidity converter against 17 real-world contracts. Every test passed on day one. That was the problem. Doodledapp Team February 17th, 2026 BUILD IN PUBLIC FOLLOW SHARE REFERENCES AI-Generated Tests are Lying to You Do LLMs Generate Tests That Capture the Actual or Expected Program Behaviour? Exploring Round-trip Properties in Property-based Testing Goodhart’s Law in Software Engineering Test Automation in the Era of LLMs Seventeen contracts. Two conversion passes each. Every single test: green. We had just finished wiring up an AI-powered testing loop to validate the core of Doodledapp, the engine that converts visual flows into Solidity code and back again. The idea was simple: take real, widely-used smart contracts, feed them through the converter, and have AI write tests to catch every bug. The AI ran, the tests ran, and everything passed on the first try. That should have been the celebration moment. Instead, it was the moment we realized something was deeply wrong. Seventeen contracts and an ambitious idea Doodledapp converts visual node graphs into Solidity smart contracts. To trust that conversion, we needed to prove it worked against real code, not toy examples. We grabbed 17 contracts that developers actually use in production: OpenZeppelin’s ERC-20 and ERC-721 implementations, Solmate’s gas-optimized token contracts, Uniswap V2 and V3 pool contracts, proxy patterns, a Merkle distributor, a vesting wallet, and more. The validation strategy was what some call ” roundtrip testing .” Take a Solidity contract, convert it to a visual flow, then convert it back to Solidity. If the output matches the input semantically, the converter works. Do it twice, and you can prove the process is stable: the second pass should produce identical output to the first. We had 17 contracts and a converter we needed to trust. We also had AI that was very good at writing tests. The plan was to point the AI at the converter, let it generate a full test suite, then loop: run the tests, fix failures, regenerate, repeat. An ouroboros of AI-driven validation that would eat its own bugs until nothing remained. The moment everything went green (and wrong) The AI generated the test suite. We ran it. Every test passed. Seventeen contracts, two passes each, dozens of assertions. All green. On the first run. We knew the converter was not perfect. We had been finding edge cases by hand for weeks. There was no way a first-generation test suite would catch zero issues. So we looked at what the tests were actually checking. The AI had read the converter, understood what it does, and written tests confirming that it behaves exactly as implemented. It verified that functions get converted, that state variables appear in the output, that control flow structures are present. Ever

Source: Hacker News | Original Link

才疏学浅

一花一草一世界 | 心若无物就可以一花一世界，一草一天堂

AI made every test pass, but the code was still wrong