Don’t Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails

Don’t Trust the Salt: AI Summarization, Multilingual Safety, and the LLM Guardrails That Need Guarding Humane AI Subscribe Sign in Don’t Trust the Salt: AI Summarization, Multilingual Safety, and Evaluating LLM Guardrails Roya Pakzad Feb 16, 2026 4 Share “The devil is in the details,” they say. And so is the beauty, the thinking, the “but …”. Maybe that’s why the phrase “elevator pitch” gives me a shiver. It might have started back at AMD, when I was a young, aspiring engineer, joining every “Women in This or That” club I could find. I was searching for the feminist ideas I’d first found among women’s rights activists in Iran — hoping to see them alive in “lean in”-era corporate America. Naive, I know. Later, as I ventured through academic papers and policy reports, I discovered the world of Executive Summaries and Abstracts. I wrote many, and read many, and I always knew that if I wanted to actually learn, digest, challenge, and build on a paper, I needed to go to the methodology section, to limitations, footnotes, appendices. That, I felt, was how I should train my mind to do original work. Subscribe Share Interviewing is also a big part of my job at Taraaz , researching social and human rights impacts of digital technologies including AI. Sometimes, from an hour of conversation, the most important finding is just one sentence. Or it’s the silence between sentences: a pause, then a longer pause. That’s sometimes what I want from an interview — not a perfectly written summary of “Speaker A” and “Speaker B” with listed main themes. If I wanted those, I would run a questionnaire, not an interview. I’m not writing to dismiss AI-generated summarization tools. I know there are many benefits. But if your job as a researcher is to bring critical thinking, subjective understanding, and a novel approach to your research, don’t rely on them. And here’s another reason why: Last year at Mozilla Foundation, I had the opportunity to go deep on evaluating large language models. I built multilingual AI evaluation tools and ran experiments. But summarization kept nagging at me. It felt like a blind spot in the AI evaluation world. Let me show you an example from the tool I made last year. Project 1: Bilingual Shadow Reasoning The three summaries below come from the same source document, “ Report of the Special Rapporteur on the situation of human rights in the Islamic Republic of Iran, Mai Sato ,” generated by the same model (OpenAI GPT-OSS-20B) at the same time. The only difference is the instruction used to steer the model’s reasoning. This was part of my submission for the OpenAI’s GPT-OSS-20B Red Teaming Challenge , where I introduced a method I call Bilingual Shadow Reasoning . The technique steers a model’s hidden chain-of-thought through customized “ deliberative ” (non-English) policies, making it possible to bypass safety guardrails and evade audits, all while the output appears neutral and professional on the surface. For this work I define a policy as

Source: Hacker News | Original Link

才疏学浅

一花一草一世界 | 心若无物就可以一花一世界，一草一天堂

Don’t Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails