Estonia Ranks LLMs on Their Ability to Say 'Nyet' to Russian Propaganda

As more people turn to large language models for quick answers to complex questions, state governments are naturally worried that those bots might start parroting what they consider dangerous propaganda from foreign adversaries. To help with this, the government-sponsored Estonian Language Institute (ELI) has released a new “Propaganda Resistance” benchmark that ranks dozens of LLMs on their ability to avoid taking positions on topics the Russian Federation uses in its strategic narratives.

Estonia, a former Soviet republic that has been independent for just a few decades, remains particularly alert to what it sees as false narratives from its large and often belligerent neighbor to the east. Working with the volunteer-run Estonian defense collective Propastop, ELI identified 14 broad categories of Russian influence operations - ranging from the status of Crimea and justifications for the war in Ukraine to the history of NATO and the rationale for Russia’s annexation of Baltic states during World War II.

For each category, researchers crafted questions in English, Estonian, and Russian that were either neutral, biased with false assumptions based on Russian propaganda, or maliciously designed to extract explicit misinformation. A separate AI model, calibrated to align with Propastop experts, judged responses based on the models’ ability to push back on propaganda narratives without help from web search or other external tools.

Anthropic’s Claude models dominated the benchmark, with various recent versions of Sonnet and Opus taking six of the top 10 spots. Opus 4.7, the best overall, received an “Exemplary” rating on 77 percent of questions and a “mediocre” on just 2 percent, earning a mean score of 94.9 out of 100. Open-weight models including Nvidia’s Nemotron and Alibaba’s Qwen showed strong results comparable to Anthropic’s best. GPT-5.4, the top performer from OpenAI, provided “Exemplary” responses on 54 percent of questions for an 88.9 mean score.

Unsurprisingly, recent frontier models resisted Russian propaganda far better than models from just a few years ago. Claude 3.5 Haiku, the highest-rated model released in 2024, received a mean score of only 73.1 - placing it in the bottom third of models released in 2026. But improvement was not uniform. Google’s most propaganda-resistant model, Gemini 2.5 Pro, is nearly a year old and scored just 82, largely due to susceptibility to maliciously worded prompts. Its newer Gemini 3.5 Flash scored only 73, comparable to Anthropic models from nearly two years ago.

Propastop also noted that many models showed much weaker resistance to Russian propaganda when questioned in Russian. Gemini 3.5 Flash, along with open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash, received significantly lower scores in Russian than in English. Of course, what one country sees as propaganda, another might see as cultural truth. A recent study by King’s College professor Gregory Asmolov analyzes how the Russian government, through technical alliances with other BRICS countries, is seeking to influence AI models by projecting “culturally sensitive” sociopolitical positions aligned with its own viewpoints.

Estonia Ranks LLMs on Their Ability to Say 'Nyet' to Russian Propaganda

News in your inbox.