The state of AI safety in mid-2026, Part 1

Capabilities and Power politics

Jun 06, 2026

I am going to review, at a high-level, the overall state of AI safety in mid-2026. I will take a broad view of the situation, looking at the many-headed, interlocking factors that will affect how we cope with powerful AI, reaching across science, technology, society, politics, and culture. Our civilisation needs to be firing on all cylinders to be equal to this challenge.

To cover this, I will organise my remarks into five loosely defined, inter-related buckets:

Capabilities: the present skills and abilities of frontier AI, as well as the rate of improvement. This will serve as context for the rest of the review.
Power politics: the degree to which countries and labs are coordinating or competing, and the attitudes and regulations they are pursuing. An analysis of top-down, coherent sources of power.
Culture: the attitudes and dispositions of societies and their people, and how adapted they are to roll with what follows. An analysis of bottom-up, diffuse drivers of behaviour.
Alignment: the extent to which we have scalable methods for getting AIs to do what we want. Those aspects of AI safety that are a model property.
Defence: the extent to which we have the technology, institutions, and infrastructure to be resilient to AI misuse or misalignment.

Here, in Part 1, I will cover the first two: Capabilities and Power Politics, with the rest to follow in later posts. Note, this review corresponds to section 1.4 Where We Are Now in my research agenda.

Capabilities

The International AI Safety Report 2026 summarises the capabilities of ‘general-purpose’ AI systems, that is LLMs like GPT or Claude, as follows:

General-purpose AI systems can perform a wide range of well-scoped tasks with high proficiency. These include conversing fluently in numerous languages; generating code to complete narrow software tasks; creating realistic images and short videos; and solving graduate-level mathematics and science problems.
However, their capabilities are ‘jagged’: there remain many tasks AI systems do not perform well. For example, AI systems can be derailed by simple errors during multi-step projects; continue to generate text that includes false statements (‘hallucinations’); and cannot yet integrate with robotic components to perform basic physical tasks such as housework…
AI agents are increasingly able to do useful work. For example, AI agents have demonstrated the ability to complete a variety of software engineering tasks with limited human oversight. However, they cannot yet complete the range of complex tasks and long-term planning required to fully automate many jobs.

This reads as a mixed picture, where models are doing impressive things but are still limited. While AI is widely used, with ChatGPT alone having almost a billion active users, and excels at many tasks, it is still fragile and struggles with multi-step projects.

To leave it there though would be to miss the important point: AI capabilities, while jagged, are rapidly advancing. The most famous graph in AI is this plot by METR, charting how competent AIs are at computer-use tasks, measured by how long the tasks take humans to complete1. The task length AIs can succeed at 50% of the time has been doubling roughly every seven months — and is speeding up.

*Measuring AI Ability to Complete Long Tasks*

While this measure isn’t perfect, it still points to something extraordinary: models have gone from unreliable question-answering machines to one-shotting complex software in just over three years.

Coding agents like Claude Code and Codex have crossed a ‘meaningful threshold’ for many users. Where before they were inconsistent and difficult to manage, they are now powerful enablers, able to build bespoke software and even automate parts of AI research. More generally, AI agents have shown progress on a diverse range of well-defined economically valuable tasks, like doing multimedia projects, making legal judgements, and compiling reports. Imagine if progress continues at the same rate for another three years: while we cannot say anything for certain, it’s not implausible that AI could surpass human capabilities on a large fraction of knowledge work, with profound societal consequences.

A key question moving forwards is whether future gains will be concentrated on easily verifiable tasks, like computer-use and mathematics, which are amenable to large-scale reinforcement learning, or whether they will generalise to fuzzier, yet perhaps more impactful capabilities like scientific research, running organisations, or accruing power and resources.

On the one hand, generalisation from verifiable to messy tasks only ever goes so far, as gaps always remain between the two task distributions. So perhaps model jaggedness will become more extreme over time2, with robust areas of human superiority remaining. In this case, AI will still be impactful, but perhaps only as much as a normal technology, like electricity or the internet.

On the other, computer-use capabilities are highly correlated with those needed for AI research, leading to self-improving feedback loops that could speed up progress dramatically. In this scenario, strong generalisation may follow from rapid algorithmic breakthroughs, leading to extraordinary changes to the world in only a few years, potentially including human extinction. Whatever the rights of this particular question, a large amount of new computing power — a critical ingredient in capabilities gains — is coming online over the next year, so we should expect some kind of continued progress.

When I initially drafted this section, I included the phrase ‘current AI does not pose extraordinary risk, as it can only do things within the envelope of ordinary human bad actors’. It looks like the next generation of models are imminently falsifying this. Notably, Anthropic have delayed the release of Claude Mythos because its hacking capabilities could cause chaos in the wrong hands. A frantic Mythos-led patching process for major software is underway to make the world ready for its release, a fact that has spooked the US government. Mythos has such a long time-horizon that it broke METR’s chart — they have too few human comparison points above 16 hours to measure it precisely.

Biosecurity and mass persuasion are often cited as other risks from advanced AI, but current capabilities don’t seem game-changing on these fronts yet. There are reasons to believe both will be hard to crack. In the case of biosecurity, because of the criticality of tacit experimental knowledge and physical access to facilities and equipment, and in mass persuasion, because humans are not actually as gullible as is often assumed, with the key factor in mind-changing being the endorsement of existing trust networks.

Another fear is that models will scheme against their developers and go ‘rogue’ in deployment, potentially exfiltrating their weights or otherwise trying to gain power. A report by METR, looking at agents deployed internally to AI labs, concluded that as of early 2026 they ‘plausibly had the means, motive, and opportunity to start small rogue deployments, but they did not have the means to make them highly robust.’ While good news, we shouldn’t be complacent. This is the least capable the models will ever be.

Capabilities: key takeaways

Current AI can perform many useful tasks such as answer questions, write code, generate images, and solve mathematics problems.
AI agents have become economically useful, beginning to transform domains like software engineering and other computer-use tasks.
Despite this, capabilities are jagged, with models struggling to complete complex, long-time-horizon tasks where correctness is hard to verify.
Most importantly, though, capability gains are progressing quickly, in a process that seems to be speeding up (for computer-use tasks, at least).
The key questions are whether fast gains will continue, and whether they will generalise to longer, fuzzier tasks like scientific research and running organisations.
The newest generation of models, like Claude Mythos, are beginning to be genuinely dangerous, with the ability to hack critical software.
Other threats, like biosecurity, mass persuasion, and rogue deployments are not yet an imminent problem, although we should not be complacent.

Power politics

In an ideal world, when developing a powerful new technology, we would do two things:

Proceed slowly, making sure we properly understand what we are building at each step.
Coordinate, so that everybody involved is taking sufficient precautions and not trying to undercut each other.

Unfortunately, neither of these things are happening with AI. To understand why, we will look here at the coherent, top-down exercise of power3 around AI. We will look at how nation states, international organisations, and the labs themselves are interacting, the laws, policies, and incentives they are following, and how they intend to ensure the future goes well (if indeed, they recognise the risk from AI at all).

There is an important asymmetry at the heart of AI politics, relating to expectations about the future. The big AI labs4, such as OpenAI and Anthropic, believe we are on the cusp of radical change, with superintelligence likely to arrive soon and remake society5. Governments, which it is worth stressing are still by far the most powerful actors in the world, do not tend to believe this. They largely see AI as normal technology which will be useful for economic and military reasons, but not fundamentally requiring urgency or a new mindset.

While the labs are undoubtedly racing against each other, it is also common to frame the competition between countries in the same way — particularly between the US and China, who are by far the most capable nation states. This is perhaps a little misleading, as they do not have the same frame of reference or shared goals. AI risk researcher Seán Ó hÉigeartaigh speaks of there being three global AI competitions happening in parallel:

US political leadership have made AI a prestige race, echoing the Space Race. It’s cool and important and strategic, and they’re going to Win.
For Chinese leadership AI is part of economic strength, soft power and influence. Technology is shared, developing economies will be built on Chinese fundamental tech, the Chinese economy and trade relations will grow. Weakening trust in a capricious US is an easy opportunity to take advantage of.
The [AI labs] are racing [to] something they think will out-think humans across the board, that they don’t yet know how to control, and think might literally kill everyone.

Each are pursuing these ends somewhat independently, in some cases without realising they are playing different games.

The most committed and focussed are the American AI labs, who are deploying colossal resources in their race to recursive self-improvement and the unbounded capabilities it promises. Indeed, the US economy increasingly looks like a ‘leveraged bet’ on their success, with investment in AI infrastructure projected to hit $770 billion in 20266.

It is important to note that, despite the racing, the CEOs of OpenAI, Anthropic, and Google DeepMind (generally regarded as the top three labs) all acknowledge that powerful AI is an extinction risk. In 2023, they signed an open letter stating that:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

Yet they are clearly walking a tightrope, trying to balance this concern with their ambitions — sometimes calling for regulation and other times opposing it; sometimes implying they don’t want to race, and yet barrelling on anyway.

In the absence of significant frontier model regulation, internal lab policy has coalesced around the idea of Responsible Scaling Policies, first pioneered by Anthropic in 20237, and then copied by others. These state the safety standards the labs promise to meet before releasing a model with specific dangerous capabilities. However, the competitive environment seems to have weakened this resolve, and Anthropic recently dropped some of their pledges. This happened right around the time they developed Mythos, arguably the first genuinely dangerous model.

While American labs mostly believe superintelligence is near, the US government does not think in such terms. For it, AI is largely just another way of making money or accruing power. It has been vehemently anti-regulation, even trying to prevent individual states from passing their own laws, a stance they justify as being pro-innovation and necessary to beat China.

One massive American advantage is access to the best chips for training models. This, combined with enormous investment in compute from the US tech industry, has been a large factor keeping American labs ahead of their Chinese rivals and forcing the latter to resort to practices like distilling American models rather than independently pushing the frontier. This advantage has come about because of a few world-leading companies, such as NVIDIA, ASML, and TSMC, all being in the American sphere of influence, allowing the US government to block sales of cutting-edge chips and equipment to China. Despite substantial smuggling, this has significantly reduced China’s available compute.

Understandably, the American AI labs are in favour of maintaining or strengthening these export controls, and are keen to emphasise the AI race to make this case. However, because the US government is largely not thinking in terms of superintelligence and the decisive strategic advantage it could confer, and instead about things like getting the world to use its technology stack8, they have been equivocal on this point. There has been a back and forth about whether to relax the export controls and allow China to buy more NVIDIA chips, potentially giving its labs a big boost. At the time of writing, we seem to be in a relaxation phase.

The US government has also clashed with Anthropic, which until recently was its sole provider of general purpose AI for classified use. When Anthropic objected to government attempts to change their contract, removing opt-outs around autonomous weapons and domestic surveillance, the Department of War9 tried to formally designate it a ‘supply chain risk’, a designation originally designed to remove foreign companies from critical defence capabilities. While this could seriously damage Anthropic, the government seems to have overplayed its hand legally and the measure looks unlikely to properly come into force.

In another twist, while all this was occurring, Anthropic announced their new Mythos model, with its powerful cybersecurity capabilities. This led to the government about-facing, and recognising they may need Anthropic after all. They have begun to talk about tightening control over AI, suggesting that labs may need approval before releasing new models, eventually issuing an executive order outlining a 30-day cyber capabilities vetting process that, while claimed to be ‘voluntary’, may be de facto mandatory. This represents a big shift, seeing the US state exerting much more control over AI than previously.

China, while behind the US on chips and capital, nevertheless has many significant strengths. They have enormous reserves of electrical power for running datacentres and a massive industrial economy, with the leading robotics industry in the world. They also have the largest supply of AI talent coming out of their universities and publishing at top conferences. While it is unlikely their semiconductor industry will catch the West soon, in theory, if they were to succeed, China could overtake the US as the leading AI nation.

The Chinese have a different attitude to AI than the Americans, with less emphasis on ideas like superintelligence. The government sees it as one in a long line of productive capacity enhancing technologies and, following past successes integrating the internet into daily life, have developed a strategy called AI+ designed to encourage adoption across society. This sees AI as an enabler of current processes rather than an independent productive force in its own right. Chinese labs, while more likely to talk about concepts like superintelligence than the government, nevertheless have a tendency to follow its lead, perhaps less inclined to view themselves as protagonists of history the way Silicon Valley billionaires do.

In early 2025, Chinese models briefly seemed to be threatening American domination in what became known as the ‘DeepSeek moment’, when the eponymous lab released their R1 model and proved they could fast follow OpenAI’s then-new reasoning paradigm at a fraction of the cost. More recent evidence suggests they are falling behind again10, leading to enormous demand for banned American models in China. And while the Chinese labs have successfully pursued a strategy of open-sourcing their models, using this as a differentiator against the closed American ones, this may be becoming difficult to sustain financially.

Chinese attitudes to AI safety and regulation seem mixed. They are more proactive regulators than the Americans, restricting AI-mediated content, looking to maintain information control, and recently regulating AI companions, but they have not adopted an approach aimed at risks from frontier AI. While safety may be mentioned pragmatically, for example a top official stating ‘if the braking system isn’t under control, you can’t step on the accelerator with confidence’, and it is rising in salience, they do not do so with a sense of urgency — and their labs invest less in safety compared to their American counterparts.

Of course, there are more countries in the world than the US and China. While they largely lack competitive AI labs, they are not entirely without influence. The EU has tried to capitalise on the so-called ‘Brussels effect’, where it uses its market size to influence global regulations, with its flagship AI Act, which comes fully into force this year. While mostly concerned with prosaic harms from AI, like banning social credit scoring and some types of facial recognition, rather than existential risk, it was paired with a Code of Practice designed to enforce the kind of frontier security commitments the labs are in theory making in their Responsible Scaling Policies. It remains to be seen how much leverage the EU will have over American and Chinese labs on these points.

Countries around the world have also been involved in institution building, in particular the creation of AI safety institutes designed to audit new models and advise national governments. The first of these, the UK AI Security Institute (AISI), has developed strong working relationships with the American AI labs11. These institutes are important, as they bring real AI safety expertise into government, helping leaders and officials make better-informed decisions. AISI was created as part of the preparations for the 2023 AI Safety Summit at Bletchley Park, where the US and China joined other countries in agreeing to ‘work together in an inclusive manner to ensure human-centric, trustworthy and responsible AI that is safe, and supports the good of all’.

While Bletchley felt like a step forward for international coordination around existential risk from AI, changes in political leadership and a loss of momentum seem to have stalled out this kind of diplomatic progress. Since then, there have been three more summits in Bletchley’s lineage — Seoul, Paris, and Delhi — with the latter two moving away from safety and towards becoming trade events. Lately, the attitude among a number of middle powers has been to de-emphasise conversations about superhuman AI in favour of accelerating their own ambitions for independence from American and Chinese models. While understandable12, this may be unrealistic, as no other countries have sufficient ‘capital, energy, chips, [and] talent’ to keep up with the frontier.

In addition to the EU, other jurisdictions have implemented regulations on frontier AI. Despite both the federal government and a well-funded campaign attempting to undermine them, California and New York13 have passed legislation which, amongst other things, mandates developers of the largest models to publish safety policies, promptly disclose incidents, and provide protections for whistleblowers — albeit with only small financial penalties for breaches.

Overall, though, there is not a coherent international push for standards around which important actors are coalescing. Instead we see the emergence of a ‘weak regime complex’ — a patchwork of different organisations and agreements covering different things without a strong unifying structure or process for high-level coordination14. Fixing this is critical, as otherwise we are in a permanent prisoner’s dilemma, where there is always an incentive to race and cut corners on safety. If the other guy isn’t going to slow down, why should you?

Achieving this will, I expect, require many changes, and is likely beyond us, but I think two are fundamental. First, decision-makers need to properly come to terms with the idea that AI could soon become so powerful as to eclipse all other global issues. This is difficult to do, because we can’t measure whether superintelligent AI is either genuinely imminent or dangerous, and it is always hard to expend reputational and financial capital on a risk that is hypothetical. It may be the case that such recognition comes very late in the day.

Second, important actors should stop framing AI as a struggle for supremacy. A great deal of conflict originates in fear of domination, and groups often find it easy to believe themselves underdogs. As a Chinese researcher has pointed out, that the US government’s AI Action Plan begins with the line ‘The United States is in a race to achieve global dominance in artificial intelligence’ makes it ‘hard for those in China who are trying to advocate for safety guardrails.’

Hopefully, we are seeing the start of positive movement, with dialogue about safety recently reopening between the two superpowers. Fundamentally, we must remember that we are all on the same side. If we race too fast, it won’t be America or China who wins, it will be a superintelligent AI.

Power politics: key takeaways

America and China have by far the most capable AI industries and are competing fiercely with each other, albeit not with exactly the same goals.
The American AI labs such as OpenAI and Anthropic are racing hard to build superintelligent AI, which they expect to arrive soon and remake society.
The labs acknowledge the risks of superintelligence, but are caught up in their race, making mixed statements about whether there should be more regulation.
The US government sees AI as a great power competition, hoping to gain an economic and military advantage over China. It is largely not thinking about superintelligence and has been vehemently anti-regulation — at least until very recently.
America holds a large advantage in compute, due to the top chip companies being in its sphere of influence. The government keeps oscillating over how tightly to restrict chip sales to China, trading short-term financial gain against protecting America’s long-term lead.
The Chinese government sees AI as the next in a series of productivity-enhancing technologies, which it wishes to apply widely across the economy. It is a more active regulator than the US government but appears even less focussed on frontier risks and superintelligence.
The EU has passed regulation and a code of practice for frontier AI, formalising the kind of voluntary commitments the American labs have made. It is unclear how well it will be able to enforce these on foreign companies.
While there was some movement towards international political cooperation on AI safety in 2023, momentum has stalled, with diverse actors pursuing their own interests. There is currently no coherent framework to coalesce around.
While it is early days, America and China have recently reopened dialogue on AI safety cooperation.

Future posts in this series will cover Culture, Alignment, and Defence, completing my review of the state of AI safety in mid-2026.

If you have any questions or feedback, please leave a comment. Or, if you wish to give it anonymously, fill out my feedback form. Thanks!

Answering a question takes a human 30 seconds, looking up a fact on the internet several minutes. It is important to note that the AIs tested usually perform the tasks they succeed at much faster than humans.

Particularly if long-term memory and continuous learning remain elusive. Although, some people argue that this is not necessary for broad capability gains to continue, and that in-context learning is enough.

I prefer this framing to the more conventional one of ‘governance’, as it emphasises the critical role of power in determining outcomes.

I will use ‘AI labs’ to refer to organisations building close-to-the-frontier LLM-based general AI systems, such as OpenAI, Anthropic, or DeepSeek. My usage does not refer generically to labs that do AI research of some kind.

The slang term for this attitude is ‘AGI-pilled’. In this article I am going to avoid using the term Artificial General Intelligence (AGI), as I think it has become pretty meaningless.

Some people think this bet will not pay off, at least in the short term, and the ‘bubble’ will pop. Once again, this is a question of expectations and how normal or exceptional a technology AI will prove to be.

Anthropic are generally regarded as the most safety-conscious of the labs.

Or more cynically, about NVIDIA’s share price and related money-making opportunities.

Which has since secured alternative provision from OpenAI and Google.

As are, seemingly, all labs not named OpenAI or Anthropic. Remarkably, it has been reported that some Google DeepMind engineers threatened to quit when the company tried to restrict access to Anthropic’s Claude as a coding agent.

The American version, CAISI, is not as well funded as AISI at the moment, but may become more important down the line.

Particularly as, post-Mythos, the US government might start restricting access to American models.

And Illinois if the governor signs SB 315 into law.

For example, since 2019, various governance and ethics principles have been agreed to by the OECD, the G7, the G20, UNESCO, BRICS, and the Council of Europe, all with questionable levels of teeth.

Working Through AI

Comments

Ready for more?