Microsoft AI CEO Mustafa Suleiman says the next chapter of artificial intelligence will be defined not by model intelligence, but by computation cost. Taking on
For many years emphasis was placed on training large foundation models. For 2026, the challenge is to serve those models to millions of users in real time. Deloitte’s TMT predictions The 2026 report states that inference workloads currently account for about two-thirds of total AI compute spending.
Lead time for GPUs is now about a year, with high-bandwidth memory being out of stock by 2026. Of the 16 GW of data center capacity planned for 2026 globally, only 5 GW is currently under construction.
Suleman’s flywheel logic prioritizes higher-margin products like Microsoft 365 Copilot, enterprise legal software, and healthcare SaaS. These premium projections can lower costs, reduce latency, increase user retention, create proprietary data, and improve model tuning.
This loop helps increase adoption and revenue by giving a compounding effect. According to Microsoft, paid Copilot seats are set to reach 15 million in the second quarter of fiscal 2026, an increase of 160% year-over-year.
Cash-strapped AI companies and consumer apps may not have adequate premium estimates, and this could impact them. They may not have the facility to pay for tokens, and this may impact responses and user retention. The wheel cannot rotate. It could be argued that intelligence for a dollar or open source could help companies cope, but Suleiman’s focus is on size and financial prowess. Microsoft is investing more than $80 billion annually in AI infrastructure.
