There are about 20 years left until AGI and ASI: why an AI uprising is still far away
Remember that phrase?
"About 20 years left until controlled thermonuclear fusion"
And no matter how many years pass, we’re still exactly 20 years away from it :)
Unfortunately, AGI hasn’t arrived yet (this is an AI model that can and will perform all the same tasks as an average human, or even better, including self-programming), and therefore this article was written by yours truly — the author of the blog Server Mall.
In my view, even the most advanced models are still far from producing high-quality, living, and genuinely interesting texts — the kind that don’t make you fall asleep from the very first lines. Where the structure is fragmented and unique; where a typo slips in somewhere so you can write to the author about their illiteracy (I’m all for it, go ahead); where the theses don’t match yours, and you can start a flame war in the comments. By the way, those discussions are often more interesting than the article itself.
You wouldn’t try to prove to ChatGPT that it’s talking nonsense — telling it to go study the basics, damn neural network. No, you’d just close the chat or change the prompt. People like arguing with each other; we like imperfections, because we are made of them entirely. Our entire species, Homo sapiens, may itself be the result of an imperfection — a genetic mutation that gave rise to consciousness.
But the paradox is that today’s generative models can’t do things perfectly, and also can’t do them imperfectly in a human way (like Fleming’s messy lab, from which penicillin-producing fungi emerged).
All generative models have problems with fact-checking (yes, humans do too), and without a well-crafted prompt the answer will miss the mark. That’s why for a technical copywriter it’s faster to write the text themselves than to design a prompt, verify facts, fix mistakes, and then “bring it to life” with their own style. And an experienced specialist in their field knows practical nuances that AI simply doesn’t — ChatGPT can help with editing, but the actual substance still has to be gathered manually.
But there is a big fat HOWEVER in all of this: AI models are progressing extremely quickly. People far removed from technology are already thinking in terms of The Matrix and Terminator. Slightly more informed techno-optimists are waiting for AGI or the singularity so they can receive universal basic income and stop working, while techno-phobic conservatives argue that everything should be restricted and banned (haha). Local bans in individual countries will only deepen their global lag, while countries without restrictions will gain an advantage. So the development of neural networks is irreversible.
But in this article I have to disappoint the techno-optimists and please the techno-phobes — humanity has hard limits: chips and electricity generation. They are not insurmountable, but they are extremely serious. This is exactly what I will talk about: the physical side of progress in generative AI.
Words that can stop humanity's enslavement: GPUs and electricity generation

First, a few words about the architecture of GPT — a generative pre-trained transformer.
Transformers, unlike recurrent neural networks (RNNs), do not process data sequentially. Instead, they use an attention mechanism, which allows the model to focus on all elements of the input sequence simultaneously. The transformer model uses matrix operations that are easily parallelized on GPUs.


With the basics covered, let’s move on. At the end of May 2024, NVIDIA’s market capitalization reached $2.76 trillion — almost 10 times more than in October 2022.
Where did such explosive growth come from?
In timing, it lags only about four months behind the appearance of publicly available GPT-3 from OpenAI (June 11, 2020).
Side note! I will mostly talk about NVIDIA technologies, because its CEO Jensen Huang said he understands this field well — that a new industrial revolution has begun, and future NVIDIA Rubin GPUs will change the AI industry the way Vera Rubin changed humanity’s understanding of the cosmos (by making significant contributions to the understanding of galaxy morphology and dynamics, dark matter, and much more).
In general, NVIDIA is the locomotive of the AI chip market with about a 90% share, while all competitors lag years behind technologically.
So. One might assume that the popularity and development of AI catalyzed GPU sales and advancement, which is why NVIDIA’s stock is skyrocketing. But in reality, it is the development of GPUs that made deep learning accessible to almost everyone.
It’s a classic chicken-and-egg situation, except we actually know the chicken came first this time :)

The excerpt from the article “Deep learning with COTS HPC system” by the Department of Computer Science at Stanford University (2013):
In this paper, we present an alternative approach to training such networks, which leverages low-cost computational resources in the form of graphics processing units and introduces the use of a high-speed communication infrastructure for tight coordination of distributed gradient computations. Our system trains neural networks at scales comparable to DistBelief using only 3 machines. We demonstrate the ability to train a network with more than 11 billion parameters — 6.5 times larger than the model in (Dean et al., 2012) — in just a few days using only 2% more machines.
Expert note: DistBelief is a first-generation machine learning system developed by Google.
|
Parameter |
DistBelief (Stanford AI Laboratory) |
GPU system |
|
Compute base |
1,000 CPU servers |
3 GPU servers |
|
Processors / accelerators |
2,000 processors |
12 graphics accelerators |
|
Cores |
16,000 cores |
18,432 cores |
|
Power consumption |
600 kilowatts |
4 kilowatts |
|
Cost |
$5,000,000 |
$33,000 |
As early as 2013, researchers already observed that deep learning is significantly more efficient on graphics accelerators than on general-purpose CPUs (thanks to the thousands of CUDA cores in NVIDIA GPUs and massive parallelism).
The relatively low cost of GPUs made research and development of advanced AI models, such as GPT-3 from OpenAI, accessible to almost everyone.
And this is where the realization comes: the further development of neural networks is constrained by the limited capabilities of GPUs. There are two ways to bypass these limitations: first — innovation in architecture; second — increasing performance.
Let’s start with the first — refinement, overcoming challenges, and mastery

With architectural innovation, nobody is in a particular rush; the same x86 architecture has been with us since 1978. Yes, it has become 64-bit and gained support for modern technologies such as virtualization and improved SIMD (Single Instruction, Multiple Data) instructions. Still, the core of x86 remains the same, carrying a large legacy burden — which is both an advantage (backward compatibility) and a disadvantage (complexity and architectural heaviness).
ARM, in its time, was not an evolution but a revolution — I wrote a separate article about this on Habr.
For AI, an important architectural innovation has been tensor cores (previously CUDA cores) — specialized hardware blocks in GPUs designed to accelerate linear algebra operations: matrix multiplication and vector operations.
Google has another technology that has been evolving since 2016 — TPU (Tensor Processing Unit), which are specialized integrated circuits designed for machine learning tasks, used with the open-source TensorFlow library from Google itself. Google’s TPUs are available as a cloud service and can be rented on an hourly basis.
Returning to tensor cores. They first appeared in the Volta architecture (2017), where the cores supported FP16 (half precision) number formats.
Next came the Turing architecture (2018), where second-generation tensor cores additionally supported INT8, INT4, and INT1 — fixed-point numerical formats.
Then came the third generation, Ampere (2020), which introduced support for new numeric formats: TF32 (Tensor Float 32), which improves single-precision computation performance; BF16 (Brain Floating Point 16), a floating-point format that is essentially a shortened version of the 32-bit floating-point standard (FP32). It also supports FP64 — 64-bit floating point, the standard for high-precision numerical representation.
And the fourth generation of tensor cores in the Hopper architecture (2022) introduced FP8 (8-bit floating point). FP32 data can be split into FP8 using the Transformer Engine technology. This significantly reduces memory usage and increases computational speed with minimal loss of accuracy. FP8 is particularly effective for inference tasks, where FP16 or FP32 precision is often unnecessary.
In the future, the Blackwell architecture is expected to introduce FP4 and FP6 support via an updated Transformer Engine, effectively doubling FP8 computational performance. NVIDIA has already announced 20 petaflops of FP4 performance with the dual GB100 accelerator.
Such innovations must continue if AI models are to scale dramatically in parameter count, just as GPT-2 grew from 1.5 billion parameters to 175 billion in GPT-3 (and GPT-4 reportedly reaches into the trillions).
For now, however, NVIDIA is following an evolutionary path — specializing GPUs for AI as much as possible and pushing numerical precision down to its minimum viable limits.
Further fundamental change would require a revolution, or otherwise continued incremental improvements along the existing trajectory — refining manufacturing processes and improving energy efficiency (where hard physical limits already begin to appear).
And now the second path — the samurai’s way: there is no goal, only the path.

It’s simple here: horizontal and vertical scaling. We just buy tens of millions of NVIDIA accelerators, build a massive data center with compute clusters, and train AI systems. All that’s left is to somehow find trillions of dollars for it :)
The CEO of OpenAI is reportedly trying to convince investors to invest $5–7 trillion in building new chip manufacturing capacity required for AI development, according to sources cited by the Wall Street Journal. According to Sam Altman, shortages and the high cost of processors make it impossible for OpenAI to build AI systems that, in his words, will be smarter than humans.
On top of that, a data center with millions of GPUs requires massive amounts of equipment and highly qualified specialists: storage systems, interconnects, power supplies, networking, patch cables, cooling (which can easily account for 10% or more of total power use), air filtration, fire suppression systems, backup power supplies, and all the other joys of data centers; IT architects, engineers, construction teams, and system administrators, ultimately.
Infinite scaling is not possible. There is neither enough manufacturing capacity nor budgets of that scale, although governments and tech giants are already investing enormous sums.
Leopold Aschenbrenner, a former OpenAI employee, claims in his 165-page memo that NVIDIA, Microsoft, Amazon, and Google are already planning spending of $100 billion per quarter just on AI.
The problem is not only clusters, but also the generation of enormous amounts of electricity.

Dynamics of global electricity demand by region (left) and regional shares (right), 1990–2025. Source: International Energy Agency (IEA).
Data centers need to be powered, models need to be trained, and then used. The more people use a trained AI model, the more electricity it consumes. In addition, compute resources are needed not only for training the main AI model, but also for risk reduction (derisking), test runs (many of which fail), other auxiliary models, and so on.
According to 2024 forecasts, AI tools will be used by only 314 million people (slightly less than 4% of the world’s population). But we are aiming for a much more energy-intensive AGI “in every home.” Let’s assume one AGI replaces an average worker, and then multiply that model by the global population. The result becomes something almost beyond comprehension.
Any precise calculations would be speculative, because if AGI replaces humans, it won’t be sitting in front of an office computer. It may be that AGI integrated into robots will eventually replace manual labor as well. People will stop commuting to work or sitting at office PCs, but will instead spend more time watching TikTok and streaming services, playing games, traveling, going to cafés, and so on (all of which also consumes electricity). And we still don’t know hundreds of other variables.
But one thing we do know for sure — humanity will need enormous electricity generation capacity. This is something worth investing in, and necessarily so.
There are some estimates based purely on servers, without accounting for supporting infrastructure. According to the SemiAnalysis blog (an interesting article), integrating ChatGPT into Google Search would require 512,820 servers based on NVIDIA A100 HGX systems, which is more than 4 million GPUs. At a power consumption of 6.5 kW per server, this results in 80 GWh of daily electricity use and 29.2 TWh annually. That is roughly equivalent to the annual output of the Balakovo nuclear power plant, with a capacity of 4,000 MW (30.0 TWh).
Now let’s look a bit into the future. Microsoft will build the world’s largest data center, Stargate, for OpenAI, at a cost of $100 billion. The plan is to reach full capacity by 2030 — consisting of “millions of specialized server chips” in each supercomputer cluster. We don’t know how many supercomputers or how many millions of chips there will be, but it seems to be at least tens of millions, and possibly over 100 million GPUs.

NVIDIA H100 is a computational monster for AI tasks.
Now things get really interesting: the power consumption of a flagship NVIDIA H100 is 700 watts. But as mentioned earlier, additional infrastructure is required. SemiAnalysis estimates around 1,400 watts per H100 when accounting for full system overhead.
Let’s assume that in the future, thanks to improved energy efficiency, advanced process nodes, and innovations in data center design, we reach 1 kW per GPU (the successor to H100). A 10 GW cluster (10 million chips) with near 100% uptime would consume 87.6 TWh annually.
And this problem has to be solved somehow.
For example, the Hinkley Point C nuclear power plant (3.2 GW) has been under construction since March 2017 and is still not completed. Or consider Units 3 and 4 of the Vogtle Nuclear Plant in the United States: two AP1000 reactors with a total capacity of 2.2 GW, planning began in 2006, construction started in 2009, and Unit 4 entered commercial operation on April 29, 2024. Nearly 18 years passed from planning to completion.
Even under ideal conditions, building a single nuclear reactor in 5–10 years is considered a major success. Naturally, nuclear power plant construction also requires billions in investment.
Do insider claims from former OpenAI employees change everything?

When I was writing this article, I came across a report that former OpenAI researcher Leopold Aschenbrenner published a 165-page document on the future of AI.
Here are the claims that spread through media channels:
-
AGI by 2027 is a reality.
-
AGI is already a key geopolitical resource. Forget nuclear weapons — that’s outdated. Any country will do everything possible to achieve AGI first, just as happened with the atomic bomb.
-
Building AGI will require a unified compute cluster costing around a trillion dollars. Microsoft is already building something similar for OpenAI.
-
This machine would consume more electricity than the entire United States produces.
-
By 2030, $8 trillion per year will be invested into AI.
-
AGI is only the beginning. After its creation, there will be an almost immediate transition to ASI (superintelligence). AGI will be so intelligent that it will rapidly self-improve.
As I said, everything ultimately breaks down against GPU manufacturing capacity, current technological limits, trillion-dollar data centers, and the enormous electricity generation required to power them. Yes, all of this is technically possible — it is already being attempted — but 2027 still sounds unrealistic. And so does 2030. A $7 trillion data center cannot be built in a couple of years and immediately produce something like “GPT-6 AGI edition.”
Even if Microsoft and its vast resources complete Stargate by 2030, it will still take time before any meaningful results appear.
What concerns me most is this: AGI and ASI are not guaranteed outcomes. If researchers truly knew the exact recipe for creating them (with only compute and technology being the limiting factors), then we could simply take the most advanced cluster, train a model for 50 years with periodic upgrades, and by 2075 AGI would be guaranteed. But even with trillion-dollar clusters, there is still no guarantee that machines will rise from nuclear ashes and start a war of extermination — or that AGI/ASI will even emerge at all.
By the way, even with a theoretical arrival of AGI, governments will likely maintain human reserve pools of critical specialists, so that disasters like the Carrington Event (which occurs once every few hundred years) do not plunge civilization into darkness and chaos. But that is just the author’s speculation.
That’s it. If you enjoyed the article, make sure to follow us on LinkedIn so you don’t miss the release of new materials.