2026-03-06 02:29:19

GPT-5.4 Released, the future direction of AI iteration is clear. The current AI field has evolved from dialog boxes to system intelligent agents, with humans responsible for aesthetics and AI for implementation, moving towards a human-machine collaborative workflow.

➤ Key Upgrades of GPT-5.4
1. Combines the general reasoning capabilities of GPT-5.2 with the top-tier programming skills of GPT-5.3-Codex
2. Supports a 1 million token window (about 5000 pages of documents), solving the pain point of long texts being easily forgotten
3. Native computer operation, allowing the model to directly see screens, use the mouse, and type on the keyboard like a human. In OSWorld testing, a 75.0% success rate has surpassed the average human level
4. Introduces mid-interruption functionality. Conversations are no longer rigid turn-based, and users can insert new requests at any time while the model is thinking or responding
5. Efficiency and cost optimization with the Tool Search mechanism. The model no longer needs to preload all tool definitions but searches on demand, significantly saving 47% of token consumption.
➤ Why is this happening?
Currently, top AI labs worldwide are facing data walls. By 2026 at the latest, all high-quality texts, codes, and books produced by humanity might be fully collected by large models. Training on text has reached a bottleneck, with models like Claude code, Codex, and openclaw deeply integrated with current operating systems, replacing some human operations by calling system tools, and possessing autonomous awareness to complete tasks.
Another little-known fact is that the Codex series models are trained together with the Codex framework. In other words, the Codex series models and the Codex framework are native to each other, allowing models to naturally call all development tools within Codex.
➤ In-Depth Analysis of Future AI Development Directions
1. From API stitching to native OS-level integration
The Computer Use capability demonstrated by GPT-5.4 marks a move from dialog boxes to the entire operating system.
Previously, models only wrote code within a limited sandbox. After the upgrade, they will have physical hands. Not only understanding code logic but also visual feedback from clicks, drags, and terminal errors.
The new framework layer will no longer be a set of preset utility functions but will have deep awareness of the OS. During training, the model learns how to observe the screen and provide feedback, enabling it to debug and modify code like an experienced engineer while viewing UI changes in the browser, achieving end-to-end self-loop development, as already demonstrated with Codex.
2. Million-token context + long-term task architecture + memory system = all-round architect
In Codex’s three-layer architecture, the model layer provides structured reasoning. The 1 million token context brought by GPT-5.4 essentially offers a broader canvas for this reasoning.
OpenAI’s memory system has always been leading, especially with the release of lossless and infinite memory. When models and frameworks are native to each other, models can instantly retrieve the entire codebase (millions of tokens), and frameworks can precisely apply modifications across dozens of related files.
Now, full architecture rewrites and precise understanding of code semantics are already possible within Codex.
3. Search-based tool invocation and dynamic expansion
GPT-5.4 introduces the Tool Search mechanism, allowing the framework to understand the model’s output pattern better. The model gains more context information for precise operations.
The future development will avoid preloading thousands of tool libraries (to prevent token waste). Instead, when the model needs a data visualization component, it will fetch definitions and load them in real-time via Tool Search. This means current skills might be intermediate products, with more tools embedded into the model content, allowing the large model to autonomously choose which tool to call.
The benefit is maintaining extremely high token efficiency. It solves the paradox where more tools make the model slower, enabling the agent’s skill tree to infinitely extend, automatically optimize, and find the best path for retraining the next-generation models.
4. Real-time interaction, shifting from turn-based to anytime interruption and modification
GPT-5.4 introduces mid-interruption, breaking the black-box state of AI generation and allowing for timely adjustments.
On the collaborative level, more human decision-making is integrated, rather than fully autonomous AI operation, achieving white-box collaboration. Humans handle aesthetics, requirement definition, and solution selection, while AI handles implementation.
The real-time intervention capability transforms AI from a one-time delivery “black box” into a collaborative engineering partner that can modify requirements at any time.
Simply put, the new AI Native mode (Codex + GPT-5.4) is like building an F1 race car from scratch, with the engine, chassis, and tires designed from day one for maximum speed.
In the future, we may no longer need to seek more powerful models but focus on systems that integrate more deeply with development environments.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.