Futures
Hundreds of contracts settled in USDT or BTC
TradFi
Gold
Trade global traditional assets with USDT in one place
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Participate in events to win generous rewards
Demo Trading
Use virtual funds to experience risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and enjoy airdrop rewards!
Futures Points
Earn futures points and claim airdrop rewards
Investment
Simple Earn
Earn interests with idle tokens
Auto-Invest
Auto-invest on a regular basis
Dual Investment
Buy low and sell high to take profits from price fluctuations
Soft Staking
Earn rewards with flexible staking
Crypto Loan
0 Fees
Pledge one crypto to borrow another
Lending Center
One-stop lending hub
VIP Wealth Hub
Customized wealth management empowers your assets growth
Private Wealth Management
Customized asset management to grow your digital assets
Quant Fund
Top asset management team helps you profit without hassle
Staking
Stake cryptos to earn in PoS products
Smart Leverage
New
No forced liquidation before maturity, worry-free leveraged gains
GUSD Minting
Use USDT/USDC to mint GUSD for treasury-level yields
NVIDIA's GPU-Accelerated Architecture: How Hardware Scheduling Powers the Inference Revolution at CES 2026
At CES 2026, NVIDIA CEO Jensen Huang delivered a sweeping keynote that reframed the AI infrastructure conversation around a single organizing principle: intelligent hardware acceleration and GPU scheduling as the foundation for the inference economy. Across 1.5 hours, he unveiled eight major developments that collectively represent a shift from training-centric AI to inference-optimized systems. The underlying thread connecting all announcements is how sophisticated GPU scheduling—from compute distribution to resource allocation—enables cost-effective, high-throughput AI deployment at scale.
System-Level GPU Acceleration: The Vera Rubin Platform’s Revolutionary Design
The centerpiece of NVIDIA’s strategy is the Vera Rubin AI supercomputer, a six-chip co-designed system that reimagines how GPU acceleration operates at the rack level. The platform’s architecture—comprising Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-X CPO—represents a departure from modular designs toward deeply integrated hardware acceleration.
The Rubin GPU introduces the Transformer engine and achieves up to 50 PFLOPS of NVFP4 inference performance, a 5x leap over Blackwell. More critically, the GPU’s 3.6TB/s NVLink interconnect bandwidth and support for hardware-accelerated tensor operations enable unprecedented GPU scheduling efficiency. The NVLink 6 Switch, operating at 400Gbps per lane, coordinates GPU-to-GPU communication with 28.8TB/s aggregate bandwidth, allowing the system to schedule computation across GPUs with minimal latency overhead.
Integrated into a single-rack Vera Rubin NVL72 system, this hardware acceleration achieves 3.6 EFLOPS of inference performance—a 5x improvement over the previous generation. The system packs 2 trillion transistors and incorporates 100% liquid cooling, enabling dense GPU scheduling without thermal constraints. Assembly time has dropped to five minutes, 18 times faster than predecessor generations, reflecting how standardized GPU acceleration frameworks simplify deployment.
Inference Efficiency Through Intelligent GPU Scheduling and Resource Allocation
NVIDIA’s three new inference products directly address the GPU scheduling challenge at different system layers. The Spectrum-X Ethernet co-packaged optics (CPO) optimize the switching fabric between GPUs. By embedding optics directly into the switching silicon, CPO achieves 5x better energy efficiency and 5x improved application uptime. This architectural choice ensures that GPU-to-GPU scheduling decisions incur minimal power overhead.
The NVIDIA Inference Context Memory Storage Platform tackles a different scheduling problem: context management. As AI models shift toward agentic reasoning with multi-million-token windows, storing and retrieving context becomes the primary bottleneck. This new storage tier, accelerated by BlueField-4 DPU and integrated with NVLink infrastructure, allows GPUs to offload key-value cache computation to dedicated storage nodes. The result is 5x better inference performance and 5x lower energy consumption—achieved not through faster GPUs alone, but through intelligent scheduling of compute and storage resources.
The NVIDIA DGX SuperPOD, built on eight Vera Rubin NVL72 systems, demonstrates how GPU scheduling scales across a pod-level deployment. By using NVLink 6 for vertical scaling and Spectrum-X Ethernet for horizontal scaling, the SuperPOD reduces token costs for large mixture-of-experts (MoE) models to 1/10 of the prior generation. This 10x cost reduction reflects the compounding returns of optimized GPU scheduling: fewer compute cycles wasted, lower data movement overhead, and better resource utilization.
Multi-Tier Storage and GPU Context Management: Solving the New Inference Bottleneck
The transition from training to inference fundamentally changes how GPU resources should be scheduled. During training, GPU utilization is predictable and steady. During inference, especially long-context inference, request patterns are irregular, and context reuse is critical. NVIDIA’s new storage platform addresses this by introducing a memory hierarchy optimized for inference: GPU HBM4 memory for active computation, the new context memory tier for key-value cache management, and traditional storage for persistent data.
GPU scheduling now must balance compute tasks with context scheduling decisions. BlueField-4 DPU accelerates context movements between these tiers, while intelligent software schedules GPU kernel launches to overlap with context prefetching. This collaborative design—spanning GPU compute, DPU acceleration, and network efficiency—eliminates the redundant KV cache recalculations that previously plagued long-context inference.
Open Models and GPU-Optimized Frameworks: Building the Physical AI Ecosystem
NVIDIA’s expanded open-source strategy reflects a recognition that GPU acceleration only delivers value within a thriving software ecosystem. In 2025, NVIDIA became the largest contributor to open-source models on Hugging Face, releasing 650 models and 250 datasets. These models are increasingly optimized for NVIDIA’s GPU scheduling architecture—they exploit Transformer engines, utilize NVFP4 precision, and align with NVLink memory hierarchies.
The new “Blueprints” framework enables developers to compose multi-model, hybrid-cloud AI systems. These systems intelligently schedule inference tasks across local GPUs and cloud-based frontier models based on latency and cost. The release of Alpamayo, a 10-billion-parameter reasoning model for autonomous driving, exemplifies this approach. Alpamayo runs efficiently on inference-optimized GPUs, demonstrating how thoughtful GPU scheduling—paired with model architecture—enables sophisticated reasoning on consumer-grade hardware.
Siemens’ integration of NVIDIA CUDA-X, AI models, and Omniverse into industrial digital twins extends GPU acceleration into manufacturing and operations. This partnership illustrates how GPU scheduling frameworks become infrastructure for entire industries.
Strategic Vision: From GPU Compute Power to Complete System Acceleration
NVIDIA’s announcement sequence reveals a deliberate strategy: each new product layer—from GPU core design through network switching to storage architecture—has been reconsidered for inference workloads. The result is a system where GPU scheduling is no longer a secondary concern but the central design principle.
Jensen Huang’s observation that the “ChatGPT moment for physical AI has arrived” is grounded in this infrastructure foundation. Autonomous vehicles equipped with Alpamayo models require GPUs that can schedule real-time inference under unpredictable conditions. Robots operating via GR00T frameworks demand GPUs that efficiently schedule multi-modal perception and reasoning. These physical AI applications are only possible because NVIDIA has reimagined GPU acceleration from the silicon level to the software stack.
The competitive moat NVIDIA is constructing combines three elements: continuously advancing GPU scheduling efficiency (5x improvements generation-to-generation), opening software to incentivize adoption (650 models, 250 datasets), and making hardware-software integration progressively harder to replicate. Each announcement at CES 2026—from Vera Rubin’s co-designed chips to the context memory platform—deepens GPU acceleration capabilities while simultaneously raising the bar for competing architectures.
As the AI industry transitions from training scarcity to inference abundance, GPU scheduling emerges as the primary constraint on cost and performance. NVIDIA’s full-stack approach ensures that its hardware acceleration capabilities will define the infrastructure layer for the next decade of AI infrastructure development.