Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
E-Commerce in Large Format: How a Software Engineer Sorts Millions of Chaotic Product Attributes
Most debates about e-commerce scaling revolve around sexual topics: distributed search systems, live inventory management, recommendation algorithms. But lurking behind is a quieter, more persistent problem: managing attribute values. It’s technical noise present in every large online store.
The Silent Problem: Why Attribute Values Complicate Everything
Product attributes are fundamental to the customer experience. They drive filters, comparisons, and search rankings. In theory, it sounds simple. In reality: raw values are chaotic.
A simple set might look like: “XL”, “Small”, “12cm”, “Large”, “M”, “S”. Colors? “RAL 3020”, “Crimson”, “Red”, “Dark Red”. Material? “Steel”, “Carbon Steel”, “Stainless”, “Stainless Steel”.
Individually, these inconsistencies seem harmless. But multiply that across 3 million SKUs, each with dozens of attributes – the problem becomes systemic. Filters behave unpredictably. Search engines lose relevance. Customers experience slower, more frustrating browsing. And backend teams drown in manual data cleaning.
A software engineer at Zoro faced exactly this challenge: a problem easy to overlook but impacting every product page.
The Path to Intelligent Automation Without Losing Control
The first principle was clear: no black-box AI. Such systems are hard to trust, debug, or scale.
Instead, a hybrid pipeline was developed that:
The result combined the contextual thinking of modern language models with fixed rules and controls. AI with guardrails, not AI out of control.
Architecture Overview: How It All Fits
The entire process runs in offline background jobs, not in real-time. This was not a compromise – it was architecturally necessary.
Real-time pipelines may sound tempting, but lead to:
Offline processing enables:
The architecture works as follows:
The Four Layers of the Solution
Layer 1: Data Preparation
Before applying intelligence, a clear preprocessing step was performed. Trimming whitespace. Deduplicating values. Contextualizing category breadcrumbs into structured strings. Removing empty entries.
This may seem fundamental, but it significantly improved AI performance. Garbage in, garbage out – at this scale, small errors can cause big problems later.
Layer 2: Intelligent Sorting with Context
The language model was not just a sorting tool. It reasoned about the values.
The service received:
With this context, the model could understand:
The model returned:
Layer 3: Deterministic Fallbacks
Not every attribute needs intelligence. Numeric ranges, unit-based values, and simple sets benefit from:
The pipeline automatically recognized these cases and used deterministic logic. This kept the system efficient and avoided unnecessary LLM calls.
Layer 4: Human Override
Each category could be tagged as:
This dual system allowed humans to make final decisions while intelligence handled the heavy lifting. It also built trust – merchants could override the model at any time.
From Chaos to Clarity: Practical Results
The pipeline transformed chaotic raw data:
These examples show how combining contextual understanding with clear rules works.
Persistence and Control Across the Entire Chain
All results were stored directly in a Product MongoDB. MongoDB became the single source of truth for:
This simplified reviews, overrides, reprocessing categories, and synchronization with other systems.
After sorting, values flowed into:
This ensured filters displayed in logical order, product pages showed consistent attributes, and search engines ranked products more accurately.
Why Not Just Use Real-Time?
Real-time processing would mean:
Offline jobs offered:
The trade-off was a slight delay between data ingestion and display. The benefit was consistency at scale – which customers value much more.
Measurable Impact
The solution delivered:
This was not just a technical win – it was also a victory for user experience and business results.
Key Takeaways for E-Commerce Software Engineers
Conclusion
Sorting attribute values sounds simple. But when it involves millions of products, it becomes a real challenge.
By combining language model intelligence with clear rules, contextual understanding, and human control, a complex, hidden problem was transformed into a clean, scalable system.
It reminds us that some of the greatest successes come from solving boring problems – those that are easy to overlook but appear on every product page.