Product Deduplication Strategies
Deduplication is essential for maintaining clean, efficient product feeds. This guide covers everything you need to know about removing duplicate products intelligently.
Why Deduplication Matters
- Marketplace Compliance: Many platforms reject feeds with duplicate products
- Better User Experience: Customers see each product only once
- Cost Efficiency: Reduce advertising spend on duplicate listings
- Inventory Accuracy: Prevent overselling due to duplicate entries
How Deduplication Works
Identify Duplicates
Products are grouped by the "Match Field" you specify
Compare Priority
Within each group, products are ranked by the "Priority Field"
Keep Best Match
Only the product with the best priority value is retained
Common Deduplication Strategies
Strategy 1: Price-Based Deduplication
Scenario: Multiple sellers offer the same product
- Match Field:
gtin(ormpn) - Priority Field:
price - Priority Direction:
lowest
Result: Keep only the cheapest offer for each unique product
Strategy 2: Stock-Based Deduplication
Scenario: Same product in multiple warehouses
- Match Field:
sku - Priority Field:
quantity - Priority Direction:
highest
Result: Show only the location with most stock
Strategy 3: Quality-Based Deduplication
Scenario: Products with varying data quality
- Match Field:
title - Priority Field:
description_length - Priority Direction:
highest
Result: Keep product with most detailed description
Strategy 4: Variant Consolidation
Scenario: Show only one variant per product group
- Match Field:
parent_id - Priority Field:
is_default - Priority Direction:
highest
Result: Display only the default variant
Advanced Deduplication Techniques
Multi-Stage Deduplication
Apply multiple deduplication rules in sequence for complex scenarios:
- Stage 1: Remove exact SKU duplicates (keep highest stock)
- Stage 2: Remove GTIN duplicates (keep lowest price)
- Stage 3: Remove title duplicates (keep best rated)
Conditional Deduplication
Combine with complex rules for selective deduplication:
IF Category = "Electronics" AND Brand = "Samsung"
THEN Deduplicate by model_number keeping lowest price
ELSE Deduplicate by title keeping highest margin
Important Considerations
Things to Watch Out For
- Case Sensitivity: Match fields are case-insensitive
- Empty Values: Products with empty match fields are skipped
- Processing Order: Deduplication happens after all other rules
- Performance: Large feeds may take longer with complex deduplication
Measuring Success
Track these metrics to ensure effective deduplication:
- Reduction in total products (typically 10-30%)
- Improved feed acceptance rates
- Higher click-through rates (less customer confusion)
- Better conversion rates (showing best options)
Troubleshooting
Too many products being removed?
Check if your match field is too broad. For example, matching by "category" might remove many unique products.
Wrong product being kept?
Verify your priority field contains the expected values and the sort direction is correct.
Deduplication not working?
Ensure the match field exists and has values. Check the "Excluded Products" tab for details.