April 10, 2026ai10 min read
Building an Image-Based Clothing Recommendation System
An end-to-end image-based recommendation system that combines classification, feature extraction, and similarity ranking to generate meaningful clothing recommendations from user-uploaded images.
Introduction
This project focuses on building an image-based clothing recommendation system designed to help users find visually and semantically similar products from large-scale e-commerce datasets.
At a high level, the system takes an input image and returns similar clothing items. However, in practice, this problem is not just about similarity — it is about combining multiple components into a coherent system.
Problem
E-commerce platforms contain a massive number of clothing items. As product diversity increases, users struggle to find visually similar or relevant items efficiently.
A naive approach would be:
- extract features from an image
- compute similarity
- return nearest neighbors
This approach fails in practice.
Similarity alone often produces:
- visually similar but semantically incorrect items
- matches across different categories
- irrelevant recommendations due to shared textures or colors
The real problem is not just similarity — it is controlled similarity under constraints.
System Overview
The system is built as a multi-stage pipeline:
Image → Classification → Feature Extraction → Similarity → Filtering → Recommendations
Each stage solves a different failure mode.
Dataset
The dataset was constructed by scraping product images from e-commerce platforms.
- ~54,000 images
- 22 clothing categories
- images include real-world noise (models, backgrounds, lighting)
Key challenges:
- inconsistent labeling
- multiple objects per image
- background noise affecting feature extraction
Classification Layer (YOLOv8)
To prevent semantically incorrect matches, a classification step was introduced.
YOLOv8 was used to:
- detect clothing category
- assign class labels to images
Why classification first?
Without classification:
- similarity search returns cross-category results
- ranking becomes meaningless
With classification:
- search space is constrained
- results become more relevant
Feature Extraction (ResNet50)
ResNet50 was used to extract high-dimensional feature vectors from images.
Key choices:
- pretrained weights (ImageNet)
- transfer learning
- feature extraction without retraining
The output is a dense vector representing visual features of the image.
Beyond CNN Features: Color Features
CNN features alone were not sufficient.
Observed issue:
- visually similar structure ≠ similar clothing perception
To improve results:
- color mean
- color histogram
These were added as additional features.
Final feature vector:
[ResNet Features + Color Mean + Color Histogram]
This significantly improved recommendation quality.
Similarity Metrics
Different similarity algorithms were tested:
Euclidean Distance
- unstable in high-dimensional space
- sensitive to magnitude differences
- rejected
Cosine Similarity
- better than Euclidean
- measures angle between vectors
- still inconsistent in some cases
Pearson Correlation
- measures linear relationship
- more stable across combined features
- selected as final approach
Final similarity:
similarity(A, B) = PearsonCorrelation(A, B)
Recommendation Logic
The recommendation process:
- User uploads an image
- YOLOv8 predicts category
- ResNet50 extracts features
- Color features are extracted
- Combined feature vector is created
- Similarity is computed against dataset
- Results are filtered by category
- Top-N items are returned
System Design Decisions
1. Classification before similarity
Improves semantic correctness
2. Hybrid feature representation
Combining deep features + handcrafted features improves ranking
3. Pearson over cosine
Better stability for this dataset
4. No database, file-based storage
Features stored as serialized vectors (pickle)
Performance
- YOLOv8 Top-1 Accuracy: ~0.87
- Dataset size: ~54k images
- High recommendation relevance in controlled tests
Challenges
1. Noisy images
Images contain:
- models
- backgrounds
- irrelevant objects
This affects both classification and feature extraction.
2. Feature alignment problem
Different feature types (CNN + color) behave differently.
Combining them required:
- normalization
- careful weighting
3. Similarity is not enough
Pure similarity systems fail without constraints.
This was the biggest realization.
Key Insight
The biggest takeaway from this project:
Recommendation systems are not model problems.
They are system design problems.
Conclusion
This project demonstrates that building a real-world recommendation system requires more than applying a single model.
It requires:
- structured pipelines
- multiple representations
- careful decision-making across components
Even for a seemingly simple task like "find similar items", the system complexity emerges quickly.
The final system is not just a model — it is a combination of decisions.
Notes
This project was developed as part of a team effort.
Collaborator:
- Güneşsu Açık
Advisor:
- Prof. Dr. Banu DİRİ
This article focuses on my approach to system design and the decisions made during development.