112 lines
4.1 KiB
Markdown
112 lines
4.1 KiB
Markdown
|
|
---
|
||
|
|
name: recommendation-systems
|
||
|
|
description: Design and build recommendation systems. Use when adding "you might also like", personalization, content ranking, or collaborative filtering to a product.
|
||
|
|
---
|
||
|
|
|
||
|
|
# Recommendation Systems Skill
|
||
|
|
|
||
|
|
Use when: building personalized recommendations, content ranking, "similar items", or any system that suggests things to users.
|
||
|
|
|
||
|
|
## Choose the Right Approach First
|
||
|
|
|
||
|
|
| Approach | When to use | Cold start? | Needs user data? |
|
||
|
|
|----------|------------|-------------|-----------------|
|
||
|
|
| Popularity/Trending | New product, no data | ✅ Works | No |
|
||
|
|
| Content-Based | Rich item metadata | ✅ Works | Minimal |
|
||
|
|
| Collaborative Filtering | Established user base | ❌ Problem | Yes (lots) |
|
||
|
|
| Hybrid | Best results | Partial | Yes |
|
||
|
|
|
||
|
|
## Approach 1: Popularity-Based (Start Here)
|
||
|
|
Good for: new products, anonymous users, trending sections.
|
||
|
|
```sql
|
||
|
|
SELECT item_id, COUNT(*) as interactions
|
||
|
|
FROM user_interactions
|
||
|
|
WHERE created_at > NOW() - INTERVAL '7 days'
|
||
|
|
GROUP BY item_id
|
||
|
|
ORDER BY interactions DESC
|
||
|
|
LIMIT 20;
|
||
|
|
```
|
||
|
|
|
||
|
|
## Approach 2: Content-Based Filtering
|
||
|
|
Match items by their attributes. Good when you have item metadata.
|
||
|
|
|
||
|
|
Key steps:
|
||
|
|
1. Vectorize item features (category, tags, text embeddings)
|
||
|
|
2. Build user profile from their interaction history
|
||
|
|
3. Score unvisited items by cosine similarity to user profile
|
||
|
|
|
||
|
|
```python
|
||
|
|
from sklearn.metrics.pairwise import cosine_similarity
|
||
|
|
import numpy as np
|
||
|
|
|
||
|
|
def get_recommendations(user_profile, item_vectors, item_ids, n=10):
|
||
|
|
scores = cosine_similarity([user_profile], item_vectors)[0]
|
||
|
|
top_indices = np.argsort(scores)[::-1][:n]
|
||
|
|
return [item_ids[i] for i in top_indices]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Approach 3: Collaborative Filtering
|
||
|
|
"Users like you also liked..." — needs significant interaction data (1000+ users).
|
||
|
|
|
||
|
|
**Matrix Factorization (SVD):**
|
||
|
|
```python
|
||
|
|
from surprise import SVD, Dataset, Reader
|
||
|
|
from surprise.model_selection import cross_validate
|
||
|
|
|
||
|
|
reader = Reader(rating_scale=(1, 5))
|
||
|
|
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)
|
||
|
|
model = SVD(n_factors=50, n_epochs=20)
|
||
|
|
model.fit(data.build_full_trainset())
|
||
|
|
prediction = model.predict(user_id, item_id)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Simpler: item-item similarity from co-occurrence:**
|
||
|
|
```sql
|
||
|
|
-- Items frequently viewed together
|
||
|
|
SELECT a.item_id as item_a, b.item_id as item_b, COUNT(*) as co_views
|
||
|
|
FROM user_interactions a
|
||
|
|
JOIN user_interactions b ON a.session_id = b.session_id AND a.item_id != b.item_id
|
||
|
|
GROUP BY a.item_id, b.item_id
|
||
|
|
HAVING COUNT(*) > 10
|
||
|
|
ORDER BY co_views DESC;
|
||
|
|
```
|
||
|
|
|
||
|
|
## Approach 4: Embedding-Based (Modern, Scalable)
|
||
|
|
Use vector embeddings + pgvector or Pinecone for semantic similarity.
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- pgvector: find similar items
|
||
|
|
SELECT id, title, embedding <-> $1::vector AS distance
|
||
|
|
FROM items
|
||
|
|
ORDER BY embedding <-> $1::vector
|
||
|
|
LIMIT 10;
|
||
|
|
```
|
||
|
|
|
||
|
|
## System Design Checklist
|
||
|
|
- [ ] **Logging first** — instrument all interactions (views, clicks, purchases, skips) BEFORE building recommendations
|
||
|
|
- [ ] **Baseline first** — ship popularity-based before ML models
|
||
|
|
- [ ] **A/B test** — measure CTR, conversion, engagement vs control
|
||
|
|
- [ ] **Filter seen items** — don't recommend what user already has/viewed
|
||
|
|
- [ ] **Diversity** — avoid echo chambers; add exploration (epsilon-greedy or MMR)
|
||
|
|
- [ ] **Freshness** — decay old interactions, boost new content
|
||
|
|
- [ ] **Cold start plan** — what do new users/items see before data exists?
|
||
|
|
- [ ] **Explainability** — "Because you watched X" improves trust
|
||
|
|
|
||
|
|
## Postgres Schema for Interaction Logging
|
||
|
|
```sql
|
||
|
|
CREATE TABLE user_interactions (
|
||
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
|
|
user_id UUID REFERENCES users(id),
|
||
|
|
item_id UUID REFERENCES items(id),
|
||
|
|
event_type TEXT NOT NULL CHECK (event_type IN ('view','click','purchase','skip','rate')),
|
||
|
|
rating NUMERIC(3,2), -- NULL unless event_type = 'rate'
|
||
|
|
session_id UUID,
|
||
|
|
context JSONB DEFAULT '{}', -- page, position, query, etc.
|
||
|
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||
|
|
);
|
||
|
|
|
||
|
|
CREATE INDEX idx_interactions_user_id ON user_interactions(user_id);
|
||
|
|
CREATE INDEX idx_interactions_item_id ON user_interactions(item_id);
|
||
|
|
CREATE INDEX idx_interactions_created_at ON user_interactions(created_at DESC);
|
||
|
|
```
|