diff --git a/claude-skills/recommendation-systems.md b/claude-skills/recommendation-systems.md new file mode 100644 index 0000000..8ee425d --- /dev/null +++ b/claude-skills/recommendation-systems.md @@ -0,0 +1,111 @@ +--- +name: recommendation-systems +description: Design and build recommendation systems. Use when adding "you might also like", personalization, content ranking, or collaborative filtering to a product. +--- + +# Recommendation Systems Skill + +Use when: building personalized recommendations, content ranking, "similar items", or any system that suggests things to users. + +## Choose the Right Approach First + +| Approach | When to use | Cold start? | Needs user data? | +|----------|------------|-------------|-----------------| +| Popularity/Trending | New product, no data | ✅ Works | No | +| Content-Based | Rich item metadata | ✅ Works | Minimal | +| Collaborative Filtering | Established user base | ❌ Problem | Yes (lots) | +| Hybrid | Best results | Partial | Yes | + +## Approach 1: Popularity-Based (Start Here) +Good for: new products, anonymous users, trending sections. +```sql +SELECT item_id, COUNT(*) as interactions +FROM user_interactions +WHERE created_at > NOW() - INTERVAL '7 days' +GROUP BY item_id +ORDER BY interactions DESC +LIMIT 20; +``` + +## Approach 2: Content-Based Filtering +Match items by their attributes. Good when you have item metadata. + +Key steps: +1. Vectorize item features (category, tags, text embeddings) +2. Build user profile from their interaction history +3. Score unvisited items by cosine similarity to user profile + +```python +from sklearn.metrics.pairwise import cosine_similarity +import numpy as np + +def get_recommendations(user_profile, item_vectors, item_ids, n=10): + scores = cosine_similarity([user_profile], item_vectors)[0] + top_indices = np.argsort(scores)[::-1][:n] + return [item_ids[i] for i in top_indices] +``` + +## Approach 3: Collaborative Filtering +"Users like you also liked..." — needs significant interaction data (1000+ users). + +**Matrix Factorization (SVD):** +```python +from surprise import SVD, Dataset, Reader +from surprise.model_selection import cross_validate + +reader = Reader(rating_scale=(1, 5)) +data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader) +model = SVD(n_factors=50, n_epochs=20) +model.fit(data.build_full_trainset()) +prediction = model.predict(user_id, item_id) +``` + +**Simpler: item-item similarity from co-occurrence:** +```sql +-- Items frequently viewed together +SELECT a.item_id as item_a, b.item_id as item_b, COUNT(*) as co_views +FROM user_interactions a +JOIN user_interactions b ON a.session_id = b.session_id AND a.item_id != b.item_id +GROUP BY a.item_id, b.item_id +HAVING COUNT(*) > 10 +ORDER BY co_views DESC; +``` + +## Approach 4: Embedding-Based (Modern, Scalable) +Use vector embeddings + pgvector or Pinecone for semantic similarity. + +```sql +-- pgvector: find similar items +SELECT id, title, embedding <-> $1::vector AS distance +FROM items +ORDER BY embedding <-> $1::vector +LIMIT 10; +``` + +## System Design Checklist +- [ ] **Logging first** — instrument all interactions (views, clicks, purchases, skips) BEFORE building recommendations +- [ ] **Baseline first** — ship popularity-based before ML models +- [ ] **A/B test** — measure CTR, conversion, engagement vs control +- [ ] **Filter seen items** — don't recommend what user already has/viewed +- [ ] **Diversity** — avoid echo chambers; add exploration (epsilon-greedy or MMR) +- [ ] **Freshness** — decay old interactions, boost new content +- [ ] **Cold start plan** — what do new users/items see before data exists? +- [ ] **Explainability** — "Because you watched X" improves trust + +## Postgres Schema for Interaction Logging +```sql +CREATE TABLE user_interactions ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID REFERENCES users(id), + item_id UUID REFERENCES items(id), + event_type TEXT NOT NULL CHECK (event_type IN ('view','click','purchase','skip','rate')), + rating NUMERIC(3,2), -- NULL unless event_type = 'rate' + session_id UUID, + context JSONB DEFAULT '{}', -- page, position, query, etc. + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX idx_interactions_user_id ON user_interactions(user_id); +CREATE INDEX idx_interactions_item_id ON user_interactions(item_id); +CREATE INDEX idx_interactions_created_at ON user_interactions(created_at DESC); +```