chore: add skills library

2026-05-29 20:13:20 -04:00 · 2026-05-29 20:13:20 -04:00 · 3e9c9a6672
commit 3e9c9a6672
parent cf358053f0
1 changed files with 111 additions and 0 deletions
--- a/claude-skills/recommendation-systems.md
+++ b/claude-skills/recommendation-systems.md
@ -0,0 +1,111 @@
+---
+name: recommendation-systems
+description: Design and build recommendation systems. Use when adding "you might also like", personalization, content ranking, or collaborative filtering to a product.
+---
+
+# Recommendation Systems Skill
+
+Use when: building personalized recommendations, content ranking, "similar items", or any system that suggests things to users.
+
+## Choose the Right Approach First
+
+| Approach | When to use | Cold start? | Needs user data? |
+|----------|------------|-------------|-----------------|
+| Popularity/Trending | New product, no data | ✅ Works | No |
+| Content-Based | Rich item metadata | ✅ Works | Minimal |
+| Collaborative Filtering | Established user base | ❌ Problem | Yes (lots) |
+| Hybrid | Best results | Partial | Yes |
+
+## Approach 1: Popularity-Based (Start Here)
+Good for: new products, anonymous users, trending sections.
+```sql
+SELECT item_id, COUNT(*) as interactions
+FROM user_interactions
+WHERE created_at > NOW() - INTERVAL '7 days'
+GROUP BY item_id
+ORDER BY interactions DESC
+LIMIT 20;
+```
+
+## Approach 2: Content-Based Filtering
+Match items by their attributes. Good when you have item metadata.
+
+Key steps:
+1. Vectorize item features (category, tags, text embeddings)
+2. Build user profile from their interaction history
+3. Score unvisited items by cosine similarity to user profile
+
+```python
+from sklearn.metrics.pairwise import cosine_similarity
+import numpy as np
+
+def get_recommendations(user_profile, item_vectors, item_ids, n=10):
+    scores = cosine_similarity([user_profile], item_vectors)[0]
+    top_indices = np.argsort(scores)[::-1][:n]
+    return [item_ids[i] for i in top_indices]
+```
+
+## Approach 3: Collaborative Filtering
+"Users like you also liked..." — needs significant interaction data (1000+ users).
+
+**Matrix Factorization (SVD):**
+```python
+from surprise import SVD, Dataset, Reader
+from surprise.model_selection import cross_validate
+
+reader = Reader(rating_scale=(1, 5))
+data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)
+model = SVD(n_factors=50, n_epochs=20)
+model.fit(data.build_full_trainset())
+prediction = model.predict(user_id, item_id)
+```
+
+**Simpler: item-item similarity from co-occurrence:**
+```sql
+-- Items frequently viewed together
+SELECT a.item_id as item_a, b.item_id as item_b, COUNT(*) as co_views
+FROM user_interactions a
+JOIN user_interactions b ON a.session_id = b.session_id AND a.item_id != b.item_id
+GROUP BY a.item_id, b.item_id
+HAVING COUNT(*) > 10
+ORDER BY co_views DESC;
+```
+
+## Approach 4: Embedding-Based (Modern, Scalable)
+Use vector embeddings + pgvector or Pinecone for semantic similarity.
+
+```sql
+-- pgvector: find similar items
+SELECT id, title, embedding <-> $1::vector AS distance
+FROM items
+ORDER BY embedding <-> $1::vector
+LIMIT 10;
+```
+
+## System Design Checklist
+- [ ] **Logging first** — instrument all interactions (views, clicks, purchases, skips) BEFORE building recommendations
+- [ ] **Baseline first** — ship popularity-based before ML models
+- [ ] **A/B test** — measure CTR, conversion, engagement vs control
+- [ ] **Filter seen items** — don't recommend what user already has/viewed
+- [ ] **Diversity** — avoid echo chambers; add exploration (epsilon-greedy or MMR)
+- [ ] **Freshness** — decay old interactions, boost new content
+- [ ] **Cold start plan** — what do new users/items see before data exists?
+- [ ] **Explainability** — "Because you watched X" improves trust
+
+## Postgres Schema for Interaction Logging
+```sql
+CREATE TABLE user_interactions (
+    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    user_id     UUID REFERENCES users(id),
+    item_id     UUID REFERENCES items(id),
+    event_type  TEXT NOT NULL CHECK (event_type IN ('view','click','purchase','skip','rate')),
+    rating      NUMERIC(3,2),  -- NULL unless event_type = 'rate'
+    session_id  UUID,
+    context     JSONB DEFAULT '{}',  -- page, position, query, etc.
+    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX idx_interactions_user_id ON user_interactions(user_id);
+CREATE INDEX idx_interactions_item_id ON user_interactions(item_id);
+CREATE INDEX idx_interactions_created_at ON user_interactions(created_at DESC);
+```