|
| 1 | +# Quick Guide: DP-442038 Workaround (720x Model Loading Overhead) |
| 2 | + |
| 3 | +**JIRA Issue**: DP-442038 |
| 4 | +**Problem**: IRIS EMBEDDING reloads model from disk on every document insert |
| 5 | +**Impact**: 720x slowdown (20 minutes for 1,746 documents) |
| 6 | +**Solution**: Python embedding cache layer (Feature 051) |
| 7 | +**Result**: 1405x speedup (0.85 seconds for same workload) |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## The Problem |
| 12 | + |
| 13 | +IRIS native `%Embedding.SentenceTransformers` reloads the embedding model **on every INSERT**: |
| 14 | + |
| 15 | +```sql |
| 16 | +-- This triggers model reload for EACH document |
| 17 | +INSERT INTO documents (id, content) VALUES (1, 'text...'); |
| 18 | +-- Model loads from disk (400MB), generates embedding, then exits |
| 19 | +INSERT INTO documents (id, content) VALUES (2, 'text...'); |
| 20 | +-- Model loads AGAIN from disk (400MB), generates embedding, exits |
| 21 | +``` |
| 22 | + |
| 23 | +**Bottleneck**: 1,746 documents × 400MB model = 698GB disk I/O + 20 minutes |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## The Workaround: Python Embedding Cache |
| 28 | + |
| 29 | +We intercept IRIS EMBEDDING calls and cache models in Python memory: |
| 30 | + |
| 31 | +### Architecture |
| 32 | +``` |
| 33 | +IRIS SQL INSERT |
| 34 | + ↓ |
| 35 | +%Embedding.SentenceTransformers |
| 36 | + ↓ |
| 37 | +iris_vector_rag.embeddings.iris_embedding (cache layer) |
| 38 | + ↓ |
| 39 | +Cached model in memory (99%+ cache hits) |
| 40 | + ↓ |
| 41 | +Return embedding to IRIS |
| 42 | +``` |
| 43 | + |
| 44 | +### Key Files |
| 45 | +1. **`iris_vector_rag/embeddings/manager.py`** - Model cache manager (singleton pattern) |
| 46 | +2. **`iris_vector_rag/embeddings/iris_embedding.py`** - IRIS integration layer |
| 47 | +3. **`iris_vector_rag/config/embedding_config.py`** - Configuration validation |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Usage |
| 52 | + |
| 53 | +### 1. Configure IRIS EMBEDDING |
| 54 | + |
| 55 | +```python |
| 56 | +from iris_vector_rag.embeddings.iris_embedding import configure_embedding |
| 57 | + |
| 58 | +# Configure embedding model in IRIS |
| 59 | +configure_embedding( |
| 60 | + connection, |
| 61 | + config_name="medical_embeddings", |
| 62 | + model_name="sentence-transformers/all-MiniLM-L6-v2", |
| 63 | + dimension=384, |
| 64 | + device="cpu", # or "cuda" for GPU |
| 65 | + use_cache=True # CRITICAL: enables caching |
| 66 | +) |
| 67 | +``` |
| 68 | + |
| 69 | +### 2. Create Table with EMBEDDING Column |
| 70 | + |
| 71 | +```sql |
| 72 | +CREATE TABLE RAG.Documents ( |
| 73 | + doc_id VARCHAR(255), |
| 74 | + content VARCHAR(5000), |
| 75 | + embedding VECTOR(DOUBLE, 384) EMBEDDING('medical_embeddings') |
| 76 | +) |
| 77 | +``` |
| 78 | + |
| 79 | +### 3. Insert Documents (Cache Automatically Applied) |
| 80 | + |
| 81 | +```python |
| 82 | +# First insert: Model loads (2-3 seconds) |
| 83 | +cursor.execute( |
| 84 | + "INSERT INTO RAG.Documents (doc_id, content) VALUES (?, ?)", |
| 85 | + ("doc1", "Patient presents with symptoms...") |
| 86 | +) |
| 87 | + |
| 88 | +# Subsequent inserts: Cache hit (15-25ms each) |
| 89 | +for i in range(1000): |
| 90 | + cursor.execute( |
| 91 | + "INSERT INTO RAG.Documents (doc_id, content) VALUES (?, ?)", |
| 92 | + (f"doc{i}", f"Document {i} content...") |
| 93 | + ) |
| 94 | +``` |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Performance Comparison |
| 99 | + |
| 100 | +| Scenario | Method | Time | Speedup | |
| 101 | +|----------|--------|------|---------| |
| 102 | +| **Without Cache** | IRIS native (DP-442038) | 20 min | 1x (baseline) | |
| 103 | +| **With Cache** | Python cache layer | 0.85s | **1405x** | |
| 104 | + |
| 105 | +### Cache Statistics |
| 106 | +- **Cache hit rate**: 99%+ |
| 107 | +- **Cache hit time**: ~15-25ms |
| 108 | +- **Cache miss time**: ~2-3s (first load only) |
| 109 | +- **Memory overhead**: ~400MB per model |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## How It Works |
| 114 | + |
| 115 | +### 1. Model Cache Manager (Singleton Pattern) |
| 116 | + |
| 117 | +```python |
| 118 | +# iris_vector_rag/embeddings/manager.py:21-45 |
| 119 | +_SENTENCE_TRANSFORMER_CACHE: Dict[str, Any] = {} # Module-level singleton |
| 120 | +_CACHE_LOCK = threading.Lock() |
| 121 | + |
| 122 | +def _get_cached_sentence_transformer(model_name: str, device: str = "cpu"): |
| 123 | + """Get or create cached SentenceTransformer model.""" |
| 124 | + cache_key = f"{model_name}:{device}" |
| 125 | + |
| 126 | + with _CACHE_LOCK: |
| 127 | + if cache_key in _SENTENCE_TRANSFORMER_CACHE: |
| 128 | + logger.debug(f"✓ Cache hit for {cache_key}") |
| 129 | + return _SENTENCE_TRANSFORMER_CACHE[cache_key] # 99%+ hit rate |
| 130 | + |
| 131 | + # Cache miss: Load model (happens once per model) |
| 132 | + logger.info(f"Loading model {model_name} (cache miss)") |
| 133 | + model = SentenceTransformer(model_name, device=device) |
| 134 | + _SENTENCE_TRANSFORMER_CACHE[cache_key] = model |
| 135 | + return model |
| 136 | +``` |
| 137 | + |
| 138 | +### 2. IRIS Integration Layer |
| 139 | + |
| 140 | +```python |
| 141 | +# iris_vector_rag/embeddings/iris_embedding.py:337-385 |
| 142 | +def generate_embeddings( |
| 143 | + texts: List[str], |
| 144 | + config_name: str, |
| 145 | + connection: Any |
| 146 | +) -> List[List[float]]: |
| 147 | + """Generate embeddings using cached model (called by IRIS).""" |
| 148 | + |
| 149 | + # Get config from IRIS |
| 150 | + config = get_config(connection, config_name) |
| 151 | + |
| 152 | + # Get cached model (99%+ cache hits) |
| 153 | + model = _get_cached_sentence_transformer( |
| 154 | + config["model_name"], |
| 155 | + config["device"] |
| 156 | + ) |
| 157 | + |
| 158 | + # Generate embeddings (15-25ms for cached model) |
| 159 | + embeddings = model.encode(texts, show_progress_bar=False) |
| 160 | + |
| 161 | + return embeddings.tolist() |
| 162 | +``` |
| 163 | + |
| 164 | +### 3. Cache Statistics Tracking |
| 165 | + |
| 166 | +```python |
| 167 | +# Track cache performance |
| 168 | +stats = get_cache_stats(connection, "medical_embeddings") |
| 169 | +print(f"Cache hits: {stats['cache_hits']}") |
| 170 | +print(f"Cache misses: {stats['cache_misses']}") |
| 171 | +print(f"Hit rate: {stats['cache_hit_rate']:.1%}") |
| 172 | +print(f"Avg embedding time: {stats['avg_embedding_time_ms']:.2f}ms") |
| 173 | +``` |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## Validation Tests |
| 178 | + |
| 179 | +### Contract Tests (TDD) |
| 180 | +Location: `tests/contract/test_iris_embedding_contract.py` |
| 181 | + |
| 182 | +```python |
| 183 | +def test_cache_hit_rate_target(): |
| 184 | + """Verify 80%+ cache hit rate for repeated calls.""" |
| 185 | + # Generate 1000 embeddings with same model |
| 186 | + for _ in range(10): |
| 187 | + embeddings = generate_embeddings(texts * 100, "test_config", conn) |
| 188 | + |
| 189 | + stats = get_cache_stats(conn, "test_config") |
| 190 | + assert stats["total_embeddings"] >= 1000 |
| 191 | + assert stats["cache_hit_rate"] >= 0.80 # 80% target |
| 192 | +``` |
| 193 | + |
| 194 | +### Performance Benchmarks |
| 195 | +Location: `tests/performance/test_iris_embedding_performance.py` |
| 196 | + |
| 197 | +```python |
| 198 | +def test_performance_benchmark_1746_texts(): |
| 199 | + """Validate actual DP-442038 scenario performance.""" |
| 200 | + texts = [f"Document {i} content..." for i in range(1746)] |
| 201 | + |
| 202 | + start = time.time() |
| 203 | + for batch in chunks(texts, 100): |
| 204 | + generate_embeddings(batch, "medical_embeddings", conn) |
| 205 | + elapsed = time.time() - start |
| 206 | + |
| 207 | + # Baseline: 1200s (20 minutes without caching) |
| 208 | + # Target: <30s (40x improvement) |
| 209 | + # Achieved: 0.85s (1405x improvement) |
| 210 | + assert elapsed < 30.0 |
| 211 | +``` |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +## Troubleshooting |
| 216 | + |
| 217 | +### Issue: Model Still Reloading on Every Call |
| 218 | +**Symptom**: Slow performance (>1 second per embedding) |
| 219 | +**Cause**: `use_cache=False` in configuration |
| 220 | +**Fix**: |
| 221 | +```python |
| 222 | +# Reconfigure with caching enabled |
| 223 | +configure_embedding( |
| 224 | + connection, |
| 225 | + config_name="medical_embeddings", |
| 226 | + model_name="sentence-transformers/all-MiniLM-L6-v2", |
| 227 | + dimension=384, |
| 228 | + use_cache=True # ← Must be True |
| 229 | +) |
| 230 | +``` |
| 231 | + |
| 232 | +### Issue: Memory Errors with Large Models |
| 233 | +**Symptom**: `OutOfMemoryError` when loading model |
| 234 | +**Cause**: Model too large for available RAM |
| 235 | +**Fix**: Use smaller model or enable GPU |
| 236 | +```python |
| 237 | +configure_embedding( |
| 238 | + connection, |
| 239 | + config_name="medical_embeddings", |
| 240 | + model_name="sentence-transformers/all-MiniLM-L6-v2", # Use smaller model |
| 241 | + dimension=384, |
| 242 | + device="cuda" # Or use GPU if available |
| 243 | +) |
| 244 | +``` |
| 245 | + |
| 246 | +### Issue: Cache Not Persisting Between IRIS Sessions |
| 247 | +**Symptom**: First call slow after IRIS restart |
| 248 | +**Cause**: Cache is in-memory only (by design) |
| 249 | +**Expected Behavior**: This is normal - first call after restart loads model (~2-3s), subsequent calls use cache |
| 250 | + |
| 251 | +--- |
| 252 | + |
| 253 | +## References |
| 254 | + |
| 255 | +- **Full Documentation**: `docs/IRIS_EMBEDDING_PERFORMANCE_REPORT.md` |
| 256 | +- **Feature Spec**: `specs/051-add-native-iris/spec.md` |
| 257 | +- **Contract Tests**: `tests/contract/test_iris_embedding_contract.py` |
| 258 | +- **Performance Tests**: `tests/performance/test_iris_embedding_performance.py` |
| 259 | +- **Integration Tests**: `tests/integration/test_iris_embedding_integration.py` |
| 260 | + |
| 261 | +--- |
| 262 | + |
| 263 | +## Summary |
| 264 | + |
| 265 | +**DP-442038 Workaround**: Python embedding cache layer eliminates 720x model loading overhead by keeping models in memory across IRIS SQL operations. |
| 266 | + |
| 267 | +**Key Benefit**: 1405x speedup (20 minutes → 0.85 seconds for 1,746 documents) |
| 268 | + |
| 269 | +**Implementation**: Transparent to users - configure once with `use_cache=True`, cache applies automatically to all subsequent IRIS EMBEDDING operations. |
0 commit comments