Shkumbin Sherifi

Shkumbin Sherifi — Blog https://shkumbins.dev/blog Technical writing on AI infrastructure, memory systems, and local-first agent architecture. en-us Sat, 06 Jun 2026 00:00:00 +0000 Building a 5-Layer Memory System for an Autonomous AI Agent https://shkumbins.dev/blog/memory-layer https://shkumbins.dev/blog/memory-layer Thu, 04 Jun 2026 00:00:00 +0000 How I built a proper memory hierarchy with consolidation for a local-first AI agent on Apple Silicon. Not just a vector DB bolted on — a layered system with session store, MCP companion, durable facts, synthesized knowledge, and full-text search. Local Model Benchmarks: Running LLMs on Apple Silicon via MLX https://shkumbins.dev/blog/local-model-benchmarks https://shkumbins.dev/blog/local-model-benchmarks Thu, 04 Jun 2026 00:00:00 +0000 Real benchmark data from running quantized LLMs locally on M4 Pro 48GB. Three models, two benchmark types, production stats, and the memory architecture that makes it work.