Tag: llm | DevRel Field Notes

Post 17 Jun 2026 3 min read

How Plan Caching Reduces LLM Agent Costs

Plan caching reuses planning templates across similar agent tasks, cutting cost and latency without throwing away accuracy.

Post 5 Jun 2026 3 min read

Stop filling your agent's context window just because you can

Bigger context windows do not remove failure modes. They create new ones when we stop being intentional about what goes into an agent's context.

Post 21 May 2026 3 min read

I benchmarked 5 embedding models across 4 datasets

I benchmarked five embedding models across four NanoBEIR datasets and found that bigger embeddings did not always produce better retrieval.

Post 19 May 2026 3 min read

Why reranking matters with cross-encoders

Bi-encoders make retrieval fast, but cross-encoders expose why reranking matters when meaning depends on the query.

Post 11 May 2026 8 min read

A beginner-friendly guide to the GGUF model format

GGUF made local LLM inference feel practical by packaging model weights, vocabulary, hyperparameters, and architecture metadata into one runnable format.

Post 27 Mar 2026 3 min read

Semantic Caching in Production

Repeated user intents can quietly inflate LLM cost and latency. Semantic caching helps, but production use comes with trade-offs.