Semantic Caching in Production
Repeated user intents can quietly inflate LLM cost and latency. Semantic caching helps, but production use comes with trade-offs.
Everything in one stream so writing stays simple and discoverable. Use tags or search when the archive gets bigger.
Need keyword search? Go to Search.
4 entries shown
Repeated user intents can quietly inflate LLM cost and latency. Semantic caching helps, but production use comes with trade-offs.
The most useful office hours start with one concrete user task and one measurable outcome.
A practical feedback loop for improving developer onboarding after each release.
Why documenting openly is becoming my default way to learn, not just a distribution tactic.