Articles
A collection of articles about machine learning, quantitative trading, software development, and other technical topics. Here I share my experiences, insights, and learnings from working in these fields.
How Codex Team Built Now
Some thoughts after using Codex more than Claude Code recently, and after listening to how the Codex team builds with Codex.
Running Claude Code as a Production Harness Agent
How to run Claude Code as a production harness agent for a recurring operations workflow, with notes on cron, permissions, secrets, MCP, cost, and failure handling.
How Claude Code Design Prompt Caching
In claude code, to achieve an efficient caching, the caching design and context compaction are well designed for long-session conversation/task
LLM Inference Caching
Explain what is the caching technique in LLM Inference from HardWare to Application Layer
Claude with MCP
Productivity
How AK uses LLM
Productivity
AI Agent
Agent: Tool & Planning
Scaling Law
What is Scaling Law? And will it end?
Server LLM with Ollama
Run LLM locally with ChatBot UI.
Langchain: Retriever
Query and retrieve documents
Langchain: Tools
How to enable tools use in LangChain
Langchain: OutputParser
Parsing LLM structured outputs
Langchain: Fundamentals
A fundamental view of LangChain
Parameter Efficient Fine Tuning: LoRA and QLoRA
Parameter Efficient Fine-Tuning (PEFT) is a technique designed to fine-tune models while minimizing the need for extensive resources and cost. PEFT is a great choice when dealin...
Memory Optimization for LLM Inference
Less memory for inference
How much memory do we need for LLM?
Memory requirements for LLM