Articles
A collection of articles about machine learning, quantitative trading, software development, and other technical topics. Here I share my experiences, insights, and learnings from working in these fields.
LLM Inference Caching
Explain what is the caching technique in LLM Inference from HardWare to Application Layer
Claude with MCP
Productivity
How AK uses LLM
Productivity
AI Agent
Agent: Tool & Planning
Scaling Law
What is Scaling Law? And will it end?
Server LLM with Ollama
Run LLM locally with ChatBot UI.
Langchain: Retriever
Query and retrieve documents
Langchain: Tools
How to enable tools use in LangChain
Langchain: OutputParser
Parsing LLM structured outputs
Langchain: Fundamentals
A fundamental view of LangChain
Parameter Efficient Fine Tuning: LoRA and QLoRA
Parameter Efficient Fine-Tuning (PEFT) is a technique designed to fine-tune models while minimizing the need for extensive resources and cost. PEFT is a great choice when dealin...
Memory Optimization for LLM Inference
Less memory for inference
How much memory do we need for LLM?
Memory requirements for LLM