Blog | Suchinthaka Wanninayaka

7 RAG Retrieval Strategies, Benchmarked

Which retrieval method actually wins? I built DocMind AI to find out.

Mar 11, 2026·12 min read·raglangchainllm+3

Transformer Deep Dive: Part 8 - Alternative Architectures

Beyond Transformers - State Space Models (SSMs), Mamba, Linear Attention, RWKV, and hybrid architectures that challenge the attention paradigm.

Jan 22, 2025·29 min read·transformersmambassm+3

Transformer Deep Dive · Part 7

Transformer Deep Dive: Part 7 - Minor But Important Changes

The small details that matter - bias removal, tied embeddings, parallel attention and FFN computation, and initialization schemes in modern LLMs.

Jan 21, 2025·24 min read·transformersarchitectureinitialization+2

Transformer Deep Dive · Part 6

Transformer Deep Dive: Part 6 - Inference Optimization

Production deployment techniques - KV-cache for avoiding redundant computation, quantization for memory efficiency, speculative decoding for faster generation, and continuous batching for throughput.

Jan 20, 2025·25 min read·transformersinferencekv-cache+3

Transformer Deep Dive · Part 5

Transformer Deep Dive: Part 5 - Training Improvements

Modern training techniques for LLMs - AdamW optimizer, learning rate schedules, mixed precision training (FP16/BF16), gradient checkpointing, and distributed training strategies.

Jan 19, 2025·25 min read·transformerstrainingoptimization+3

Transformer Deep Dive · Part 4

Transformer Deep Dive: Part 4 - FFN Modifications

Evolution of the Feed-Forward Network - from ReLU to GELU, SwiGLU gated activations, and Mixture of Experts for scaling to trillion-parameter models.

Jan 18, 2025·22 min read·transformersffnswiglu+3

Transformer Deep Dive · Part 3

Transformer Deep Dive: Part 3 - Attention Modifications

Evolution of the attention mechanism - from sinusoidal to RoPE positional encoding, Multi-Query Attention, Grouped Query Attention, and the revolutionary FlashAttention algorithm.

Jan 17, 2025·22 min read·transformersattentionrope+3

Transformer Deep Dive · Part 2

Transformer Deep Dive: Part 2 - Architecture Changes

How modern LLMs evolved from the original Transformer - decoder-only architecture, Pre-Layer Normalization, and RMSNorm. The fundamental architectural shifts that power GPT, LLaMA, and Mistral.

Jan 16, 2025·17 min read·transformersattentiondeep-learning+2

Transformer Deep Dive · Part 1

Transformer Deep Dive: Part 1 - The Original Transformer (2017)

A deep dive into the original Transformer architecture from 'Attention Is All You Need' - the encoder-decoder structure, scaled dot-product attention, multi-head attention, and the design decisions that revolutionized NLP.

Jan 15, 2025·10 min read·transformersattentiondeep-learning+2