目录前言1. Motivation today2. Scaling in practice3. Maximum update parametrization – in depth4. CerebrasGPT5. MiniCPM5.1 Techique 1: muP to stabilize scaling5.2 Optimal batch size and LR5.3 What remains – model size vs data tradeoffs5.4 (partial) solutio…