Weight decay induces low-rank attention layers | Read Paper on Bytez