Interesting AF. Promoting the best 6 layers to Q4 or the last 6 layers to Q4 (Q2 DS4 Flash quants) have different effects depending on what you check. The full logits error is smaller in the "last" variant, but actually the "best" variants (layers 32,25,15,27,23,31) as a better top token agreement with Q4. This makes sense, actually: likely the last layers improve the logits distribution precision but other layers make the decisions more similar.