Keep in mind that there usually is more than one potential method to route DDR critical signals. Many times, the processor manufacturer has guidelines that explains the options and recommendations. Note that these guides many times are hidden behind security so you usually need to request access.
Beyond that, keep in mind that length matching generally is better if you are matching delay length/time instead of physical length. Where this becomes really important is when you route some nets on outer layers, but other nets for the same group on inner layers. Delay time is different between inner and outer layers and this needs to be compensated for. This is why many recommendations suggest routing all of a group on a single layer in order for physical variences to be applied equally across the complete group. Do not forget that vias also have “length” so make sure your layout tool acounts for that based on signal path, not just overall via length.
Another item to note is that within the processor, each net has internal delay that is unique to the internal configuration. These delays also should be figured into the overall delay calculations in order to increase the accuracy of length matching.
It is best to use solid GND reference layers adjacient to critical signals. Avoid PWR layers for returns and if done anyway, do NOT route over splits/voids and be sure the signals are adjacent to the correct voltage that powers those signals.
I highly recommend Beeker’s “Billion Dollar Mistake” seminar.
One last thought. PCB fabrication has tolerances so the best bet is to length match nets as tight as possible instead of to the largest tolerance allowed by guides. You do NOT want to design at the edge of length tolerances, only to find that PCB fabrication tolerances (which are out of the designer’s control) push the result over the limit.