Thanks for confirming, and for filing a problem report. I'm not very familiar with OpenACC, but according to -Minfo the 13.3 compiler generates a scalar kernel because it can't establish ...
The following generates a stream of out of bounds global reads according to cuda-memcheck when compiled with pgf95 versions 12.5 and 13.3. It works with m=128 and/or using mod instead of iand.