[mlir] fix the rocm runtime wrapper to account for cuda / rocm api differences