Understanding the PyCUDA memory model with matrix manipulation