Depends on the application.
For 2d work, X is horizontal and Y is vertical. This should remain unchanged.
When transitioning to 3d work, the above statement should apply to a top view of the object/item. For example, in a game like Super Mario 64 (or any other "simple 3d game"), X+Y axises should refer to the planar movement (forward, back, sideways). The Z dimension would be vertical (up).
When people say height/width/depth, they are usually referring to looking at an object (for example, a cube or a figure). For representation of a gaming world, it makes more sense to use a north/east/up method for XYZ. This allows maps to be made on the XY coordinate system, and Z values whenever height is necessary (such as building structures in Minecraft).
FYI for the OP: "Y is the standard for math" isn't true. Y is the standard up for a 2 coordinate system. Z is the standard up for a 3 coordinate system. Since we need a 3d system for 3d gaming, I vote "Z".