Skip to content

Derivatives, Convection, Projection

Derivatives

Primary files: derivatives.f90, test_filtermodule.f90, and shared helpers in functions.f90.

The derivative work was GPU-enabled by moving loop-heavy x/y/z derivative operations to CUF kernels. The important optimization lesson was to prefer full 3D kernels over many small do k launches. Derivative timing now scales well between 1 GPU and 2 GPUs for the validation case.

Convection

Primary production file: convec.f90. The older 1D/2D FFT reference files were removed from the release because they were temporary comparison material.

The production convection path is GPU-enabled and uses the validated GPU implementation by default. Convection remains one of the largest single-GPU costs in the current validation case, but it scales reasonably when the slab is split across two GPUs.

Projection

Projection is GPU-enabled and now a small fraction of the timestep. The retained LESGO_PROJECT_STAGE_TIMING switch exists only for detailed timing attribution. The optimized path uses packed/overlapped halo behavior by default rather than a web of small public switches.

Boundary And Auxiliary Flow Modules

Files such as wallstress.f90, inflow.f90, shifted_inflow.f90, sponge.f90, coriolis.f90, and rmsdiv.f90 have GPU coverage for timestep-relevant loops. Initialization, parsing, and output remain CPU-side where they do not affect timestep performance.