Skip to content

SGS And Stresses

The SGS module was one of the largest GPU porting targets because it contains repeated full-domain tensor work plus MPI halo exchange.

Primary Files

File Role
sgs_stag_util.f90 SGS tensor construction, dynamic model helpers, calc_Sij, Nu_t, tau handling
divstress_uv.f90 Divergence of horizontal stress components
divstress_w.f90 Divergence of vertical stress components
std_dynamic.f90, scaledep_dynamic.f90 Dynamic SGS model support
interpolag_Ssim.f90, interpolag_Sdep.f90, lagrange_Ssim.f90, lagrange_Sdep.f90 Lagrangian dynamic model support

Implemented GPU Changes

Area Change
calc_Sij GPU path with explicit CUDA Fortran interior kernel option
Nu_t and tau loops GPU kernels for repeated full-domain operations
Tau halo Combined contiguous device halo path for nproc==2
dwdz halo Device-buffer path audited; further micro-variants were not beneficial
Stage timing Non-strict timing mode avoids charging queued work to the wrong stage
Default half-channel SGS path sgs_model=5 validation exercises the Lagrangian scale-dependent path, including the F_NN, F_QN, F_MM, and F_LM update work

Default Half-Channel Validation Note

The 128^3 default half-channel case uses the Lagrangian scale-dependent SGS model after the dynamic model initializes. For that validation, the GPU path covers the runtime loops in sgs_stag_util.f90, scaledep_dynamic.f90, interpolag_Sdep.f90, and lagrange_Sdep.f90. The retained optimized defaults are the explicit calc_Sij interior kernel and the combined two-rank tau halo; diagnostic timers remain off unless explicitly enabled.

Retained Controls

Switch Purpose
LESGO_SGS_HALO_COMBINED Fallback-safe control for combined tau halo
LESGO_SGS_CALCSIJ_EXPLICIT Fallback-safe control for explicit calc_Sij interior path
LESGO_SGS_STAGE_TIMING Enable SGS diagnostic timing
LESGO_SGS_STRICT_SYNC Restore strict sync behavior for debugging

Do not change SGS formulas or array bounds while optimizing communication. For SGS changes, validate divergence, kinetic energy, bottom wall stress, mean velocity, Reynolds stresses, SGS model/stress build timing, divstress timing, and tau halo behavior on 2 MPI / 2 GPU.