Pressure Solver¶

The pressure solver is the most communication-sensitive part of the multi-GPU branch. The current implementation keeps the pressure equation and zero-mode handling intact while moving the dominant work to GPU kernels, cuFFT, and a pressure-specific transpose-Thomas path.

Primary Files¶

File	Role
`press_stag_array.f90`	Pressure RHS, pressure halos, cuFFT orchestration
`tridag_array.f90`	Tridiagonal solve and pressure transpose-Thomas helper
`mpi_transpose_mod.f90`	MPI transpose support

Implemented GPU Changes¶

Area	Change
Forward/inverse FFT	cuFFT batched plans
RHS assembly	GPU kernels with combined RHS halo path
Tridiagonal solve	GPU Thomas path with cached coefficients
Multi-GPU pressure	nproc==2 specialized pressure transpose-Thomas helper
Output path	Direct Thomas output avoids a separate pack-out stage where possible
Timing	Clean production timing separated from detailed diagnostic timing

Retained Controls¶

Switch	Purpose
`LESGO_PRESS_RHS_HALO_COMBINED`	Fallback-safe control for combined RHS halo
`LESGO_PRESS_TRANSPOSE_GENERIC`	Force old generic transpose helper
`LESGO_PRESS_DIRECT_THOMAS_OUT`	Control direct Thomas output path
`LESGO_PRESS_STAGE_TIMING`	Enable pressure stage timing
`LESGO_PRESS_TRANSPOSE_TIMING`	Enable transpose helper timing

Do not replace Thomas with PCR, CR, or SPIKE-style solvers unless that is a deliberate new algorithmic project. The current production goal is to preserve the original Thomas solve and optimize ownership/layout around it.