deisa.ray.comm module

class deisa.ray.comm.Comm(*args, **kwargs)[source]

Bases: Protocol

barrier() None[source]

Block until all ranks reach this barrier.

rank: int
world_size: int
class deisa.ray.comm.MPICommAdapter(comm)[source]

Bases: object

Adapter exposing an MPI communicator via the shared Comm protocol.

barrier() None[source]

Block until all MPI ranks reach this barrier.

class deisa.ray.comm.NoOpComm(rank: int = 0, world_size: int = 1)[source]

Bases: object

Fallback communicator that no-ops synchronization calls.

barrier() None[source]

No-op barrier for single-process setups.

class deisa.ray.comm.TorchDistComm(*, rank: int, world_size: int)[source]

Bases: object

Torch distributed communicator implementing the Comm protocol.

barrier() None[source]

Block until all Torch distributed ranks reach this barrier.

deisa.ray.comm.init_gloo_comm(world_size: int, rank: int, master_addr: str = '127.0.0.1', master_port: int = 29500, timeout_s: int = 120) TorchDistComm[source]

Set up a Gloo communicator backed by a TCP store.

Parameters:
  • world_size (int) – Number of ranks participating in the communicator.

  • rank (int) – Rank ID of the current process.

  • master_addr (str, optional) – Hostname or IP address of the master rendezvous node. Defaults to "127.0.0.1".

  • master_port (int, optional) – Port of the master rendezvous node. Defaults to 29500.

  • timeout_s (int, optional) – Timeout (seconds) for rendezvous setup. Defaults to 120.

Returns:

Wrapper around the initialized PyTorch process group.

Return type:

TorchDistComm