deisa.ray.comm module¶

class deisa.ray.comm.Comm(*args, **kwargs)[source]¶

Bases: Protocol

barrier() → None[source]¶: Block until all ranks reach this barrier.

rank: int¶

world_size: int¶

class deisa.ray.comm.MPICommAdapter(comm)[source]¶

Bases: object

Adapter exposing an MPI communicator via the shared Comm protocol.

barrier() → None[source]¶: Block until all MPI ranks reach this barrier.

class deisa.ray.comm.NoOpComm(rank: int = 0, world_size: int = 1)[source]¶

Bases: object

Fallback communicator that no-ops synchronization calls.

barrier() → None[source]¶: No-op barrier for single-process setups.

class deisa.ray.comm.TorchDistComm(*, rank: int, world_size: int)[source]¶

Bases: object

Torch distributed communicator implementing the Comm protocol.

barrier() → None[source]¶: Block until all Torch distributed ranks reach this barrier.

deisa.ray.comm.init_gloo_comm(world_size: int, rank: int, master_addr: str = '127.0.0.1', master_port: int = 29500, timeout_s: int = 120) → TorchDistComm[source]¶

Set up a Gloo communicator backed by a TCP store.

Parameters:

world_size (int) – Number of ranks participating in the communicator.
rank (int) – Rank ID of the current process.
master_addr (str, optional) – Hostname or IP address of the master rendezvous node. Defaults to "127.0.0.1".
master_port (int, optional) – Port of the master rendezvous node. Defaults to 29500.
timeout_s (int, optional) – Timeout (seconds) for rendezvous setup. Defaults to 120.

Returns:

Wrapper around the initialized PyTorch process group.

Return type:

TorchDistComm