Torch Distribute 环境变量

#misc

As it will become much clearer later, a few environment variables are used to glue everything together: WORLD_SIZE, WORLD_RANK and LOCAL_RANK. Think them as fancy names for “total number of GPUs in your cluster”, “the ID of a GPU at the cluster level”, and “the ID of a GPU at a node level”. As you might guess, they are the identifications of the processes so to keep them communicating with each other during the life span of your training job.