public class NvidiaGPUPluginForRuntimeV2 extends Object implements DevicePlugin, DevicePluginScheduler
| Modifier and Type | Class and Description | 
|---|---|
| static class  | NvidiaGPUPluginForRuntimeV2.DeviceLinkTypeDifferent type of link. | 
| class  | NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutorA shell wrapper class easy for test. | 
| Modifier and Type | Field and Description | 
|---|---|
| static org.slf4j.Logger | LOG | 
| static String | NV_RESOURCE_NAME | 
| static String | TOPOLOGY_POLICY_ENV_KEYThe container can set this environment variable. | 
| static String | TOPOLOGY_POLICY_PACKSchedule policy that prefer the faster GPU-GPU communication. | 
| static String | TOPOLOGY_POLICY_SPREADSchedule policy that prefer the faster CPU-GPU communication. | 
| Constructor and Description | 
|---|
| NvidiaGPUPluginForRuntimeV2() | 
| Modifier and Type | Method and Description | 
|---|---|
| Set<Device> | allocateDevices(Set<Device> availableDevices,
               int count,
               Map<String,String> envs)Called when allocating devices. | 
| void | basicSchedule(Set<Device> allocation,
             int count,
             Set<Device> availableDevices) | 
| int | computeCostOfDevices(Device[] devices)The cost function used to calculate costs of a sub set of devices. | 
| Map<Integer,List<Map.Entry<Set<Device>,Integer>>> | getCostTable() | 
| Map<String,Integer> | getDevicePairToWeight() | 
| Set<Device> | getDevices()Called when update node resource. | 
| DeviceRegisterRequest | getRegisterRequestInfo()Called first when device plugin framework wants to register. | 
| void | initCostTable() | 
| boolean | isTopoInitialized() | 
| DeviceRuntimeSpec | onDevicesAllocated(Set<Device> allocatedDevices,
                  YarnRuntimeType yarnRuntime)Asking how these devices should be prepared/used
 before/when container launch. | 
| void | onDevicesReleased(Set<Device> releasedDevices)Called after device released. | 
| void | parseTopo(String topo,
         Map<String,Integer> deviceLinkToWeight)A typical sample topo output:
     GPU0  GPU1  GPU2  GPU3  CPU Affinity
 GPU0  X  PHB  SOC  SOC  0-31
 GPU1 PHB  X   SOC  SOC  0-31
 GPU2 SOC SOC  X    PHB  0-31
 GPU3 SOC SOC  PHB   X   0-31
 Legend:
   X   = Self
   SOC  = Connection traversing PCIe as well as the SMP link between
   CPU sockets(e.g. | 
| void | setPathOfGpuBinary(String pOfGpuBinary) | 
| void | setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor) | 
| void | topologyAwareSchedule(Set<Device> allocation,
                     int count,
                     Map<String,String> envs,
                     Set<Device> availableDevices,
                     Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)Topology Aware schedule algorithm. | 
public static final org.slf4j.Logger LOG
public static final String NV_RESOURCE_NAME
public static final String TOPOLOGY_POLICY_ENV_KEY
public static final String TOPOLOGY_POLICY_PACK
public static final String TOPOLOGY_POLICY_SPREAD
public DeviceRegisterRequest getRegisterRequestInfo() throws Exception
DevicePlugingetRegisterRequestInfo in interface DevicePluginDeviceRegisterRequestExceptionpublic Set<Device> getDevices() throws Exception
DevicePlugingetDevices in interface DevicePluginDevice, TreeSet recommendedExceptionpublic DeviceRuntimeSpec onDevicesAllocated(Set<Device> allocatedDevices, YarnRuntimeType yarnRuntime) throws Exception
DevicePluginVolumeSpec to let the
 framework to create volume before running container.onDevicesAllocated in interface DevicePluginallocatedDevices - A set of allocated Device.yarnRuntime - Indicate which runtime YARN will use
        Could be RUNTIME_DEFAULT or RUNTIME_DOCKER
        in DeviceRuntimeSpec constants. The default means YARN's
        non-docker container runtime is used. The docker means YARN's
        docker container runtime is used.DeviceRuntimeSpec description about environment,
 VolumeSpec, MountVolumeSpec. etcExceptionpublic void onDevicesReleased(Set<Device> releasedDevices) throws Exception
DevicePluginonDevicesReleased in interface DevicePluginreleasedDevices - A set of released devicesExceptionpublic Set<Device> allocateDevices(Set<Device> availableDevices, int count, Map<String,String> envs)
DevicePluginSchedulerallocateDevices in interface DevicePluginScheduleravailableDevices - Devices allowed to be chosen from.count - Number of device to be allocated.envs - Environment variables of the container.Device allocated@VisibleForTesting
public void initCostTable()
                                      throws IOException
IOException@VisibleForTesting public int computeCostOfDevices(Device[] devices)
@VisibleForTesting public void topologyAwareSchedule(Set<Device> allocation, int count, Map<String,String> envs, Set<Device> availableDevices, Map<Integer,List<Map.Entry<Set<Device>,Integer>>> cTable)
@VisibleForTesting public void basicSchedule(Set<Device> allocation, int count, Set<Device> availableDevices)
public void parseTopo(String topo, Map<String,Integer> deviceLinkToWeight)
@VisibleForTesting public void setPathOfGpuBinary(String pOfGpuBinary)
@VisibleForTesting public void setShellExecutor(NvidiaGPUPluginForRuntimeV2.NvidiaCommandExecutor shellExecutor)
@VisibleForTesting public boolean isTopoInitialized()
@VisibleForTesting public Map<Integer,List<Map.Entry<Set<Device>,Integer>>> getCostTable()
Copyright © 2008–2024 Apache Software Foundation. All rights reserved.