kongen kongen
首页
导航站
  • 学习教程

    • Opencv教程
    • C++基础教程
    • C++_Primer教程
    • CUDA编程
  • Opencv
  • CNN
  • 技术文档
  • GitHub技巧
  • Nodejs
  • 博客搭建
  • 面试题库

    • HTML
    • CSS
    • jQuery
    • Vue
    • 零碎
  • 面试心得

    • 杂言碎语
  • 十架道路

    • 十架七言系列
    • 基督徒生活观
    • 上帝的蓝图
  • 摘抄收录

    • ☆ 励志鸡汤
    • ❀ 人间烟火
  • 读书笔记

    • 《小狗钱钱》
    • 《穷爸爸富爸爸》
    • 《聪明人使用方格笔记本》
  • 学习
  • 面试
  • 心情杂货
  • 友情链接
关于
  • 网站
  • 资源
  • Vue资源
  • 分类
  • 标签
  • 归档
GitHub (opens new window)

Kongen

你好呀(✪ω✪)
首页
导航站
  • 学习教程

    • Opencv教程
    • C++基础教程
    • C++_Primer教程
    • CUDA编程
  • Opencv
  • CNN
  • 技术文档
  • GitHub技巧
  • Nodejs
  • 博客搭建
  • 面试题库

    • HTML
    • CSS
    • jQuery
    • Vue
    • 零碎
  • 面试心得

    • 杂言碎语
  • 十架道路

    • 十架七言系列
    • 基督徒生活观
    • 上帝的蓝图
  • 摘抄收录

    • ☆ 励志鸡汤
    • ❀ 人间烟火
  • 读书笔记

    • 《小狗钱钱》
    • 《穷爸爸富爸爸》
    • 《聪明人使用方格笔记本》
  • 学习
  • 面试
  • 心情杂货
  • 友情链接
关于
  • 网站
  • 资源
  • Vue资源
  • 分类
  • 标签
  • 归档
GitHub (opens new window)
  • 第一章-CUDA简介
  • 第二章CUDA编程模型概述
  • 第三章编程接口
  • 第四章硬件实现
  • 第五章性能指南
  • 附录A支持CUDA的设备列表
  • 附录B对C++语言扩展的详细描述
  • 附录C协作组
  • 附录D-CUDA动态并行
  • 附录E虚拟内存管理
  • 附录F流序内存分配
  • 附录G图内存结点
  • 附录H数学方法
  • 附录I_C++语言支持
  • 附录J纹理获取
  • 附录K_CUDA计算能力
  • 附录L_CUDA底层驱动API
  • 附录M_CUDA环境变量
  • 附录N_CUDA的统一内存
  • Readme
  • CUDA编程教程
kongen
2025-02-08

附录M_CUDA环境变量

# 附录M_CUDA环境变量

下表列出了 CUDA 环境变量。 与多进程服务相关的环境变量记录在 GPU 部署和管理指南的多进程服务部分。

Table 18. CUDA Environment Variables
Variable Values Description
Device Enumeration and Properties
CUDA_VISIBLE_DEVICES A comma-separated sequence of GPU identifiers

MIG support: MIG-<GPU-UUID>/<GPU instance ID>/<compute instance ID>
GPU identifiers are given as integer indices or as UUID strings. GPU UUID strings should follow the same format as given by nvidia-smi, such as GPU-8932f937-d72c-4106-c12f-20bd9faed9f6. However, for convenience, abbreviated forms are allowed; simply specify enough digits from the beginning of the GPU UUID to uniquely identify that GPU in the target system. For example, CUDA_VISIBLE_DEVICES=GPU-8932f937 may be a valid way to refer to the above GPU UUID, assuming no other GPU in the system shares this prefix.

Only the devices whose index is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence. If one of the indices is invalid, only the devices whose index precedes the invalid index are visible to CUDA applications. For example, setting CUDA_VISIBLE_DEVICES to 2,1 causes device 0 to be invisible and device 2 to be enumerated before device 1. Setting CUDA_VISIBLE_DEVICES to 0,2,-1,1 causes devices 0 and 2 to be visible and device 1 to be invisible.

MIG format starts with MIG keyword and GPU UUID should follow the same format as given by nvidia-smi. For example, MIG-GPU-8932f937-d72c-4106-c12f-20bd9faed9f6/1/2. Only single MIG instance enumeration is supported.
CUDA_MANAGED_FORCE_DEVICE_ALLOC 0 or 1 (default is 0) Forces the driver to place all managed allocations in device memory.
CUDA_DEVICE_ORDER FASTEST_FIRST, PCI_BUS_ID, (default is FASTEST_FIRST) FASTEST_FIRST causes CUDA to enumerate the available devices in fastest to slowest order using a simple heuristic. PCI_BUS_ID orders devices by PCI bus ID in ascending order.
Compilation
CUDA_CACHE_DISABLE 0 or 1 (default is 0) Disables caching (when set to 1) or enables caching (when set to 0) for just-in-time-compilation. When disabled, no binary code is added to or retrieved from the cache.
CUDA_CACHE_PATH filepath Specifies the folder where the just-in-time compiler caches binary codes; the default values are:
  • on Windows, %APPDATA%\NVIDIA\ComputeCache
  • on Linux, ~/.nv/ComputeCache
CUDA_CACHE_MAXSIZE integer (default is 268435456 (256 MiB) and maximum is 4294967296 (4 GiB)) Specifies the size in bytes of the cache used by the just-in-time compiler. Binary codes whose size exceeds the cache size are not cached. Older binary codes are evicted from the cache to make room for newer binary codes if needed.
CUDA_FORCE_PTX_JIT 0 or 1 (default is 0) When set to 1, forces the device driver to ignore any binary code embedded in an application (see Application Compatibility) and to just-in-time compile embedded PTX code instead. If a kernel does not have embedded PTX code, it will fail to load. This environment variable can be used to validate that PTX code is embedded in an application and that its just-in-time compilation works as expected to guarantee application forward compatibility with future architectures (see Just-in-Time Compilation).
CUDA_DISABLE_PTX_JIT 0 or 1 (default is 0) When set to 1, disables the just-in-time compilation of embedded PTX code and use the compatible binary code embedded in an application (see Application Compatibility). If a kernel does not have embedded binary code or the embedded binary was compiled for an incompatible architecture, then it will fail to load. This environment variable can be used to validate that an application has the compatible SASS code generated for each kernel.(see Binary Compatibility).
Execution
CUDA_LAUNCH_BLOCKING 0 or 1 (default is 0) Disables (when set to 1) or enables (when set to 0) asynchronous kernel launches.
CUDA_DEVICE_MAX_CONNECTIONS 1 to 32 (default is 8) Sets the number of compute and copy engine concurrent connections (work queues) from the host to each device of compute capability 3.5 and above.
CUDA_AUTO_BOOST 0 or 1 Overrides the autoboost behavior set by the --auto-boost-default option of nvidia-smi. If an application requests via this environment variable a behavior that is different from nvidia-smi's, its request is honored if there is no other application currently running on the same GPU that successfully requested a different behavior, otherwise it is ignored.
cuda-gdb (on Linux platform)
CUDA_DEVICE_WAITS_ON_EXCEPTION 0 or 1 (default is 0) When set to 1, a CUDA application will halt when a device exception occurs, allowing a debugger to be attached for further debugging.
MPS service (on Linux platform)
CUDA_DEVICE_DEFAULT_PERSISTING_L2_CACHE_PERCENTAGE_LIMIT Percentage value (between 0 - 100, default is 0) Devices of compute capability 8.x allow, a portion of L2 cache to be set-aside for persisting data accesses to global memory. When using CUDA MPS service, the set-aside size can only be controlled using this environment variable, before starting CUDA MPS control daemon. I.e., the environment variable should be set before running the command nvidia-cuda-mps-control -d.

编辑 (opens new window)
附录L_CUDA底层驱动API
附录N_CUDA的统一内存

← 附录L_CUDA底层驱动API 附录N_CUDA的统一内存→

最近更新
01
附录L_CUDA底层驱动API
02-08
02
附录K_CUDA计算能力
02-08
03
附录J纹理获取
02-08
更多文章>
Theme by Vdoing | Copyright © 2024-2025 Kongen | MIT License
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式
×