{"id":1161315,"date":"2025-01-13T19:15:07","date_gmt":"2025-01-13T11:15:07","guid":{"rendered":"https:\/\/docs.pingcode.com\/ask\/ask-ask\/1161315.html"},"modified":"2025-01-13T19:15:09","modified_gmt":"2025-01-13T11:15:09","slug":"%e5%a6%82%e4%bd%95%e7%94%a8gpu%e8%b7%91python%e7%a8%8b%e5%ba%8f","status":"publish","type":"post","link":"https:\/\/docs.pingcode.com\/ask\/1161315.html","title":{"rendered":"\u5982\u4f55\u7528gpu\u8dd1python\u7a0b\u5e8f"},"content":{"rendered":"<p style=\"text-align:center;\" ><img decoding=\"async\" src=\"https:\/\/cdn-kb.worktile.com\/kb\/wp-content\/uploads\/2024\/04\/25202605\/505d6436-8fd4-4569-99ba-8175036e86ec.webp\" alt=\"\u5982\u4f55\u7528gpu\u8dd1python\u7a0b\u5e8f\" \/><\/p>\n<p><p> <strong>\u5982\u4f55\u7528GPU\u8dd1Python\u7a0b\u5e8f\uff1a\u4f7f\u7528\u6df1\u5ea6\u5b66\u4e60\u5e93\u3001\u9002\u914dCUDA\u73af\u5883\u3001\u5229\u7528GPU\u52a0\u901f\u5e76\u884c\u8ba1\u7b97\u3001\u4f18\u5316\u4ee3\u7801\u6027\u80fd<\/strong><\/p>\n<\/p>\n<p><p>\u5728Python\u4e2d\u4f7f\u7528GPU\u6765\u8fd0\u884c\u7a0b\u5e8f\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u8ba1\u7b97\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5904\u7406\u6df1\u5ea6\u5b66\u4e60\u548c\u5927\u89c4\u6a21\u6570\u636e\u8ba1\u7b97\u65f6\u3002<strong>\u4f7f\u7528\u6df1\u5ea6\u5b66\u4e60\u5e93<\/strong>\u5982TensorFlow\u6216PyTorch\u3001<strong>\u9002\u914dCUDA\u73af\u5883<\/strong>\u3001<strong>\u5229\u7528GPU\u52a0\u901f\u5e76\u884c\u8ba1\u7b97<\/strong>\u3001<strong>\u4f18\u5316\u4ee3\u7801\u6027\u80fd<\/strong>\u662f\u5b9e\u73b0\u8fd9\u4e00\u76ee\u6807\u7684\u5173\u952e\u6b65\u9aa4\u3002\u8fd9\u91cc\u6211\u4eec\u8be6\u7ec6\u8ba8\u8bba\u5982\u4f55\u914d\u7f6e\u548c\u4f7f\u7528GPU\u6765\u8dd1Python\u7a0b\u5e8f\uff0c\u7279\u522b\u662f\u5982\u4f55\u914d\u7f6eCUDA\u73af\u5883\u3002<\/p>\n<\/p>\n<p><h3>\u4e00\u3001\u914d\u7f6eCUDA\u73af\u5883<\/h3>\n<\/p>\n<p><p>CUDA\u662fNVIDIA\u63d0\u4f9b\u7684\u5e76\u884c\u8ba1\u7b97\u67b6\u6784\uff0c\u4f7f\u5f00\u53d1\u8005\u80fd\u591f\u5229\u7528GPU\u8fdb\u884c\u901a\u7528\u8ba1\u7b97\u3002\u914d\u7f6eCUDA\u73af\u5883\u662f\u4f7f\u7528GPU\u52a0\u901fPython\u7a0b\u5e8f\u7684\u7b2c\u4e00\u6b65\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5CUDA Toolkit<\/strong>\uff1a\u9996\u5148\uff0c\u60a8\u9700\u8981\u4eceNVIDIA\u7684\u5b98\u65b9\u7f51\u7ad9\u4e0b\u8f7d\u5e76\u5b89\u88c5\u4e0e\u60a8\u7684GPU\u548c\u64cd\u4f5c\u7cfb\u7edf\u76f8\u517c\u5bb9\u7684CUDA Toolkit\u3002\u5b89\u88c5\u8fc7\u7a0b\u4e2d\u53ef\u4ee5\u9009\u62e9\u9ed8\u8ba4\u9009\u9879\uff0c\u5e76\u786e\u4fdd\u5b89\u88c5\u4e86CUDA\u548ccuDNN\uff08\u6df1\u5ea6\u5b66\u4e60\u5e93\u7684\u52a0\u901f\u5e93\uff09\u3002<\/p>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf<\/strong>\uff1a\u5b89\u88c5\u5b8c\u6210\u540e\uff0c\u9700\u8981\u5c06CUDA\u7684\u8def\u5f84\u6dfb\u52a0\u5230\u7cfb\u7edf\u7684\u73af\u5883\u53d8\u91cf\u4e2d\u3002\u5728Windows\u7cfb\u7edf\u4e0b\uff0c\u53ef\u4ee5\u901a\u8fc7\u201c\u7cfb\u7edf\u5c5e\u6027 -&gt; \u9ad8\u7ea7\u7cfb\u7edf\u8bbe\u7f6e -&gt; \u73af\u5883\u53d8\u91cf\u201d\u6765\u6dfb\u52a0\u3002\u5728Linux\u7cfb\u7edf\u4e0b\uff0c\u53ef\u4ee5\u5728<code>~\/.bashrc<\/code>\u6216<code>~\/.bash_profile<\/code>\u4e2d\u6dfb\u52a0\u4ee5\u4e0b\u5185\u5bb9\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">export PATH=\/usr\/local\/cuda\/bin:$PATH<\/p>\n<p>export LD_LIBRARY_PATH=\/usr\/local\/cuda\/lib64:$LD_LIBRARY_PATH<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u9a8c\u8bc1\u5b89\u88c5<\/strong>\uff1a\u6253\u5f00\u547d\u4ee4\u884c\u7ec8\u7aef\uff0c\u8f93\u5165<code>nvcc --version<\/code>\uff0c\u5982\u679c\u770b\u5230CUDA\u7248\u672c\u4fe1\u606f\uff0c\u8bf4\u660e\u5b89\u88c5\u6210\u529f\u3002\u8fd8\u53ef\u4ee5\u8f93\u5165<code>nvidia-smi<\/code>\u6765\u67e5\u770bGPU\u7684\u4f7f\u7528\u60c5\u51b5\u548c\u9a71\u52a8\u4fe1\u606f\u3002<\/p>\n<\/p>\n<\/li>\n<\/ol>\n<p><h3>\u4e8c\u3001\u4f7f\u7528\u6df1\u5ea6\u5b66\u4e60\u5e93<\/h3>\n<\/p>\n<p><p>\u6df1\u5ea6\u5b66\u4e60\u5e93\u5982TensorFlow\u548cPyTorch\u63d0\u4f9b\u4e86\u7b80\u4fbf\u7684\u65b9\u6cd5\u6765\u5229\u7528GPU\u8fdb\u884c\u52a0\u901f\u3002\u8fd9\u91cc\u5206\u522b\u4ecb\u7ecd\u8fd9\u4e24\u4e2a\u5e93\u7684\u57fa\u672c\u7528\u6cd5\u3002<\/p>\n<\/p>\n<p><h4>1. TensorFlow<\/h4>\n<\/p>\n<p><p>TensorFlow\u662f\u4e00\u4e2a\u7531\u8c37\u6b4c\u5f00\u53d1\u7684\u5f00\u6e90\u6df1\u5ea6\u5b66\u4e60\u6846\u67b6\uff0c\u5e7f\u6cdb\u5e94\u7528\u4e8e<a href=\"https:\/\/docs.pingcode.com\/ask\/59192.html\" target=\"_blank\">\u673a\u5668\u5b66\u4e60<\/a>\u7814\u7a76\u548c\u751f\u4ea7\u73af\u5883\u4e2d\u3002\u8981\u4f7f\u7528GPU\u8fd0\u884cTensorFlow\u7a0b\u5e8f\uff0c\u786e\u4fdd\u5b89\u88c5\u4e86GPU\u7248\u672c\u7684TensorFlow\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5TensorFlow-GPU<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">pip install tensorflow-gpu<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u67e5\u770b\u53ef\u7528\u7684GPU\u8bbe\u5907<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import tensorflow as tf<\/p>\n<p>print(&quot;Num GPUs Av<a href=\"https:\/\/docs.pingcode.com\/blog\/59162.html\" target=\"_blank\">AI<\/a>lable: &quot;, len(tf.config.experimental.list_physical_devices(&#39;GPU&#39;)))<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u7b80\u5355\u793a\u4f8b<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import tensorflow as tf<\/p>\n<h2><strong>Create a constant tensor and print its value<\/strong><\/h2>\n<p>a = tf.constant(2.0)<\/p>\n<p>b = tf.constant(3.0)<\/p>\n<p>c = a + b<\/p>\n<p>print(c)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>TensorFlow\u4f1a\u81ea\u52a8\u68c0\u6d4b\u53ef\u7528\u7684GPU\uff0c\u5e76\u5c06\u8ba1\u7b97\u4efb\u52a1\u5206\u914d\u5230GPU\u4e0a\u3002\u5982\u679c\u5e0c\u671b\u6307\u5b9a\u4f7f\u7528\u7279\u5b9a\u7684GPU\uff0c\u53ef\u4ee5\u4f7f\u7528<code>tf.device<\/code>\u4e0a\u4e0b\u6587\u7ba1\u7406\u5668\u3002<\/p>\n<\/p>\n<p><h4>2. PyTorch<\/h4>\n<\/p>\n<p><p>PyTorch\u662f\u4e00\u4e2a\u7531Facebook\u5f00\u53d1\u7684\u5f00\u6e90\u6df1\u5ea6\u5b66\u4e60\u6846\u67b6\uff0c\u4ee5\u5176\u52a8\u6001\u8ba1\u7b97\u56fe\u548c\u6613\u7528\u6027\u53d7\u5230\u5e7f\u6cdb\u6b22\u8fce\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5PyTorch-GPU<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">pip install torch<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u67e5\u770b\u53ef\u7528\u7684GPU\u8bbe\u5907<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import torch<\/p>\n<p>print(&quot;Is CUDA available: &quot;, torch.cuda.is_available())<\/p>\n<p>print(&quot;CUDA device count: &quot;, torch.cuda.device_count())<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u7b80\u5355\u793a\u4f8b<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import torch<\/p>\n<h2><strong>Create tensors and move them to GPU<\/strong><\/h2>\n<p>x = torch.tensor([1.0, 2.0, 3.0]).cuda()<\/p>\n<p>y = torch.tensor([4.0, 5.0, 6.0]).cuda()<\/p>\n<p>z = x + y<\/p>\n<p>print(z)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>PyTorch\u63d0\u4f9b\u4e86<code>cuda()<\/code>\u65b9\u6cd5\uff0c\u53ef\u4ee5\u8f7b\u677e\u5730\u5c06\u5f20\u91cf\uff08tensor\uff09\u548c\u6a21\u578b\u79fb\u52a8\u5230GPU\u4e0a\u8fdb\u884c\u8ba1\u7b97\u3002<\/p>\n<\/p>\n<p><h3>\u4e09\u3001\u5229\u7528GPU\u52a0\u901f\u5e76\u884c\u8ba1\u7b97<\/h3>\n<\/p>\n<p><p>\u5728\u8bb8\u591a\u79d1\u5b66\u8ba1\u7b97\u548c\u6570\u636e\u5904\u7406\u4efb\u52a1\u4e2d\uff0c\u5229\u7528GPU\u8fdb\u884c\u5e76\u884c\u8ba1\u7b97\u53ef\u4ee5\u5927\u5927\u63d0\u5347\u6027\u80fd\u3002\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u7528\u7684\u5e76\u884c\u8ba1\u7b97\u5e93\u548c\u65b9\u6cd5\u3002<\/p>\n<\/p>\n<p><h4>1. CuPy<\/h4>\n<\/p>\n<p><p>CuPy\u662f\u4e00\u4e2a\u7528\u4e8eGPU\u52a0\u901f\u7684Python\u5e93\uff0c\u5176API\u4e0eNumPy\u517c\u5bb9\uff0c\u4f7f\u5f97\u5728GPU\u4e0a\u6267\u884cNumPy\u7684\u64cd\u4f5c\u53d8\u5f97\u975e\u5e38\u7b80\u5355\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5CuPy<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">pip install cupy<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u7b80\u5355\u793a\u4f8b<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import cupy as cp<\/p>\n<h2><strong>Create arrays on the GPU<\/strong><\/h2>\n<p>x = cp.array([1.0, 2.0, 3.0])<\/p>\n<p>y = cp.array([4.0, 5.0, 6.0])<\/p>\n<p>z = x + y<\/p>\n<p>print(z)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>CuPy\u7684API\u8bbe\u8ba1\u4e0eNumPy\u975e\u5e38\u76f8\u4f3c\uff0c\u56e0\u6b64\u5bf9\u4e8e\u5df2\u6709\u7684NumPy\u4ee3\u7801\uff0c\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u8fc1\u79fb\u5230CuPy\u4e0a\u6765\u5229\u7528GPU\u52a0\u901f\u3002<\/p>\n<\/p>\n<p><h4>2. Numba<\/h4>\n<\/p>\n<p><p>Numba\u662f\u4e00\u4e2a\u5c06Python\u4ee3\u7801\u7f16\u8bd1\u4e3a\u673a\u5668\u4ee3\u7801\u7684\u5373\u65f6\u7f16\u8bd1\u5668\uff0c\u652f\u6301\u4f7f\u7528CUDA\u8fdb\u884cGPU\u52a0\u901f\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5Numba<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">pip install numba<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u7b80\u5355\u793a\u4f8b<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">from numba import cuda<\/p>\n<p>import numpy as np<\/p>\n<h2><strong>Define a CUDA kernel<\/strong><\/h2>\n<p>@cuda.jit<\/p>\n<p>def add_kernel(x, y, out):<\/p>\n<p>    idx = cuda.grid(1)<\/p>\n<p>    if idx &lt; x.size:<\/p>\n<p>        out[idx] = x[idx] + y[idx]<\/p>\n<h2><strong>Host code<\/strong><\/h2>\n<p>N = 1000000<\/p>\n<p>x = np.arange(N, dtype=np.float32)<\/p>\n<p>y = np.arange(N, dtype=np.float32)<\/p>\n<p>out = np.zeros_like(x)<\/p>\n<h2><strong>Copy data to device<\/strong><\/h2>\n<p>d_x = cuda.to_device(x)<\/p>\n<p>d_y = cuda.to_device(y)<\/p>\n<p>d_out = cuda.device_array_like(x)<\/p>\n<h2><strong>Launch kernel<\/strong><\/h2>\n<p>threads_per_block = 256<\/p>\n<p>blocks_per_grid = (x.size + (threads_per_block - 1)) \/\/ threads_per_block<\/p>\n<p>add_kernel[blocks_per_grid, threads_per_block](d_x, d_y, d_out)<\/p>\n<h2><strong>Copy result back to host<\/strong><\/h2>\n<p>d_out.copy_to_host(out)<\/p>\n<p>print(out)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>Numba\u652f\u6301\u5728GPU\u4e0a\u8fd0\u884c\u81ea\u5b9a\u4e49\u7684CUDA\u5185\u6838\u51fd\u6570\uff0c\u9002\u7528\u4e8e\u9700\u8981\u8fdb\u884c\u590d\u6742\u5e76\u884c\u8ba1\u7b97\u7684\u573a\u666f\u3002<\/p>\n<\/p>\n<p><h3>\u56db\u3001\u4f18\u5316\u4ee3\u7801\u6027\u80fd<\/h3>\n<\/p>\n<p><p>\u4e3a\u4e86\u5145\u5206\u53d1\u6325GPU\u7684\u6027\u80fd\uff0c\u4f18\u5316\u4ee3\u7801\u662f\u5fc5\u4e0d\u53ef\u5c11\u7684\u6b65\u9aa4\u3002\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u7528\u7684\u4f18\u5316\u6280\u5de7\u3002<\/p>\n<\/p>\n<p><h4>1. \u6570\u636e\u9884\u5904\u7406\u548c\u52a0\u8f7d<\/h4>\n<\/p>\n<p><p>\u5728\u6df1\u5ea6\u5b66\u4e60\u4e2d\uff0c\u6570\u636e\u9884\u5904\u7406\u548c\u52a0\u8f7d\u5e38\u5e38\u6210\u4e3a\u6027\u80fd\u74f6\u9888\u3002\u901a\u8fc7\u4f7f\u7528\u6570\u636e\u52a0\u8f7d\u5668\u548c\u6570\u636e\u589e\u5f3a\u6280\u672f\uff0c\u53ef\u4ee5\u663e\u8457\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>TensorFlow\u6570\u636e\u52a0\u8f7d<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import tensorflow as tf<\/p>\n<p>def parse_function(proto):<\/p>\n<p>    # Define your parsing function here<\/p>\n<p>    pass<\/p>\n<p>dataset = tf.data.TFRecordDataset(filenames)<\/p>\n<p>dataset = dataset.map(parse_function)<\/p>\n<p>dataset = dataset.batch(batch_size)<\/p>\n<p>dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>PyTorch\u6570\u636e\u52a0\u8f7d<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">from torch.utils.data import DataLoader, Dataset<\/p>\n<p>class CustomDataset(Dataset):<\/p>\n<p>    def __init__(self, data):<\/p>\n<p>        self.data = data<\/p>\n<p>    def __len__(self):<\/p>\n<p>        return len(self.data)<\/p>\n<p>    def __getitem__(self, idx):<\/p>\n<p>        return self.data[idx]<\/p>\n<p>dataset = CustomDataset(data)<\/p>\n<p>dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=4)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>\u901a\u8fc7\u4f7f\u7528\u5f02\u6b65\u6570\u636e\u52a0\u8f7d\u548c\u6570\u636e\u589e\u5f3a\uff0c\u53ef\u4ee5\u907f\u514dGPU\u7b49\u5f85\u6570\u636e\u52a0\u8f7d\u7684\u65f6\u95f4\uff0c\u4ece\u800c\u63d0\u9ad8\u8bad\u7ec3\u6548\u7387\u3002<\/p>\n<\/p>\n<p><h4>2. \u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3<\/h4>\n<\/p>\n<p><p>\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3\u662f\u4e00\u79cd\u901a\u8fc7\u4f7f\u7528\u534a\u7cbe\u5ea6\u6d6e\u70b9\u6570\uff08FP16\uff09\u6765\u52a0\u901f\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u8bad\u7ec3\u7684\u65b9\u6cd5\u3002\u5728\u4fdd\u6301\u6a21\u578b\u7cbe\u5ea6\u7684\u540c\u65f6\uff0c\u53ef\u4ee5\u663e\u8457\u51cf\u5c11\u663e\u5b58\u5360\u7528\u548c\u8ba1\u7b97\u65f6\u95f4\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>TensorFlow\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">from tensorflow.keras import mixed_precision<\/p>\n<p>policy = mixed_precision.Policy(&#39;mixed_float16&#39;)<\/p>\n<p>mixed_precision.set_global_policy(policy)<\/p>\n<p>model = create_model()<\/p>\n<p>model.compile(optimizer=&#39;adam&#39;, loss=&#39;sparse_categorical_crossentropy&#39;, metrics=[&#39;accuracy&#39;])<\/p>\n<p>model.fit(train_dataset, epochs=10)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>PyTorch\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">from torch.cuda.amp import autocast, GradScaler<\/p>\n<p>model = create_model().cuda()<\/p>\n<p>optimizer = torch.optim.Adam(model.parameters())<\/p>\n<p>scaler = GradScaler()<\/p>\n<p>for data, target in dataloader:<\/p>\n<p>    optimizer.zero_grad()<\/p>\n<p>    with autocast():<\/p>\n<p>        output = model(data.cuda())<\/p>\n<p>        loss = criterion(output, target.cuda())<\/p>\n<p>    scaler.scale(loss).backward()<\/p>\n<p>    scaler.step(optimizer)<\/p>\n<p>    scaler.update()<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>\u4f7f\u7528\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u8bad\u7ec3\u901f\u5ea6\uff0c\u540c\u65f6\u51cf\u5c11\u663e\u5b58\u4f7f\u7528\u3002<\/p>\n<\/p>\n<p><h3>\u4e94\u3001\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790<\/h3>\n<\/p>\n<p><p>\u5728\u4f7f\u7528GPU\u8fd0\u884cPython\u7a0b\u5e8f\u65f6\uff0c\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790\u4e5f\u662f\u975e\u5e38\u91cd\u8981\u7684\u6b65\u9aa4\u3002\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u7528\u7684\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790\u5de5\u5177\u3002<\/p>\n<\/p>\n<p><h4>1. NVIDIA Nsight<\/h4>\n<\/p>\n<p><p>NVIDIA Nsight\u662f\u4e00\u4e2a\u5168\u9762\u7684\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790\u5de5\u5177\uff0c\u652f\u6301CUDA\u5e94\u7528\u7a0b\u5e8f\u7684\u6027\u80fd\u5256\u6790\u548c\u8c03\u8bd5\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u5b89\u88c5NVIDIA Nsight<\/strong>\uff1a\u4eceNVIDIA\u5b98\u7f51\u4e0b\u8f7d\u5b89\u88c5Nsight Compute\u548cNsight Systems\u3002<\/p>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u4f7f\u7528Nsight\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">nsight-sys python your_script.py<\/p>\n<p>nsight-cu-cli --target-processes all python your_script.py<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>Nsight\u53ef\u4ee5\u5e2e\u52a9\u60a8\u6df1\u5165\u5206\u6790CUDA\u5185\u6838\u7684\u6027\u80fd\u74f6\u9888\uff0c\u5e76\u63d0\u4f9b\u8be6\u7ec6\u7684\u6027\u80fd\u62a5\u544a\u3002<\/p>\n<\/p>\n<p><h4>2. TensorBoard<\/h4>\n<\/p>\n<p><p>TensorBoard\u662fTensorFlow\u63d0\u4f9b\u7684\u53ef\u89c6\u5316\u5de5\u5177\uff0c\u53ef\u4ee5\u7528\u6765\u76d1\u63a7\u8bad\u7ec3\u8fc7\u7a0b\u548c\u6027\u80fd\u6307\u6807\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>\u4f7f\u7528TensorBoard<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">import tensorflow as tf<\/p>\n<p>log_dir = &quot;logs\/fit\/&quot; + datetime.datetime.now().strftime(&quot;%Y%m%d-%H%M%S&quot;)<\/p>\n<p>tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)<\/p>\n<p>model.fit(train_dataset, epochs=10, callbacks=[tensorboard_callback])<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>\u542f\u52a8TensorBoard<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-shell\">tensorboard --logdir=logs\/fit<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>TensorBoard\u63d0\u4f9b\u4e86\u4e30\u5bcc\u7684\u53ef\u89c6\u5316\u529f\u80fd\uff0c\u53ef\u4ee5\u5e2e\u52a9\u60a8\u76d1\u63a7\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u7684\u5404\u79cd\u6307\u6807\u3002<\/p>\n<\/p>\n<p><h3>\u516d\u3001\u5e38\u89c1\u95ee\u9898\u548c\u89e3\u51b3\u65b9\u6cd5<\/h3>\n<\/p>\n<p><p>\u5728\u4f7f\u7528GPU\u8fd0\u884cPython\u7a0b\u5e8f\u65f6\uff0c\u53ef\u80fd\u4f1a\u9047\u5230\u4e00\u4e9b\u5e38\u89c1\u95ee\u9898\u3002\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u89c1\u95ee\u9898\u53ca\u5176\u89e3\u51b3\u65b9\u6cd5\u3002<\/p>\n<\/p>\n<p><h4>1. CUDA\u7248\u672c\u517c\u5bb9\u6027\u95ee\u9898<\/h4>\n<\/p>\n<p><p>\u5728\u5b89\u88c5CUDA Toolkit\u548c\u6df1\u5ea6\u5b66\u4e60\u5e93\u65f6\uff0c\u786e\u4fdd\u5b83\u4eec\u7684\u7248\u672c\u662f\u517c\u5bb9\u7684\u3002\u53ef\u4ee5\u53c2\u8003NVIDIA\u548c\u5404\u5927\u6df1\u5ea6\u5b66\u4e60\u5e93\u7684\u5b98\u65b9\u6587\u6863\uff0c\u4ee5\u83b7\u53d6\u517c\u5bb9\u7684\u7248\u672c\u4fe1\u606f\u3002<\/p>\n<\/p>\n<p><h4>2. \u5185\u5b58\u4e0d\u8db3\u95ee\u9898<\/h4>\n<\/p>\n<p><p>\u5728\u8bad\u7ec3\u5927\u578b\u6a21\u578b\u65f6\uff0c\u53ef\u80fd\u4f1a\u9047\u5230\u663e\u5b58\u4e0d\u8db3\u7684\u95ee\u9898\u3002\u53ef\u4ee5\u5c1d\u8bd5\u4ee5\u4e0b\u65b9\u6cd5\uff1a<\/p>\n<\/p>\n<ol>\n<li><strong>\u51cf\u5c11\u6279\u91cf\u5927\u5c0f<\/strong>\uff1a\u51cf\u5c0f\u6279\u91cf\u5927\u5c0f\u53ef\u4ee5\u51cf\u5c11\u6bcf\u6b21\u8bad\u7ec3\u6240\u9700\u7684\u663e\u5b58\u3002<\/li>\n<li><strong>\u4f7f\u7528\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3<\/strong>\uff1a\u6df7\u5408\u7cbe\u5ea6\u8bad\u7ec3\u53ef\u4ee5\u663e\u8457\u51cf\u5c11\u663e\u5b58\u5360\u7528\u3002<\/li>\n<li><strong>\u6a21\u578b\u526a\u679d<\/strong>\uff1a\u901a\u8fc7\u526a\u679d\u6280\u672f\u51cf\u5c11\u6a21\u578b\u53c2\u6570\uff0c\u4ece\u800c\u51cf\u5c11\u663e\u5b58\u5360\u7528\u3002<\/li>\n<\/ol>\n<p><h4>3. \u591aGPU\u8bad\u7ec3<\/h4>\n<\/p>\n<p><p>\u5728\u591aGPU\u73af\u5883\u4e0b\uff0c\u53ef\u4ee5\u4f7f\u7528\u6570\u636e\u5e76\u884c\u6216\u6a21\u578b\u5e76\u884c\u6280\u672f\u6765\u52a0\u901f\u8bad\u7ec3\u3002<\/p>\n<\/p>\n<ol>\n<li>\n<p><strong>TensorFlow\u591aGPU\u8bad\u7ec3<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">strategy = tf.distribute.MirroredStrategy()<\/p>\n<p>with strategy.scope():<\/p>\n<p>    model = create_model()<\/p>\n<p>    model.compile(optimizer=&#39;adam&#39;, loss=&#39;sparse_categorical_crossentropy&#39;, metrics=[&#39;accuracy&#39;])<\/p>\n<p>model.fit(train_dataset, epochs=10)<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<li>\n<p><strong>PyTorch\u591aGPU\u8bad\u7ec3<\/strong>\uff1a<\/p>\n<\/p>\n<p><pre><code class=\"language-python\">model = torch.nn.DataParallel(create_model())<\/p>\n<p>model = model.cuda()<\/p>\n<p><\/code><\/pre>\n<\/p>\n<\/li>\n<\/ol>\n<p><p>\u591aGPU\u8bad\u7ec3\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u8bad\u7ec3\u901f\u5ea6\u3002<\/p>\n<\/p>\n<p><h3>\u4e03\u3001\u603b\u7ed3<\/h3>\n<\/p>\n<p><p>\u4f7f\u7528GPU\u8fd0\u884cPython\u7a0b\u5e8f\u53ef\u4ee5\u663e\u8457\u63d0\u5347\u8ba1\u7b97\u6027\u80fd\uff0c\u7279\u522b\u662f\u5728\u5904\u7406\u6df1\u5ea6\u5b66\u4e60\u548c\u5927\u89c4\u6a21\u6570\u636e\u8ba1\u7b97\u65f6\u3002\u901a\u8fc7\u914d\u7f6eCUDA\u73af\u5883\u3001\u4f7f\u7528\u6df1\u5ea6\u5b66\u4e60\u5e93\u3001\u5229\u7528GPU\u52a0\u901f\u5e76\u884c\u8ba1\u7b97\u548c\u4f18\u5316\u4ee3\u7801\u6027\u80fd\uff0c\u53ef\u4ee5\u5145\u5206\u53d1\u6325GPU\u7684\u8ba1\u7b97\u80fd\u529b\u3002\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\uff0c\u8fd8\u9700\u8981\u7ed3\u5408\u5177\u4f53\u95ee\u9898\u8fdb\u884c\u8c03\u8bd5\u548c\u6027\u80fd\u5206\u6790\uff0c\u4ee5\u786e\u4fdd\u4ee3\u7801\u7684\u9ad8\u6548\u8fd0\u884c\u3002\u5e0c\u671b\u672c\u6587\u80fd\u5bf9\u60a8\u5728\u4f7f\u7528GPU\u8fd0\u884cPython\u7a0b\u5e8f\u65f6\u63d0\u4f9b\u6709\u4ef7\u503c\u7684\u53c2\u8003\u3002<\/p>\n<\/p>\n<h2><strong>\u76f8\u5173\u95ee\u7b54FAQs\uff1a<\/strong><\/h2>\n<p> <strong>\u5982\u4f55\u9009\u62e9\u9002\u5408\u7684GPU\u6765\u8fd0\u884cPython\u7a0b\u5e8f\uff1f<\/strong><br \/>\u9009\u62e9\u9002\u5408\u7684GPU\u4e3b\u8981\u53d6\u51b3\u4e8e\u4f60\u7684\u8ba1\u7b97\u9700\u6c42\u548c\u9884\u7b97\u3002\u5982\u679c\u4f60\u7684Python\u7a0b\u5e8f\u6d89\u53ca\u6df1\u5ea6\u5b66\u4e60\u6216\u5927\u89c4\u6a21\u6570\u636e\u5904\u7406\uff0c\u5efa\u8bae\u9009\u62e9NVIDIA\u7684CUDA\u652f\u6301\u7684\u663e\u5361\uff0c\u56e0\u4e3a\u5b83\u4eec\u5728\u8fd9\u65b9\u9762\u8868\u73b0\u4f18\u8d8a\u3002\u5bf9\u4e8e\u8f83\u5c0f\u7684\u9879\u76ee\uff0c\u5165\u95e8\u7ea7\u7684GPU\u5982NVIDIA GTX\u7cfb\u5217\u53ef\u80fd\u5c31\u8db3\u591f\u4e86\u3002\u6bd4\u8f83\u4e0d\u540cGPU\u7684\u6027\u80fd\u548c\u4ef7\u683c\uff0c\u786e\u4fdd\u9009\u62e9\u4e00\u6b3e\u80fd\u6ee1\u8db3\u4f60\u9700\u6c42\u7684\u4ea7\u54c1\u3002<\/p>\n<p><strong>\u5728Python\u4e2d\u5982\u4f55\u914d\u7f6eGPU\u652f\u6301\u7684\u5e93\uff1f<\/strong><br \/>\u8981\u5728Python\u4e2d\u4f7f\u7528GPU\uff0c\u901a\u5e38\u9700\u8981\u5b89\u88c5\u4e13\u95e8\u7684\u5e93\uff0c\u5982TensorFlow\u6216PyTorch\u3002\u8fd9\u4e9b\u5e93\u90fd\u63d0\u4f9b\u4e86\u8be6\u7ec6\u7684\u6587\u6863\uff0c\u6307\u5bfc\u5982\u4f55\u5728GPU\u4e0a\u8fd0\u884c\u4ee3\u7801\u3002\u4f60\u9700\u8981\u786e\u4fdd\u5b89\u88c5\u6b63\u786e\u7248\u672c\u7684CUDA\u548ccuDNN\uff0c\u540c\u65f6\u5728\u4ee3\u7801\u4e2d\u8bbe\u7f6e\u8bbe\u5907\u53c2\u6570\uff0c\u4f8b\u5982\u4f7f\u7528<code>torch.cuda.is_available()<\/code>\u6765\u68c0\u67e5CUDA\u662f\u5426\u53ef\u7528\u5e76\u6307\u5b9a\u8bbe\u5907\u3002<\/p>\n<p><strong>\u8fd0\u884cPython\u7a0b\u5e8f\u65f6\uff0c\u5982\u4f55\u89e3\u51b3GPU\u5185\u5b58\u4e0d\u8db3\u7684\u95ee\u9898\uff1f<\/strong><br \/>GPU\u5185\u5b58\u4e0d\u8db3\u662f\u8fd0\u884c\u5927\u578b\u6a21\u578b\u65f6\u5e38\u89c1\u7684\u95ee\u9898\u3002\u53ef\u4ee5\u901a\u8fc7\u51e0\u79cd\u65b9\u5f0f\u6765\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff1a\u51cf\u5c0f\u6a21\u578b\u7684\u6279\u5904\u7406\u5927\u5c0f\uff0c\u4f7f\u7528\u66f4\u8f7b\u91cf\u7ea7\u7684\u6a21\u578b\u67b6\u6784\uff0c\u6216\u901a\u8fc7\u6570\u636e\u751f\u6210\u5668\u6765\u5206\u6279\u52a0\u8f7d\u6570\u636e\u3002\u6b64\u5916\uff0c\u4f7f\u7528\u5de5\u5177\u5982TensorBoard\u53ef\u4ee5\u5e2e\u52a9\u76d1\u63a7GPU\u7684\u5185\u5b58\u4f7f\u7528\u60c5\u51b5\uff0c\u4ece\u800c\u627e\u51fa\u4f18\u5316\u7684\u65b9\u5411\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"\u5982\u4f55\u7528GPU\u8dd1Python\u7a0b\u5e8f\uff1a\u4f7f\u7528\u6df1\u5ea6\u5b66\u4e60\u5e93\u3001\u9002\u914dCUDA\u73af\u5883\u3001\u5229\u7528GPU\u52a0\u901f\u5e76\u884c\u8ba1\u7b97\u3001\u4f18\u5316\u4ee3\u7801\u6027\u80fd \u5728Py [&hellip;]","protected":false},"author":3,"featured_media":1161324,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[37],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/posts\/1161315"}],"collection":[{"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/comments?post=1161315"}],"version-history":[{"count":"1","href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/posts\/1161315\/revisions"}],"predecessor-version":[{"id":1161327,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/posts\/1161315\/revisions\/1161327"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/media\/1161324"}],"wp:attachment":[{"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/media?parent=1161315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/categories?post=1161315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/docs.pingcode.com\/wp-json\/wp\/v2\/tags?post=1161315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}