PyTorch教程-6.2. 参数管理-电子发烧友网

一旦我们选择了一个架构并设置了我们的超参数，我们就进入训练循环，我们的目标是找到最小化损失函数的参数值。训练后，我们将需要这些参数来进行未来的预测。此外，我们有时会希望提取参数以在其他上下文中重用它们，将我们的模型保存到磁盘以便它可以在其他软件中执行，或者进行检查以期获得科学理解。

大多数时候，我们将能够忽略参数声明和操作的具体细节，依靠深度学习框架来完成繁重的工作。然而，当我们远离具有标准层的堆叠架构时，我们有时需要陷入声明和操作参数的困境。在本节中，我们将介绍以下内容：

访问用于调试、诊断和可视化的参数。

跨不同模型组件共享参数。

import torch
from torch import nn

from mxnet import init, np, npx
from mxnet.gluon import nn

npx.set_np()

import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

import tensorflow as tf

我们首先关注具有一个隐藏层的 MLP。

net = nn.Sequential(nn.LazyLinear(8),
          nn.ReLU(),
          nn.LazyLinear(1))

X = torch.rand(size=(2, 4))
net(X).shape

torch.Size([2, 1])

net = nn.Sequential()
net.add(nn.Dense(8, activation='relu'))
net.add(nn.Dense(1))
net.initialize() # Use the default initialization method

X = np.random.uniform(size=(2, 4))
net(X).shape

(2, 1)

net = nn.Sequential([nn.Dense(8), nn.relu, nn.Dense(1)])

X = jax.random.uniform(d2l.get_key(), (2, 4))
params = net.init(d2l.get_key(), X)
net.apply(params, X).shape

(2, 1)

net = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(4, activation=tf.nn.relu),
  tf.keras.layers.Dense(1),
])

X = tf.random.uniform((2, 4))
net(X).shape

TensorShape([2, 1])

6.2.1. 参数访问

让我们从如何从您已知的模型中访问参数开始。

当通过类定义模型时Sequential，我们可以首先通过索引模型来访问任何层，就好像它是一个列表一样。每个层的参数都方便地位于其属性中。

When a model is defined via the Sequential class, we can first access any layer by indexing into the model as though it were a list. Each layer’s parameters are conveniently located in its attribute.

Flax and JAX decouple the model and the parameters as you might have observed in the models defined previously. When a model is defined via the Sequential class, we first need to initialize the network to generate the parameters dictionary. We can access any layer’s parameters through the keys of this dictionary.

我们可以如下检查第二个全连接层的参数。

net[2].state_dict()

OrderedDict([('weight',
       tensor([[-0.2523, 0.2104, 0.2189, -0.0395, -0.0590, 0.3360, -0.0205, -0.1507]])),
       ('bias', tensor([0.0694]))])

net[1].params

dense1_ (
 Parameter dense1_weight (shape=(1, 8), dtype=float32)
 Parameter dense1_bias (shape=(1,), dtype=float32)
)

params['params']['layers_2']

FrozenDict({
  kernel: Array([[-0.20739523],
      [ 0.16546965],
      [-0.03713543],
      [-0.04860032],
      [-0.2102929 ],
      [ 0.163712 ],
      [ 0.27240783],
      [-0.4046879 ]], dtype=float32),
  bias: Array([0.], dtype=float32),
})

net.layers[2].weights

[,
 ]

我们可以看到这个全连接层包含两个参数，分别对应于该层的权重和偏差。

6.2.1.1. 目标参数

请注意，每个参数都表示为参数类的一个实例。要对参数做任何有用的事情，我们首先需要访问基础数值。做这件事有很多种方法。有些更简单，有些则更通用。以下代码从返回参数类实例的第二个神经网络层中提取偏差，并进一步访问该参数的值。

type(net[2].bias), net[2].bias.data

(torch.nn.parameter.Parameter, tensor([0.0694]))

参数是复杂的对象，包含值、梯度和附加信息。这就是为什么我们需要显式请求该值。

除了值之外，每个参数还允许我们访问梯度。因为我们还没有为这个网络调用反向传播，所以它处于初始状态。

net[2].weight.grad == None

True

type(net[1].bias), net[1].bias.data()

(mxnet.gluon.parameter.Parameter, array([0.]))

Parameters are complex objects, containing values, gradients, and additional information. That is why we need to request the value explicitly.

In addition to the value, each parameter also allows us to access the gradient. Because we have not invoked backpropagation for this network yet, it is in its initial state.

net[1].weight.grad()

array([[0., 0., 0., 0., 0., 0., 0., 0.]])

bias = params['params']['layers_2']['bias']
type(bias), bias

(jaxlib.xla_extension.Array, Array([0.], dtype=float32))

Unlike the other frameworks, JAX does not keep a track of the gradients over the neural network parameters, instead the parameters and the network are decoupled. It allows the user to express their computation as a Python function, and use the grad transformation for the same purpose.

type(net.layers[2].weights[1]), tf.convert_to_tensor(net.layers[2].weights[1])

(tensorflow.python.ops.resource_variable_ops.ResourceVariable,
 )

6.2.1.2. 一次所有参数

当我们需要对所有参数执行操作时，一个一个地访问它们会变得乏味。当我们使用更复杂的模块（例如，嵌套模块）时，情况会变得特别笨拙，因为我们需要递归遍历整个树以提取每个子模块的参数。下面我们演示访问所有层的参数。

[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

net.collect_params()

sequential0_ (
 Parameter dense0_weight (shape=(8, 4), dtype=float32)
 Parameter dense0_bias (shape=(8,), dtype=float32)
 Parameter dense1_weight (shape=(1, 8), dtype=float32)
 Parameter dense1_bias (shape=(1,), dtype=float32)
)

jax.tree_util.tree_map(lambda x: x.shape, params)

FrozenDict({
  params: {
    layers_0: {
      bias: (8,),
      kernel: (4, 8),
    },
    layers_2: {
      bias: (1,),
      kernel: (8, 1),
    },
  },
})

net.get_weights()

[array([[-0.42006454, 0.6094975 , -0.30087888, 0.42557293],
    [-0.26464057, -0.5518195 , 0.5476741 , 0.31728595],
    [-0.5571538 , -0.33794886, -0.05885679, 0.05435681],
    [ 0.28541476, 0.8276871 , -0.7665834 , 0.5791599 ]],
    dtype=float32),
 array([0., 0., 0., 0.], dtype=float32),
 array([[-0.52124995],
    [-0.22314149],
    [ 0.20780373],
    [ 0.6839919 ]], dtype=float32),
 array([0.], dtype=float32)]

6.2.2. 绑定参数

通常，我们希望跨多个层共享参数。让我们看看如何优雅地做到这一点。下面我们分配一个全连接层，然后专门使用它的参数来设置另一层的参数。这里我们需要net(X)在访问参数之前运行前向传播。

# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
          shared, nn.ReLU(),
          shared, nn.ReLU(),
          nn.LazyLinear(1))

net(X)
# Check whether the parameters are the same
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])

net = nn.Sequential()
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(8, activation='relu')
net.add(nn.Dense(8, activation='relu'),
    shared,
    nn.Dense(8, activation='relu', params=shared.params),
    nn.Dense(10))
net.initialize()

X = np.random.uniform(size=(2, 20))

net(X)
# Check whether the parameters are the same
print(net[1].weight.data()[0] == net[2].weight.data()[0])
net[1].weight.data()[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[1].weight.data()[0] == net[2].weight.data()[0])

[ True True True True True True True True]
[ True True True True True True True True]

# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(8)
net = nn.Sequential([nn.Dense(8), nn.relu,
           shared, nn.relu,
           shared, nn.relu,
           nn.Dense(1)])

params = net.init(jax.random.PRNGKey(d2l.get_seed()), X)

# Check whether the parameters are different
print(len(params['params']) == 3)

True

# tf.keras behaves a bit differently. It removes the duplicate layer
# automatically
shared = tf.keras.layers.Dense(4, activation=tf.nn.relu)
net = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  shared,
  shared,
  tf.keras.layers.Dense(1),
])

net(X)
# Check whether the parameters are different
print(len(net.layers) == 3)

True

这个例子表明第二层和第三层的参数是绑定的。它们不仅相等，而且由完全相同的张量表示。因此，如果我们改变其中一个参数，另一个参数也会改变。

您可能想知道，当参数绑定时梯度会发生什么变化？由于模型参数包含梯度，因此在反向传播时将第二个隐藏层和第三个隐藏层的梯度相加。

You might wonder, when parameters are tied what happens to the gradients? Since the model parameters contain gradients, the gradients of the second hidden layer and the third hidden layer are added together during backpropagation.

6.2.3. 概括

我们有几种方法来访问和绑定模型参数。

6.2.4. 练习

使用第 6.1 节NestMLP中定义的模型并访问各个层的参数。

构造一个包含共享参数层的 MLP 并对其进行训练。在训练过程中，观察每一层的模型参数和梯度。

为什么共享参数是个好主意？

声明：本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人，不代表电子发烧友网立场。文章及其配图仅供工程师学习之用，如有内容侵权或者其他违规问题，请联系本站处理。举报投诉

pytorch

pytorch

+关注

关注
2

文章
808

浏览量
13395

在Windows 10上安装WICED Studio 6.2没有足够的可用磁盘空间

disk space I am also encountered the same problem on version 6.2. There is more than 600GB free, so

发表于 09-18 14:29

Pytorch入门之的基本操作

Pytorch入门之基本操作

发表于 05-22 17:15

PyTorch如何入门

PyTorch 入门实战（一）——Tensor

发表于 06-01 09:58

Pytorch AI语音助手

想做一个Pytorch AI语音助手，有没有好的思路呀？

发表于 03-06 13:00

如何安装TensorFlow2 Pytorch？

如何安装TensorFlow2 Pytorch？

发表于 03-07 07:32

如何往星光2板子里装pytorch？

如题,想先gpu版本的pytorch只安装cpu版本的pytorch,pytorch官网提供了基于conda和pip两种安装方式。因为咱是risc架构没对应的conda，而使用pip安装提示也没有

发表于 09-12 06:30

pytorch模型转换需要注意的事项有哪些？

什么是JIT（torch.jit）？答：JIT（Just-In-Time）是一组编译工具，用于弥合PyTorch研究与生产之间的差距。它允许创建可以在不依赖Python解释器的情况下运行的模型

发表于 09-18 08:05

6.2亿部设备采用 Red Bend 移动软件管理客户端

6.2亿部设备采用 Red Bend 移动软件管理客户端马萨诸塞州沃尔瑟姆2009年11月30日电 /美通社亚洲/ -- 移动软件管理 (MSM) 领域的

发表于 11-30 17:46 •904次阅读

基于PyTorch的深度学习入门教程之PyTorch简单知识

本文参考PyTorch官网的教程，分为五个基本模块来介绍PyTorch。为了避免文章过长，这五个模块分别在五篇博文中介绍。 Part1：PyTorch简单知识 Part2：PyTorch

发表于 02-16 15:20 •2324次阅读

PyTorch教程6.2之参数管理

电子发烧友网站提供《PyTorch教程6.2之参数管理.pdf》资料免费下载

发表于 06-05 15:24 •0次下载

PyTorch教程13.7之参数服务器

电子发烧友网站提供《PyTorch教程13.7之参数服务器.pdf》资料免费下载

发表于 06-05 14:22 •0次下载

PyTorch教程19.1之什么是超参数优化

电子发烧友网站提供《PyTorch教程19.1之什么是超参数优化.pdf》资料免费下载

发表于 06-05 10:25 •0次下载

PyTorch教程19.2之超参数优化API

电子发烧友网站提供《PyTorch教程19.2之超参数优化API.pdf》资料免费下载

发表于 06-05 10:27 •0次下载

PyTorch教程19.4之多保真超参数优化

电子发烧友网站提供《PyTorch教程19.4之多保真超参数优化.pdf》资料免费下载

发表于 06-05 10:45 •0次下载

pycharm如何调用pytorch

引言 PyTorch是一个开源的机器学习库，广泛用于计算机视觉、自然语言处理等领域。PyCharm是一个流行的Python集成开发环境（IDE），提供了代码编辑、调试、测试等功能。将PyTorch

发表于 08-01 15:41 •758次阅读

搜索历史

PyTorch教程-6.2. 参数管理

评论

在Windows 10上安装WICED Studio 6.2没有足够的可用磁盘空间

Pytorch入门之的基本操作

PyTorch如何入门

Pytorch AI语音助手

如何安装TensorFlow2 Pytorch？

如何往星光2板子里装pytorch？

pytorch模型转换需要注意的事项有哪些？

6.2亿部设备采用 Red Bend 移动软件管理客户端

基于PyTorch的深度学习入门教程之PyTorch简单知识

PyTorch教程6.2之参数管理

PyTorch教程13.7之参数服务器

PyTorch教程19.1之什么是超参数优化

PyTorch教程19.2之超参数优化API

PyTorch教程19.4之多保真超参数优化

pycharm如何调用pytorch