# 19.7. d2l API Document¶ Open the notebook in Colab Open the notebook in Colab Open the notebook in Colab

The implementations of the following members of the d2l package and sections where they are defined and explained can be found in the source file.

class d2l.mxnet.Accumulator(n)[source]

For accumulating sums over n variables.

class d2l.mxnet.AddNorm(dropout, **kwargs)[source]
forward(X, Y)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.AdditiveAttention(num_hiddens, dropout, **kwargs)[source]

Additive attention.

forward(queries, keys, values, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', nrows=1, ncols=1, figsize=3.5, 2.5)[source]

For plotting data in animation.

class d2l.mxnet.AttentionDecoder(**kwargs)[source]

The base attention-based decoder interface.

class d2l.mxnet.BERTEncoder(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, **kwargs)[source]
forward(tokens, segments, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.BERTModel(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000)[source]
forward(tokens, segments, valid_lens=None, pred_positions=None)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.BPRLoss(weight=None, batch_axis=0, **kwargs)[source]
forward(positive, negative)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.BananasDataset(is_train)[source]
class d2l.mxnet.CTRDataset(data_path, feat_mapper=None, defaults=None, min_threshold=4, num_feat=34)[source]
class d2l.mxnet.Decoder(**kwargs)[source]

The base decoder interface for the encoder-decoder architecture.

forward(X, state)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.DotProductAttention(dropout, **kwargs)[source]

Scaled dot product attention.

forward(queries, keys, values, valid_lens=None)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.Encoder(**kwargs)[source]

The base encoder interface for the encoder-decoder architecture.

forward(X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.EncoderBlock(num_hiddens, ffn_num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]
forward(X, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.EncoderDecoder(encoder, decoder, **kwargs)[source]

The base class for the encoder-decoder architecture.

forward(enc_X, dec_X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.HingeLossbRec(weight=None, batch_axis=0, **kwargs)[source]
forward(positive, negative, margin=1)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.MaskLM(vocab_size, num_hiddens, **kwargs)[source]
forward(X, pred_positions)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.MaskedSoftmaxCELoss(axis=- 1, sparse_label=True, from_logits=False, weight=None, batch_axis=0, **kwargs)[source]

The softmax cross-entropy loss with masks.

forward(pred, label, valid_len)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

class d2l.mxnet.MultiHeadAttention(num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]
forward(queries, keys, values, valid_lens)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.NextSentencePred(**kwargs)[source]
forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.PositionWiseFFN(ffn_num_hiddens, ffn_num_outputs, **kwargs)[source]
forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.PositionalEncoding(num_hiddens, dropout, max_len=1000)[source]
forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.RNNModel(rnn_layer, vocab_size, **kwargs)[source]

The RNN model.

forward(inputs, state)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.RNNModelScratch(vocab_size, num_hiddens, device, get_params, init_state, forward_fn)[source]

An RNN Model implemented from scratch.

class d2l.mxnet.RandomGenerator(sampling_weights)[source]

Draw a random int in [0, n] according to n sampling weights.

class d2l.mxnet.Residual(num_channels, use_1x1conv=False, strides=1, **kwargs)[source]

The Residual block of ResNet.

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.SNLIDataset(dataset, num_steps, vocab=None)[source]

A customized dataset to load the SNLI dataset.

class d2l.mxnet.Seq2SeqEncoder(vocab_size, embed_size, num_hiddens, num_layers, dropout=0, **kwargs)[source]

The RNN encoder for sequence to sequence learning.

forward(X, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

An iterator to load sequence data.

class d2l.mxnet.Timer[source]

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.mxnet.TokenEmbedding(embedding_name)[source]

Token Embedding.

class d2l.mxnet.TransformerEncoder(vocab_size, num_hiddens, ffn_num_hiddens, num_heads, num_layers, dropout, use_bias=False, **kwargs)[source]
forward(X, valid_lens, *args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments.

*argslist of NDArray

Input tensors.

class d2l.mxnet.VOCSegDataset(is_train, crop_size, voc_dir)[source]

A customized dataset to load VOC dataset.

filter(imgs)[source]

Returns a new dataset with samples filtered by the filter function fn.

Note that if the Dataset is the result of a lazily transformed one with transform(lazy=False), the filter is eagerly applied to the transformed samples without materializing the transformed result. That is, the transformation will be applied again whenever a sample is retrieved after filter().

fncallable

A filter function that takes a sample as input and returns a boolean. Samples that return False are discarded.

Dataset

The filtered dataset.

class d2l.mxnet.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Vocabulary for text.

d2l.mxnet.abs(x, out=None, **kwargs)

Calculate the absolute value element-wise.

xndarray or scalar

Input array.

outndarray or None, optional

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned.

absolutendarray

An ndarray containing the absolute value of each element in x. This is a scalar if x is a scalar.

>>> x = np.array([-1.2, 1.2])
>>> np.abs(x)
array([1.2, 1.2])

d2l.mxnet.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

d2l.mxnet.arange(start, stop=None, step=1, dtype=None, ctx=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

startnumber, optional

Start of interval. The interval includes this value. The default start value is 0.

stopnumber

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

stepnumber, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given.

dtypedtype

The type of the output array. The default is float32.

arangendarray

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

>>> np.arange(3)
array([0., 1., 2.])

>>> np.arange(3.0)
array([0., 1., 2.])

>>> np.arange(3,7)
array([3., 4., 5., 6.])

>>> np.arange(3,7,2)
array([3., 5.])

d2l.mxnet.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

d2l.mxnet.bleu(pred_seq, label_seq, k)[source]

Compute the BLEU.

d2l.mxnet.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper_left, bottom_right)

d2l.mxnet.box_corner_to_center(boxes)[source]

Convert from (upper_left, bottom_right) to (center, width, height)

d2l.mxnet.box_iou(boxes1, boxes2)[source]

Compute IOU between two sets of boxes of shape (N,4) and (M,4).

d2l.mxnet.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

d2l.mxnet.build_colormap2label()[source]

Build an RGB color to label mapping for segmentation.

d2l.mxnet.concat(seq, axis=0, out=None)

Join a sequence of arrays along an existing axis.

a1, a2, …sequence of array_like

The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).

axisint, optional

The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0.

outndarray, optional

If provided, the destination to place the result. The shape must be correct, matching that of what concatenate would have returned if no out argument were specified.

resndarray

The concatenated array.

split : Split array into a list of multiple sub-arrays of equal size. hsplit : Split array into multiple sub-arrays horizontally (column wise) vsplit : Split array into multiple sub-arrays vertically (row wise) dsplit : Split array into multiple sub-arrays along the 3rd axis (depth). stack : Stack a sequence of arrays along a new axis. hstack : Stack arrays in sequence horizontally (column wise) vstack : Stack arrays in sequence vertically (row wise) dstack : Stack arrays in sequence depth wise (along third dimension)

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1., 2.],
[3., 4.],
[5., 6.]])

>>> np.concatenate((a, b.T), axis=1)
array([[1., 2., 5.],
[3., 4., 6.]])

>>> np.concatenate((a, b), axis=None)
array([1., 2., 3., 4., 5., 6.])

d2l.mxnet.copyfile(filename, target_dir)[source]

Copy a file into a target directory.

d2l.mxnet.corr2d(X, K)[source]

Compute 2D cross-correlation.

d2l.mxnet.cos(x, out=None, **kwargs)

Cosine, element-wise.

xndarray or scalar

Angle, in radians ($$2 \pi$$ rad equals 360 degrees).

outndarray or None

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. The dtype of the output is the same as that of the input if the input is an ndarray.

yndarray or scalar

The corresponding cosine values. This is a scalar if x is a scalar.

This function only supports input type of float.

>>> np.cos(np.array([0, np.pi/2, np.pi]))
array([ 1.000000e+00, -4.371139e-08, -1.000000e+00])
>>> # Example of providing the optional output parameter
>>> out1 = np.array([0], dtype='f')
>>> out2 = np.cos(np.array([0.1]), out1)
>>> out2 is out1
True

d2l.mxnet.cosh(x, out=None, **kwargs)

Hyperbolic cosine, element-wise. Equivalent to 1/2 * (np.exp(x) + np.exp(-x)) and np.cos(1j*x).

xndarray or scalar

Input array or scalar.

outndarray or None

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. The dtype of the output is the same as that of the input if the input is an ndarray.

yndarray or scalar

The corresponding hyperbolic cosine values. This is a scalar if x is a scalar.

This function only supports input type of float.

>>> np.cosh(0)
1.0

d2l.mxnet.count_corpus(tokens)[source]

Count token frequencies.

class d2l.mxnet.defaultdict

defaultdict(default_factory[, …]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

copy() → a shallow copy of D.
default_factory

Factory for default value called by __missing__().

d2l.mxnet.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

d2l.mxnet.download_all()[source]

Download all files in the DATA_HUB.

d2l.mxnet.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

d2l.mxnet.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

d2l.mxnet.evaluate_accuracy_gpu(net, data_iter, device=None)[source]

Compute the accuracy for a model on a dataset using a GPU.

d2l.mxnet.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

d2l.mxnet.exp(x, out=None, **kwargs)

Calculate the exponential of all elements in the input array.

xndarray or scalar

Input values.

outndarray or None, optional

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned.

outndarray or scalar

Output array, element-wise exponential of x. This is a scalar if x is a scalar.

>>> np.exp(1)
2.718281828459045
>>> x = np.array([-1, 1, -2, 2])
>>> np.exp(x)
array([0.36787945, 2.7182817 , 0.13533528, 7.389056  ])

d2l.mxnet.eye(N, M=None, k=0, dtype=<class 'numpy.float32'>, **kwargs)

Return a 2-D array with ones on the diagonal and zeros elsewhere.

Nint

Number of rows in the output.

Mint, optional

Number of columns in the output. If None, defaults to N.

kint, optional

Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

dtypedata-type, optional

Data-type of the returned array.

Indarray of shape (N,M)

An array where all elements are equal to zero, except for the k-th diagonal, whose values are equal to one.

>>> np.eye(2, dtype=int)
array([[1, 0],
[0, 1]], dtype=int64)
>>> np.eye(3, k=1)
array([[0., 1., 0.],
[0., 0., 1.],
[0., 0., 0.]])

class d2l.mxnet.float32

Single-precision floating-point number type, compatible with C float. Character code: 'f'. Canonical name: np.single. Alias on this platform: np.float32: 32-bit-precision floating-point number type: sign bit, 8 bits exponent, 23 bits mantissa.

as_integer_ratio()

Return a pair of integers, whose ratio is exactly equal to the original floating point number, and with a positive denominator. Raise OverflowError on infinities and a ValueError on NaNs.

>>> np.single(10.0).as_integer_ratio()
(10, 1)
>>> np.single(0.0).as_integer_ratio()
(0, 1)
>>> np.single(-.25).as_integer_ratio()
(-1, 4)

d2l.mxnet.get_dataloader_workers()[source]

Use 4 processes to read the data except for Windows.

d2l.mxnet.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

d2l.mxnet.grad_clipping(net, theta)[source]

Clip the gradient.

class d2l.mxnet.int32

Signed integer type, compatible with C int. Character code: 'i'. Canonical name: np.intc. Alias on this platform: np.int32: 32-bit signed integer (-2147483648 to 2147483647).

d2l.mxnet.linreg(X, w, b)[source]

The linear regression model.

d2l.mxnet.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0, ctx=None)

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop]. The endpoint of the interval can optionally be excluded.

startreal number

The starting value of the sequence.

stopreal number

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

numint, optional

Number of samples to generate. Default is 50. Must be non-negative.

endpointbool, optional

If True, stop is the last sample. Otherwise, it is not included. Default is True.

retstepbool, optional

If True, return (samples, step), where step is the spacing between samples.

dtypedtype, optional

The type of the output array. If dtype is not given, infer the data type from the other input arguments.

axisint, optional

The axis in the result to store the samples. Relevant only if start or stop are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.

samplesndarray

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

stepfloat, optional

Only returned if retstep is True Size of spacing between samples.

arangeSimilar to linspace, but uses a step size (instead of the

number of samples).

>>> np.linspace(2.0, 3.0, num=5)
array([2.  , 2.25, 2.5 , 2.75, 3.  ])
>>> np.linspace(2.0, 3.0, num=5, endpoint=False)
array([2. , 2.2, 2.4, 2.6, 2.8])
>>> np.linspace(2.0, 3.0, num=5, retstep=True)
(array([2.  , 2.25, 2.5 , 2.75, 3.  ]), 0.25)


Graphical illustration:

>>> import matplotlib.pyplot as plt
>>> N = 8
>>> y = np.zeros(N)
>>> x1 = np.linspace(0, 10, N, endpoint=True)
>>> x2 = np.linspace(0, 10, N, endpoint=False)
>>> plt.plot(x1.asnumpy(), y.asnumpy(), 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.plot(x2.asnumpy(), (y + 0.5).asnumpy(), 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.ylim([-0.5, 1])
(-0.5, 1)
>>> plt.show()


This function differs from the original numpy.linspace in the following aspects:

• start and stop do not support list, numpy ndarray and mxnet ndarray

• axis could only be 0

• There could be an additional ctx argument to specify the device, e.g. the i-th GPU.

d2l.mxnet.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a Gluon data iterator.

d2l.mxnet.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

d2l.mxnet.load_data_bananas(batch_size)[source]

Load the bananas dataset.

d2l.mxnet.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

d2l.mxnet.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

d2l.mxnet.load_data_snli(batch_size, num_steps=50)[source]

Download the SNLI dataset and return data iterators and vocabulary.

d2l.mxnet.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

d2l.mxnet.load_data_voc(batch_size, crop_size)[source]

Download and load the VOC2012 semantic dataset.

d2l.mxnet.log(x, out=None, **kwargs)

Natural logarithm, element-wise. The natural logarithm log is the inverse of the exponential function, so that log(exp(x)) = x. The natural logarithm is logarithm in base e.

xndarray

Input value. Elements must be of real value.

outndarray or None, optional

A location into which the result is stored. If provided, it must have the same shape and dtype as input ndarray. If not provided or None, a freshly-allocated array is returned.

yndarray

The natural logarithm of x, element-wise. This is a scalar if x is a scalar.

Currently only supports data of real values and inf as input. Returns data of real value, inf, -inf and nan according to the input. This function differs from the original numpy.log in the following aspects: - Does not support complex number for now - Input type does not support Python native iterables(list, tuple, …). - out param: cannot perform auto broadcasting. out ndarray’s shape must be the same as the expected output. - out param: cannot perform auto type cast. out ndarray’s dtype must be the same as the expected output. - out param does not support scalar input case.

>>> a = np.array([1, np.exp(1), np.exp(2), 0], dtype=np.float64)
>>> np.log(a)
array([  0.,   1.,   2., -inf], dtype=float64)
>>> # Using the default float32 dtype leads to slightly different behavior
>>> a = np.array([1, np.exp(1), np.exp(2), 0])
>>> np.log(a)
array([  0.,  0.99999994,   2., -inf])
>>> np.log(1)
0.0

d2l.mxnet.masked_softmax(X, valid_lens)[source]

Perform softmax operation by masking elements on the last axis.

d2l.mxnet.match_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5)[source]

Assign ground-truth bounding boxes to anchor boxes similar to them.

d2l.mxnet.matmul(a, b, out=None)

Dot product of two arrays. Specifically,

• If both a and b are 1-D arrays, it is inner product of vectors

• If both a and b are 2-D arrays, it is matrix multiplication,

• If either a or b is 0-D (scalar), it is equivalent to multiply() and using np.multiply(a, b) or a * b is preferred.

• If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.

• If a is an N-D array and b is a 2-D array, it is a sum product over the last axis of a and the second-to-last axis of b:

dot(a, b)[i,j,k] = sum(a[i,j,:] * b[:,k])

andarray

First argument.

bndarray

Second argument.

outndarray, optional

Output argument. It must have the same shape and type as the expected output.

outputndarray

Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned. If out is given, then it is returned

>>> a = np.array(3)
>>> b = np.array(4)
>>> np.dot(a, b)
array(12.)


For 2-D arrays it is the matrix product:

>>> a = np.array([[1, 0], [0, 1]])
>>> b = np.array([[4, 1], [2, 2]])
>>> np.dot(a, b)
array([[4., 1.],
[2., 2.]])

>>> a = np.arange(3*4*5*6).reshape((3,4,5,6))
>>> b = np.arange(5*6)[::-1].reshape((6,5))
>>> np.dot(a, b)[2,3,2,2]
array(29884.)
>>> np.sum(a[2,3,2,:] * b[:,2])
array(29884.)

d2l.mxnet.meshgrid(*xi, **kwargs)[source]

Return coordinate matrices from coordinate vectors.

Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,…, xn.

x1, x2,…, xnndarrays

1-D arrays representing the coordinates of a grid.

indexing{‘xy’, ‘ij’}, optional

Cartesian (‘xy’, default) or matrix (‘ij’) indexing of output. See Notes for more details.

sparsebool, optional

If True a sparse grid is returned in order to conserve memory. Default is False. Please note that sparse=True is currently not supported.

copybool, optional

If False, a view into the original arrays are returned in order to conserve memory. Default is True. Please note that copy=False is currently not supported.

X1, X2,…, XNndarray

For vectors x1, x2,…, ‘xn’ with lengths Ni=len(xi) , return (N1, N2, N3,...Nn) shaped arrays if indexing=’ij’ or (N2, N1, N3,...Nn) shaped arrays if indexing=’xy’ with the elements of xi repeated to fill the matrix along the first dimension for x1, the second for x2 and so on.

This function supports both indexing conventions through the indexing keyword argument. Giving the string ‘ij’ returns a meshgrid with matrix indexing, while ‘xy’ returns a meshgrid with Cartesian indexing. In the 2-D case with inputs of length M and N, the outputs are of shape (N, M) for ‘xy’ indexing and (M, N) for ‘ij’ indexing. In the 3-D case with inputs of length M, N and P, outputs are of shape (N, M, P) for ‘xy’ indexing and (M, N, P) for ‘ij’ indexing. The difference is illustrated by the following code snippet:

xv, yv = np.meshgrid(x, y, sparse=False, indexing='ij')
for i in range(nx):
for j in range(ny):
# treat xv[i,j], yv[i,j]

xv, yv = np.meshgrid(x, y, sparse=False, indexing='xy')
for i in range(nx):
for j in range(ny):
# treat xv[j,i], yv[j,i]


In the 1-D and 0-D case, the indexing and sparse keywords have no effect.

d2l.mxnet.normal(loc=0.0, scale=1.0, size=None, dtype=None, ctx=None, out=None)[source]

Draw random samples from a normal (Gaussian) distribution.

Samples are distributed according to a normal distribution parametrized by loc (mean) and scale (standard deviation).

locfloat, optional

Mean (centre) of the distribution.

scalefloat, optional

Standard deviation (spread or “width”) of the distribution.

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a scalar tensor containing a single value is returned if loc and scale are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.

dtype{‘float16’, ‘float32’, ‘float64’}, optional

Data type of output samples. Default is ‘float32’

ctxContext, optional

Device context of output, default is current context.

outndarray, optional

Store output to an existing ndarray.

outndarray

Drawn samples from the parameterized normal distribution.

The probability density for the Gaussian distribution is

(19.7.1)$p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },$

where $$\mu$$ is the mean and $$\sigma$$ the standard deviation. The square of the standard deviation, $$\sigma^2$$, is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at $$x + \sigma$$ and $$x - \sigma$$ 2). This implies that numpy.random.normal is more likely to return samples lying close to the mean, rather than those far away.

1

Wikipedia, “Normal distribution”, https://en.wikipedia.org/wiki/Normal_distribution

2

P. R. Peebles Jr., “Central Limit Theorem” in “Probability, Random Variables and Random Signal Principles”, 4th ed., 2001, pp. 51, 51, 125.

>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> s = np.random.normal(mu, sigma, 1000)


Verify the mean and the variance:

>>> np.abs(mu - np.mean(s)) < 0.01
array(True)

d2l.mxnet.ones(shape, dtype=<class 'numpy.float32'>, order='C', ctx=None)

Return a new array of given shape and type, filled with ones. This function currently only supports storing multi-dimensional data in row-major (C-style).

shapeint or tuple of int

The shape of the empty array.

dtypestr or numpy.dtype, optional

An optional value type. Default is numpy.float32. Note that this behavior is different from NumPy’s ones function where float64 is the default value, because float32 is considered as the default data type in deep learning.

order{‘C’}, optional, default: ‘C’

How to store multi-dimensional data in memory, currently only row-major (C-style) is supported.

ctxContext, optional

An optional device context (default is the current default context).

outndarray

Array of ones with the given shape, dtype, and ctx.

>>> np.ones(5)
array([1., 1., 1., 1., 1.])

>>> np.ones((5,), dtype=int)
array([1, 1, 1, 1, 1], dtype=int64)

>>> np.ones((2, 1))
array([[1.],
[1.]])

>>> s = (2,2)
>>> np.ones(s)
array([[1., 1.],
[1., 1.]])

d2l.mxnet.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', figsize=3.5, 2.5, axes=None)[source]

Plot data points.

d2l.mxnet.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

d2l.mxnet.predict_ch8(prefix, num_preds, net, vocab, device)[source]

Generate new characters following the prefix.

d2l.mxnet.predict_seq2seq(net, src_sentence, src_vocab, tgt_vocab, num_steps, device, save_attention_weights=False)[source]

Predict for sequence to sequence.

d2l.mxnet.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

d2l.mxnet.rand(*size, **kwargs)[source]

Random values in a given shape.

Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1). Parameters ———- d0, d1, …, dn : int, optional

The dimensions of the returned array, should be all positive. If no argument is given a single Python float is returned.

outndarray

Random values.

>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
[ 0.37601032,  0.25528411],  #random
[ 0.49313049,  0.94909878]]) #random

d2l.mxnet.read_csv_labels(fname)[source]

Read fname to return a name to label dictionary.

d2l.mxnet.read_data_bananas(is_train=True)[source]

Read the bananas dataset images and labels.

d2l.mxnet.read_data_nmt()[source]

Load the English-French dataset.

d2l.mxnet.read_snli(data_dir, is_train)[source]

Read the SNLI dataset into premises, hypotheses, and labels.

d2l.mxnet.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

d2l.mxnet.read_voc_images(voc_dir, is_train=True)[source]

Read all VOC feature and label images.

d2l.mxnet.resnet18(num_classes)[source]

A slightly modified ResNet-18 model.

d2l.mxnet.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

d2l.mxnet.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

d2l.mxnet.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

d2l.mxnet.set_figsize(figsize=3.5, 2.5)[source]

Set the figure size for matplotlib.

d2l.mxnet.sgd(params, lr, batch_size)[source]

Minibatch stochastic gradient descent.

d2l.mxnet.show_bboxes(axes, bboxes, labels=None, colors=None)[source]

Show bounding boxes.

d2l.mxnet.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

d2l.mxnet.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

d2l.mxnet.sin(x, out=None, **kwargs)

Trigonometric sine, element-wise.

xndarray or scalar

Angle, in radians ($$2 \pi$$ rad equals 360 degrees).

outndarray or None

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. The dtype of the output is the same as that of the input if the input is an ndarray.

yndarray or scalar

The sine of each element of x. This is a scalar if x is a scalar.

This function only supports input type of float.

>>> np.sin(np.pi/2.)
1.0
>>> np.sin(np.array((0., 30., 45., 60., 90.)) * np.pi / 180.)
array([0.        , 0.5       , 0.70710677, 0.86602545, 1.        ])

d2l.mxnet.sinh(x, out=None, **kwargs)

Hyperbolic sine, element-wise. Equivalent to 1/2 * (np.exp(x) - np.exp(-x)) or -1j * np.sin(1j*x).

xndarray or scalar

Input array or scalar.

outndarray or None

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. The dtype of the output is the same as that of the input if the input is an ndarray.

yndarray or scalar

The corresponding hyperbolic sine values. This is a scalar if x is a scalar.

This function only supports input type of float.

>>> np.sinh(0)
0.0
>>> # Example of providing the optional output parameter
>>> out1 = np.array([0], dtype='f')
>>> out2 = np.sinh(np.array([0.1]), out1)
>>> out2 is out1
True

d2l.mxnet.split_batch(X, y, devices)[source]

Split X and y into multiple devices.

d2l.mxnet.split_batch_multi_inputs(X, y, devices)[source]

Split multi-input X and y into multiple devices.

d2l.mxnet.split_data_ml100k(data, num_users, num_items, split_mode='random', test_ratio=0.1)[source]

Split the dataset in random mode or seq-aware mode.

d2l.mxnet.squared_loss(y_hat, y)[source]

Squared loss.

d2l.mxnet.stack(arrays, axis=0, out=None)
Join a sequence of arrays along a new axis.

The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.

arrayssequence of array_like

Each array must have the same shape.

axisint, optional

The axis in the result array along which the input arrays are stacked.

outndarray, optional

If provided, the destination to place the result. The shape must be correct, matching that of what stack would have returned if no out argument were specified.

stackedndarray

The stacked array has one more dimension than the input arrays.

concatenate : Join a sequence of arrays along an existing axis. split : Split array into a list of multiple sub-arrays of equal size.

>>> arrays = [np.random.rand(3, 4) for _ in range(10)]
>>> np.stack(arrays, axis=0).shape
(10, 3, 4)

>>> np.stack(arrays, axis=1).shape
(3, 10, 4)

>>> np.stack(arrays, axis=2).shape
(3, 4, 10)

>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 3, 4])
>>> np.stack((a, b))
array([[1., 2., 3.],
[2., 3., 4.]])

>>> np.stack((a, b), axis=-1)
array([[1., 2.],
[2., 3.],
[3., 4.]])

d2l.mxnet.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

d2l.mxnet.tanh(x, out=None, **kwargs)

Compute hyperbolic tangent element-wise. Equivalent to np.sinh(x)/np.cosh(x).

xndarray or scalar.

Input array.

outndarray or None

A location into which the result is stored. If provided, it must have a shape that the inputs fill into. If not provided or None, a freshly-allocated array is returned. The dtype of the output and input must be the same.

yndarray or scalar

The corresponding hyperbolic tangent values.

If out is provided, the function writes the result into it, and returns a reference to out. (See Examples) - input x does not support complex computation (like imaginary number) >>> np.tanh(np.pi*1j) TypeError: type <type ‘complex’> not supported

>>> np.tanh(np.array[0, np.pi]))
array([0.       , 0.9962721])
>>> np.tanh(np.pi)
0.99627207622075
>>> # Example of providing the optional output parameter illustrating
>>> # that what is returned is a reference to said parameter
>>> out1 = np.array(1)
>>> out2 = np.tanh(np.array(0.1), out1)
>>> out2 is out1
True

d2l.mxnet.tensor(object, dtype=None, ctx=None)

Create an array.

objectarray_like or numpy.ndarray or mxnet.numpy.ndarray

An array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence.

dtypedata-type, optional

The desired data-type for the array. Default is float32.

ctxdevice context, optional

Device context on which the memory is allocated. Default is mxnet.context.current_context().

outndarray

An array object satisfying the specified requirements.

>>> np.array([1, 2, 3])
array([1., 2., 3.])

>>> np.array([[1, 2], [3, 4]])
array([[1., 2.],
[3., 4.]])

>>> np.array([[1, 0], [0, 1]], dtype=bool)
array([[ True, False],
[False,  True]])

d2l.mxnet.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

d2l.mxnet.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

d2l.mxnet.train_2d(trainer, steps=20)[source]

Optimize a 2-dim objective function with a customized trainer.

d2l.mxnet.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

d2l.mxnet.train_ch6(net, train_iter, test_iter, num_epochs, lr, device=gpu(0))[source]

Train a model with a GPU (defined in Chapter 6).

d2l.mxnet.train_ch8(net, train_iter, vocab, lr, num_epochs, device, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

d2l.mxnet.train_epoch_ch3(net, train_iter, loss, updater)[source]

Train a model within one epoch (defined in Chapter 3).

d2l.mxnet.train_epoch_ch8(net, train_iter, loss, updater, device, use_random_iter)[source]

Train a model within one epoch (defined in Chapter 8).

d2l.mxnet.train_seq2seq(net, data_iter, lr, num_epochs, tgt_vocab, device)[source]

Train a model for sequence to sequence.

d2l.mxnet.transpose_output(X, num_heads)[source]

Reverse the operation of transpose_qkv

d2l.mxnet.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

d2l.mxnet.try_all_gpus()[source]

Return all available GPUs, or [cpu()] if no GPU exists.

d2l.mxnet.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

d2l.mxnet.update_D(X, Z, net_D, net_G, loss, trainer_D)[source]

Update discriminator.

d2l.mxnet.update_G(Z, net_D, net_G, loss, trainer_G)[source]

Update generator.

d2l.mxnet.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

d2l.mxnet.voc_label_indices(colormap, colormap2label)[source]

Map an RGB color to a label.

d2l.mxnet.voc_rand_crop(feature, label, height, width)[source]

Randomly crop for both feature and label images.

d2l.mxnet.zeros(shape, dtype=None, order='C', ctx=None)

Return a new array of given shape and type, filled with zeros. This function currently only supports storing multi-dimensional data in row-major (C-style).

shapeint or tuple of int

The shape of the empty array.

dtypestr or numpy.dtype, optional

An optional value type (default is numpy.float32). Note that this behavior is different from NumPy’s zeros function where float64 is the default value, because float32 is considered as the default data type in deep learning.

order{‘C’}, optional, default: ‘C’

How to store multi-dimensional data in memory, currently only row-major (C-style) is supported.

ctxContext, optional

An optional device context (default is the current default context).

outndarray

Array of zeros with the given shape, dtype, and ctx.

>>> np.zeros(5)
array([0., 0., 0., 0., 0.])

>>> np.zeros((5,), dtype=int)
array([0, 0, 0, 0, 0], dtype=int64)

>>> np.zeros((2, 1))
array([[0.],
[0.]])

class d2l.torch.Accumulator(n)[source]

For accumulating sums over n variables.

class d2l.torch.AddNorm(normalized_shape, dropout, **kwargs)[source]
forward(X, Y)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.AdditiveAttention(key_size, query_size, num_hiddens, dropout, **kwargs)[source]
forward(queries, keys, values, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', nrows=1, ncols=1, figsize=3.5, 2.5)[source]

For plotting data in animation.

class d2l.torch.AttentionDecoder(**kwargs)[source]

The base attention-based decoder interface.

class d2l.torch.BERTEncoder(vocab_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, key_size=768, query_size=768, value_size=768, **kwargs)[source]
forward(tokens, segments, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.BERTModel(vocab_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, max_len=1000, key_size=768, query_size=768, value_size=768, hid_in_features=768, mlm_in_features=768, nsp_in_features=768)[source]
forward(tokens, segments, valid_lens=None, pred_positions=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.BananasDataset(*args, **kwds)[source]
class d2l.torch.Decoder(**kwargs)[source]

The base decoder interface for the encoder-decoder architecture.

forward(X, state)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.DotProductAttention(dropout, **kwargs)[source]

Scaled dot product attention.

forward(queries, keys, values, valid_lens=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.Encoder(**kwargs)[source]

The base encoder interface for the encoder-decoder architecture.

forward(X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.EncoderBlock(key_size, query_size, value_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, dropout, use_bias=False, **kwargs)[source]
forward(X, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.EncoderDecoder(encoder, decoder, **kwargs)[source]

The base class for the encoder-decoder architecture.

forward(enc_X, dec_X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.MaskLM(vocab_size, num_hiddens, num_inputs=768, **kwargs)[source]
forward(X, pred_positions)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.MaskedSoftmaxCELoss(weight: Optional[torch.Tensor] = None, size_average=None, ignore_index: int = - 100, reduce=None, reduction: str = 'mean')[source]

The softmax cross-entropy loss with masks.

forward(pred, label, valid_len)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.MultiHeadAttention(key_size, query_size, value_size, num_hiddens, num_heads, dropout, bias=False, **kwargs)[source]
forward(queries, keys, values, valid_lens)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.NextSentencePred(num_inputs, **kwargs)[source]
forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.PositionWiseFFN(ffn_num_input, ffn_num_hiddens, ffn_num_outputs, **kwargs)[source]
forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.PositionalEncoding(num_hiddens, dropout, max_len=1000)[source]
forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.RNNModel(rnn_layer, vocab_size, **kwargs)[source]

The RNN model.

forward(inputs, state)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.RNNModelScratch(vocab_size, num_hiddens, device, get_params, init_state, forward_fn)[source]

A RNN Model implemented from scratch.

class d2l.torch.RandomGenerator(sampling_weights)[source]

Draw a random int in [0, n] according to n sampling weights.

class d2l.torch.Residual(input_channels, num_channels, use_1x1conv=False, strides=1)[source]

The Residual block of ResNet.

forward(X)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.SNLIDataset(*args, **kwds)[source]

A customized dataset to load the SNLI dataset.

class d2l.torch.Seq2SeqEncoder(vocab_size, embed_size, num_hiddens, num_layers, dropout=0, **kwargs)[source]

The RNN encoder for sequence to sequence learning.

forward(X, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

An iterator to load sequence data.

class d2l.torch.Timer[source]

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.torch.TokenEmbedding(embedding_name)[source]

Token Embedding.

class d2l.torch.TransformerEncoder(vocab_size, key_size, query_size, value_size, num_hiddens, norm_shape, ffn_num_input, ffn_num_hiddens, num_heads, num_layers, dropout, use_bias=False, **kwargs)[source]
forward(X, valid_lens, *args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class d2l.torch.VOCSegDataset(*args, **kwds)[source]

A customized dataset to load VOC dataset.

class d2l.torch.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Vocabulary for text.

d2l.torch.abs(input, *, out=None) → Tensor

Computes the absolute value of each element in input.

(19.7.2)$\text{out}_{i} = |\text{input}_{i}|$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.abs(torch.tensor([-1, -2, 3]))
tensor([ 1,  2,  3])

d2l.torch.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

d2l.torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a 1-D tensor of size $$\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil$$ with values from the interval [start, end) taken with common difference step beginning from start.

Note that non-integer step is subject to floating point rounding errors when comparing against end; to avoid inconsistency, we advise adding a small epsilon to end in such cases.

(19.7.3)$\text{out}_{{i+1}} = \text{out}_{i} + \text{step}$
Args:

start (Number): the starting value for the set of points. Default: 0. end (Number): the ending value for the set of points step (Number): the gap between each pair of adjacent points. Default: 1.

Keyword args:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()). If dtype is not given, infer the data type from the other input arguments. If any of start, end, or stop are floating-point, the dtype is inferred to be the default dtype, see get_default_dtype(). Otherwise, the dtype is inferred to be torch.int64.

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.arange(5)
tensor([ 0,  1,  2,  3,  4])
>>> torch.arange(1, 4)
tensor([ 1,  2,  3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000,  1.5000,  2.0000])

d2l.torch.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

d2l.torch.bleu(pred_seq, label_seq, k)[source]

Compute the BLEU.

d2l.torch.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper_left, bottom_right)

d2l.torch.box_corner_to_center(boxes)[source]

Convert from (upper_left, bottom_right) to (center, width, height)

d2l.torch.box_iou(boxes1, boxes2)[source]

Compute IOU between two sets of boxes of shape (N,4) and (M,4).

d2l.torch.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

d2l.torch.build_colormap2label()[source]

Build an RGB color to label mapping for segmentation.

d2l.torch.concat()

cat(tensors, dim=0, *, out=None) -> Tensor

Concatenates the given sequence of seq tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.

torch.cat() can be seen as an inverse operation for torch.split() and torch.chunk().

torch.cat() can be best understood via examples.

Args:
tensors (sequence of Tensors): any python sequence of tensors of the same type.

Non-empty tensors provided must have the same shape, except in the cat dimension.

dim (int, optional): the dimension over which the tensors are concatenated

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
-1.0969, -0.4614],
[-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
-0.5790,  0.1497]])

d2l.torch.copyfile(filename, target_dir)[source]

Copy a file into a target directory.

d2l.torch.corr2d(X, K)[source]

Compute 2D cross-correlation.

d2l.torch.cos(input, *, out=None) → Tensor

Returns a new tensor with the cosine of the elements of input.

(19.7.4)$\text{out}_{i} = \cos(\text{input}_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 1.4309,  1.2706, -0.8562,  0.9796])
>>> torch.cos(a)
tensor([ 0.1395,  0.2957,  0.6553,  0.5574])

d2l.torch.cosh(input, *, out=None) → Tensor

Returns a new tensor with the hyperbolic cosine of the elements of input.

(19.7.5)$\text{out}_{i} = \cosh(\text{input}_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.1632,  1.1835, -0.6979, -0.7325])
>>> torch.cosh(a)
tensor([ 1.0133,  1.7860,  1.2536,  1.2805])


Note

When input is on the CPU, the implementation of torch.cosh may use the Sleef library, which rounds very large results to infinity or negative infinity. See here for details.

d2l.torch.count_corpus(tokens)[source]

Count token frequencies.

class d2l.torch.defaultdict

defaultdict(default_factory[, …]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

copy() → a shallow copy of D.
default_factory

Factory for default value called by __missing__().

d2l.torch.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

d2l.torch.download_all()[source]

Download all files in the DATA_HUB.

d2l.torch.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

d2l.torch.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

d2l.torch.evaluate_accuracy_gpu(net, data_iter, device=None)[source]

Compute the accuracy for a model on a dataset using a GPU.

d2l.torch.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

d2l.torch.exp(input, *, out=None) → Tensor

Returns a new tensor with the exponential of the elements of the input tensor input.

(19.7.6)$y_{i} = e^{x_{i}}$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.exp(torch.tensor([0, math.log(2.)]))
tensor([ 1.,  2.])

d2l.torch.eye(n, m=None, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

Args:

n (int): the number of rows m (int, optional): the number of columns with default being n out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Returns:

Tensor: A 2-D tensor with ones on the diagonal and zeros elsewhere

Example:

>>> torch.eye(3)
tensor([[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.]])

d2l.torch.get_dataloader_workers()[source]

Use 4 processes to read the data.

d2l.torch.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

d2l.torch.grad_clipping(net, theta)[source]

Clip the gradient.

d2l.torch.linreg(X, w, b)[source]

The linear regression model.

d2l.torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Creates a one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive. That is, the value are:

(19.7.7)$(\text{start}, \text{start} + \frac{\text{end} - \text{start}}{\text{steps} - 1}, \ldots, \text{start} + (\text{steps} - 2) * \frac{\text{end} - \text{start}}{\text{steps} - 1}, \text{end})$

Warning

Not providing a value for steps is deprecated. For backwards compatibility, not providing a value for steps will create a tensor with 100 elements. Note that this behavior is not reflected in the documented function signature and should not be relied on. In a future PyTorch release, failing to provide a value for steps will throw a runtime error.

Args:

start (float): the starting value for the set of points end (float): the ending value for the set of points steps (int): size of the constructed tensor out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.linspace(3, 10, steps=5)
tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
>>> torch.linspace(-10, 10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=5)
tensor([-10.,  -5.,   0.,   5.,  10.])
>>> torch.linspace(start=-10, end=10, steps=1)
tensor([-10.])

d2l.torch.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a PyTorch data iterator.

d2l.torch.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

d2l.torch.load_data_bananas(batch_size)[source]

Load the bananas dataset.

d2l.torch.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

d2l.torch.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

d2l.torch.load_data_snli(batch_size, num_steps=50)[source]

Download the SNLI dataset and return data iterators and vocabulary.

d2l.torch.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

d2l.torch.load_data_voc(batch_size, crop_size)[source]

Download and load the VOC2012 semantic dataset.

d2l.torch.log(input, *, out=None) → Tensor

Returns a new tensor with the natural logarithm of the elements of input.

(19.7.8)$y_{i} = \log_{e} (x_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(5)
>>> a
tensor([-0.7168, -0.5471, -0.8933, -1.4428, -0.1190])
>>> torch.log(a)
tensor([ nan,  nan,  nan,  nan,  nan])

d2l.torch.masked_softmax(X, valid_lens)[source]

Perform softmax operation by masking elements on the last axis.

d2l.torch.match_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5)[source]

Assign ground-truth bounding boxes to anchor boxes similar to them.

d2l.torch.matmul(input, other, *, out=None) → Tensor

Matrix product of two tensors.

The behavior depends on the dimensionality of the tensors as follows:

• If both tensors are 1-dimensional, the dot product (scalar) is returned.

• If both arguments are 2-dimensional, the matrix-matrix product is returned.

• If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.

• If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.

• If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a $$(j \times 1 \times n \times n)$$ tensor and other is a $$(k \times n \times n)$$ tensor, out will be a $$(j \times k \times n \times n)$$ tensor.

Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not the matrix dimensions. For example, if input is a $$(j \times 1 \times n \times m)$$ tensor and other is a $$(k \times m \times p)$$ tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. out will be a $$(j \times k \times n \times p)$$ tensor.

This operator supports TensorFloat32.

Note

The 1-dimensional dot product version of this function does not support an out parameter.

Arguments:

input (Tensor): the first tensor to be multiplied other (Tensor): the second tensor to be multiplied

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> # vector x vector
>>> tensor1 = torch.randn(3)
>>> tensor2 = torch.randn(3)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([])
>>> # matrix x vector
>>> tensor1 = torch.randn(3, 4)
>>> tensor2 = torch.randn(4)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([3])
>>> # batched matrix x broadcasted vector
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(4)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3])
>>> # batched matrix x batched matrix
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(10, 4, 5)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3, 5])
>>> # batched matrix x broadcasted matrix
>>> tensor1 = torch.randn(10, 3, 4)
>>> tensor2 = torch.randn(4, 5)
>>> torch.matmul(tensor1, tensor2).size()
torch.Size([10, 3, 5])

d2l.torch.meshgrid(*tensors)[source]

Take $$N$$ tensors, each of which can be either scalar or 1-dimensional vector, and create $$N$$ N-dimensional grids, where the $$i$$ th grid is defined by expanding the $$i$$ th input over dimensions defined by other inputs.

Args:
tensors (list of Tensor): list of scalars or 1 dimensional tensors. Scalars will be

treated as tensors of size $$(1,)$$ automatically

Returns:

seq (sequence of Tensors): If the input has $$k$$ tensors of size $$(N_1,), (N_2,), \ldots , (N_k,)$$, then the output would also have $$k$$ tensors, where all tensors are of size $$(N_1, N_2, \ldots , N_k)$$.

Example:

>>> x = torch.tensor([1, 2, 3])
>>> y = torch.tensor([4, 5, 6])
>>> grid_x, grid_y = torch.meshgrid(x, y)
>>> grid_x
tensor([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> grid_y
tensor([[4, 5, 6],
[4, 5, 6],
[4, 5, 6]])

d2l.torch.normal(mean, std, *, generator=None, out=None) → Tensor

Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given.

The mean is a tensor with the mean of each output element’s normal distribution

The std is a tensor with the standard deviation of each output element’s normal distribution

The shapes of mean and std don’t need to match, but the total number of elements in each tensor need to be the same.

Note

When the shapes do not match, the shape of mean is used as the shape for the returned output tensor

Args:

mean (Tensor): the tensor of per-element means std (Tensor): the tensor of per-element standard deviations

Keyword args:

generator (torch.Generator, optional): a pseudorandom number generator for sampling out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))
tensor([  1.0425,   3.5672,   2.7969,   4.2925,   4.7229,   6.2134,
8.0505,   8.1408,   9.0563,  10.0566])

d2l.torch.normal(mean=0.0, std, *, out=None) → Tensor

Similar to the function above, but the means are shared among all drawn elements.

Args:

mean (float, optional): the mean for all distributions std (Tensor): the tensor of per-element standard deviations

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(mean=0.5, std=torch.arange(1., 6.))
tensor([-1.2793, -1.0732, -2.0687,  5.1177, -1.2303])

d2l.torch.normal(mean, std=1.0, *, out=None) → Tensor

Similar to the function above, but the standard-deviations are shared among all drawn elements.

Args:

mean (Tensor): the tensor of per-element means std (float, optional): the standard deviation for all distributions

Keyword args:

out (Tensor, optional): the output tensor

Example:

>>> torch.normal(mean=torch.arange(1., 6.))
tensor([ 1.1552,  2.6148,  2.6535,  5.8318,  4.2361])

d2l.torch.normal(mean, std, size, *, out=None) → Tensor

Similar to the function above, but the means and standard deviations are shared among all drawn elements. The resulting tensor has size given by size.

Args:

mean (float): the mean for all distributions std (float): the standard deviation for all distributions size (int…): a sequence of integers defining the shape of the output tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> torch.normal(2, 3, size=(1, 4))
tensor([[-1.3987, -1.9544,  3.6048,  0.7909]])

d2l.torch.ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
[ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])

d2l.torch.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', figsize=3.5, 2.5, axes=None)[source]

Plot data points.

d2l.torch.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

d2l.torch.predict_ch8(prefix, num_preds, net, vocab, device)[source]

Generate new characters following the prefix.

d2l.torch.predict_seq2seq(net, src_sentence, src_vocab, tgt_vocab, num_steps, device, save_attention_weights=False)[source]

Predict for sequence to sequence.

d2l.torch.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

d2l.torch.rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with random numbers from a uniform distribution on the interval $$[0, 1)$$

The shape of the tensor is defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

Keyword args:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.rand(4)
tensor([ 0.5204,  0.2503,  0.3525,  0.5673])
>>> torch.rand(2, 3)
tensor([[ 0.8237,  0.5781,  0.6879],
[ 0.3816,  0.7249,  0.0998]])

d2l.torch.read_csv_labels(fname)[source]

Read fname to return a name to label dictionary.

d2l.torch.read_data_bananas(is_train=True)[source]

Read the bananas dataset images and labels.

d2l.torch.read_data_nmt()[source]

Load the English-French dataset.

d2l.torch.read_snli(data_dir, is_train)[source]

Read the SNLI dataset into premises, hypotheses, and labels.

d2l.torch.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

d2l.torch.read_voc_images(voc_dir, is_train=True)[source]

Read all VOC feature and label images.

d2l.torch.resnet18(num_classes, in_channels=1)[source]

A slightly modified ResNet-18 model.

d2l.torch.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

d2l.torch.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

d2l.torch.sequence_mask(X, valid_len, value=0)[source]

Mask irrelevant entries in sequences.

d2l.torch.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

d2l.torch.set_figsize(figsize=3.5, 2.5)[source]

Set the figure size for matplotlib.

d2l.torch.sgd(params, lr, batch_size)[source]

Minibatch stochastic gradient descent.

d2l.torch.show_bboxes(axes, bboxes, labels=None, colors=None)[source]

Show bounding boxes.

d2l.torch.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

d2l.torch.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

d2l.torch.sin(input, *, out=None) → Tensor

Returns a new tensor with the sine of the elements of input.

(19.7.9)$\text{out}_{i} = \sin(\text{input}_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([-0.5461,  0.1347, -2.7266, -0.2746])
>>> torch.sin(a)
tensor([-0.5194,  0.1343, -0.4032, -0.2711])

d2l.torch.sinh(input, *, out=None) → Tensor

Returns a new tensor with the hyperbolic sine of the elements of input.

(19.7.10)$\text{out}_{i} = \sinh(\text{input}_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.5380, -0.8632, -0.1265,  0.9399])
>>> torch.sinh(a)
tensor([ 0.5644, -0.9744, -0.1268,  1.0845])


Note

When input is on the CPU, the implementation of torch.sinh may use the Sleef library, which rounds very large results to infinity or negative infinity. See here for details.

d2l.torch.split_batch(X, y, devices)[source]

Split X and y into multiple devices.

d2l.torch.squared_loss(y_hat, y)[source]

Squared loss.

d2l.torch.stack(tensors, dim=0, *, out=None) → Tensor

Concatenates a sequence of tensors along a new dimension.

All tensors need to be of the same size.

Arguments:

tensors (sequence of Tensors): sequence of tensors to concatenate dim (int): dimension to insert. Has to be between 0 and the number

of dimensions of concatenated tensors (inclusive)

Keyword args:

out (Tensor, optional): the output tensor.

d2l.torch.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

d2l.torch.tanh(input, *, out=None) → Tensor

Returns a new tensor with the hyperbolic tangent of the elements of input.

(19.7.11)$\text{out}_{i} = \tanh(\text{input}_{i})$
Args:

input (Tensor): the input tensor.

Keyword args:

out (Tensor, optional): the output tensor.

Example:

>>> a = torch.randn(4)
>>> a
tensor([ 0.8986, -0.7279,  1.1745,  0.2611])
>>> torch.tanh(a)
tensor([ 0.7156, -0.6218,  0.8257,  0.2553])

d2l.torch.tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor

Constructs a tensor with data.

Warning

torch.tensor() always copies data. If you have a Tensor data and want to avoid a copy, use torch.Tensor.requires_grad_() or torch.Tensor.detach(). If you have a NumPy ndarray and want to avoid a copy, use torch.as_tensor().

Warning

When data is a tensor x, torch.tensor() reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore torch.tensor(x) is equivalent to x.clone().detach() and torch.tensor(x, requires_grad=True) is equivalent to x.clone().detach().requires_grad_(True). The equivalents using clone() and detach() are recommended.

Args:
data (array_like): Initial data for the tensor. Can be a list, tuple,

NumPy ndarray, scalar, and other types.

Keyword args:
dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, infers data type from data.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

pin_memory (bool, optional): If set, returned tensor would be allocated in

the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
tensor([[ 0.1000,  1.2000],
[ 2.2000,  3.1000],
[ 4.9000,  5.2000]])

>>> torch.tensor([0, 1])  # Type inference on data
tensor([ 0,  1])

>>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
dtype=torch.float64,
device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')

>>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
tensor(3.1416)

>>> torch.tensor([])  # Create an empty tensor (of size (0,))
tensor([])

d2l.torch.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

d2l.torch.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

d2l.torch.train_2d(trainer, steps=20)[source]

Optimize a 2-dim objective function with a customized trainer.

d2l.torch.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

d2l.torch.train_ch6(net, train_iter, test_iter, num_epochs, lr, device=device(type='cuda', index=0))[source]

Train a model with a GPU (defined in Chapter 6).

d2l.torch.train_ch8(net, train_iter, vocab, lr, num_epochs, device, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

d2l.torch.train_epoch_ch3(net, train_iter, loss, updater)[source]

The training loop defined in Chapter 3.

d2l.torch.train_epoch_ch8(net, train_iter, loss, updater, device, use_random_iter)[source]

Train a net within one epoch (defined in Chapter 8).

d2l.torch.train_seq2seq(net, data_iter, lr, num_epochs, tgt_vocab, device)[source]

Train a model for sequence to sequence.

d2l.torch.transpose_output(X, num_heads)[source]

Reverse the operation of transpose_qkv

d2l.torch.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

d2l.torch.try_all_gpus()[source]

Return all available GPUs, or [cpu(),] if no GPU exists.

d2l.torch.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

d2l.torch.update_D(X, Z, net_D, net_G, loss, trainer_D)[source]

Update discriminator.

d2l.torch.update_G(Z, net_D, net_G, loss, trainer_G)[source]

Update generator.

d2l.torch.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

d2l.torch.voc_label_indices(colormap, colormap2label)[source]

Map an RGB color to a label.

d2l.torch.voc_rand_crop(feature, label, height, width)[source]

Randomly crop for both feature and label images.

d2l.torch.zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument size.

Args:
size (int…): a sequence of integers defining the shape of the output tensor.

Can be a variable number of arguments or a collection like a list or tuple.

Keyword args:

out (Tensor, optional): the output tensor. dtype (torch.dtype, optional): the desired data type of returned tensor.

Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional): the desired layout of returned Tensor.

Default: torch.strided.

device (torch.device, optional): the desired device of returned tensor.

Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional): If autograd should record operations on the

returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
[ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])

class d2l.tensorflow.Accumulator(n)[source]

For accumulating sums over n variables.

class d2l.tensorflow.Animator(xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', nrows=1, ncols=1, figsize=3.5, 2.5)[source]

For plotting data in animation.

class d2l.tensorflow.RNNModelScratch(vocab_size, num_hiddens, init_state, forward_fn)[source]

A RNN Model implemented from scratch.

class d2l.tensorflow.Residual(*args, **kwargs)[source]

The Residual block of ResNet.

call(X)[source]

Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Arguments:

inputs: A tensor or list of tensors. training: Boolean or boolean scalar tensor, indicating whether to run

the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be

either a tensor or None (no mask).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class d2l.tensorflow.SeqDataLoader(batch_size, num_steps, use_random_iter, max_tokens)[source]

An iterator to load sequence data.

class d2l.tensorflow.Timer[source]

Record multiple running times.

avg()[source]

Return the average time.

cumsum()[source]

Return the accumulated time.

start()[source]

Start the timer.

stop()[source]

Stop the timer and record the time in a list.

sum()[source]

Return the sum of time.

class d2l.tensorflow.TrainCallback(net, train_iter, test_iter, num_epochs, device_name)[source]

A callback to visiualize the training progress.

on_epoch_begin(epoch, logs=None)[source]

Called at the start of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Arguments:

epoch: Integer, index of epoch. logs: Dict. Currently no data is passed to this argument for this method

but that may change in the future.

on_epoch_end(epoch, logs)[source]

Called at the end of an epoch.

Subclasses should override for any actions to run. This function should only be called during TRAIN mode.

Arguments:

epoch: Integer, index of epoch. logs: Dict, metric results for this training epoch, and for the

validation epoch if validation is performed. Validation result keys are prefixed with val_.

class d2l.tensorflow.Updater(params, lr)[source]

For updating parameters using minibatch stochastic gradient descent.

class d2l.tensorflow.Vocab(tokens=None, min_freq=0, reserved_tokens=None)[source]

Vocabulary for text.

d2l.tensorflow.abs(*args, **kwargs)[source]

Computes the absolute value of a tensor.

Given a tensor of integer or floating-point values, this operation returns a tensor of the same type, where each element contains the absolute value of the corresponding element in the input.

Given a tensor x of complex numbers, this operation returns a tensor of type float32 or float64 that is the absolute value of each element in x. For a complex number $$a + bj$$, its absolute value is computed as $$sqrt{a^2 + b^2}$$. For example:

>>> x = tf.constant([[-2.25 + 4.75j], [-3.25 + 5.75j]])
>>> tf.abs(x)
<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[5.25594901],
[6.60492241]])>

Args:
x: A Tensor or SparseTensor of type float16, float32, float64,

int32, int64, complex64 or complex128.

name: A name for the operation (optional).

Returns:
A Tensor or SparseTensor of the same size, type and sparsity as x,

with absolute values. Note, for complex64 or complex128 input, the returned Tensor will be of type float32 or float64, respectively.

If x is a SparseTensor, returns SparseTensor(x.indices, tf.math.abs(x.values, …), x.dense_shape)

d2l.tensorflow.accuracy(y_hat, y)[source]

Compute the number of correct predictions.

d2l.tensorflow.arange(*args, **kwargs)

Creates a sequence of numbers.

Creates a sequence of numbers that begins at start and extends by increments of delta up to but not including limit.

The dtype of the resulting tensor is inferred from the inputs unless it is provided explicitly.

Like the Python builtin range, start defaults to 0, so that range(n) = range(0, n).

For example:

>>> start = 3
>>> limit = 18
>>> delta = 3
>>> tf.range(start, limit, delta)
<tf.Tensor: shape=(5,), dtype=int32,
numpy=array([ 3,  6,  9, 12, 15], dtype=int32)>

>>> start = 3
>>> limit = 1
>>> delta = -0.5
>>> tf.range(start, limit, delta)
<tf.Tensor: shape=(4,), dtype=float32,
numpy=array([3. , 2.5, 2. , 1.5], dtype=float32)>

>>> limit = 5
>>> tf.range(limit)
<tf.Tensor: shape=(5,), dtype=int32,
numpy=array([0, 1, 2, 3, 4], dtype=int32)>

Args:
start: A 0-D Tensor (scalar). Acts as first entry in the range if limit

is not None; otherwise, acts as range limit and first entry defaults to 0.

limit: A 0-D Tensor (scalar). Upper limit of sequence, exclusive. If None,

defaults to the value of start while the first entry of the range defaults to 0.

delta: A 0-D Tensor (scalar). Number that increments start. Defaults to

dtype: The type of the elements of the resulting tensor. name: A name for the operation. Defaults to “range”.

Returns:

An 1-D Tensor of type dtype.

@compatibility(numpy) Equivalent to np.arange @end_compatibility

d2l.tensorflow.argmax(*args, **kwargs)[source]

Returns the index with the largest value across axes of a tensor.

In case of identity returns the smallest index.

For example:

>>> A = tf.constant([2, 20, 30, 3, 6])
>>> tf.math.argmax(A)  # A[2] is maximum in tensor A
<tf.Tensor: shape=(), dtype=int64, numpy=2>
>>> B = tf.constant([[2, 20, 30, 3, 6], [3, 11, 16, 1, 8],
...                  [14, 45, 23, 5, 27]])
>>> tf.math.argmax(B, 0)
<tf.Tensor: shape=(5,), dtype=int64, numpy=array([2, 2, 0, 2, 2])>
>>> tf.math.argmax(B, 1)
<tf.Tensor: shape=(3,), dtype=int64, numpy=array([2, 2, 1])>
>>> C = tf.constant([0, 0, 0, 0])
>>> tf.math.argmax(C) # Returns smallest index in case of ties
<tf.Tensor: shape=(), dtype=int64, numpy=0>

Args:

input: A Tensor. axis: An integer, the axis to reduce across. Default to 0. output_type: An optional output dtype (tf.int32 or tf.int64). Defaults

to tf.int64.

name: An optional name for the operation.

Returns:

A Tensor of type output_type.

d2l.tensorflow.astype(*args, **kwargs)

Casts a tensor to a new type.

The operation casts x (in case of Tensor) or x.values (in case of SparseTensor or IndexedSlices) to dtype.

For example:

>>> x = tf.constant([1.8, 2.2], dtype=tf.float32)
>>> tf.dtypes.cast(x, tf.int32)
<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>


The operation supports data types (for x and dtype) of uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32, float64, complex64, complex128, bfloat16. In case of casting from complex types (complex64, complex128) to real types, only the real part of x is returned. In case of casting from real types to complex types (complex64, complex128), the imaginary part of the returned value is set to 0. The handling of complex types here matches the behavior of numpy.

Args:
x: A Tensor or SparseTensor or IndexedSlices of numeric type. It could

be uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32, float64, complex64, complex128, bfloat16.

dtype: The destination type. The list of supported dtypes is the same as

x.

name: A name for the operation (optional).

Returns:
A Tensor or SparseTensor or IndexedSlices with same shape as x and

same type as dtype.

Raises:

TypeError: If x cannot be cast to the dtype.

d2l.tensorflow.bbox_to_rect(bbox, color)[source]

Convert bounding box to matplotlib format.

d2l.tensorflow.box_center_to_corner(boxes)[source]

Convert from (center, width, height) to (upper_left, bottom_right)

d2l.tensorflow.box_corner_to_center(boxes)[source]

Convert from (upper_left, bottom_right) to (center, width, height)

d2l.tensorflow.build_array_nmt(lines, vocab, num_steps)[source]

Transform text sequences of machine translation into minibatches.

d2l.tensorflow.concat(*args, **kwargs)[source]

Concatenates tensors along one dimension.

See also tf.tile, tf.stack, tf.repeat.

Concatenates the list of tensors values along dimension axis. If values[i].shape = [D0, D1, … Daxis(i), …Dn], the concatenated result has shape

[D0, D1, … Raxis, …Dn]

where

Raxis = sum(Daxis(i))

That is, the data from the input tensors is joined along the axis dimension.

The number of dimensions of the input tensors must match, and all dimensions except axis must be equal.

For example:

>>> t1 = [[1, 2, 3], [4, 5, 6]]
>>> t2 = [[7, 8, 9], [10, 11, 12]]
>>> tf.concat([t1, t2], 0)
<tf.Tensor: shape=(4, 3), dtype=int32, numpy=
array([[ 1,  2,  3],
[ 4,  5,  6],
[ 7,  8,  9],
[10, 11, 12]], dtype=int32)>

>>> tf.concat([t1, t2], 1)
<tf.Tensor: shape=(2, 6), dtype=int32, numpy=
array([[ 1,  2,  3,  7,  8,  9],
[ 4,  5,  6, 10, 11, 12]], dtype=int32)>


As in Python, the axis could also be negative numbers. Negative axis are interpreted as counting from the end of the rank, i.e.,

axis + rank(values)-th dimension.

For example:

>>> t1 = [[[1, 2], [2, 3]], [[4, 4], [5, 3]]]
>>> t2 = [[[7, 4], [8, 4]], [[2, 10], [15, 11]]]
>>> tf.concat([t1, t2], -1)
<tf.Tensor: shape=(2, 2, 4), dtype=int32, numpy=
array([[[ 1,  2,  7,  4],
[ 2,  3,  8,  4]],
[[ 4,  4,  2, 10],
[ 5,  3, 15, 11]]], dtype=int32)>


Note: If you are concatenating along a new axis consider using stack. E.g.

python tf.concat([tf.expand_dims(t, axis) for t in tensors], axis) 

can be rewritten as

python tf.stack(tensors, axis=axis) 

Args:

values: A list of Tensor objects or a single Tensor. axis: 0-D int32 Tensor. Dimension along which to concatenate. Must be

in the range [-rank(values), rank(values)). As in Python, indexing for axis is 0-based. Positive axis in the rage of [0, rank(values)) refers to axis-th dimension. And negative axis refers to axis + rank(values)-th dimension.

name: A name for the operation (optional).

Returns:

A Tensor resulting from concatenation of the input tensors.

d2l.tensorflow.corr2d(X, K)[source]

Compute 2D cross-correlation.

d2l.tensorflow.cos(x, name=None)[source]

Computes cos of x element-wise.

Given an input tensor, this function computes cosine of every element in the tensor. Input range is (-inf, inf) and output range is [-1,1]. If input lies outside the boundary, nan is returned.

python x = tf.constant([-float("inf"), -9, -0.5, 1, 1.2, 200, 10000, float("inf")]) tf.math.cos(x) ==> [nan -0.91113025 0.87758255 0.5403023 0.36235774 0.48718765 -0.95215535 nan] 

Args:

x: A Tensor. Must be one of the following types: bfloat16, half, float32, float64, complex64, complex128. name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

d2l.tensorflow.cosh(x, name=None)[source]

Computes hyperbolic cosine of x element-wise.

Given an input tensor, this function computes hyperbolic cosine of every element in the tensor. Input range is [-inf, inf] and output range is [1, inf].

python x = tf.constant([-float("inf"), -9, -0.5, 1, 1.2, 2, 10, float("inf")]) tf.math.cosh(x) ==> [inf 4.0515420e+03 1.1276259e+00 1.5430807e+00 1.8106556e+00 3.7621956e+00 1.1013233e+04 inf] 

Args:

x: A Tensor. Must be one of the following types: bfloat16, half, float32, float64, complex64, complex128. name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

d2l.tensorflow.count_corpus(tokens)[source]

Count token frequencies.

class d2l.tensorflow.defaultdict

defaultdict(default_factory[, …]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

copy() → a shallow copy of D.
default_factory

Factory for default value called by __missing__().

d2l.tensorflow.download(name, cache_dir='../data')[source]

Download a file inserted into DATA_HUB, return the local filename.

d2l.tensorflow.download_all()[source]

Download all files in the DATA_HUB.

d2l.tensorflow.download_extract(name, folder=None)[source]

Download and extract a zip/tar file.

d2l.tensorflow.evaluate_accuracy(net, data_iter)[source]

Compute the accuracy for a model on a dataset.

d2l.tensorflow.evaluate_loss(net, data_iter, loss)[source]

Evaluate the loss of a model on the given dataset.

d2l.tensorflow.exp(*args, **kwargs)[source]

Computes exponential of x element-wise. $$y = e^x$$.

This function computes the exponential of the input tensor element-wise. i.e. math.exp(x) or $$e^x$$, where x is the input tensor. $$e$$ denotes Euler’s number and is approximately equal to 2.718281. Output is positive for any real input.

>>> x = tf.constant(2.0)
>>> tf.math.exp(x)
<tf.Tensor: shape=(), dtype=float32, numpy=7.389056>

>>> x = tf.constant([2.0, 8.0])
>>> tf.math.exp(x)
<tf.Tensor: shape=(2,), dtype=float32,
numpy=array([   7.389056, 2980.958   ], dtype=float32)>


For complex numbers, the exponential value is calculated as $$e^{x+iy}={e^x}{e^{iy}}={e^x}{\cos(y)+i\sin(y)}$$

For 1+1j the value would be computed as: $$e^1{\cos(1)+i\sin(1)} = 2.7182817 \times (0.5403023+0.84147096j)$$

>>> x = tf.constant(1 + 1j)
>>> tf.math.exp(x)
<tf.Tensor: shape=(), dtype=complex128,
numpy=(1.4686939399158851+2.2873552871788423j)>

Args:
x: A tf.Tensor. Must be one of the following types: bfloat16, half,

float32, float64, complex64, complex128.

name: A name for the operation (optional).

Returns:

A tf.Tensor. Has the same type as x.

@compatibility(numpy) Equivalent to np.exp @end_compatibility

d2l.tensorflow.eye(*args, **kwargs)[source]

Construct an identity matrix, or a batch of matrices.

See also tf.ones, tf.zeros, tf.fill, tf.one_hot.

python # Construct one identity matrix. tf.eye(2) ==> [[1., 0.],

[0., 1.]]

# Construct a batch of 3 identity matrices, each 2 x 2. # batch_identity[i, :, :] is a 2 x 2 identity matrix, i = 0, 1, 2. batch_identity = tf.eye(2, batch_shape=[3])

# Construct one 2 x 3 “identity” matrix tf.eye(2, num_columns=3) ==> [[ 1., 0., 0.],

[ 0., 1., 0.]]

Args:
num_rows: Non-negative int32 scalar Tensor giving the number of rows

in each batch matrix.

num_columns: Optional non-negative int32 scalar Tensor giving the number

of columns in each batch matrix. Defaults to num_rows.

batch_shape: A list or tuple of Python integers or a 1-D int32 Tensor.

If provided, the returned Tensor will have leading batch dimensions of this shape.

dtype: The type of an element in the resulting Tensor name: A name for this Op. Defaults to “eye”.

Returns:

A Tensor of shape batch_shape + [num_rows, num_columns]

d2l.tensorflow.get_fashion_mnist_labels(labels)[source]

Return text labels for the Fashion-MNIST dataset.

d2l.tensorflow.grad_clipping(grads, theta)[source]

Clip the gradient.

d2l.tensorflow.linreg(X, w, b)[source]

The linear regression model.

d2l.tensorflow.linspace(*args, **kwargs)

Generates evenly-spaced values in an interval along a given axis.

A sequence of num evenly-spaced values are generated beginning at start along a given axis. If num > 1, the values in the sequence increase by stop - start / num - 1, so that the last one is exactly stop. If num <= 0, ValueError is raised.

Matches [np.linspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)’s behaviour except when num == 0.

For example:

 tf.linspace(10.0, 12.0, 3, name="linspace") => [ 10.0  11.0  12.0] 

Start and stop can be tensors of arbitrary size:

>>> tf.linspace([0., 5.], [10., 40.], 5, axis=0)
<tf.Tensor: shape=(5, 2), dtype=float32, numpy=
array([[ 0.  ,  5.  ],
[ 2.5 , 13.75],
[ 5.  , 22.5 ],
[ 7.5 , 31.25],
[10.  , 40.  ]], dtype=float32)>


Axis is where the values will be generated (the dimension in the returned tensor which corresponds to the axis will be equal to num)

>>> tf.linspace([0., 5.], [10., 40.], 5, axis=-1)
<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[ 0.  ,  2.5 ,  5.  ,  7.5 , 10.  ],
[ 5.  , 13.75, 22.5 , 31.25, 40.  ]], dtype=float32)>

Args:
start: A Tensor. Must be one of the following types: bfloat16,

float32, float64. N-D tensor. First entry in the range.

stop: A Tensor. Must have the same type and shape as start. N-D tensor.

Last entry in the range.

num: A Tensor. Must be one of the following types: int32, int64. 0-D

tensor. Number of values to generate.

name: A name for the operation (optional). axis: Axis along which the operation is performed (used only when N-D

tensors are provided).

Returns:

A Tensor. Has the same type as start.

d2l.tensorflow.load_array(data_arrays, batch_size, is_train=True)[source]

Construct a TensorFlow data iterator.

d2l.tensorflow.load_corpus_time_machine(max_tokens=- 1)[source]

Return token indices and the vocabulary of the time machine dataset.

d2l.tensorflow.load_data_fashion_mnist(batch_size, resize=None)[source]

Download the Fashion-MNIST dataset and then load it into memory.

d2l.tensorflow.load_data_nmt(batch_size, num_steps, num_examples=600)[source]

Return the iterator and the vocabularies of the translation dataset.

d2l.tensorflow.load_data_time_machine(batch_size, num_steps, use_random_iter=False, max_tokens=10000)[source]

Return the iterator and the vocabulary of the time machine dataset.

d2l.tensorflow.matmul(*args, **kwargs)[source]

Multiplies matrix a by matrix b, producing a * b.

The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication dimensions, and any further outer dimensions specify matching batch size.

Both matrices must be of the same type. The supported types are: float16, float32, float64, int32, complex64, complex128.

Either matrix can be transposed or adjointed (conjugated and transposed) on the fly by setting one of the corresponding flag to True. These are False by default.

If one or both of the matrices contain a lot of zeros, a more efficient multiplication algorithm can be used by setting the corresponding a_is_sparse or b_is_sparse flag to True. These are False by default. This optimization is only available for plain matrices (rank-2 tensors) with datatypes bfloat16 or float32.

A simple 2-D tensor matrix multiplication:

>>> a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
>>> a  # 2-D tensor
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)>
>>> b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2])
>>> b  # 2-D tensor
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 7,  8],
[ 9, 10],
[11, 12]], dtype=int32)>
>>> c = tf.matmul(a, b)
>>> c  # a * b
<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 58,  64],
[139, 154]], dtype=int32)>


A batch matrix multiplication with batch shape [2]:

>>> a = tf.constant(np.arange(1, 13, dtype=np.int32), shape=[2, 2, 3])
>>> a  # 3-D tensor
<tf.Tensor: shape=(2, 2, 3), dtype=int32, numpy=
array([[[ 1,  2,  3],
[ 4,  5,  6]],
[[ 7,  8,  9],
[10, 11, 12]]], dtype=int32)>
>>> b = tf.constant(np.arange(13, 25, dtype=np.int32), shape=[2, 3, 2])
>>> b  # 3-D tensor
<tf.Tensor: shape=(2, 3, 2), dtype=int32, numpy=
array([[[13, 14],
[15, 16],
[17, 18]],
[[19, 20],
[21, 22],
[23, 24]]], dtype=int32)>
>>> c = tf.matmul(a, b)
>>> c  # a * b
<tf.Tensor: shape=(2, 2, 2), dtype=int32, numpy=
array([[[ 94, 100],
[229, 244]],
[[508, 532],
[697, 730]]], dtype=int32)>


Since python >= 3.5 the @ operator is supported (see [PEP 465](https://www.python.org/dev/peps/pep-0465/)). In TensorFlow, it simply calls the tf.matmul() function, so the following lines are equivalent:

>>> d = a @ b @ [[10], [11]]
>>> d = tf.matmul(tf.matmul(a, b), [[10], [11]])

Args:
a: tf.Tensor of type float16, float32, float64, int32,

complex64, complex128 and rank > 1.

b: tf.Tensor with same type and rank as a. transpose_a: If True, a is transposed before multiplication. transpose_b: If True, b is transposed before multiplication. adjoint_a: If True, a is conjugated and transposed before

multiplication.

adjoint_b: If True, b is conjugated and transposed before

multiplication.

a_is_sparse: If True, a is treated as a sparse matrix. Notice, this

does not support tf.sparse.SparseTensor, it just makes optimizations that assume most values in a are zero. See tf.sparse.sparse_dense_matmul for some support for tf.sparse.SparseTensor multiplication.

b_is_sparse: If True, b is treated as a sparse matrix. Notice, this

does not support tf.sparse.SparseTensor, it just makes optimizations that assume most values in a are zero. See tf.sparse.sparse_dense_matmul for some support for tf.sparse.SparseTensor multiplication.

name: Name for the operation (optional).

Returns:

A tf.Tensor of the same type as a and b where each inner-most matrix is the product of the corresponding matrices in a and b, e.g. if all transpose or adjoint attributes are False:

output[…, i, j] = sum_k (a[…, i, k] * b[…, k, j]), for all indices i, j.

Note: This is matrix product, not element-wise product.

Raises:
ValueError: If transpose_a and adjoint_a, or transpose_b and

adjoint_b are both set to True.

d2l.tensorflow.meshgrid(*args, **kwargs)[source]

Broadcasts parameters for evaluation on an N-D grid.

Given N one-dimensional coordinate arrays *args, returns a list outputs of N-D coordinate arrays for evaluating expressions on an N-D grid.

Notes:

meshgrid supports cartesian (‘xy’) and matrix (‘ij’) indexing conventions. When the indexing argument is set to ‘xy’ (the default), the broadcasting instructions for the first two dimensions are swapped.

Examples:

Calling X, Y = meshgrid(x, y) with the tensors

python x = [1, 2, 3] y = [4, 5, 6] X, Y = tf.meshgrid(x, y) # X = [[1, 2, 3], #      [1, 2, 3], #      [1, 2, 3]] # Y = [[4, 4, 4], #      [5, 5, 5], #      [6, 6, 6]] 

Args:

*args: Tensors with rank 1. **kwargs:

• indexing: Either ‘xy’ or ‘ij’ (optional, default: ‘xy’).

• name: A name for the operation (optional).

Returns:

outputs: A list of N Tensors with rank N.

Raises:

TypeError: When no keyword arguments (kwargs) are passed. ValueError: When indexing keyword argument is not one of xy or ij.

d2l.tensorflow.normal(*args, **kwargs)

Outputs random values from a normal distribution.

Example that generates a new set of random values every time:

>>> tf.random.set_seed(5);
>>> tf.random.normal([4], 0, 1, tf.float32)
<tf.Tensor: shape=(4,), dtype=float32, numpy=..., dtype=float32)>


Example that outputs a reproducible result:

>>> tf.random.set_seed(5);
>>> tf.random.normal([2,2], 0, 1, tf.float32, seed=1)
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-1.3768897 , -0.01258316],
[-0.169515   ,  1.0824056 ]], dtype=float32)>


In this case, we are setting both the global and operation-level seed to ensure this result is reproducible. See tf.random.set_seed for more information.

Args:

shape: A 1-D integer Tensor or Python array. The shape of the output tensor. mean: A Tensor or Python value of type dtype, broadcastable with stddev.

The mean of the normal distribution.

stddev: A Tensor or Python value of type dtype, broadcastable with mean.

The standard deviation of the normal distribution.

dtype: The type of the output. seed: A Python integer. Used to create a random seed for the distribution.

See tf.random.set_seed for behavior.

name: A name for the operation (optional).

Returns:

A tensor of the specified shape filled with random normal values.

d2l.tensorflow.ones(*args, **kwargs)[source]

Creates a tensor with all elements set to one (1).

See also tf.ones_like, tf.zeros, tf.fill, tf.eye.

This operation returns a tensor of type dtype with shape shape and all elements set to one.

>>> tf.ones([3, 4], tf.int32)
<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], dtype=int32)>

Args:
shape: A list of integers, a tuple of integers, or

a 1-D Tensor of type int32.

dtype: Optional DType of an element in the resulting Tensor. Default is

tf.float32.

name: Optional string. A name for the operation.

Returns:

A Tensor with all elements set to one (1).

d2l.tensorflow.plot(X, Y=None, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear', yscale='linear', fmts='-', 'm--', 'g-.', 'r:', figsize=3.5, 2.5, axes=None)[source]

Plot data points.

d2l.tensorflow.predict_ch3(net, test_iter, n=6)[source]

Predict labels (defined in Chapter 3).

d2l.tensorflow.predict_ch8(prefix, num_preds, net, vocab, params)[source]

Generate new characters following the prefix.

d2l.tensorflow.preprocess_nmt(text)[source]

Preprocess the English-French dataset.

d2l.tensorflow.rand(*args, **kwargs)

Outputs random values from a uniform distribution.

The generated values follow a uniform distribution in the range [minval, maxval). The lower bound minval is included in the range, while the upper bound maxval is excluded.

For floats, the default range is [0, 1). For ints, at least maxval must be specified explicitly.

In the integer case, the random integers are slightly biased unless maxval - minval is an exact power of two. The bias is small for values of maxval - minval significantly smaller than the range of the output (either 2**32 or 2**64).

Examples:

>>> tf.random.uniform(shape=[2])
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([..., ...], dtype=float32)>
>>> tf.random.uniform(shape=[], minval=-1., maxval=0.)
<tf.Tensor: shape=(), dtype=float32, numpy=-...>
>>> tf.random.uniform(shape=[], minval=5, maxval=10, dtype=tf.int64)
<tf.Tensor: shape=(), dtype=int64, numpy=...>


The seed argument produces a deterministic sequence of tensors across multiple calls. To repeat that sequence, use tf.random.set_seed:

>>> tf.random.set_seed(5)
>>> tf.random.uniform(shape=[], maxval=3, dtype=tf.int32, seed=10)
<tf.Tensor: shape=(), dtype=int32, numpy=2>
>>> tf.random.uniform(shape=[], maxval=3, dtype=tf.int32, seed=10)
<tf.Tensor: shape=(), dtype=int32, numpy=0>
>>> tf.random.set_seed(5)
>>> tf.random.uniform(shape=[], maxval=3, dtype=tf.int32, seed=10)
<tf.Tensor: shape=(), dtype=int32, numpy=2>
>>> tf.random.uniform(shape=[], maxval=3, dtype=tf.int32, seed=10)
<tf.Tensor: shape=(), dtype=int32, numpy=0>


Without tf.random.set_seed but with a seed argument is specified, small changes to function graphs or previously executed operations will change the returned value. See tf.random.set_seed for details.

Args:

shape: A 1-D integer Tensor or Python array. The shape of the output tensor. minval: A Tensor or Python value of type dtype, broadcastable with

shape (for integer types, broadcasting is not supported, so it needs to be a scalar). The lower bound on the range of random values to generate (inclusive). Defaults to 0.

maxval: A Tensor or Python value of type dtype, broadcastable with

shape (for integer types, broadcasting is not supported, so it needs to be a scalar). The upper bound on the range of random values to generate (exclusive). Defaults to 1 if dtype is floating point.

dtype: The type of the output: float16, float32, float64, int32,

or int64.

seed: A Python integer. Used in combination with tf.random.set_seed to

create a reproducible sequence of tensors across multiple calls.

name: A name for the operation (optional).

Returns:

A tensor of the specified shape filled with random uniform values.

Raises:

ValueError: If dtype is integral and maxval is not specified.

d2l.tensorflow.read_data_nmt()[source]

Load the English-French dataset.

d2l.tensorflow.read_time_machine()[source]

Load the time machine dataset into a list of text lines.

d2l.tensorflow.reduce_sum(*args, **kwargs)[source]

Computes the sum of elements across dimensions of a tensor.

Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.

If axis is None, all dimensions are reduced, and a tensor with a single element is returned.

For example:

>>> # x has a shape of (2, 3) (two rows and three columns):
>>> x = tf.constant([[1, 1, 1], [1, 1, 1]])
>>> x.numpy()
array([[1, 1, 1],
[1, 1, 1]], dtype=int32)
>>> # sum all the elements
>>> # 1 + 1 + 1 + 1 + 1+ 1 = 6
>>> tf.reduce_sum(x).numpy()
6
>>> # reduce along the first dimension
>>> # the result is [1, 1, 1] + [1, 1, 1] = [2, 2, 2]
>>> tf.reduce_sum(x, 0).numpy()
array([2, 2, 2], dtype=int32)
>>> # reduce along the second dimension
>>> # the result is [1, 1] + [1, 1] + [1, 1] = [3, 3]
>>> tf.reduce_sum(x, 1).numpy()
array([3, 3], dtype=int32)
>>> # keep the original dimensions
>>> tf.reduce_sum(x, 1, keepdims=True).numpy()
array([[3],
[3]], dtype=int32)
>>> # reduce along both dimensions
>>> # the result is 1 + 1 + 1 + 1 + 1 + 1 = 6
>>> # or, equivalently, reduce along rows, then reduce the resultant array
>>> # [1, 1, 1] + [1, 1, 1] = [2, 2, 2]
>>> # 2 + 2 + 2 = 6
>>> tf.reduce_sum(x, [0, 1]).numpy()
6

Args:

input_tensor: The tensor to reduce. Should have numeric type. axis: The dimensions to reduce. If None (the default), reduces all

dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)].

keepdims: If true, retains reduced dimensions with length 1. name: A name for the operation (optional).

Returns:

The reduced tensor, of the same dtype as the input_tensor.

@compatibility(numpy) Equivalent to np.sum apart the fact that numpy upcast uint8 and int32 to int64 while tensorflow returns the same dtype as the input. @end_compatibility

d2l.tensorflow.reshape(*args, **kwargs)[source]

Reshapes a tensor.

Given tensor, this operation returns a new tf.Tensor that has the same values as tensor in the same order, except with a new shape given by shape.

>>> t1 = [[1, 2, 3],
...       [4, 5, 6]]
>>> print(tf.shape(t1).numpy())
[2 3]
>>> t2 = tf.reshape(t1, [6])
>>> t2
<tf.Tensor: shape=(6,), dtype=int32,
numpy=array([1, 2, 3, 4, 5, 6], dtype=int32)>
>>> tf.reshape(t2, [3, 2])
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4],
[5, 6]], dtype=int32)>


The tf.reshape does not change the order of or the total number of elements in the tensor, and so it can reuse the underlying data buffer. This makes it a fast operation independent of how big of a tensor it is operating on.

>>> tf.reshape([1, 2, 3], [2, 2])
Traceback (most recent call last):
...
InvalidArgumentError: Input to reshape is a tensor with 3 values, but the
requested shape has 4


To instead reorder the data to rearrange the dimensions of a tensor, see tf.transpose.

>>> t = [[1, 2, 3],
...      [4, 5, 6]]
>>> tf.reshape(t, [3, 2]).numpy()
array([[1, 2],
[3, 4],
[5, 6]], dtype=int32)
>>> tf.transpose(t, perm=[1, 0]).numpy()
array([[1, 4],
[2, 5],
[3, 6]], dtype=int32)


If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant. In particular, a shape of [-1] flattens into 1-D. At most one component of shape can be -1.

>>> t = [[1, 2, 3],
...      [4, 5, 6]]
>>> tf.reshape(t, [-1])
<tf.Tensor: shape=(6,), dtype=int32,
numpy=array([1, 2, 3, 4, 5, 6], dtype=int32)>
>>> tf.reshape(t, [3, -1])
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4],
[5, 6]], dtype=int32)>
>>> tf.reshape(t, [-1, 2])
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4],
[5, 6]], dtype=int32)>


tf.reshape(t, []) reshapes a tensor t with one element to a scalar.

>>> tf.reshape([7], []).numpy()
7


More examples:

>>> t = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(tf.shape(t).numpy())
[9]
>>> tf.reshape(t, [3, 3])
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], dtype=int32)>

>>> t = [[[1, 1], [2, 2]],
...      [[3, 3], [4, 4]]]
>>> print(tf.shape(t).numpy())
[2 2 2]
>>> tf.reshape(t, [2, 4])
<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[1, 1, 2, 2],
[3, 3, 4, 4]], dtype=int32)>

>>> t = [[[1, 1, 1],
...       [2, 2, 2]],
...      [[3, 3, 3],
...       [4, 4, 4]],
...      [[5, 5, 5],
...       [6, 6, 6]]]
>>> print(tf.shape(t).numpy())
[3 2 3]
>>> # Pass '[-1]' to flatten 't'.
>>> tf.reshape(t, [-1])
<tf.Tensor: shape=(18,), dtype=int32,
numpy=array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6],
dtype=int32)>
>>> # -- Using -1 to infer the shape --
>>> # Here -1 is inferred to be 9:
>>> tf.reshape(t, [2, -1])
<tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[4, 4, 4, 5, 5, 5, 6, 6, 6]], dtype=int32)>
>>> # -1 is inferred to be 2:
>>> tf.reshape(t, [-1, 9])
<tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
[4, 4, 4, 5, 5, 5, 6, 6, 6]], dtype=int32)>
>>> # -1 is inferred to be 3:
>>> tf.reshape(t, [ 2, -1, 3])
<tf.Tensor: shape=(2, 3, 3), dtype=int32, numpy=
array([[[1, 1, 1],
[2, 2, 2],
[3, 3, 3]],
[[4, 4, 4],
[5, 5, 5],
[6, 6, 6]]], dtype=int32)>

Args:

tensor: A Tensor. shape: A Tensor. Must be one of the following types: int32, int64.

Defines the shape of the output tensor.

name: Optional string. A name for the operation.

Returns:

A Tensor. Has the same type as tensor.

d2l.tensorflow.seq_data_iter_random(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using random sampling.

d2l.tensorflow.seq_data_iter_sequential(corpus, batch_size, num_steps)[source]

Generate a minibatch of subsequences using sequential partitioning.

d2l.tensorflow.set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend)[source]

Set the axes for matplotlib.

d2l.tensorflow.set_figsize(figsize=3.5, 2.5)[source]

Set the figure size for matplotlib.

d2l.tensorflow.sgd(params, grads, lr, batch_size)[source]

Minibatch stochastic gradient descent.

d2l.tensorflow.show_images(imgs, num_rows, num_cols, titles=None, scale=1.5)[source]

Plot a list of images.

d2l.tensorflow.show_trace_2d(f, results)[source]

Show the trace of 2D variables during optimization.

d2l.tensorflow.sin(x, name=None)[source]

Computes sine of x element-wise.

Given an input tensor, this function computes sine of every element in the tensor. Input range is (-inf, inf) and output range is [-1,1].

python x = tf.constant([-float("inf"), -9, -0.5, 1, 1.2, 200, 10, float("inf")]) tf.math.sin(x) ==> [nan -0.4121185 -0.47942555 0.84147096 0.9320391 -0.87329733 -0.54402107 nan] 

Args:

x: A Tensor. Must be one of the following types: bfloat16, half, float32, float64, complex64, complex128. name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

d2l.tensorflow.sinh(x, name=None)[source]

Computes hyperbolic sine of x element-wise.

Given an input tensor, this function computes hyperbolic sine of every element in the tensor. Input range is [-inf,inf] and output range is [-inf,inf].

python x = tf.constant([-float("inf"), -9, -0.5, 1, 1.2, 2, 10, float("inf")]) tf.math.sinh(x) ==> [-inf -4.0515420e+03 -5.2109528e-01 1.1752012e+00 1.5094614e+00 3.6268604e+00 1.1013232e+04 inf] 

Args:

x: A Tensor. Must be one of the following types: bfloat16, half, float32, float64, complex64, complex128. name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

d2l.tensorflow.squared_loss(y_hat, y)[source]

Squared loss.

d2l.tensorflow.stack(*args, **kwargs)[source]

Stacks a list of rank-R tensors into one rank-(R+1) tensor.

See also tf.concat, tf.tile, tf.repeat.

Packs the list of tensors in values into a tensor with rank one higher than each tensor in values, by packing them along the axis dimension. Given a list of length N of tensors of shape (A, B, C);

if axis == 0 then the output tensor will have the shape (N, A, B, C). if axis == 1 then the output tensor will have the shape (A, N, B, C). Etc.

For example:

>>> x = tf.constant([1, 4])
>>> y = tf.constant([2, 5])
>>> z = tf.constant([3, 6])
>>> tf.stack([x, y, z])
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[1, 4],
[2, 5],
[3, 6]], dtype=int32)>
>>> tf.stack([x, y, z], axis=1)
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)>


This is the opposite of unstack. The numpy equivalent is np.stack

>>> np.array_equal(np.stack([x, y, z]), tf.stack([x, y, z]))
True

Args:

values: A list of Tensor objects with the same shape and type. axis: An int. The axis to stack along. Defaults to the first dimension.

Negative values wrap around, so the valid range is [-(R+1), R+1).

name: A name for this operation (optional).

Returns:

output: A stacked Tensor with the same type as values.

Raises:

ValueError: If axis is out of the range [-(R+1), R+1).

d2l.tensorflow.synthetic_data(w, b, num_examples)[source]

Generate y = Xw + b + noise.

d2l.tensorflow.tanh(x, name=None)[source]

Computes hyperbolic tangent of x element-wise.

Given an input tensor, this function computes hyperbolic tangent of every element in the tensor. Input range is [-inf, inf] and output range is [-1,1].

python x = tf.constant([-float("inf"), -5, -0.5, 1, 1.2, 2, 3, float("inf")]) tf.math.tanh(x) ==> [-1. -0.99990916 -0.46211717 0.7615942 0.8336547 0.9640276 0.9950547 1.] 

Args:

x: A Tensor. Must be one of the following types: bfloat16, half, float32, float64, complex64, complex128. name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as x.

If x is a SparseTensor, returns SparseTensor(x.indices, tf.math.tanh(x.values, …), x.dense_shape)

d2l.tensorflow.tensor(value, dtype=None, shape=None, name='Const')

Creates a constant tensor from a tensor-like object.

Note: All eager tf.Tensor values are immutable (in contrast to tf.Variable). There is nothing especially _constant_ about the value returned from tf.constant. This function it is not fundamentally different from tf.convert_to_tensor. The name tf.constant comes from the symbolic APIs (like tf.data or keras functional models) where the value is embeded in a Const node in the tf.Graph. tf.constant is useful for asserting that the value can be embedded that way.

If the argument dtype is not specified, then the type is inferred from the type of value.

>>> # Constant 1-D Tensor from a python list.
>>> tf.constant([1, 2, 3, 4, 5, 6])
<tf.Tensor: shape=(6,), dtype=int32,
numpy=array([1, 2, 3, 4, 5, 6], dtype=int32)>
>>> # Or a numpy array
>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>> tf.constant(a)
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[1, 2, 3],
[4, 5, 6]])>


If dtype is specified the resulting tensor values are cast to the requested dtype.

>>> tf.constant([1, 2, 3, 4, 5, 6], dtype=tf.float64)
<tf.Tensor: shape=(6,), dtype=float64,
numpy=array([1., 2., 3., 4., 5., 6.])>


If shape is set, the value is reshaped to match. Scalars are expanded to fill the shape:

>>> tf.constant(0, shape=(2, 3))
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[0, 0, 0],
[0, 0, 0]], dtype=int32)>
>>> tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 2, 3],
[4, 5, 6]], dtype=int32)>


tf.constant has no effect if an eager Tensor is passed as the value, it even transmits gradients:

>>> v = tf.Variable([0.0])
>>> with tf.GradientTape() as g:
...     loss = tf.constant(v + v)
>>> g.gradient(loss, v).numpy()
array([2.], dtype=float32)


But, since tf.constant embeds the value in the tf.Graph this fails for symbolic tensors:

>>> i = tf.keras.layers.Input(shape=[None, None])
>>> t = tf.constant(i)
Traceback (most recent call last):
...
NotImplementedError: ...


tf.constant will _always_ create CPU (host) tensors. In order to create tensors on other devices, use tf.identity. (If the value is an eager Tensor, however, the tensor will be returned unmodified as mentioned above.)

Related Ops:

• tf.convert_to_tensor is similar but: * It has no shape argument. * Symbolic tensors are allowed to pass through.

>>> i = tf.keras.layers.Input(shape=[None, None])
>>> t = tf.convert_to_tensor(i)

• tf.fill: differs in a few ways: * tf.constant supports arbitrary constants, not just uniform scalar

Tensors like tf.fill.

• tf.fill creates an Op in the graph that is expanded at runtime, so it can efficiently represent large tensors.

• Since tf.fill does not embed the value, it can produce dynamically sized outputs.

Args:

value: A constant value (or list) of output type dtype. dtype: The type of the elements of the resulting tensor. shape: Optional dimensions of resulting tensor. name: Optional name for the tensor.

Returns:

A Constant Tensor.

Raises:

TypeError: if shape is incorrectly specified or unsupported. ValueError: if called on a symbolic tensor.

d2l.tensorflow.tokenize(lines, token='word')[source]

Split text lines into word or character tokens.

d2l.tensorflow.tokenize_nmt(text, num_examples=None)[source]

Tokenize the English-French dataset.

d2l.tensorflow.train_2d(trainer, steps=20)[source]

Optimize a 2-dim objective function with a customized trainer.

d2l.tensorflow.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)[source]

Train a model (defined in Chapter 3).

d2l.tensorflow.train_ch6(net_fn, train_iter, test_iter, num_epochs, lr, device=<tensorflow.python.eager.context._EagerDeviceContext object>)[source]

Train a model with a GPU (defined in Chapter 6).

d2l.tensorflow.train_ch8(net, train_iter, vocab, num_hiddens, lr, num_epochs, strategy, use_random_iter=False)[source]

Train a model (defined in Chapter 8).

d2l.tensorflow.train_epoch_ch3(net, train_iter, loss, updater)[source]

The training loop defined in Chapter 3.

d2l.tensorflow.train_epoch_ch8(net, train_iter, loss, updater, params, use_random_iter)[source]

Train a model within one epoch (defined in Chapter 8).

d2l.tensorflow.transpose(*args, **kwargs)[source]

Transposes a, where a is a Tensor.

Permutes the dimensions according to the value of perm.

The returned tensor’s dimension i will correspond to the input dimension perm[i]. If perm is not given, it is set to (n-1…0), where n is the rank of the input tensor. Hence by default, this operation performs a regular matrix transpose on 2-D input Tensors.

If conjugate is True and a.dtype is either complex64 or complex128 then the values of a are conjugated and transposed.

@compatibility(numpy) In numpy transposes are memory-efficient constant time operations as they simply return a new view of the same data with adjusted strides.

TensorFlow does not support strides, so transpose returns a new tensor with the items permuted. @end_compatibility

For example:

>>> x = tf.constant([[1, 2, 3], [4, 5, 6]])
>>> tf.transpose(x)
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[1, 4],
[2, 5],
[3, 6]], dtype=int32)>


Equivalently, you could call tf.transpose(x, perm=[1, 0]).

If x is complex, setting conjugate=True gives the conjugate transpose:

>>> x = tf.constant([[1 + 1j, 2 + 2j, 3 + 3j],
...                  [4 + 4j, 5 + 5j, 6 + 6j]])
>>> tf.transpose(x, conjugate=True)
<tf.Tensor: shape=(3, 2), dtype=complex128, numpy=
array([[1.-1.j, 4.-4.j],
[2.-2.j, 5.-5.j],
[3.-3.j, 6.-6.j]])>


‘perm’ is more useful for n-dimensional tensors where n > 2:

>>> x = tf.constant([[[ 1,  2,  3],
...                   [ 4,  5,  6]],
...                  [[ 7,  8,  9],
...                   [10, 11, 12]]])


As above, simply calling tf.transpose will default to perm=[2,1,0].

To take the transpose of the matrices in dimension-0 (such as when you are transposing matrices where 0 is the batch dimesnion), you would set perm=[0,2,1].

>>> tf.transpose(x, perm=[0, 2, 1])
<tf.Tensor: shape=(2, 3, 2), dtype=int32, numpy=
array([[[ 1,  4],
[ 2,  5],
[ 3,  6]],
[[ 7, 10],
[ 8, 11],
[ 9, 12]]], dtype=int32)>


Note: This has a shorthand linalg.matrix_transpose):

Args:

a: A Tensor. perm: A permutation of the dimensions of a. This should be a vector. conjugate: Optional bool. Setting it to True is mathematically equivalent

to tf.math.conj(tf.transpose(input)).

name: A name for the operation (optional).

Returns:

A transposed Tensor.

d2l.tensorflow.truncate_pad(line, num_steps, padding_token)[source]

Truncate or pad sequences.

d2l.tensorflow.try_all_gpus()[source]

Return all available GPUs, or [cpu(),] if no GPU exists.

d2l.tensorflow.try_gpu(i=0)[source]

Return gpu(i) if exists, otherwise return cpu().

d2l.tensorflow.use_svg_display()[source]

Use the svg format to display a plot in Jupyter.

d2l.tensorflow.zeros(*args, **kwargs)[source]

Creates a tensor with all elements set to zero.

See also tf.zeros_like, tf.ones, tf.fill, tf.eye.

This operation returns a tensor of type dtype with shape shape and all elements set to zero.

>>> tf.zeros([3, 4], tf.int32)
<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int32)>
`
Args:
shape: A list of integers, a tuple of integers, or

a 1-D Tensor of type int32.

dtype: The DType of an element in the resulting Tensor. name: Optional string. A name for the operation.

Returns:

A Tensor with all elements set to zero.