Posted 2021-01-17Algorithm / Deep Learning7 minutes read (About 1088 words)

The Design Principle of PyTorch Architecture

The article introduces the detailed principles that drive the implementation of PyTorch and how these principles are reflected in the PyTorch architecture.

Design principles

PyTorch’s success stems from weaving previous ideas into a design that balances speed and ease of use. There are four main principles behind our choices:

Be Pythonic: Data scientists are familiar with the Python language, its programming model, and it stools.

Put researchers first: PyTorch strives to make writing models, data loaders, and optimizers as easy and productive as possible.

Provide pragmatic performance: To be useful, PyTorch needs to deliver compelling performance,although not at the expense of simplicity and ease of use.

Worse is better: Given a fixed amount of engineering resources, and all else being equal, the time saved by keeping the internal implementation of PyTorch simple can be used to implement additional features, adapt to new situations, and keep up with the fast pace of progress in the field of AI. Therefore it is better to have a simple but slightly incomplete solution than a comprehensive but complex and hard to maintain design.

Usability Centric Design

Deep learning models are just Python programs

The neural networks themselves evolved rapidly from simple sequences of feed forward layers into incredibly varied numerical programs often composed of many loops and recursive functions. To support this growing complexity, PyTorch foregos the potential benefits of a graph-meta programming based approach to preserve the imperative programming model of Python.

PyTorch extends this to all aspects of deep learning workflows. Defining layers, composing models, loading data, running optimizers, and parallelizing the training process are all expressed using the familiar concepts developed for general purpose programming. This solution ensures that any new potential neural network architecture can be easily implemented with PyTorch.

Interoperability and extensibility

Easy and efficient interoperability is one of the top priorities for PyTorch because it opens the possibility to leverage the rich ecosystem of Python libraries as part of user programs. Hence, PyTorch allows for bidirectional exchange of data with external libraries.

For example, it provides a mechanism to convert between NumPy arrays and PyTorch tensors using the torch.from_numpy() function and .numpy() tensor method.
Similar functionality is also available to exchange data stored using the DLPack format.

Automatic differentiation

In its current implementation, PyTorch performs reverse-mode automatic differentiation, which computes the gradient of a scalar output with respect to a multivariate input. Differentiating functions with more outputs than inputs is more efficiently executed using forward-mode automatic differentiation, but this use case is less common for machine learning applications.

Common Code Snippets

Configuration

# PyTorch version
torch.__version__                           # PyTorch version
conda update pytorch torchvision -c pytorch # Update PyTorch

# CUDA
torch.version.cuda                          # Check corresponding CUDA Version
torch.cuda.is_available()                   # Check whether there is CUDA support
torch.cuda.empty_cache()                    # Manually release GPU storage after stops running
CUDA_VISIBLE_DEVICES=0,1 python train.py    # Run on a specific GPU: Command Line
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'  # Run on a specific GPU: Code

# cuDNN
torch.backends.cudnn.version()              # Corresponding cuDNN version
torch.backends.cudnn.benchmark = True       # Increase the speed, but calculation are slightly different due to the randomness.
torch.backends.cudnn.deterministic = True   # Avoid this randomness/fluctuation of results

# GPU type
torch.cuda.get_device_name(0)               # GPU type

# Fixed random seed
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)

Tensor

# Tensor Information
tensor.type()                                       # Data type
tensor.size()                                       # Shape of the tensor. It is a subclass of Python tuple
tensor.dim()                                        # Number of dimensions.

# Type convertions (Float in PyTorch is much faster than double.)
torch.set_default_tensor_type(torch.FloatTensor)    # Set default tensor type.
tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()

# Torch.Tensor <==> NumPy.ndarray
ndarray = tensor.cpu().numpy()                      # torch.Tensor -> np.ndarray.
tensor = torch.from_numpy(ndarray).float()          # np.ndarray -> torch.Tensor.
# If ndarray has negative stride:
# This means that your numpy array has undergone such operation: image = image[..., ::-1]
# User Case: 
# If you don’t want to flip the image, if for example you have already trained a network with un-flipped images, then you can save and load the image before passing it for inference.
# Solution:
tensor = torch.from_numpy(ndarray.copy()).float()

# Reshape
tensor = torch.reshape(tensor, shape)
# Shuffle the first dimension
tensor = tensor[torch.randperm(tensor.size(0))]
# tensor [::-1]: Assume tensor has shape N*D*H*W.
tensor = tensor[:, :, :, torch.arange(tensor.size(3) - 1, -1, -1).long()]

# Replication Operation       |  New/Shared memory | Still in computation graph |
tensor.clone()              # |        New         |          Yes               |
tensor.detach()             # |      Shared        |          No                |
tensor.detach.clone()()     # |        New         |          No                |

# Splicing
tensor = torch.cat(list_of_tensors, dim=0)          # Stitching along the given dimension: 3 * 10×5 -> 30×5
tensor = torch.stack(list_of_tensors, dim=0)        # Add one more dimension: 3 * 10×5 -> 3×10×5

# One-hot Code
N = tensor.size(0)
one_hot = torch.zeros(N, num_classes).long()
one_hot.scatter_(dim=1, index=torch.unsqueeze(tensor, dim=1), src=torch.ones(N, num_classes).long())

# Zero Elements
torch.nonzero(tensor)                               # Index of non-zero elements
torch.nonzero(tensor == 0)                          # Index of zero elements
torch.nonzero(tensor).size(0)                       # Number of non-zero elements
torch.nonzero(tensor == 0).size(0)                  # Number of zero elements

# Equal Judgement
torch.allclose(tensor1, tensor2)                    # float tensor
torch.equal(tensor1, tensor2)                       # int tensor

# Expand
torch.reshape(tensor, (64, 512, 1, 1)).expand(64, 512, 7, 7) # Expand tensor of shape 64*512 to shape 64*512*7*7.

# Matrix Multiplication
result = torch.mm(tensor1, tensor2)                 # (m*n) * (n*p) -> (m*p)
result = torch.bmm(tensor1, tensor2)                # Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p)
result = tensor1 * tensor2                          # Element-wise multiplication

# Euclidean Distance
dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2)) # X1: m*d, X2: n*d.