Quick Overview of Neural Architectures

admin

Quick Overview of Neural Architectures

I know it has been done a thousand times before but a Deep Learning forum couldn't potentially do without such a topic At the same time, it will serve as an explanation of what the sub-categories mean.

Convolutional Neural Networks (CNNs)

As the name suggests, all networks which employ the operation of convolution belong to this category (unless a more specific one is available, e.g. Graph Neural Networks). In a first approximation, one can say that a characteristic trait of CNNs is that sets of convolution kernels are applied indiscriminately to each element of input/intermediate layers. These sets usually vary between the layers, aiming to detect more abstract regularities in the data as the depth increases. The three most typical parameters of a CNN layer are the kernel size, dilation and stride. The latter two go beyond the approximation above and spread out kernels in such a way that some elements might be skipped in individual convolutions. CNNs are best known from and used profusely in image classification, object detection and image segmentation. Examples of famous CNNs include ResNet, VGG and Alexnet.

Transformers (NLP)

This and the next category might be a bit tricky. Transformers are essentially based on the usage of attention to selectively weight different portions of information and produce new representations in consecutive layers. They are most renowned for their use in Natural Langauge Processing (NLP) with the flagship for a number of years being BERT. The input in this use are encoded representations of words, word parts and special tokens - the so called embeddings. They are then transformed across consecutive layers and - based on their aggregation or a selected token - a classification/regression-type tasks can be performed. Alternatively, the output representations can be fed to a decoder which acts in the opposite direction and for example can output a translation in different language than the original. In the Transformers category I would like to stick to this type of language models, whereas the Attention-based category is dedicated to other uses of the attention mechanism.

Attention-based

TODO

Recurrent Neural Networks

TODO

Autoencoders

TODO

Bayesian Networks

TODO

Generative Adversarial Networks

TODO

Graph Neural Networks

TODO