
The Image GPT, developed by OpenAI's research team, is designed to learn image generation Framework. The GPT-2, an automatic natural language sentence generation model presented by the same team last year, is applied to images. OpenAI is a US-based AI research company (non-profit) co-chaired by Elon Musk.
The leftmost is the half-size input image, the rightmost is the original image, and the center 4 columns are the output image using Image GPT.
GPT-2 is a model that allows you to type in short sentences and automatically generates plausible long sentences for you. The accuracy of the model has been a topic of discussion for some. The model was trained on a dataset of 8 million web pages, with 48 layers of 1.5 billion parameters. The network consists of. The algorithms are Recurrent Neural Network (RNN) and CNN (CNN). Convolutional Neural Network) without recursion or convolution. This model uses a Transformer that uses only Attention.
When we tried this model with images, we were able to generate a plausible sample image with image completion We found that we can do this. By using half of the image as input and automatically generating the rest of the image, we were able to create a consistent Generate a sample image.
The network has 76 million parameters of the "iGPT-S", 455 million iGPT-M" with parameters, and "iGPT-L" with 1.4 billion parameters. Trained the transformer, including each, on ImageNet, a large image dataset The iGPT-XL, which has 6.8 billion parameters, is also used to create images from ImageNet and the Web. We train on a dataset that combines the following. After resizing the pixel series from the input image to low resolution as a pre-processing, and converting it to a single-column sequence, the pixel series is Learning to put it in the Transformer.
We quantitatively evaluated the performance of the trained models on other image datasets and found that the ResNet The results were better than those of supervised and unsupervised image learning models, such as the NLP and SimCLR. However, there are still some issues, such as being limited to low resolution.
