No one can explain it better than OpenAI
There are few important aspects:
The generator is simple too. IN: random-vector(noise) Out: Image
pix2pix : Let's take an example coloring greyscale image.
Discriminator: IN: pair of images (grey+color) OUT: Real(match) or Fake(no-match). The real will be grey+color of same image. The fake will be grey + generator(fake)->synthetic-color.
Generator: In: Image Out:Image.
* They also added that generator need to have L1 similiarity to the output image pair (with some small- lambda size. The main one is to fool the discriminator).
pixel level domain transfer: Let's take an example of a man wearing a sweather and the sweather alone.
Generator: IN: image of fashion-model Out: image of sweather
Real/Fake Discriminator: IN: sweather image OUT: real/fake
Domain Discriminator IN: two images, sweather and the fashion-model. OUT: match/not
pix2pix uses U-Net based generators (Encoder-Decoder but with skip-connection), originally used for segmentation regular CNN, which is great here.
Discrimintors are path-based.
Generative adversarial networks have been vigorously explored in the last two years, and many conditional variants have been proposed. Please see the discussion of related work in our paper. Below we point out two papers that especially influenced this work: the original GAN paper from Goodfellow et al., and the DCGAN framework, from which our code is derived.
2014: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks NIPS, 2014. [PDF]
Code (Theano)
2015: DCGAN Alec Radford
Paper: Alec Radford, Luke Metz, Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ICLR, 2016. [PDF]
Code:
theano.
torch(Soumith)
keras 170 lines
From Keras: in each batch:
1. generator" Input: noise Output: Image : predicting a batch (on first epoc total random)
2. discriminator : Input: Image Output: boolean (real/fake). Trained on X= batch_size real(mnist) + batch_size generated from last stage. Y is [1..1,0...0]
3. discriminator_on_generator : sequential of generator then discriminator(trainable=False). Input: noise, output: True/False. X=new random noise Y=[1...1] . During training we try to get to 1, as the discriminator is not trainable, it can't change to always 1, so the generator must improve.
2016: pix2pix (Applied) based of DCGAN
Article: Image-to-image translation using conditional adversarial nets (Including many different image type (night/day. color/greyscale. earth/road map.)
Code: original torch tensorflow
2016: Improved Techniques for Training GANs (goodfellow).
Code: tensor-flow (original)
2016: model-based domain-transfer (Applied based of DCGAN)
Code: Torch(original)
There are few important aspects:
Loss Function (or discriminator)
Classic DCGAN have one discriminator: IN: Image Out: Real/Fake.The generator is simple too. IN: random-vector(noise) Out: Image
pix2pix : Let's take an example coloring greyscale image.
Discriminator: IN: pair of images (grey+color) OUT: Real(match) or Fake(no-match). The real will be grey+color of same image. The fake will be grey + generator(fake)->synthetic-color.
Generator: In: Image Out:Image.
* They also added that generator need to have L1 similiarity to the output image pair (with some small- lambda size. The main one is to fool the discriminator).
pixel level domain transfer: Let's take an example of a man wearing a sweather and the sweather alone.
Generator: IN: image of fashion-model Out: image of sweather
Real/Fake Discriminator: IN: sweather image OUT: real/fake
Domain Discriminator IN: two images, sweather and the fashion-model. OUT: match/not
Network Architectures
As with all CNNs, the network size, depth and structure is important for the quality of the output.pix2pix uses U-Net based generators (Encoder-Decoder but with skip-connection), originally used for segmentation regular CNN, which is great here.
Discrimintors are path-based.
Original articles and code links
Generative adversarial networks have been vigorously explored in the last two years, and many conditional variants have been proposed. Please see the discussion of related work in our paper. Below we point out two papers that especially influenced this work: the original GAN paper from Goodfellow et al., and the DCGAN framework, from which our code is derived.
2014: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks NIPS, 2014. [PDF]
Code (Theano)
2015: DCGAN Alec Radford
Paper: Alec Radford, Luke Metz, Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks ICLR, 2016. [PDF]
Code:
theano.
torch(Soumith)
keras 170 lines
From Keras: in each batch:
1. generator" Input: noise Output: Image : predicting a batch (on first epoc total random)
2. discriminator : Input: Image Output: boolean (real/fake). Trained on X= batch_size real(mnist) + batch_size generated from last stage. Y is [1..1,0...0]
3. discriminator_on_generator : sequential of generator then discriminator(trainable=False). Input: noise, output: True/False. X=new random noise Y=[1...1] . During training we try to get to 1, as the discriminator is not trainable, it can't change to always 1, so the generator must improve.
2016: pix2pix (Applied) based of DCGAN
Article: Image-to-image translation using conditional adversarial nets (Including many different image type (night/day. color/greyscale. earth/road map.)
Code: original torch tensorflow
2016: Improved Techniques for Training GANs (goodfellow).
Code: tensor-flow (original)
2016: model-based domain-transfer (Applied based of DCGAN)
Code: Torch(original)
No comments:
Post a Comment