Discussion

VDSR Successful Use Cases:

Context: For larger scale factors, the information contained in small patches is not sufficient for detailed recovery making it an ill-posed problem. This is overcome in VDSR since it detects changes over a wider area but with less precise perception.[19]
Convergence: Since Low Resolution(LR) and High-Resolution(HR) images share a lot of common details, it is advantageous to utilize residual-based training as in VDSR. Residual learning and gradient clipping make its initial learning rate 10^4 times faster than SRCNN.[20]
Scale: Depending on the needs, one might want to enlarge a specific region or resize to certain dimensions which makes training and storing of all such scenarios impractical. This model is effective for multi-scale factor super-resolution.[19]
Better reconstruction of edges and textures in space objects than SRCNN, and FSRCNN.[22]

VDSR Possible Failures:

VDSR employs a pixel-wise l2 loss in training and does not provide an upscaling factor greater than a factor of 8.
VDSR is applied thrice to an LR face by upscaling with a scaling factor of 2 and it fails to generate authentic face details and the results appear blurry.
Noise robustness: In certain applications like the reconstruction of space objects which can have some kind of noise, like the Gaussian noise, or the salt and pepper noise or the Poisson noise, it is essential for the model to be robust. Based on the results generated by Haopeng Zhang et al., in “ A Comparable Study of CNN-Based Single Image Super-Resolution for Space-Based Imaging Sensors”[22], the results[23] and figure[24] show that VDSR does not produce good results. This is expected because deep-learning-based reconstruction methods involve training with noise-free data and inclusion of noise has a significant impact
Having deeper neural networks with 30 convolutional layers does not have a better performance than the performance of VDSR with 20 layers.[25] It only reduces the initial learning rate and not the PSNR/SSIM values [26]. This paper by Vadim Romanuke also states that having 20 layers is not optimal and truncating the number of layers while also manipulating the learning rate drop factor can lead to better results with reduced memory requirement and faster speed for medium-sized images.

Possible implementation changes

Architecture optimality is one possible optimization in a deep neural network structure. VDSR style architecture requires a bicubic interpolated image as the input, that leads to heavier computation time and memory compared to the architectures with a scale-specific upsampling method. Instead of using a bicubic interpolated image as input, training upsampling modules at the very end of the network is also possible
VDSR can handle multiple scales of SISR, but scale specific weaknesses may be considered if a scale specific upsampling method is used instead of bicubic interpolation.
resNet style architectures may be used but since they are made for image classification tasks their usage in these cases may be suboptimal.
The single multi-scale model may be better than multiple single scale models.
Though good results are obtained in general using deep networks, it does not work optimally when there is some sort of degradation and fewer data available related to that field, for example, medical imaging. In such cases, information related to the sensors used, imaged object/scene, and acquisition conditions can be used for designing useful priors for obtaining better resolution. [27]

Recent Posts