Image and Video Processing:

Scene Statistics and Visual Neuroscience at work!

Controversies: don't use (1) the Mean Squared Error, nor (2) the image of Lena Söderberg. Here we only talk about the MSE issue.

MPEG-like video compression is based on predictive coding: i.e. motion compensation and residual quantization. Motion compensation consist of optical flow estimation (M module) and motion-based prediction (P module).Residuals (prediction errors) are analyzed in a different domain (T module) and quantized therein (Q module). The different modules in this framework will continue to benefit from image representations coming from Vision Science and prediction and regression based on Statistical Learning.

Efficient coding of visual information and efficient inference of missing information in images depend on two factors: (1) the statistical structure of photographic images, and (2) the nature of the observer that will analyze the result. Interestingly, these two factors (image regularities and human vision) are deeply related since the evolution of biological sensors seems to be guided by statistical learning (see our work on the Efficient Coding Hypothesis in Visual Neuroscience). However, the simultaneous consideration of these two factors is unusual in the image processing community, particularly beyond Gaussian image models and linear models of the observer.
Our work in image and video processing has been parallel to our investigation in describing the non-Gaussian nature of visual scenes and the nonlinear behavior of visual cortex. This parallel approach is sensible since these are two sides of the same issue in vision (the Efficient Coding Hypothesis again!). Specifically, the core algorithm used in many applications has been the Divisive Normalization, a canonical computation in sensory neurons with interesting statistical effects (see [Neur.Comp.10]). We have used this perceptual (and also statistical) model to propose novel solutions in bit allocation, to identify perceptually relevant motion, to smooth image representations and to compute distances between images.

Image Coding:

Image statistics, feature extraction and transform coding.
Perceptually optimal bit allocation and quantization.
Texture perception models in image coding: JPEG and beyond.

Motion Estimation:

Perceptually relevant optical flow.
Motion-based video segmentation.

Video Coding:

Motion compensation and transform coding: the MPEG framework
Perceptual relevance of prediction and quantization.

Image Restoration:

Denoising and deblurring using Perceptual Regularization.
Denoising through regression in the wavelet domain.

Image and Video Distortion Metrics:

Metrics induced by vision models.
Metrics induced by non-Gaussian scene statistics.

Color Constancy and White Balance:

Color constancy through chromatic adaptation models.
Automatic White Balance through statistical adaptation models.
Color constancy (reflectance estimation) using spatio-spectral information.

Classification and Understanding

Nonlinear feature extraction for image representation.
Image classification from nonlinear features.
Knowledge retrieval (regression, estimation of physical parameters) from nonlinear features.

Decoded sequences for various settings of Motion Estimation (M module) and Residual Quantization (T+Q modules). See for example Electr.Lett.00a, or IEEE TIP01.

Image and Video Processing

Low level Image Processing (coding, restoration, synthesis, white balance, color and texture edition, etc...) is all about image statistics in a domain where the metric is non-Euclidean (i.e. induced by the data or the observer).

We proposed original image processing techniques using both perception models and image statistics including (i) improvements of JPEG standard for image coding through nonlinear texture vision models [Electr.Lett.95, Electr.Lett.99, IEEE TNN05, IEEE TIP06a, JMLR08, RPSP12, Patent08], (ii) improvements of MPEG standard for video coding with new perceptual quantization scheme and new motion estimation focused on perceptually relevant optical flow [LNCS97, Electr.Lett.98, Electr.Lett.00a, Electr.Lett.00b, IEEE TIP01, Redund.Reduct.99], (iii) new image restoration techniques based on nonlinear contrast perception models and the image statistics in local frequency domains [IEEE TIP 06b, JMLR10]; (iv) new approaches to color constancy either based on relative chromatic descriptors [Vis.Res.97, J.Opt.96], statistically-based chromatic adaptation models [Neur.Comp.12, PLoS-ONE14], or Bayesian estimation of surface reflectance [IEEE-TGRS14]; (v) new subjective image and video distortion measures using nonlinear perception models [Im.Vis.Comp.97, Disp.99, IEEE ICIP02, JOSA10, Proc.SPIE15]; and (vi) image classification and knowledge extraction (or regression) based on our feature extraction techniques [IEEE-TNN11, IEEE-TGRS13,Int.J.Neur.Syst.14, IEEE-JSTSP15]. See CODE for image and video processing applications here.

Image Coding: using nonlinear perceptual image representations is critical to improve JPEG (see the gain in visual quality at 1 bit/pix). Measuring subjective distortion (the numbers at the bottom) is another vision-related problem we addressed (see below).

Video Coding: improved bit allocation according to nonlinear perception model (right vs left) is critical to improve MPEG video coding with regard to improved optical flow computation (bottom vs top).

Image Restoration: regularization functionals based on nonlinear perception models and signal smoothing according to image statistics in the wavelet domain help in image restoration.

Color Constancy and White Balance: these adaptation problems reduce to manifold matching in different illumination conditions. We proposed linear and nonlinear solutions to this geometric problem.

Subjective image/video distortion metrics: Observer's opinion (ground truth in the vertical axis) is better correlated with our Euclidean distance in nonlinear perceptual domains (right) than with the widely used Structural Similarity Index (left).

Image classification: Classifiers based on flexible features adapted to the data (such as RBIG, SPCA, PPA, DRR) are robust to changes in acquisition conditions (adaptivity implies no retraining is needed).