This plot graphically shows how the interpolation process responds to spatial frequencies in the original data. The horizontal axes show cycles per sampling interval and the vertical axis gives the relative strength of how those frequency components will be passed on to the interpolated signal. The ideal frequency response is a box of height 1.0 which spans the interval between -1/2 and +1/2 cycles per sample. Note that the frequency response plots have a logarithmic scale on the vertical axis.
Errors in determining position on the Earth.
Square root of the sum of the squares.
The gradient is the rate of change (i.e. magnitude of first derivative) at a point. Here it specifically refers to the rate at which the brightness temperatures change with respect to distance. For the swath data the gradient at a specific sample is approximated by taking the geometric mean of the Tb difference divided by distance for the adjacent samples along scan and across scan. For the gridded data it's the same except the adjacent column and row are used.
Not signal. Aliasing is a specific kind of noise where the energy from frequencies greater than the Nyquist critical frequency is spuriously reflected back into the sampled signal.
It takes two pixels to describe a cycle, so the critical frequency, Fc = 1/2 cycle per pixel. A digital signal cannot represent frequencies greater than Fc.
Smoothing eliminates high frequencies in the signal. If these high frequencies are noise, then this is a good thing. If they are signal, then smoothing is (probably) a bad thing. Averaging is a good example of smoothing. If the original data contains random errors (or, in a digital signal, aliasing) then averaging will eliminate the errors. If on the other hand the fluctuations in the sampled data are due to actual fluctuations in the measured signal, averaging will cause you to lose this information.
We usually think of frequency as the rate of oscillation per time. Spatial frequency is the rate of oscillation per distance. Here's an example of an image that varies at the rate of 1 cycle per 100 samples (or .01 cycles per sample) in x and 1 cycle per 50 samples (or .02 cycles per sample) in y:

The plot of the transfer function graphically shows the distribution of weights used to interpolate the original data. The point to be interpolated is at the center of the horizontal axes. The vertical scale gives the relative weight to be given to each original data point according to it's (horizontal) spatial relationship to the interpolation point.
Gonzalez and Wintz, Digital Image Processing.
Teukolsky, et. al., Numerical Recipes.