Spline Fit Inspector


Spline Fit Inspector

This inspector controls the spline fit function in Plot.

Spline fit based on spline2 written by Barend J. Thijsse. He allows to integrate his excellent program into Plot and he has also written the documentation below.

For complete reference informations look in the sources section.

The spline fit function marvelously enhances Plot and I have to thank Barend for this contribution.

Introduction

Purpose: The purpose of spline fit is to separate the data into signal (underlying trend) and noise, by letting the fitted spline represent the underlying trend and the residuals of the fit represent the noise. The noise in the data can have any origin: measurement errors, statistical sampling, or even tiny roundoff errors.

Freestyle: The main difference with curve fitting is that with spline fitting you don’t have to select a particular mathematical functional form to fit. Spline fit automatically determines a function that is flexible enough to represent the underlying trend, yet smooth enough not to fit the noise.

Method: Not knowing about the mathematical functional form of the underlying trend, spline fit has to base its judgement completely on the noise. It does so by constructing a great number of trial functions, fitting them in a least-squares sense to the data, and applying specialized statistical tests to the residuals of the fit. From all trial functions that pass these tests, spline fit finally selects the simplest one, i.e. the spline with the fewest number of knots. (click here for a detailed description).

User choices: Spline fit is good but not perfect. Since spline fit has so little to go on, there is no guarantee that the result is always correct. Although experience has shown that the default settings lead to good results in the majority of cases, the user will sometimes want to try other settings.

Options

Autocorrelation in data: Data sometimes contain autocorrelation, implying that the errors in neighboring data points are correlated. This happens, for example, if some sort of averaging or filtering process has operated after the noise originated. The presence of autocorrelation in data very often goes unnoticed to the eye, and it may seriously distract spline fit if not properly handled. There are three ways to deal with possible autocorrelation:

Ignore: Often the best way to start. Spline fit assumes that the error in each Y value is independent of the error in the next Y value. If the fitted spline looks too wiggly, choose another option.
Detect: Spline fit compares the residual of each Y value with the residuals in a range of subsequent data points and calculates from that the mean autocorrelation function. The mean autocorrelation function is statistically tested against the assumed theoretical autocorrelation function selected in the next section. The best match yields a value for the autocorrelation length, and the corresponding spline fit is shown.
Manual: Same as Detect, but the autocorrelation length is set by the user as Factor, expressing the ratio between the autocorrelation length L and the average data spacing ΔXav. The best spline fit under the condition of fixed autocorrelation length is shown. Use this option if the autocorrelation length is known, or if you want to play around by trial-and-error.

Factor: Used for the Manual option.

Assumed Correlation Function: The correlation function expresses the expectation value of (ri+nri)/(ri)2, with n >= 0, where ri are the weighted residuals for the fitted spline S(X): ri = (Yi – S(Xi))/ui. Here ui is the error (uncertainty) in the value Yi, or an estimate of this error. In the next section the available information about these errors can be indicated. The correlation function is a function of n, the difference in the indices of the datapoints, or in more general terms, a function of ΔX, the distance between two datapoints along the X axis. The autocorrelation length L is a parameter in this function.

There are four choices available for the correlation function:

Exponential: exp(–|ΔX|/L). In many cases this is a practical choice.
Gaussian: exp(–|ΔX|2/2L2). This type of correlation results from Gaussian smoothing.
Linear: 1 – |ΔX|/2L for |ΔX| < 2L and 0 otherwise. This type of correlation results from taking running averages.
Sinc: sin(2|ΔX|/L)/(2|ΔX|/L). The only function with a negative part.

Info On Errors In Y: One of the best properties of spline fit is that it almost always produces excellent results even if the user has no information on the errors ui. This is a small miracle, given the fact that Spline Fit only has the residuals (noise) available to base its decisions on. The reason that this is possible is that spline fit uses special statistical tests, which are insensitive to user misjudgements of the errors in the data and even immune to a misjudgement in the form of a common scaling factor. For example, if all errors are estimated to be one quarter of what they are in reality, spline fit does not care.

None: The user has no particular information on the data errors. This is a very common situation. Spline fit assumes that all errors are equal, by setting ui equal to 1 for all i.
From Error Data: User-estimates of the data errors ui are supplied in the third column of the datafile. The user-estimates may be wrong by any common factor. Only the relative errors, from one datapoint to the next, count.
Fixed Value: The user sets the error in the Y data to a fixed value. With this choice, spline fit no longer uses the special statistical tests for the goodness-of-fit, but the common χ2 test.
Signif. Digits: The user specifies the number of significant digits in the Y data as a fixed value. With this choice, spline fit no longer uses the special statistical tests for the goodness-of-fit, but the common χ2 test.

Value: Used for the Fixed Value and Signif. Digits options.

Spline Order: The order of the spline function is one more than the degree of the polynomial pieces that make up the spline. A common choice is 4, which generates cubic splines. These have continuous first and second derivatives. The lowest order permitted is 1, which generates a histogram-like approximation.

Exclude Data In Range (X min, X max): With these, a data range can be specified that is excluded from the fit. This option can be used, for example, to exclude a peak from the fit, so that a curved baseline can be fitted to the remaining data

No. of Points: Defines the number of points for newly generated spline result and derivative bufffer. If the From Input button is checked the number of points will become the same as the input data has.

Show Minima/Maxima: Minima and maxima of the spline are shown as separate points.

Show Derivative: The derivative of the spline is added to the plot.

Set Defaults: Restores default spline fit parameter.

Spline Fit: Executes the spline fit and generates new buffers with the result.

Return Values

The spline fit returns some values to the Data Inspector and Report Panel:

rms: The quantity rms is the root mean square value of the noise amplitude in the data.

dws: The quantity dws is the generalized Durbin-Watson statistic for the fitted spline. A value in the range 1.9-2.2 usually indicates a good fit. Larger values are suspect, since they may indicate that some of the noise is fitted. Smaller values, which are very rare, definitely point to a systematic misfit.

l: The parameter l is the number of intervals of the fitted spline. The number of internal knots is one less than this. Unless your data are extremely complicated or sparse, l should only be a fraction of the number of data points.

ksi: The autocorrelation length ξ reported for the spline fit is expressed as a number measured on the X axis. A value of zero or much smaller than the average data spacing ΔX indicates that the data are essentially uncorrelated.

acffit: The quantity acffit measures how closely the autocorrelation function of the fit-residuals matches the assumed autocorrelation function with autocorrelation length x (the previous number).