Curve Estimation
Curve Estimation
-
- Models
are types of linear and nonlinear curves which may
be fitted to the data. PASW/SPSS supports these models: linear,
logarithmic, inverse, quadratic, cubic, power, compound, S-curve,
logistic, growth, and exponential. Using the PASW/SPSS menu choice
Analyze, Legacy Dialog, Scatter/Dot, will allow the researcher to plot
dependent variables, which may aid the researcher in selecting a
suitable model to fit. However, before selecting a more complex model
the researcher should first consider if a transformation of the data
might enable a simpler one to be used, even linear regression.
- Residual models
. The PASW/SPSS Curve Estimation module only
supports one dependent and one independent variable. While this is
suitable for bivariate analysis, for multivariate analysis it is at
best a "quick and dirty" tool for assessing if one of multiple
independent variables is related to the dependent in one of the 10
supported nonlinear manners. An alternative strategy is to use OLS,
ordinal, multinomial, or some other form of multivariate regression to
regress a given independent variable on all the other independents,
then save the residuals. The residuals then represent the variance in
the given independent once all other independents are controlled. One
may then us these residuals as the independent variable in the
PASW/SPSS Curve Fitting module, using it to predict the dependent under
any of the supported linear and nonlinear models.
The choice between a regular (raw data) and a residual model depends on whether the the researcher is interested in uncontrolled or in controlled relationships. Put another way, the standardized b coefficients in the uncontrolled, bivariate raw data approach are whole coefficients, equal to the correlation of the independent with the dependent. The standardized b coefficients in the controlled, multivariate residual approach are partial coefficients, partialling out the effect of other independent variables. Generally, partial coefficients are preferred for most multivariate analysis purposes.
- Time series models . In the Curve Estimation dialog, if the "Time" radio button is turned on, PASW/SPSS assumes time series data with a uniform time interval separating cases in the series. That is, each data row is assumed to represent observations at sequential times which are uniformly spaced. It is assumed, of course, that the dependent variable is also a time series variable. A "Sequence" variable is created automatically and is used as the independent (other predictor variables cannot be used if the "Time" option is selected). If the "Time" option is selected, the time variable, t, replaces the independent variable, x, in the equations given below. and one can specify a forecast period past the end of the time series.
- Linear models.
Y = b0 + (b1 * x), where b0 is the
constant, b1 the regression coefficient for x, the independent
variable. Note: in this and figures below, the exact shape of the curve
(line) is greatly affected by the parameters. Each figure only
represents one particular set of parameters, of course. In the figure
below, b0 is 4.818 and b1 is .436 in the "Model Summary and Parameter
Estimates" output table.
-
Logarithmic models.
Y = b0 + (b1 * ln(x)) where
ln() is the natural log function. In the figure below, b0 is 5.422 and
b1 is 1.113, in the "Model Summary and Parameter Estimates" output
table.
- Inverse models.
Y = b0 + (b1 / x). In the figure
below, b0 is 7.194 and b1 is -1.384, in the "Model Summary and
Parameter Estimates" output table.
- Quadratic models.
Y = b0 + (b1 * t) + (b2 * x**2)
where ** is the exponentiation operator. If b2 is positive, the slope
is upward; if negative, downward. In the figure below, b0 is 4.065, b1
is 1.389, and b2 is -.141, in the "Model Summary and Parameter
Estimates" output table.
- Cubic models.
Y = b0 + (b1 * x) + (b2 * x**2) +
(b3 * x**3). If b3 is positive, the slope is upward; if negative,
downward. In the figure below, b0 is 3.409, b1 is 2.609, b2 is -.598,
and b3 is .043, in the "Model Summary and Parameter Estimates" output
table.
- Power models.
Y = b0 * (x**b1). If b0 is positive,
the slope is upward; if negative, downward. Also, ln(Y) = ln(b0) + (b1
* ln(x)). In the figure below, b0 is 4.84 and b1 is .263, in the "Model
Summary and Parameter Estimates" output table.
- Compound models.
Y = b0 * (b1**x) . If b0 is
positive, the slope is upward; if negative, downward. Also, ln(Y) =
ln(b0) + (ln(b1) * x). Below b0 os 4.260 and b1 is 1.05, reported in
the "Model Summary and Parameter Estimates" table:
- S-curve models.
Y = e**(b0 + (b1/x)) , where e is
the base of the natural logarithm. If b1 is positive, the slope is
upward; if negative, downward. Also, ln(Y) = b0 + (b1/x). Below b0 is
2.009 and b1 is -.331, reported in the "Model Summary and Parameter
Estimates" table:
- Logistic models.
Y = 1 / (1/u + (b0 * (b1**x)))
where u is the upper boundary value. After selecting Logistic, specify
the upper boundary value to use in the regression equation. The value
must be a positive number that is greater than the largest dependent
variable value. If b1 is negative, the slope is upward; if positive,
downward. Also, ln(1/y-1/u) = ln (b0) + (ln(b1) * x). Below b0 is .113
and b1 is .822 in the "Model Summary and Parameter Estimates" table
output.
- Growth models.
Y = e**(b0 + (b1 * x)). If b1 is
negative, the slope is downward; if positive, upward. Also ln(Y) = b0 +
(b1 * x). Below b0 is 1.449 and b1 is .100 in the "Model Summary and
Parameter Estimates" table in output.
- Exponential models.
Y = b0 * (e**(b1 * x)). If b0
is negative, the slope is downward; if positive, upward. Also ln(Y) =
ln(b0) + (b1 * x). Below b0 is 4.260 and b1 is .100 in the "Model
Summary and Parameter Estimates" table in output.
- Residual models
. The PASW/SPSS Curve Estimation module only
supports one dependent and one independent variable. While this is
suitable for bivariate analysis, for multivariate analysis it is at
best a "quick and dirty" tool for assessing if one of multiple
independent variables is related to the dependent in one of the 10
supported nonlinear manners. An alternative strategy is to use OLS,
ordinal, multinomial, or some other form of multivariate regression to
regress a given independent variable on all the other independents,
then save the residuals. The residuals then represent the variance in
the given independent once all other independents are controlled. One
may then us these residuals as the independent variable in the
PASW/SPSS Curve Fitting module, using it to predict the dependent under
any of the supported linear and nonlinear models.
- Statistics
. PASW/SPSS statistical output for the Curve Estimation module includes these:
- Comparative fit plots
of the type above can be displayed to
compare any of the supported models. For example, for the data in the
foregoing models, the table and plot below compare a linear model with
an inverse model:
Example 2. A second example uses the PASW/SPSS sample dataset, virus.save, which tracks the spread of a computer virus in messages by time. A comparison of the linear model with the quadratic model shows an even more marked contrast:
- Regression coefficients are the b0 (constant) and other b terms in the model equations listed above.
- R2 measures , including multiple R, R-square, adjusted R-square, and standard error of the estimate.R-square is interpreted as the percent of variance in the dependent explained by the model. The "sig" column gives an F-test of the overall significance of the model. If the significance shown in the "Model Summary and Parameter Estimates" table for a given model (ex., Inverse) is, say, .032, this means there is a 3.2% chance that if a different random sample were taken, one would get a R2 as strong or stronger simply by chance of random sampling. Since this 3.2% chance of Type I error (false positives) is less than the customary 5% level, the researcher concludes that the computer R2 is significant (truly different from 0).
- Analysis of variance table
. If the "Display ANOVA table" checkbox is checked (not
the default) in PASW/SPSS, the "Model Summary" table will contain
R-square and the standard error of estimate but not the parameter
estimates nor the comparison across models. Rather, one will get
separate ANOVA output for each model. That for the quadratic model in
the computer virus example is illustrated below.
The "Model Summary" table is be followed by the "ANOVA" table containing regression, residual, and total sums of squares used in computing the F test of overall model significance, and the significance level is reported. Then the "Coefficients" table contains the unstandardized B coefficient for the one independent, its standard error, its standardized (beta weight) value, and a t test of its significance and the significance level. The constant and its standard error and significance is also reported. In the case of a quadratic model, illustrated below, the one independent is entered twice on the predictor side, once as hours and again as hours-squared, for this example.
- Save variables . The "Save" button in PASW/SPSS Curve Estimation allows the researcher to save predicted values, residuals, and prediction intervals (upper and lower bounds) back to the dataset for further analysis - for instance, for residual analysis , perhaps using the menu choice Graphs, Legacy, Scatter/Dot to plot residuals on the Y axis against the dependent on the X axis (as one example).
- Comparative fit plots
of the type above can be displayed to
compare any of the supported models. For example, for the data in the
foregoing models, the table and plot below compare a linear model with
an inverse model:
- Data dimensions . In PASW/SPSS Curve Estimation, only a single independent (or a time variable) predicting a single dependent can be modeled, meaning that only two-dimensional curves may be fitted. Other curve-fitting software supports three-dimensional curve-fitting.
- Data level . All models require quantitative dependent and independent variables. If both independent and dependent are dichotomous, the fit line will be linear even when a nonlinear (ex., quadratic) fit is requested; in such a case the linear and quadratic solutions will be identical. If one variable is dichotomous and the other is continuous, regardless of causal direction., the linear and nonlinear fit lines will not necessarily be identical.
- Randomly distributed residuals characterize well-fitting models.
- Independence . Observations should be independent.
- Linear models require multivariate normality (normal distribution of the dependent for each value of the independent or combinations of independent values). Also, the dependent must have constant variance across the ranges of the independent variables. The dependent must be related to the independent variables in a linear manner.
- Validation . While selecting the model with the highest R-squared is tempting, it is not the recommended method. For instance, a cubic model will always have a higher R-squared than a quadratic model. The recommended method for selecting which model is best is cross-validation. That is, the formulas for each model based on the estimation dataset are applied to the hold-out dataset, then the R-squares are compared based on output for the hold-out dataset. Alternatively, the determination may be made graphically by overlaying sequence plots of both models for the hold-out dataset.
- Other . Other assumptions are discussed separately according to model, as in the linear regression or logistic regression sections.
- How can I significance test the difference of two R2
's between models in a single sample?
- Unfortunately, PASW/SPSS does not support this obvious need. However, the
- Models
are types of linear and nonlinear curves which may
be fitted to the data. PASW/SPSS supports these models: linear,
logarithmic, inverse, quadratic, cubic, power, compound, S-curve,
logistic, growth, and exponential. Using the PASW/SPSS menu choice
Analyze, Legacy Dialog, Scatter/Dot, will allow the researcher to plot
dependent variables, which may aid the researcher in selecting a
suitable model to fit. However, before selecting a more complex model
the researcher should first consider if a transformation of the data
might enable a simpler one to be used, even linear regression.
- I want to use, from the Curve Estimation module, the
two best functions of my independent in a regression equation, but will
this introduce multicollinearity?
- Yes, if you have two related terms like x
There are many, including many with richer input and output options than the PASW/SPSS Curve Estimation module. SigmaPlot and TableCurve 2D and 3D , also from SigmaPlot.com. CurveExpert will fit some 30 different models. DataFit supports hundreds of two- and three-dimensional models. LabFit is another package supporting hundreds of functions for two- and three-dimensional curve fitting. And there are many more.
What is the command structure if I prefer to use the syntax window rather than menu system in PASW/SPSS?-
The command takes this form
TSET MXNEWVAR=4.
CURVEFIT
/VARIABLES=accident WITH age
/CONSTANT
/MODEL=LINEAR LOGARITHMIC
/PRINT ANOVA
/PLOT FIT
/SAVE=PRED RESID .
The TSET command sets aside space for the 4 new variables created by the SAVE command (predicted fit and error, residual fit and error). The VARIABLES command asks that number of accidents be predicted from age. The CONSTANT command requires a constant to be in the equation. MODEL requests both a linear and a logarithmic model be fitted. PRINT ANOVA puts an Anova table in output. PLOT FIT causes a plot of number of accidents on the Y axis against age on the X axis with points representing observed values and lines for the linear and logarithmic fit curves. The SAVE command saves variables back to the dataset.
The full general syntax is as follows:
CURVEFIT VARIABLES= varname [WITH varname]
[/MODEL= [LINEAR**] [LOGARITHMIC] [INVERSE]
[QUADRATIC] [CUBIC] [COMPOUND]
[POWER] [S] [GROWTH] [EXPONENTIAL]
[LGSTIC] [ALL]]
[/CIN={95** }]
{value}
[/UPPERBOUND={NO**}]
{n }
[/{CONSTANT† }
{NOCONSTANT}
[/PLOT={FIT**}]
{NONE }
[/ID = varname]
[/PRINT=ANOVA]
[/SAVE=[PRED] [RESID] [CIN]]
[/APPLY [='model name'] [{SPECIFICATIONS}]]
{FIT }
**Default if the subcommand is omitted.
†Default if the subcommand is omitted and there is no corresponding specification on the TSET command.
- Daniel, Cuthbert and Fred S. Wood (1999). Fitting equations to data: Computer analysis of multifactor data, 2nd edition. . NY: Wiley-Interscience. A leading text on curve estimation, going beyond the capabilities of the PASW/SPSS Curve Estimation module.
- Steiger, J.H. & Browne, M.W. (1984). The comparison of interdependent correlations between optimal linear composites. Psychometrika , 49, 11-21.
Overview |
Contents Key concepts and terms |