A Study on Fractional Polynomial Regression

A Study on Fractional Polynomial Regression

Chapter One

Purpose of the Study

Most of the existing method on fractional polynomial models focused on fitting models to psychological and pharmacokinetic experimental data. Little has been done on agronomic data although; Nelder (1966) introduced and applied the inverse polynomial model on fertilizer trials, while Salawu (2007) applied the inverse polynomial model at quadratic variable on fertilizer response of three rice varieties.

This research focuses on fitting all the power set of a fractional polynomial model on Pmain aim is to observe how well the fractional polynomial model fit the data using normal errors regression analysis when the covariates are continuous or are grouped.

CHAPTER TWO

LITERATURE REVIEW

This chapter seeks to review literature on the works of different scholars with regards to techniques applied in areas of conventional polynomial regression models, fractional polynomial regression model and issues of assessing model adequacy.

Polynomial Regression Models for Continuous Covariates

A basic choice in modeling is between parametric and non-parametric models. Parametric models such as polynomials are easy to fit and the risk function may be written down concisely, but they may fit the data badly and give misleading inference. On the other hand, non-parametric models may fit the data well but difficult to interpret due to fluctuations in the fitted curves. The risk function is usually impossible to write down concisely (Royston et al., 1999). Polynomial regression entails an inherent trade- oﬀ between accuracy and eﬃciency. As the degree of the polynomial increases, the accuracy of the model increases up to a certain point, however the time and space needed increases as well (Stronger and Stone, 2006). Bremer (2012) stated that in practice, we usually start with models of degree one, and if transformations on the predictor or the response are insufficient then we consider models of degree two. Higher degree models should be avoided unless the context from which the data is coming explicitly calls for one of these models. To decide on the appropriate degree of a polynomial regression model, two different strategies are possible. One can start with a linear model and include higher order terms one by one until the highest order term becomes non-significant. This method is generally called Forward Variable selection. On the other hand one could start with a high order model and exclude the non Bsignificant highest order terms one by one until the remaining highest order term becomes significant. This method is generally referred to as Backward Variable selection. In general, the two methods may not lead to the same model. For polynomial models, these methods are likely overpowered, since we can restrict our attention to first and second order polynomial models (Bremer, 2012). It is possible to select the predictor functions more carefully as curve linear functions of X to avoid this problem.

One problem that is always encountered in regression model building is nonlinearity in the relation between the outcome variable and continuous or ordered predictors. Traditionally, such predictors are entered into stepwise selection procedures as linear terms or as dummy variables obtained after grouping, though the assumption of linearity may be incorrect (Royston and Sauerberi, 2008). Categorization introduces problems of defining cutpoint(s) (Altman et al., 1994), overparametrization and loss of efficiency; Lagakos (1988). In any case, a cutpoint model is an unrealistic way to describe a smooth relationship between a predictor and a response variable. An alternative approach is to keep the variable continuous and allow some form of nonlinearity. Hitherto, quadratic or cubic polynomials have been used, but the range of curve shapes afforded by conventional low-order polynomials is limited (Royston and Sauerbrei, 2008). Box and Tidwell (1962) proposed a method of determining a power transformation of a predictor. A more general family of parametric models, proposed by Royston and Altman (1994), is based on fractional polynomial (FP) functions and can be traced back to Box and Tidwell‘s (1962) approach. Royston and Altman (1994) presented the FP functions which encompass conventional polynomials as a special case where one, two or more terms of the form x^p are fitted, the exponent‘s p being chosen from a small, preselected set of integer and non-integer values.

For non-parametric regression and scatter plot smoothing are other methods in modeling continuous covariates other than linear and FP functions. For a function of x with the global-influence property, the fit at a given value x₀ of x may be relatively unaffected by local perturbations of the response at x₀, but the fit at points distant to x₀ may be affected, perhaps considerably. This property may be regarded by proponents of local-influence models as a fatal flaw (Royston and Sauerbrei, 2008). Conventional polynomial regression is a popular nonparametric regression technique due to its attractive asymptotic properties, in particular at the border of the support. For fully observed responses, a local polynomial regression estimate of m(x₀) is obtained by estimating a polynomial in x with weighted ordinary least squares. Each unit is weighted depending on its distance in x to the design point of interest (focal value) x₀, thereby making the procedure local (Karlsson et al., 2009). According to Royston and Sauerbrei (2008), a rigorous definition of the global-influence property has not been framed, but such models are usually parametric in nature. Examples include polynomials, nonlinear models such as exponential and logistic functions, and fractional polynomials developed by Royston and Altman (1994). By contrast, functions with the local-influence property, including regression splines (de Boor, 2001), smoothing splines (Green and Silverman, 1994), and kernel-based scatter-plot smoothers such as locally weighted scatter plot smoothers ‖LOWESS‖ (Cleveland and Devlin, 1988), are typically nonparametric in character. Perturbation of the response at x₀ usually greatly affects the fit at x₀ but hardly affects it at points distant to x₀. One key argument favoring functions with global influence is their potential for use in future applications and datasets (Royston and Sauerbrei, 2008). Without such an aim, functions with local influence might appear the more attractive (Hand and Vinciotti, 2003). According to Royston and Sauerbrei (2008) fractional polynomial functions retain the global-influence property; they are much more flexible than polynomials. Further, they stated that low-dimensional fractional polynomial curves may provide a satisfactory fit where high-order polynomials fail (Royston and Altman, 1994). Fractional polynomials are intermediate between polynomials and nonlinear curves. They may be seen as a good compromise between ultra-flexible but potentially unstable local-influence models and the relatively inflexible conventional polynomials (Royston and Sauerbrei, 2008).

CHAPTER THREE

METHODOLOGY

This chapter seeks to explain the methodology used in the research. The study going to present the fractional polynomial for normal error regression models, the median method for categorizing continuous covariates and the deviance method for parameter estimation and checking adequacy of the fitted model.

Normal Error Model

For an individual with response y, the multiple linear regression model with normal errors

~ N(0 2 ) and covariate vector X = ( x1 , x2 ,…, xk ) with k variables, may be

written as y

(3.1)

The linear predictor or ‗index‘, X is an important quantity in multivariable modeling and equation (3.1) is called the normal error model (Royston and Sauerbrei, 2008).

CHAPTER FOUR

ANALYSIS AND DISCUSSION OF RESULTS

Introduction

This chapter seeks to present results from the analyses of data and interpretation of results for the normal error fractional polynomial regression for the data sets described in chapter three and presented in the appendix. The data set was analyzed as a generalized linear model (GLM), using two different approaches.

CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATION

Introduction

This chapter presents the summary, conclusion and recommendations based on the results obtained in chapter four.

Summary

The main objective of this research work is to fit a fractional polynomial regression model with continuous covariate and grouped covariate. The essence is to compare the performance of the fit for continuous covariate and grouped covariates. The generalized linear model was fitted for the fractional polynomials. Two different approaches were used. The first approach is Royston and Altman method, while the second approach is ordinary fractional polynomials fit.

From the fitted fractional polynomial regression models it was observed that the median algorithm method for grouping continuous covariates that was proposed gave a better results compared to the continuous covariate. For the experimental design data, the effect of nitrogen fertilizer, manure and cowpea variety on cowpea yield presented in table 4.1 through table 4.10 when MFP regression proposed by Royston and Altman was applied the algorithm for selection factors with significant effects converged at φ(1,_, 1) with final deviance of 127.97, both fertilizer and manure rates are significant at 5% level with P-value of 0.029 and 0.001 respectively. On the other hand variety was not significant at 5% level. When fertilizer was considered as an independent variable one i.e (x₁) , the algorithm for the selection of factors with significant effects converged at φ(x_1, 3) with model terms deviance of 127.08.The model for selection of factors with significant effects converged at φ(x_2, -2) with model terms deviance of 130.17 for the manure.

Conclusion

Based on the observations above, the study conclude that the median grouping method (polytomous) for grouping continuous covariate did not performed badly, since it gave most significant result compare to ungrouped and grouped (dichotomous) continuous covariates. For the Fractional polynomials regression, the continuous covariates produced the gain (G) of 3.09.When multivariable fractional polynomials regression was used, the gain (G) of 6.20 and 25.85 were produced. In fact, most data analyst always grouped their treatment levels before analysis except otherwise. Therefore grouping could be done adequately depending on the method one obtained his cutpoint or carrying out the grouping of the continuous covariate.

Recommendation

Based on our observations from chapter four, the study recommends the following:

Fractional polynomial regression model fit the data well because it is open in thesense that a set of pre-defined powers are available which the best powers among other contending powers can be
Theuse of FP modeling in experimental design data too, since programs for interaction effect have been developed in latest versions of STATA, though, was not studied in this research work so as to extend its
The median algorithm method of grouping continuous covariate is recommendbecause in this research work it showed a contending strength with covariate that is not grouped.

Contribution to Knowledge

The contributions to knowledge from this research work are as follows;

The comparison between fractional polynomial regression model withcontinuous covariate and grouped covariate is achieved.
Medianalgorithm method of grouping continuous covariates has
Fitteda fractional polynomial regression model in analyzing experimental design data has been successfully

Suggestion for Further Research

Extension of fractional polynomial regression model in fitting experimental design data with interaction effects is suggested.

Other Topics