Implementation of Derivative-Free Optimization Methods

Implementation of Derivative-Free Optimization Methods

Chapter One

OBJECTIVE OF THE STUDY
The overall aim of the study is to implement derivative-free algorithms in unconstrained problems.

The specific objectives of this study are:

to implement the finite difference approach for derivative in the Quasi-Newton algorithm
to implement the derivative-free trust region method using finite difference methods
to determine the accuracy of the methods

CHAPTER TWO

LITERATURE REVIEW

INTRODCTION

This section introduces the different algorithms that will be compared in this project. Consider an optimization problemMin f(x) (2.1)

Several strategies can be considered when we are faced with the problems which do not allow utilization of direct derivatives of gradients. The first is to apply existing direct search optimization methods, like the well-known and widely used simplex reflection algorithm or its modern variants, of the parallel direct search algorithm. This first approach has the advantage of requiring little effort from the user; however, it may require substantial computing resources like a great number of function evaluations (Nocedal and Wright 2014). This is because it does not take into account the advantage of the objective function’s smoothness well enough.

The second approach is using automatic differentiation tools. Automatic differentiation is utilized to define computer programs which calculate the derivatives of a function by some procedures. These computer programs calculate the Jacobean of vector-valued functions which are from n-dimensional to m-dimensional Euclidean space, i.e., from ^m. On the other hand, if the function is scalar-valued, i.e., from ⁿ to R, then the computer program should calculate the gradient (and Hessian) of the function. However, such tools are not preferred in solving problems which we consider. This is mainly because in the automatic differentiation tools approach, the function to be differentiated is required to be the result of a callable program which cannot be treated as a black box (Xiaogang, Dong, Xingdong and Lu, 2006). A third possibility is to make use of finite difference approximation of the derivatives (gradients and possibly Hessian matrices). In general, given the cost of evaluating the objective function, evaluating its Hessian by finite differences is expensive. One can utilize quasi-Newton Hessian approximation techniques instead. Incorporating finite differences for computing gradients in conjunction with the quasi-Newton Hessian approximation techniques has proved to be helpful and sometimes surprisingly efficient. Indeed, the additional function evaluations required in the calculation of the derivatives is very costly and, most importantly, finite differencing can be unreliable in the presence of noise if the differentiation step is not adapted according to the noise level. The objective functions in the problems we consider are obtained from some simulation procedures; therefore, automatic differentiation tools are not applicable as mentioned above. This forces one to consider algorithms without preceding the approximation of the derivatives of the objective function at a given value. We will look at discrete gradients from non-smooth optimization, where the approach can be interpreted as an approximation or mimicking of derivatives. There are two important components of derivative free methods. Sampling better points in the iteration procedure is the first one of these components. The other one is searching appropriate subspaces where the chance of finding a minimum is relatively high. In order to be able to use the extensive convergence theory for derivative based methods, these derivative free methods need to satisfy some properties. For instance, to guarantee the convergence of a derivative free method, we need to ensure that the error in the gradient converges to zero when the trust-region or line-search steps are reduced. Hence, a descent step will be found if the gradient of the true function is not zero at the current iterate. To show this, one needs to prove that the linear or quadratic approximation models satisfy Taylor-like error bounds on the function value and the gradient. Finally, for our approach to derivative free optimization given by non-smooth optimization, we shall use so-called discrete gradients.

NESTEROV, (2011) work on Derivative Free optimization, and sighted Dixon, Himmelblau and Polyak for extensive discussions and references. These methods come essentially in four different classes, a classification strongly influenced by (Conn and Toint 1996). The first class contains algorithms which use finite-difference approximation of the objective function’s derivatives in the context of a gradient based method, such as nonlinear conjugate gradients or quasi-Newton methods (see, for instance (Nash and Sofer 1996)). He says that Derivative free optimization is a class of nonlinear optimization methods which usually comes to mind when one needs to apply optimization to complex systems. The complexity of those systems manifests itself in the lack of derivative information (exact or approximate) of the functions under consideration. What usually causes the lack of derivative information is the fact that the function values are a result of a black-box simulation process or a physical experiment. The situation is often aggravated by the high cost of the function evaluations and the numerical noise in the resulting values. Thus the use of finite difference derivative approximation is typically prohibitive. The numerous applications of derivative free optimization can be found in engineering design, geological modeling, finance, manufacturing, biomedical applications and many other fields. As the available computational power grows the simulation processes become routine and using optimization of complex systems becomes possible and desirable.

CHAPTER THREE

DERIVATIVE BASED OPTIMIZATION

Quasi-Newton Method

If the function being minimized f (z) is not available in closed form or is difficult to differentiate, the approximations in equations 1.8, 1.9, 1.10, 1.11, 1.12 can be used to represent the derivatives. This is the case where If f is a function of just two variables.

Quasi-Newton Algorithm

Min { }, for (.) twice differentiable continuously

Step 1: Select a Z₀

Step 2: Compute

Step 3: If = 0, stop; else go to step 4

Step 4: Compute =

Step 5: If ^-1 exists, compute by solving = – and go to step 6; else set and goto step 6.

Comment

When is singular, the quasi-Newton method is not applicable, and whence the Hessian matrices is not positive definite, the search direction may fail to be a descent direction them the minimum problem have no solution. We revert to the method of gradient and Hessian methods in Rⁿ. We can now compute the step size

CHAPTER FOUR

METHODOLOGY AND RESULTS

INTRODUCTION

In this section, we considered the unconstrained minimization of an objective function f : Rⁿ—R, at least once continuously differentiable and bounded from below (for which gradients are neither available for use, nor can be accurately approximated). We are concerned by using two methods, which are Quasi-Newton method and Trust-Region method (conforming them to RosenBroock’s function). We will present results of numerical experiments and compare our algorithm with two widely used algorithms: the Quasi-Newton and the trust region methods. We choose three simple problems to illustrate some of the differences between the above two methods for derivative free optimization. The above two algorithms are implemented on MATLAB environment.

CHAPTER FIVE

SUMMARY AND CONCLUSION

Summary.

We here in briefly summarize the approaches which the Quasi-Newton method and the trust region methods use for finding the solution of the problem. The Quasi-Newton algorithms choose a search direction h(z_n) at the current iteration X_n and, then, take a step length for the new iteration with a lower function value along this direction.

This step has the length of h. Quasi-Newton method finds an approximate value of h by generating a limited number of trial step lengths and by replacing the Gradient and Hessian matrix with finite difference approaches

Trust-region algorithms construct a model function _k to be used instead of the actual objective function This model which is easier to handle than the objective function itself is constructed by means of the gradient and Hessian information of and by some -values which are already in our hands. The search directions which will minimize the model m_k is being looked for in some region around the current iterate x_k. The size of this region is adjusted according to the sufficiency of the decrease in at every iteration step.

Findings.

The methods converge and the level of accuracy is high.
The methods are fast but Quasi-Newton method is cheaper than that of Trust-Region method.
We wrote MATLAB code for the implementation of the derivative-free Quasi-Newton and Trust-Region methods.
The finite difference approach work faster than that of Quadratic Interpolation method.

Contribution to knowledge
in this study we are able to:

(a). implement the derivative-free Quasi-Newton method.
(b). implement the Trust-Region method for Derivative-Free Optimization.
(c). use the methods to minimize the Rosenbrock’s function and others.

Conclusion

Many algorithms exist in derivative based and derivative free optimization. It is know that these algorithms are related to specific problems. The purpose of this thesis is to determine the range of problems solvable by some of the algorithms.

In this work we’ve examine two popular methods, Quasi Newton and Trust Region.

We’ve considered problems in two variables. Problems in more variables would be considered in subsequent study.

Also, other approaches in the Trust region, like Interpolation and Regression methods would be considered. The problems we’ve considered performed well in both derivative based and derivative free methods.

REFERENCE

A.R. Conn and N. Luis Vicente, (2009); Introduction to derivation free optimization. Siam, Philadelpha.
A.R. Conn, K. Scheinberg, P. l. Toint (1998); On the convergence of derivation-free method for unconstrained optimization, An Approximation theory and optimization tribute to Powell M.J.D.’’ Iserles A; Buhmanu M. (pp.83-108) Cambridge university press.
A.R. Conn, N. I. M. Gould and PH. L. Tont (2000); Trust Region methods, Mpssiam series on optimization. Siam, Philadelphia.
A.R. Conn, N. I. M. Gould, and PH. L. Toint, (2000): Trust-Region Methods, MPSSIAM Series on Optimization, SIAM, Philadelphia.
A.R. Nelder and R. Mead, (1965), A simplex method for function Minimization, Comput.J.[vol.7, pp. 308–313].
LagariasJ, J. A. Reeds, M. H. Wright, and P. E. Wright(1998); Convergence properties of the Nelder–Mead simplex method in low dimensions, [9, pp.112–147] SIAM J. Optim.
C.T.Kelley, (1999), Iterative method for optimization, SIAM J.

Other Topics