next up previous contents index
Next: The Newton-Raphson Method. Up: Fitting of Data Previous: Fitting of Data

   
Outline of the Available Methods

Let y(x,a) be a function where $x = (x_1,\ldots,x_n) \in \rm I\kern-.2000em\hbox{R}^{n}$ are the independent variables and $a \in A \subset \rm I\kern-.2000em\hbox{R}^{p}$ are the p parameters lying in the domain A. If A is not the whole space $\rm I\kern-.2000em\hbox{R}^p$, the problem is said to be constrained.

If a situation can be observed by a set of events $(y^{(i)},x^{(i)}) i=1,\ldots,m$, i.e. a set of couples representing the measured dependant and variables, it is possible to deduce the value of the parameters of your model y(x,a) corresponding to that situation. As the measurements are generally given with some error, it is impossible to get the exact value of the parameters but only an estimation of them. Estimating is in some sense finding the most likely value of the parameters. Much more events than parameters are in general necessary.

In a linear problem, if the errors on the observations have a gaussian distribution, the ``Maximum Likelihood Principle'' gives you the ``best estimate'' of the parameters as the solution of the so-called ``Least Squares Minimization'' that follows:

\begin{displaymath}\min_{a\in A}~~\chi^2~(a)\end{displaymath}

with

\begin{displaymath}\chi^2~(a)~=~\sum_i~w^{(i)}~[~y^{(i)}~-~y(x^{(i)},a)~]^2~\end{displaymath}

The expected variance of the so-computed estimator is minimum among all approximation methods and is therefore called in statistics an ``efficient estimator''.

The quantities

\begin{displaymath}r^{(i)}(a)~=~\sqrt{w^{(i)}}~[~y^{(i)}~-~y (x^{(i)},a)~]\end{displaymath}

are named the residuals and w(i) the weight of the ithobservation that can be, for instance, the inverse of the computed variance of the observation.

If y(x,a) depends linearly on each parameter aj, the problem is also known as a Linear Regression and is solved in MIDAS by the command REGRESSION. This chapter deals with y(x,a) which have a non-linear dependance in a.

Let us now introduce some mathematical notations. Let g(a) and H(a) be respectively the gradient and the Hessian matrix of the function $\chi^2(a)$. They can be expressed by

g(a) = 2 J(a)T r(a)      and


H(a) = 2 (J(a)T J(a) + B(a))

where r(a) is the residuals vector

\begin{displaymath}r(a)~=~(r^{(1)}(a),\ldots,r^{(m)}(a))~~,\end{displaymath}

J(a) the Jacobian matrix of r(a) i.e.

\begin{displaymath}J(a)_{ij}~={\partial {r^{(i)}} \over \partial{a_j}}\end{displaymath}

and B(a) is

\begin{displaymath}B(a)~=~\sum_i~r^{(i)}(a)~H_i(a)\end{displaymath}

with Hi(a), the Hessian matrix of r(i)(a).

In the rest of the chapter, all the functions are supposed to be differentiable if they are applied the derivation operator even when this condition is not necessary for the convergence of the algorithm.

A certain number of numerical methods have been developed to solve the non-linear least squares problem, four have so far been implemented in MIDAS. A complete description of these algorithms can be found in [1] and [3], the present document will only give a basic introduction.



 
next up previous contents index
Next: The Newton-Raphson Method. Up: Fitting of Data Previous: Fitting of Data
Petra Nass
1999-06-09