ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Update Quasi-Newton Algorithm for Training ANN

[version 1; peer review: awaiting peer review]
PUBLISHED 16 Jan 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Fallujah Multidisciplinary Science and Innovation gateway.

Abstract

The proposed design of neural network in this article is based on new accurate approach for training by unconstrained optimization, especially update quasi-Newton methods are perhaps the most popular general-purpose algorithms. A limited memory BFGS algorithm is presented for solving large-scale symmetric nonlinear equations, where a line search technique without derivative information is used. On each iteration, the updated approximations of Hessian matrix satisfy the quasi-Newton form, which traditionally served as the basis for quasi-Newton methods. On the basis of the quadratic model used in this article, we add a new update of quasi-Newton form. One innovative features of this form's is its ability to estimate the energy function's or performance function with high order precision with second-order curvature while employ the given function value data and gradient. The global convergence of the proposed algorithm is established under some suitable conditions. Under some hypothesis the approach is established to be globally convergent. The updated approaches can be numerical and more efficient than the existing comparable traditional methods, as illustrated by numerical trials. Numerical results show that the given method is competitive to those of the normal BFGS methods. We show that solving a partial differential equation can be formulated as a multi-objective optimization problem, and use this formulation to propose several modifications to existing methods. Also the proposed algorithm is used to approximate the optimal scaling parameter, which can be used to eliminate the need to optimize this parameter. Our proposed update is tested on a variety of partial differential equations and compared to existing methods. These partial differential equations include the fourth order three dimensions nonlinear equation, which we solve in up to four dimensions, the convection-diffusion equation, all of which show that our proposed update lead to enhanced accuracy.

Keywords

Robust quasi-Newton methods, Convergence analysis, Numerical experiments, ANNs. unconstrained optimization.

1. Introduction

In recent years, some authors have used neural networks (ANNs) as an important technique to solve many real-world problems because of their specifications. Some authors have used ANNs to solve different types of differential equations, such that1,3 first proposed the concept of solving differential equations using ANNs by formulating a trial solution of the differential equation. The authors tested the applicability and accuracy of their developed method not only for differential equations but also for systems of coupled differential equations. Furthermore, the authors compared their results with those obtained using other numerical methods and reported that the developed ANN was superior in terms of memory requirements and accuracy.46 For this reason, the authors aimed to develop this technique to obtain the best results. One of these developments is the training rules, particularly the quasi-Newton method, because it is a second-order convergence. Many authors such712 have proposed modifications for the training algorithm. Others such1320 suggest some rules for the speed of convergence. Several attempts have been made to solve different types of differential equations by using feed forward neural networks. In,21 reported a hybrid method was reported that combines optimization techniques with neural networks to solve high-order differential equations.

The quasi-Newton method is the most useful method for minimizing a smooth n variable function.

(1)
minimizef(x),xRn
where f:RnR1 is continuously differentiable.22 In contrast to utilizing the real value of the Hessian or its inverse, in the proposed update, we use a symmetric positive definite estimate of the Hessian (H) or its inverse (inv H). The following is the form:
(2)
xk+1=xk+αkdk,dk=gkHk=gkHk1

If H is not an invertible matrix, then the pseudoinverse of H.

Wolfe conditions are used to determine the step length ( αk ) and search direction ( dk ), as follows:

(3)
f(xk+αkdk)f(xk)+δαkgkTdk
(4)
dkTg(xk+αkdk)σdkTgk
where 0<δ<σ<1 was typically used. For more details, refer to.23 The parameter αk is computed using a line - search in the following form:
(5)
αk=gkTdk/dkTQdk
For more details, please refer to.24 Its direction is computed by solving:
(6)
Bkdk+gk=0

For each iteration, Bk is the updated Hessian estimate. The Broyden Fletcher Goldfarb-Shanno (BFGS) approach, proposed by Broyden, Fletcher, Goldfarb, and Shanno, is now one of the most effective training methods. Using the following formula, matrix Bk+1 in the BFGS technique can be updated:

(7)
Bk+1BFGS=BkBkskskTBkTskTBksk+ykykTskTyk

Let Hk be the inverse of Bk . Undoubtedly, the suggested update in (8) is publicly known as

(8)
Hk+1BFGS=HkHkykskT+skykTHkskTyk+[1+ykTHkykskTyk]skskTskTyk

See25,26 for further details. For the update process, we let:

(9)
Bk+1sk=yk
where sk=xk+1xk=αkdk and yk=gk+1gk (see27). The numerical experiment showed that the BFGS technique outperformed all the other training approaches. Convex minimization using the update approach has been extensively investigated; for example, see.1,2,28 To demonstrate that the update approach using the Wolfe line search may not succeed for non-convex functions, Dai created an example with six cycling points.29 Many improvements have been suggested, including changes in the regular BFGS technique, and a modified BFGS algorithm (MBFGS) has been devised to improve and speed the global convergence of the BFGS method.30,31 They demonstrated that the approach converged worldwide for nonconvex optimization problems. To determine whether a novel quasi-Newton methodology has global convergence and outperforms the BFGS method in terms of computation, see.32,33 In practice, the modified BFGS technique is typically preferred to efficiently compute matrix H (or H−1) using a symmetric positive definite matrix. While the standard method employs only gradient values, the modified approach uses both. Without making any convexity assumptions about the goal function, global convergence was demonstrated.34

2. Derivation of suggested update

A new additional update was derived using a quadratic model of the goal function. Consequently, the quadratic model of the objective function is given as

(10)
fk+1=fk+skTgk+12skTQ(xk)sk
where Q(xk) is the Hessian matrix. The first derivative of the above equation can be written as:
(11)
fk+1=gk+Q(xk)sk

Thus, the curvature information in Eq. (10) can be approximated by

(12)
skTQ(xk)sk=23(fkfk+1)+23skTQ(xk)sk

Because the updated Bk+1 is supposed to approximate the Q(xk) , it is reasonable to have

(13)
skTBk+1sk=23(fkfk+1)+23skTQ(xk)sk

Using (11) in (13), we obtain:

(14)
skTBk+1sk=23skTyk+23(fkfk+1)

The new quasi-Newton (QN-) equation is given by:

(15)
skTyk~=23skTyk+23(fkfk+1)

From the above equation, the different gradients can be written as

(16)
Bk+1sk=yk~,yk~=23yk+2/3(fkfk+1)skTukuk
where uk is a vector such that skTuk0 . The BFGS update is modified based on the revised quasi-Newton equation. Alternatively, the vector uk choices in Equation (16) can be expressed as:
yk~=23yk+2/3(fkfk+1)skTykyk.
yk~=23yk+2/3(fkfk+1)skTgkgk.
yk~=23yk+2/3(fkfk+1)skTgk+`1gk+`1.

From the above explanation of the results, we can write the algorithm as follows:

Stage 1: Let x0Rn , k=0 and H0=I

Stage 2: If gk=0 , stop.

Stage 3: Evaluate dk=Hkgk .

Stage 4: Determine the optimal learning rate (step - size) by αk using Eqs. (4) & (5).

Stage 5: Let xk+1=xk+αkdk . Update Hk+1 by using Equations (9) and (16) if skTyk~>0 ; otherwise, leave Hk+1=Hk .

Stage 6: Take k=k+1 , and then go to Stage 2.

The following theorem illustrates the theoretical benefits of the new quasi-Newton Equation (16). To ensure that the matrix Bk+1 is positive definite, we need only prove that skTyk~>0 holds.

Theorem 1.

Let matrix sequence Bk+1 be generated using Equation (6). Thus, the sequence Bk+1 is positive- definite.

Proof.

From the different gradient definitions, we have:

(17)
skTyk~=23yk+23(fkfk+1)

By applying Wolfe's condition to the previous equation, we obtain:

(18)
skTyk~23(skTykδgkTsk)

Because skTyk>0 and δgkTsk>0 , Eq. (18), we obtain

(19)
skTyk~0

Therefore, Bk+1 is positive -definite.

3. Convergent analysis

We provide a global convergence of innovative approaches under circumstances that are comparatively understated.

  • 1. The level was set to L0={xRn:f(x)f(x0)} be convex.

  • 2. Because the gradient satisfies the Lipschitz continuity, there is a positive constant called L>0 :

(20)
(f(x-)f(x+))Lx-x+,x-,x+L0.

The series {xk} generated by a new algorithm is evident in S because {fk} is a decreasing series, and there is a constant f that results in

(21)
limkfk=f
  • 3. Let Q be a matrix from the 2nd derivatives of the f . Then, there exist constants R and r , such that:

(22)
rz2zTQzRz2

for all zRn , for more details see.1214

Theorem 2.

If {xk} is generated using the proposed algorithm. Then we have:

(23)
rsk2skTyk~Rsk2.
and
(24)
yk~(L+R)sk.

Proof:

By different gradient definitions yk~ and combining Equations (10) with (16), we obtain:

(25)
skTyk~=skTQ(xk)sk=23skTyk+23(fkfk+1)=2(fk+1fk)2skTgk.

Utilizing the mean value theorem and Taylor series, we obtain:

(26)
fk+1=fk+skTgk+12skTQ(ηk)sk
where ξ(0,1) and ηk=xk+ξ(xk+1xk) . As such by Eqs. (25) and (26), as follows:
(27)
skTyk~=2(skTgk+12skTQ(ηk)sk)2skTgk=2skTgk+skTQ(ηk)sk2skTgk=skTQ(ηk)sk

Meeting Assumption 3, it is simple to surmise:

(28)
rsk2skTyk~Rsk2

Then, we obtain different gradient definitions of yk~ by direct calculations:

(29)
yk~=23yk+[2/3(fkfk+1)]skTukuk23yk+|[skTQ(ηk)sk2/3(skTyk)]|δkukuk43yk+|[skTQ(ηk)sk]|sk4/3Lsk+Rsk(4/3L+R)sk

The proof is finished.

Theorem 3.

If the constants a1>0 and a2>0 exist, then the following inequality holds:

(30)
skTBks2a2sk2,andBkska1sk
for any k . The sequence {xk} is obtained using the new algorithm, and we obtain:
(31)
limkinfgk=0.

Proof:

The proof is straightforward, similar to the proof of Theorem 3 in.6

In this study, we prove a global convergence theorem for non-convex problems and suggest a cautious updating strategy that is comparable to that mentioned previously. We state a Powell-related lemma for motivational purposes.15

Lemma 1.

A smooth function f that is limited below can be treated using the BFGS technique if a constant M>0 exists, which makes the inequality:

(32)
yk~2/skTyk~M
then:
(33)
limkinfgk=0.

Theorem 4.

If these Assumptions hold, {xk} is generated by the new algorithm. Then Eq. (32) holds.

Proof:

If Eq.(33) fails to hold, then there exists a constant ε>0 , such that:

(34)
gkε.

Therefore, a constant r>0 exists, such that:

(35)
rsk2skTyk~.

So, combining Eqs. (29) and (35) imply that:

(36)
yk~2/skTyk~M.

The proof is finished.

4. Numerical experiments

In this section, we present a numerical comparison of QN -techniques and suggest modifications for solving 4th order nonlinear partial differential equations.

Example 1:

Consider the nonlinear 4th order has the form;

ʯxtʯxxxy2ʯxxʯy4ʯxʯxy=0;ʯ(x,y,0)=12sech2(12(x+y))and,exact solutionʯ(x,y,z,t)=tanh(12(x+yt))

The results of solving the above equation at different times t are presented in Table 1. The neural solution for this equation is shown in Figure 1.

We stopped utilizing the algorithms by employing Himmeblau's law18:

If |f(xk)|>105, then |f(xk)f(xk+1)||f(xk)|=1 . Otherwise, |f(xk)f(xk+1)|=1 . For every problem, if gk<ε is used, the program is terminated.

Quasi-Newton approaches perform better when an appropriate quasi-Newton equation is employed. The performance of the new update with uk=gk+1 was the best of the three methods, whereas the normal performance of the new update with uk=yk and uk=gk was somewhat better than that of the BFGS technique. As a result, among the QN -procedures for unconstrained problems, the new update with uk=gk+1 is the most well -organized.

Example 2:

Consider the nonlinear 4th order has the form:

ʯttʯxxʯxxxxʯyyʯzz3(ʯ2)xx=0ʯ(x,y,z,0)=12sech2(12(x+y+z)),ut=tanh(12(x+y+z))sech2(12(x+y+z))

Exact solution:

ʯ(x,y,z,t)=12sech2(12(x+y+z2t))

The neural solution for this equation is shown in Figure 2 when z = -0.5. The accuracy for epochs and time is presented in Table 2, and Table 3, illustrates the results of the neural solution of the equation.

Table 1. The results of suggested algorithm for different values of time t.

X = yti exactSuggested update
t = 0.001t = 0.01t = 0.05t = 0.25 t = 0. 5
0-0.000499999958333338-0.000048659724380-0.00499995832713615-0.025004418506876-0.124353001771672-0.244918662401479
0.10.09917293685007910.0991745224936500.09471522470115250.074859690643595-0.0249947929685649-0.148885033624227
0.20.1968947513472500.1968947513472880.1925653986080040.1732357321591650.0748596906873580-0.0499583749589804
0.30.2908549773513760.2908549773512500.2867302913733980.2682711820082290.1732351578345540.0499583749579298
0.40.3795210616076390.3795210616078160.3756626611743460.3583573983448810.2682711609880480.148885033623492
0.50.4617238425475650.4617238425474540.4581758521754610.4422304539404850.3583573353498610.244918662402002
0.60.5366936825826130.5366936867094200.5334821284571570.5190218339048870.4422302905133230.336375352939167
0.70.6040503114156080.6040503114155110.6011844731215160.5882592564034650.5190218338981770.421898609908564
0.80.6637571498681710.6637571498683640.6612322030974770.6498276076309770.5882592563980050.500520211189160
0.90.7160543243130460.7160543805602820.7138545530398990.7039056038620370.6498274193530200.571668985813867
10.7613840888095080.7613840888093370.7594862750645050.7508932836260450.7039056039365210.635140845030389
c57e4b17-c84d-41af-bff7-2bc0be961a23_figure1.gif

Figure 1. Illustration the results using new algorithm for different time t.

c57e4b17-c84d-41af-bff7-2bc0be961a23_figure2.gif

Figure 2. Solution for z = -1/2.

Table 2. Properties of the proposed algorithm for solving Example 1.

Train Function “Trainbfg”Performance of trainEpochTime Msereg
[t = 0.001]4.72e-278180:00:021.4903e-11
[t = 0.01]7.27e-234040:00:003.0524e-17
[t = 0.05]9.34e-24330:00:007.6100e-12
[t = 0.25]2.64e-279090:00:018.7302e-16
[t = 0. 5]1.59e-245930:00:015.4723e-12

Table 3. MSE and performance for training, validation, and testing for the solution of Example 2.

LMSuggested update BFGSCG RP
Time 00:00:3900:00: 800:00:4400:00:12
Best Epoch 100081010001000
MSE 2.61912e-126.9328543e-172.9424106e-075.9553091e-06
Best training perf 2.694601e-126.694813e-142.21545518e-075.9894044e-06
Best validation perf 2.334575e-127.2694735e-161.996644e-075.7156087e-06
Best test perf 2.5514463e-127.7070942e-152.254638e-076.0358983e-06

5. Conclusions

In this study, we constructed improved BFGS quasi-Newton updating formulae by using the proposed robust QN -equation. Second-order information from Hessian’s Hessian objective function Hessian’s is used in this study to develop a novel quasi-Newton equation. Two nonlinear 4th order example are provided to illustrate the accuracy of the suggested update, The results are consistent with the practical results and conform to the results that the suggested update, is globally convergent.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 16 Jan 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Ghazi F, Tawfiq L and Kareem Z. Update Quasi-Newton Algorithm for Training ANN [version 1; peer review: awaiting peer review]. F1000Research 2026, 15:71 (https://doi.org/10.12688/f1000research.172826.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 16 Jan 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.