Talk:Rate of convergence
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
old and untitled
[edit]I think that for linear convergence we need to be strictly smaller than one. Indeed, the sublinearly convergent seguence has . -- Jitse Niesen 18:00, 27 Dec 2004 (UTC)
--
- In the book
- Richard L. Burden, J. Douglas Faires (2000), "Numerical Analysis, (7th Ed)", Brooks/Cole. ISBN 0534382169
- they just say for linear convergence.
- I agree with you, intuitively converges sublinearly. On the other hand, the sequence converges linearly, but with Same for for . There are other strange sequences. For example, does not converge with rate faster than linear (that is with ), however if we say it converges linearly, one gets
- So, the rate of convergence is not a perfect indicator of how fast a sequence converges. Probably ultimately it is a matter of convention. Do you have any references where for linear convergence? --Oleg Alexandrov 19:12, 27 Dec 2004 (UTC).
I am abroad, so I cannot check any references at the moment (I'll probably be back in the second week of January), but I probably got it either from the Gautschi book mentioned in the article, or from Süli and Mayers' An Introduction to Numerical Analysis.
According to the definition currently in the article, the sequence eps_k = 1/k does NOT converge linearly and eps_k = 1/2^(k^2) converges sublinearly, but not with order > 1. This is the correct definition in the framework of root-finding algorithms (unless I'm wildly wrong): the error of the bisection method is 1/2^k, which converges linearly, and the error of Newton's method is something like 1/2^(2^k), which converges quadratically.
- I think you meant eps_k = 1/2^(k^2) converges superlinearly. You are right about the errors of the bisection method and Newton's method. We need to get the defintion of rate of convergence right, but ultimately, we discuss technicalities, the point of all numerical textbooks is that quadratic convergence is what we want, everything else is not important. Oleg Alexandrov 20:37, 27 Dec 2004 (UTC)
I now realize that there is another definition in numerical ordinary differential equations: a method has order q if the error is like 1/N^q where N = number of variables. A method with error 1/2^N (like the spectral method) would be called exponentially convergent in this context. This can be confusing, and so it should be explained better in the article.
- I hope introducing an alternate defintion will not make things more confusing. Oleg Alexandrov 20:37, 27 Dec 2004 (UTC)
By the way, how do you like the book by Burden and Faires? I have never read it, but I saw it references in several Wikipedia articles, so perhaps I should give it a try. However, I still think we need μ < 1 because every convergent sequence satisfies the condition with μ = 1. -- Jitse Niesen 20:05, 27 Dec 2004 (UTC)
- This is my wife's favorite numerical analysis book (that should matter a lot! :), it is also the standard textbook at UCLA, where I teach. Recently I was contacted to review a textbook in numerical analysis, and the editors wanted to know how their book compares to Burden and Faires. I would say it is a very nice book, with lots of problems, and good explanations. I think I put that book as references in the several places you saw it. Oleg Alexandrov 20:37, 27 Dec 2004 (UTC)
- PS Burden and Faires is also very expensive ($140.95 list price in the US), it is also 7th edition. This probably says that the book is in demand.
Burden and Faires definition again
[edit]That book, unfortunately not only forgets that one must have μ<1, they actually have exercises where they ask to prove x_n=1/n converges linearly . So I think that book is overall confused on that topic. Oleg Alexandrov 20:36, 16 Jan 2005 (UTC)
- Cheers, I didn't notice that, so it is a deliberate choice of Burden and Faires. All other books require μ < 1 though, so I'm keeping that definition. It's a pity, since the Burden and Faires book seems pretty nice overall, and apparently also used as a textbook at my university. -- Jitse Niesen 11:55, 17 Jan 2005 (UTC)
- I think the point of Burden and Faires is that they are really after quadratic convergence. Therefore, anything slower than that they think is all and the same damn slow linear convergence. :) Oleg Alexandrov 16:11, 17 Jan 2005 (UTC)
This book provides a formal definition on Rate of Convergence of functions that I think gives us a better understanding on this topic. Also, from this definition I find that using Taylor's theorem is the "safest" way to find the rate of convergence of a function. The definition is stated as follow:
Suppose that and . If a positive constant K exists with , for sufficiently small , then we write .
The functions we use for comparison generally have the form , where . (Definition 1.19, P37, Burden and Faires, 7th Edition)
By using Taylor series, we will want to be the largest value so that .Usually we can stop at the first non-zero term after the limit in the Taylor series. -- Hoai Tran 4:12, 28 Sep 2008 (OU)
Further tweaks to extended definition
[edit]In reference to my simultaneous edit to the article: The new formulation with "converges with at least order q" was taken from the Suli and Mayers book. The previous formulation ("the order of convergence is the largest q") has the disadvantage that it is not obvious that such a q exists. Feel free to revert as it is a bit nit-picking. (PS: Removing Burden and Faires from the references was the right solution) -- Jitse Niesen 12:21, 20 Jan 2005 (UTC)
- I like your change. Cheers, Oleg Alexandrov | talk 05:41, 21 Jan 2005 (UTC)
Problem with the convergence examples
[edit]I'm fairly sure that sequence d doesn't converge at all... so can we use it as an example of sublinear convergence? -- EPAstor, 22:39 April 10, 2005 (UTC)
- I don't understand. Are you refering to the sequence
- This sequence certainly converges to zero (for every positive ε, one can find an N such that |dn| < ε for n ≥ N). Or perhaps you mean that the series
- does not converge? That is true, but it is not what the article says. -- Jitse Niesen 12:25, 11 Apr 2005 (UTC)
I thought the examples given here were a little sketchy and didn't include examples of finding the rate of convergence under the extended definition. I'm currently taking a Numerical Analysis class and using the Burden and Faires text. The examples given in that book for the extended definition involve using the Taylor series and reduce the problem to one of finding the first non-zero term in the Taylor series after the limit. For example, one exercise from Chapter 1 of the Burden and Faires book asked us to find the rate of convergence for
Since we know that the Talor series for sin(h) about zero is:
dividing by h gives us
Since 1 is the limit and the next non-zero term is a constant times h2, our rate of convergence is on the order of h2. --Salley Hyatt 15:23 28 Sept 2008 (OU)
Question regarding extended definition
[edit]In the basic definition, it seems that what one is really after is not the lim, but the lim sup. I think that this would actually be equivalent to the extended definition. Does that make any sense? --128.100.151.121 22:58, 11 February 2006 (UTC)
- Yes, using "lim sup" instead of "lim" would make the basic definition equivalent to the extended definition. But I find it simpler that way, without lim sum. No? Oleg Alexandrov (talk) 00:25, 12 February 2006 (UTC)
Swapping Pages
[edit]Hello everyone, first of all, I don't understand why the order of converging page was directing everyone to the rate of convergence page. They are clearly two different things. They both measure how fast a sequence converges but they are two different way of measuring convergence. Order of convergence is how fast the terms of a sequence get smaller compared to its previous terms. Rate of convergence measures how fast a sequence/function converges compared to a known sequence/function. In addition, this page previously held information about order of convergence as opposed to rate of convergence. So I created the Order of convergence page and swapped both pages with each other because the info on the page titled rate of convergence was actually about the order of convergence. It took me a while to figure out the confusion as well. It is not easy. I have also added a few more examples. In case you are wondering, I used the Burden & Faires 8th edition. I understand from previous discussions on the rate of convergence page that some of you are already familiar with this book. The two definitions are entirely different with the rate of convergence being defined on page 35 and order of convergence being defined on page 75.
A Real Kaiser 06:38, 28 October 2007 (UTC)
- I am familiar with the Burden & Faires book. I think it is rather non-standard in how it defines things. I undid your edits and I made order of convergence redirect here. Oleg Alexandrov (talk) 15:29, 4 April 2008 (UTC)
Rate of Convergence "Big Oh" notation.
[edit]There is a page on the "Big Oh" notation in Wikipedia but the page on rates of convergence does not have the definition using the "Big Oh" notation. I propose we add a section to the rates of convergence page that gives a definition using "Big Oh" notation and then link to the "Big Oh" notation page for more information. The definition would be as follows:
Suppose {Bn}n=1infinity is a sequence known to converge to zero, and {An}n=1infinity converges to a number A. If a positive constant k exists with
abs(An-A) <= k*abs(Bn), for large n
then we say that {An}n=1infinity converges to A with rate of convergence O(Bn). It is indicated by An = A + O(Bn).
Reference: Numerical Analysis by Richard L. Burden and J. Douglas Faires. —Preceding unsigned comment added by 132.235.37.25 (talk) 18:35, 26 September 2008 (UTC)
Emphasize O( ) method
[edit]On top of talking about Big O notation, I would also like to propose a section that emphasizes O( ) convergence, which is concerned in finding largest number p such that for some sequence {an} | ( 1 <= n <= infinity ) which converges to a, an=a + O( ).
This is similar to above definition, but is more useful in some cases.
For example if an = , bn = it can be demonstrated that although both {an} and {bn} converge to zero as n -> infinity, {bn} converges with rate bn = 0 + O( ), while an = 0 + O( ). This means bn goes to zero much faster than an.
Reference: Numerical Analysis by Richard L. Burden and J. Douglas Faires.
- There are two different contexts in which you can talk about the speed of convergence. The first one is iterative methods where n denotes the iteration number, the second one is numerical integration where n denotes the number of grid points. In the first context, "linear convergence" means roughly O(C−n), and in the second context, "linear convergence" means O(1/n). The first context is what's treated in the article now, the second context is what you're talking about. If you can address all this without confusing the reader, please go ahead. -- Jitse Niesen (talk) 21:38, 29 September 2008 (UTC)
- I wasn't very happy in how the two contexts were separated, so I rewrote bits with the hope that it's clearer now. -- Jitse Niesen (talk) 00:58, 30 November 2008 (UTC)
Discretization
[edit]I think the information in this article concerning convergence speed for discretization methods is incomplete and a little misleading. The article states (in the second paragraph) that the solution of the discretized problem converges to the solution of the continuous problem as the grid size goes to zero, which is true in a theoretical sense, but practically speaking, the rate of convergence is increased by improving the efficiency and accuracy of the algorithm. Decreasing grid size increases accuracy, but also the number of calculations required, which can very quickly become prohibitive. The section entitled, "Convergence Speed For Discretization Methods" is also somewhat misleading. It is not completely clear whether "n" refers to the number of grid points or the number of grid spaces, although I think the author is referring to the latter. There is no mention of LTE (local truncation error) and GTE (global truncation error), and its effect on convergence. I propose to add the following information: When discretizing a continuous problem, such as a differential equation with an initial value, it is often impractical to increase the rate of convergence by decreasing the step size; a better solution is to improve the efficiency of the algorithm. A better algorithm can give the same accuracy with fewer steps, or better accuracy with the same number of steps. Associated with any algorithm is a local truncation error (LTE), which is the error inherent in the method at each step. The global truncation error (GTE) is the total error, n×LTE, where n is the number of steps (time steps, in an initial value problem). The order of a method is based on its GTE. For instance, the Euler Method of solving IVPs (initial value problems) is of order O(h), meaning the total error is of the same magnitude as "h" (one time step); i.e., the GTE = C×h where C is some constant. The Euler Method uses the Forward Difference approximation (not the most accurate approximation of a derivative) and uses one value of the function at each step. In contrast, The Runge-Kutta 4 Method uses a weighted average of four values of the function at each step and achieves a GTE of order O(h^4). Since h is a fraction, h^4 is a much smaller number than h (h = (b-a)/n, where (b-a) is the total interval and n is the number of subintervals), meaning the total error for the same number of steps is much smaller. Alternatively, the Runge-Kutta Method achieves the same level of accuracy as the Euler Method with much fewer steps, which is what is normally desired in a computer algorithm. Gc953496 (talk) 00:04, 10 September 2012 (UTC)
- Yes and no. Yes, that section is somewhat underformulated. It is only consistent for one-dimensional discretizations. And no, this article is not about comparing discretization or integration methods but strictly on the narrow topic of the definition of "rate of convergence" and probably also "order of convergence". And even in the extended sense your approach is wrong, since one would like to, as you do too in the end, speak about the rate or order of convergence of the same method under refinement of the discretization. Also your error globalization formula is not entirely correct.--LutzL (talk) 09:53, 10 September 2012 (UTC)
- LTE and GTE are discussed in Truncation error (numerical integration) and belong there better than here. A sentence with that link might be good.
- Mjmohio (talk) 00:55, 19 September 2012 (UTC)
Correction of Citation for Convergence of Secant Method;
[edit]The citation which references the convergence order of the secant method is incorrect. In particular, instead of providing a 1.681... decimal value (should be 1.618... at the least), it should note that the order is equal to the golden ratio. Thus, the last part of the citation would be more specific and correct if it were written as:
- Corrected the numerical value. Please note that the letter phi is already a link to the golden ratio.--LutzL (talk) 15:14, 13 September 2012 (UTC)
regular root
[edit]“ | q may be non-integer. For example, the secant method has, in the case of convergence to a regular root, convergence order φ=1.618.... | ” |
What's a regular root? —Tamfang (talk) 03:58, 19 April 2014 (UTC)
- A root where the function is regular. Since the function has to be at least 1,1-smooth to get this result, it means that the function value is zero and the derivative value different from zero.--LutzL (talk) 06:30, 19 April 2014 (UTC)
"Finally, the sequence {dk} converges sublinearly."
[edit]This sentence should be strengthened to "converges logarithmically" because (1/n -1/[n+1])/(1/[n+1]-1/[n+2])=(n+2)/n, which converges to 1. I will make the change now.144.35.45.98 (talk) 16:43, 6 November 2015 (UTC)
Definition of convergence with order .
[edit]In the definition the meaning of the constant is not explained. Joel Brennan (talk) 22:20, 23 April 2018 (UTC)
-- I'd like to add that Nocedal and Wright do not restrict M to be in (0,1). They explicitly say that M is "not necessarily less than 1". Check out page 619 of Nocedal and Wright, 2nd Edition. I'll make this change on the page. Jwd0023 (talk) 16:37, 26 February 2019 (UTC)
yes — Preceding unsigned comment added by 194.250.110.49 (talk) 09:04, 19 November 2023 (UTC)
Linear convergence is wrong
[edit]The definition for linear convergence should have a "limsup" not a "lim", because the lim does not necessarily exist. For example, you could have a sequence where the distance to the optimium decreases by a factor of 1/2 at odd iterations, but by 1/3 at even iterations. That would still be linearly convergent with rate 1/2, but the lim does not exist. 2001:16B8:60E3:6200:AC65:A419:7775:7CFF (talk) 11:42, 23 April 2020 (UTC)
- 2001 16B8 194.250.110.49 (talk) 08:59, 19 November 2023 (UTC)
- This should now be clear in the article text: Q-convergence is not defined for all sequences, and in situations like the one you've proposed here, R-convergence is the better analytical tool. The sequence you've suggested converges R-linearly with rate 1/sqrt(6), which is faster than rate 1/2 and slower than rate 1/3. RowanElder (talk) 02:45, 16 September 2024 (UTC)
Differening definitions of Convergence Order
[edit]There appears to be some disagreement among sources about whether or not is allowed for the order of convergence. In https://link.springer.com/content/pdf/10.1007/BF00939805.pdf they specify that q>1 (in their paper q is called ). But in http://fourier.eng.hmc.edu/e176/lectures/NM/node3.html it is given as . Should we include a note that says both definitions are in use? The-erinaceous-one (talk) 05:20, 1 August 2020 (UTC)
- I went ahead and included note in the article that clarifies the difference with a reference to https://link.springer.com/content/pdf/10.1007/BF00939805.pdf. The-erinaceous-one (talk) 07:04, 1 August 2020 (UTC)
limit or limit superior?
[edit]I know the definition of order of convergence with lim sup rather than lim, because in many cases the limit does not exist, and one is mainly interested in getting the inequality |x(n+1) - L| ≤ C |x(n) - L|q, which BTW can also (and actually should) be used to define convergence of order q. In case the sequence might jump to the exact limit, which happens, e.g., when the bisection method is applied to find a root of the form r = m/2^n, one cannot divide by (x(n) - L). — MFH:Talk 14:02, 28 March 2022 (UTC)
- I hope the new edits resolve this: the lead definition is given clear caveats and later a more technical alternative is introduced, R-convergence, that incorporates these inequalities. Still needs work, for sure, but I hope this now seems covered. RowanElder (talk) 02:50, 16 September 2024 (UTC)
do we assume that mew I. the first paragraph is any non-zero number?
[edit]what is mew?if it can include 0 and infinity then there can be any order of convergence? is this true? can someone clarify within this page. thanks Jarfuls of Tweed (talk) 11:48, 13 December 2022 (UTC)
- nvm, it's clear, it is just explained farther down Jarfuls of Tweed (talk) 11:54, 13 December 2022 (UTC)
hijacked citation?
[edit]- The terms Q-linear and R-linear are used in; The Big O definition when using Taylor series is used in Nocedal, Jorge; Wright, Stephen J. (2006), Numerical Optimization (2nd ed.), Berlin, New York: Springer-Verlag, pp. 619+620, ISBN 978-0-387-30303-1.
The bolded bit was inserted anonymously in October 2008, leaving the first "used in" with no citation to call its own. (The citation itself was added by Jitse Niesen half a year earlier.) I'm going to remove the addition; if you check the book and find that it also supports The Big O definition when using Taylor series, do add it back. —Tamfang (talk) 23:59, 10 November 2023 (UTC)
Dead links, questionable material
[edit]Aiming to do a quick copyedit of this page, I was surprised that the citations are largely to class notes and that important ones of these are now dead links, especially around some of the more doubtful claims of the article. It looks from this talk page like Nocedal and Wright has been the core reliable-source reference up to now and it's a great standard choice; I'll find my copy wherever it's lying around and will begin adding more inline citations to Nocedal and Wright, removing the dead links, and removing questionable claims contradicted by N&W and not supported by reliable sources. RowanElder (talk) 15:39, 12 September 2024 (UTC)
- I began this today with adding and correcting material; dead links remain. RowanElder (talk) 19:33, 13 September 2024 (UTC)
- I've now looked through the "discretization methods" section again and it's a mess. It probably merits a total rewrite. The opening paragraph implicitly sets up a very specific class of discretization method (unidimensional forward solvers) without explicitly declaring that as its scope, rather declaring a much more general scope, and the specific discussion does not generalize very naturally to the proper general scope. The "n" vs "h" issue is also serious but is shallow compared to this deeper set-up issue and it isn't worth solving alone without solving the deeper issues. RowanElder (talk) 19:52, 19 September 2024 (UTC)
- Having looked at this properly now, it looks like the problems with the discretization section root in a series of well-justified but only partially-executed simplifications that focused the rest of the article on numerical analysis and on Q-convergence at the expense of rates of convergence in real analysis, dynamical systems, o-notation (o(), O(), Omega(), etc.), etc. However, it appears to me that the article cannot really make proper sense now without either (a) dropping the discretization section entirely or (b) better contextualizing Q-convergence with the dropped material. The latter seems preferable to me for fitting the general title of the article.
- My proposal for executing that is to restructure the page in the following way:
- (1) In the lead section, begin with a more informal definition in the first paragraph, introducing the contrast of asymptotic and pre-asymptotic rates, then use a short second paragraph to introduce the two main classes of general technical definitions of asymptotic convergence rates: the log-linear iterate-indexed class used for the Q-convergence section and the linear-linear grid-parameter-indexed class used for the discretization methods section. In a third paragraph, describe both as special cases of the general analytic apparatus of asymptotically bounding sequences, currently only given the barest introduction in the R-convergence subsection of the article, and o-notation.
- (2) Retain the "for numerical iterative methods" section mostly as-is, but bring the "recurrences" and "acceleration" sections into this section as subsections. Add a little on o-notation, but mostly as a pointer to the analytical section.
- (3) Revise the "for numerical discretization methods" section without changing the nature of the current content much. Add a small amount on contrasts of L2, pointwise, and uniform rates of convergence. Let the current unexplained use of o-notation point to the analytical section.
- (4) Add a "for analytical proof" section (hopefully a better title will come to me later) that discusses o-notation properly and discusses the use of bounding sequences to characterize asymptotic rates, complementing the current short use in the "R-convergence" subsection.
- (5) Add a "pre-asymptotic rates of convergence" stub section, introducing both practical "number of iterates to desired accuracy"-type empirical approaches and analytical tools like Markov chain mixing times.
- The result should hopefully be something that is less technical up-front and then more correct in the technical information it does cover deeper in the article where it is motivated. RowanElder (talk) 23:27, 26 September 2024 (UTC)
- I now plan to name the planned "for analytical proof" section described here with the title "comparing asymptotic rates of convergence" instead. RowanElder (talk) 16:31, 7 October 2024 (UTC)
- This is now effectively done. I chose not to add Markov chain stopping time type pre-asymptotics because this article does not otherwise contain any stochastics. It would be a better article if it did, since these days stochastic gradient descent is an overwhelmingly important iterative numerical method, for instance, but it doesn't seem right to fold that work into this current, narrower revision. RowanElder (talk) 15:19, 12 October 2024 (UTC)