Calculus of variations: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Jeese Bloom
the functional acts on a function,not the value of a function.By the way, there are still many similar errors in the following text,and i don't know how to correct them yet.
 
Euler–Lagrange equation: Clarified the logic of how the lemma is applied in the proof
 
(One intermediate revision by one other user not shown)
Line 12: Line 12:
== History ==
== History ==


The calculus of variations began with the work of [[Isaac Newton]], such as with [[Newton's minimal resistance problem]], which he formulated and solved in 1685, and later published in his ''[[Philosophiæ Naturalis Principia Mathematica|Principia]]'' in 1687,<ref name=":2">{{Cite book |last=Goldstine |first=Herman H. |url=https://books.google.com/books?id=_iTnBwAAQBAJ&pg=PA7 |title=A History of the Calculus of Variations from the 17th Through the 19th Century |date=1980 |publisher=Springer New York |isbn=978-1-4613-8106-8 |series= |location= |pages=7–21}}</ref> which was the first problem in the field to be formulated and correctly solved,<ref name=":2" /> and was also one of the most difficult problems tackled by variational methods prior to the twentieth century.<ref name=":0">{{Citation |last=Ferguson |first=James |title=A Brief Survey of the History of the Calculus of Variations and its Applications |date=2004 |arxiv=math/0402357 |bibcode=2004math......2357F }}</ref><ref name=":1">{{Cite book |last=Rowlands |first=Peter |url=https://books.google.com/books?id=ipA4DwAAQBAJ&pg=PA36 |title=Newton and the Great World System |date=2017 |publisher=[[World Scientific Publishing]] |isbn=978-1-78634-372-7 |pages=36–39 |language=en |doi=10.1142/q0108}}</ref><ref>{{Cite journal |last=Torres |first=Delfim F. M. |date=2021-07-29 |title=On a Non-Newtonian Calculus of Variations |journal=Axioms |language=en |volume=10 |issue=3 |pages=171 |doi=10.3390/axioms10030171 |doi-access=free |issn=2075-1680|arxiv=2107.14152 }}</ref> This problem was followed by the [[brachistochrone curve]] problem raised by [[Johann Bernoulli]] (1696),<ref name=GelfandFominP3>{{cite book| last1=Gelfand|first1=I. M.|author-link1=Israel Gelfand|last2=Fomin|first2=S. V.|author-link2=Sergei Fomin|title=Calculus of variations | year=2000|publisher=Dover Publications|location=Mineola, New York|isbn=978-0486414485|page=3| url=https://books.google.com/books?id=YkFLGQeGRw4C|edition=Unabridged repr.|editor1-last=Silverman| editor1-first=Richard A.}}</ref> which was similar to one raised by [[Galileo Galilei]] in 1638, but he did not solve the problem explicity nor did he use the methods based on calculus.<ref name=":0" /> Bernoulli had solved the problem, using the principle of least time in the process, but not calculus of variations, whereas Newton did to solve the problem in 1697, and as a result, he pioneered the field with his work on the two problems.<ref name=":1" /> The problem would immediately occupy the attention of [[Jacob Bernoulli]] and the [[Guillaume de l'Hôpital|Marquis de l'Hôpital]], but [[Leonhard Euler]] first elaborated the subject, beginning in 1733. [[Joseph-Louis Lagrange]] was influenced by Euler's work to contribute greatly to the theory. After Euler saw the 1755 work of the 19-year-old Lagrange, Euler dropped his own partly geometric approach in favor of Lagrange's purely analytic approach and renamed the subject the ''calculus of variations'' in his 1756 lecture ''Elementa Calculi Variationum''.<ref name=Thiele>{{cite book |last=Thiele |first=Rüdiger |editor-last1=Bradley |editor-first1=Robert E. |editor-last2=Sandifer |editor-first2=C. Edward |title=Leonhard Euler: Life, Work and Legacy |publisher=Elsevier |year=2007 |page=249 |chapter=Euler and the Calculus of Variations |chapter-url=https://books.google.com/books?id=75vJL_Y-PvsC&pg=PA249 |isbn=9780080471297}}</ref><ref name=Goldstine>{{cite book |last=Goldstine |first=Herman H. |year=2012 |title=A History of the Calculus of Variations from the 17th through the 19th Century |url=https://books.google.com/books?id=_iTnBwAAQBAJ&q=%22Indeed+after%22&pg=110 |publisher=Springer Science & Business Media |page=110 |isbn=9781461381068 |author-link=Herman Goldstine }}</ref>{{efn|"Euler waited until Lagrange had published on the subject in 1762 ... before he committed his lecture ... to print, so as not to rob Lagrange of his glory. Indeed, it was only Lagrange's method that Euler called Calculus of Variations."<ref name=Thiele/>}}
The calculus of variations began with the work of [[Isaac Newton]], such as with [[Newton's minimal resistance problem]], which he formulated and solved in 1685, and later published in his ''[[Philosophiæ Naturalis Principia Mathematica|Principia]]'' in 1687,<ref name=":2">{{Cite book |last=Goldstine |first=Herman H. |url=https://books.google.com/books?id=_iTnBwAAQBAJ&pg=PA7 |title=A History of the Calculus of Variations from the 17th Through the 19th Century |date=1980 |publisher=Springer New York |isbn=978-1-4613-8106-8 |series= |location= |pages=7–21}}</ref> which was the first problem in the field to be formulated and correctly solved,<ref name=":2" /> and was also one of the most difficult problems tackled by variational methods prior to the twentieth century.<ref name=":0">{{Citation |last=Ferguson |first=James |title=A Brief Survey of the History of the Calculus of Variations and its Applications |date=2004 |arxiv=math/0402357 |bibcode=2004math......2357F }}</ref><ref name=":1">{{Cite book |last=Rowlands |first=Peter |url=https://books.google.com/books?id=ipA4DwAAQBAJ&pg=PA36 |title=Newton and the Great World System |date=2017 |publisher=[[World Scientific Publishing]] |isbn=978-1-78634-372-7 |pages=36–39 |language=en |doi=10.1142/q0108}}</ref><ref>{{Cite journal |last=Torres |first=Delfim F. M. |date=2021-07-29 |title=On a Non-Newtonian Calculus of Variations |journal=Axioms |language=en |volume=10 |issue=3 |pages=171 |doi=10.3390/axioms10030171 |doi-access=free |issn=2075-1680|arxiv=2107.14152 }}</ref> This problem was followed by the [[brachistochrone curve]] problem raised by [[Johann Bernoulli]] (1696),<ref name=GelfandFominP3>{{cite book| last1=Gelfand|first1=I. M.|author-link1=Israel Gelfand|last2=Fomin|first2=S. V.|author-link2=Sergei Fomin|title=Calculus of variations | year=2000|publisher=Dover Publications|location=Mineola, New York|isbn=978-0486414485|page=3| url=https://books.google.com/books?id=YkFLGQeGRw4C|edition=Unabridged repr.|editor1-last=Silverman| editor1-first=Richard A.}}</ref> which was similar to one raised by [[Galileo Galilei]] in 1638, but he did not solve the problem explicitly nor did he use the methods based on calculus.<ref name=":0" /> Bernoulli solved the problem using the principle of least time in the process, but not calculus of variations. In 1697 Newton solved the problem using variational techniques, and as a result, he pioneered the field with his work on the two problems.<ref name=":1" /> The problem would immediately occupy the attention of [[Jacob Bernoulli]] and the [[Guillaume de l'Hôpital|Marquis de l'Hôpital]], but [[Leonhard Euler]] first elaborated the subject, beginning in 1733. [[Joseph-Louis Lagrange]] was influenced by Euler's work to contribute greatly to the theory. After Euler saw the 1755 work of the 19-year-old Lagrange, Euler dropped his own partly geometric approach in favor of Lagrange's purely analytic approach and renamed the subject the ''calculus of variations'' in his 1756 lecture ''Elementa Calculi Variationum''.<ref name=Thiele>{{cite book |last=Thiele |first=Rüdiger |editor-last1=Bradley |editor-first1=Robert E. |editor-last2=Sandifer |editor-first2=C. Edward |title=Leonhard Euler: Life, Work and Legacy |publisher=Elsevier |year=2007 |page=249 |chapter=Euler and the Calculus of Variations |chapter-url=https://books.google.com/books?id=75vJL_Y-PvsC&pg=PA249 |isbn=9780080471297}}</ref><ref name=Goldstine>{{cite book |last=Goldstine |first=Herman H. |year=2012 |title=A History of the Calculus of Variations from the 17th through the 19th Century |url=https://books.google.com/books?id=_iTnBwAAQBAJ&q=%22Indeed+after%22&pg=110 |publisher=Springer Science & Business Media |page=110 |isbn=9781461381068 |author-link=Herman Goldstine }}</ref>{{efn|"Euler waited until Lagrange had published on the subject in 1762 ... before he committed his lecture ... to print, so as not to rob Lagrange of his glory. Indeed, it was only Lagrange's method that Euler called Calculus of Variations."<ref name=Thiele/>}}


[[Adrien-Marie Legendre]] (1786) laid down a method, not entirely satisfactory, for the discrimination of maxima and minima. [[Isaac Newton]] and [[Gottfried Leibniz]] also gave some early attention to the subject.<ref name="brunt">{{cite book |last=van Brunt |first=Bruce |title=The Calculus of Variations |publisher=Springer |year=2004 |isbn=978-0-387-40247-5}}</ref> To this discrimination [[Vincenzo Brunacci]] (1810), [[Carl Friedrich Gauss]] (1829), [[Siméon Denis Poisson|Siméon Poisson]] (1831), [[Mikhail Ostrogradsky]] (1834), and [[Carl Gustav Jacob Jacobi|Carl Jacobi]] (1837) have been among the contributors. An important general work is that of [[Pierre Frédéric Sarrus]] (1842) which was condensed and improved by [[Augustin-Louis Cauchy]] (1844). Other valuable treatises and memoirs have been written by [[Strauch]]{{which|date=October 2024}} (1849), [[John Hewitt Jellett]] (1850), [[Otto Hesse]] (1857), [[Alfred Clebsch]] (1858), and Lewis Buffett Carll (1885), but perhaps the most important work of the century is that of [[Karl Weierstrass]]. His celebrated course on the theory is epoch-making, and it may be asserted that he was the first to place it on a firm and unquestionable foundation. The [[Hilbert's twentieth problem|20th]] and the [[Hilbert's twenty-third problem|23rd]] [[Hilbert problems|Hilbert problem]] published in 1900 encouraged further development.<ref name="brunt" />
[[Adrien-Marie Legendre]] (1786) laid down a method, not entirely satisfactory, for the discrimination of maxima and minima. [[Isaac Newton]] and [[Gottfried Leibniz]] also gave some early attention to the subject.<ref name="brunt">{{cite book |last=van Brunt |first=Bruce |title=The Calculus of Variations |publisher=Springer |year=2004 |isbn=978-0-387-40247-5}}</ref> To this discrimination [[Vincenzo Brunacci]] (1810), [[Carl Friedrich Gauss]] (1829), [[Siméon Denis Poisson|Siméon Poisson]] (1831), [[Mikhail Ostrogradsky]] (1834), and [[Carl Gustav Jacob Jacobi|Carl Jacobi]] (1837) have been among the contributors. An important general work is that of [[Pierre Frédéric Sarrus]] (1842) which was condensed and improved by [[Augustin-Louis Cauchy]] (1844). Other valuable treatises and memoirs have been written by [[Strauch]]{{which|date=October 2024}} (1849), [[John Hewitt Jellett]] (1850), [[Otto Hesse]] (1857), [[Alfred Clebsch]] (1858), and Lewis Buffett Carll (1885), but perhaps the most important work of the century is that of [[Karl Weierstrass]]. His celebrated course on the theory is epoch-making, and it may be asserted that he was the first to place it on a firm and unquestionable foundation. The [[Hilbert's twentieth problem|20th]] and the [[Hilbert's twenty-third problem|23rd]] [[Hilbert problems|Hilbert problem]] published in 1900 encouraged further development.<ref name="brunt" />
Line 21: Line 21:


The calculus of variations is concerned with the maxima or minima (collectively called '''extrema''') of functionals. A functional maps [[Function (mathematics)|functions]] to [[scalar (mathematics)|scalars]], so functionals have been described as "functions of functions."  Functionals have extrema with respect to the elements <math>y</math> of a given [[function space]] defined over a given [[Domain of a function|domain]]. A functional <math>J[y]</math> is said to have an extremum at the function <math>f</math> if <math>\Delta J = J[y] - J[f]</math> has the same [[Sign (mathematics)|sign]] for all <math>y</math> in an arbitrarily small neighborhood of <math>f.</math>{{efn|The neighborhood of <math>f</math> is the part of the given function space where <math>|y - f| < h</math> over the whole domain of the functions, with <math>h</math> a positive number that specifies the size of the neighborhood.<ref name='CourHilb1953P169'>{{cite book |last1=Courant |first1=R |author-link1=Richard Courant |last2=Hilbert |first2=D |author-link2=David Hilbert |title = Methods of Mathematical Physics |volume=I |edition=First English |publisher=Interscience Publishers, Inc. |year=1953 |location=New York |page=169 |isbn=978-0471504474}}</ref>}} The function <math>f</math> is called an '''extremal''' function or extremal.{{efn|name=ExtremalVsExtremum| Note the difference between the terms extremal and extremum. An extremal is a function that makes a functional an extremum.}} The extremum <math>J[f]</math> is called a local maximum if <math>\Delta J \leq 0</math> everywhere in an arbitrarily small neighborhood of <math>f,</math> and a local minimum if <math>\Delta J \geq 0</math> there. For a function space of continuous functions, extrema of corresponding functionals are called '''strong extrema''' or '''weak extrema''', depending on whether the first derivatives of the continuous functions are respectively all continuous or not.<ref name='GelfandFominPP12to13'>{{harvnb|Gelfand|Fomin|2000|pp=12–13}}</ref>
The calculus of variations is concerned with the maxima or minima (collectively called '''extrema''') of functionals. A functional maps [[Function (mathematics)|functions]] to [[scalar (mathematics)|scalars]], so functionals have been described as "functions of functions."  Functionals have extrema with respect to the elements <math>y</math> of a given [[function space]] defined over a given [[Domain of a function|domain]]. A functional <math>J[y]</math> is said to have an extremum at the function <math>f</math> if <math>\Delta J = J[y] - J[f]</math> has the same [[Sign (mathematics)|sign]] for all <math>y</math> in an arbitrarily small neighborhood of <math>f.</math>{{efn|The neighborhood of <math>f</math> is the part of the given function space where <math>|y - f| < h</math> over the whole domain of the functions, with <math>h</math> a positive number that specifies the size of the neighborhood.<ref name='CourHilb1953P169'>{{cite book |last1=Courant |first1=R |author-link1=Richard Courant |last2=Hilbert |first2=D |author-link2=David Hilbert |title = Methods of Mathematical Physics |volume=I |edition=First English |publisher=Interscience Publishers, Inc. |year=1953 |location=New York |page=169 |isbn=978-0471504474}}</ref>}} The function <math>f</math> is called an '''extremal''' function or extremal.{{efn|name=ExtremalVsExtremum| Note the difference between the terms extremal and extremum. An extremal is a function that makes a functional an extremum.}} The extremum <math>J[f]</math> is called a local maximum if <math>\Delta J \leq 0</math> everywhere in an arbitrarily small neighborhood of <math>f,</math> and a local minimum if <math>\Delta J \geq 0</math> there. For a function space of continuous functions, extrema of corresponding functionals are called '''strong extrema''' or '''weak extrema''', depending on whether the first derivatives of the continuous functions are respectively all continuous or not.<ref name='GelfandFominPP12to13'>{{harvnb|Gelfand|Fomin|2000|pp=12–13}}</ref>
 
[[File:Examples of Euler-Lagrange equation.jpg|thumb|300x300px|Examples where calculus of variations can be applied- finding minimal surfaces, finding geodesics, deriving Snell's law of refraction, getting an equation to solve the double pendulum problem numerically]]
Both strong and weak extrema of functionals are for a space of continuous functions but strong extrema have the additional requirement that the first derivatives of the functions in the space be continuous. Thus a strong extremum is also a weak extremum, but the [[Converse (logic)|converse]] may not hold.  Finding strong extrema is more difficult than finding weak extrema.<ref name='GelfandFominP13'>{{harvnb | Gelfand|Fomin| 2000 | p=13 }}</ref> An example of a [[Necessity and sufficiency|necessary condition]] that is used for finding weak extrema is the [[Euler–Lagrange equation]].<ref name='GelfandFominPP14to15'>{{harvnb | Gelfand|Fomin| 2000 | pp=14–15 }}</ref>{{efn|name=SectionVarSuffCond| For a sufficient condition, see section [[#Variations and sufficient condition for a minimum|Variations and sufficient condition for a minimum]].}}
Both strong and weak extrema of functionals are for a space of continuous functions but strong extrema have the additional requirement that the first derivatives of the functions in the space be continuous. Thus a strong extremum is also a weak extremum, but the [[Converse (logic)|converse]] may not hold.  Finding strong extrema is more difficult than finding weak extrema.<ref name="GelfandFominP13">{{harvnb | Gelfand|Fomin| 2000 | p=13 }}</ref> An example of a [[Necessity and sufficiency|necessary condition]] that is used for finding weak extrema is the [[Euler–Lagrange equation]].<ref name="GelfandFominPP14to15">{{harvnb | Gelfand|Fomin| 2000 | pp=14–15 }}</ref>{{efn|name=SectionVarSuffCond| For a sufficient condition, see section [[#Variations and sufficient condition for a minimum|Variations and sufficient condition for a minimum]].}}


== Euler–Lagrange equation ==
== Euler–Lagrange equation ==
Line 29: Line 29:


Consider the functional
Consider the functional
<math display="block">J[y] = \int_{x_1}^{x_2} L\left(x,y(x),y'(x)\right)\, dx \, .</math>
 
<math display="block">J[y] = \int_{x_1}^{x_2} L\left(x,y(x),y'(x)\right)\, dx,</math>
 
where
where
*<math>x_1, x_2</math> are [[Constant (mathematics)|constants]],
*<math>x_1, x_2</math> are [[Constant (mathematics)|constants]],
Line 37: Line 39:


If the functional <math>J[y]</math> attains a [[local minimum]] at <math>f,</math> and <math>\eta(x)</math> is an arbitrary function that has at least one derivative and vanishes at the endpoints <math>x_1</math> and <math>x_2,</math> then for any number <math>\varepsilon</math> close to 0,
If the functional <math>J[y]</math> attains a [[local minimum]] at <math>f,</math> and <math>\eta(x)</math> is an arbitrary function that has at least one derivative and vanishes at the endpoints <math>x_1</math> and <math>x_2,</math> then for any number <math>\varepsilon</math> close to 0,
<math display="block">J[f] \le J[f + \varepsilon \eta] \, .</math>
<math display="block">J[f] \le J[f + \varepsilon \eta] \, .</math>


Line 44: Line 47:


<math display="block">\Phi(\varepsilon) = J[f+\varepsilon\eta] \, .</math>
<math display="block">\Phi(\varepsilon) = J[f+\varepsilon\eta] \, .</math>
Since the functional <math>J[y]</math> has a minimum for <math>y = f</math> the function <math>\Phi(\varepsilon)</math> has a minimum at <math>\varepsilon = 0</math> and thus,{{efn|The product <math>\varepsilon \Phi'(0)</math> is called the first variation of the functional <math>J</math> and is denoted by <math>\delta J.</math>  Some references define the [[first variation]] differently by leaving out the <math>\varepsilon</math> factor.}}
Since the functional <math>J[y]</math> has a minimum for <math>y = f</math> the function <math>\Phi(\varepsilon)</math> has a minimum at <math>\varepsilon = 0</math> and thus,{{efn|The product <math>\varepsilon \Phi'(0)</math> is called the first variation of the functional <math>J</math> and is denoted by <math>\delta J.</math>  Some references define the [[first variation]] differently by leaving out the <math>\varepsilon</math> factor.}}
<math display="block">\Phi'(0) \equiv \left.\frac{d\Phi}{d\varepsilon}\right|_{\varepsilon = 0} = \int_{x_1}^{x_2} \left.\frac{dL}{d\varepsilon}\right|_{\varepsilon = 0} dx = 0 \, .</math>
<math display="block">\Phi'(0) \equiv \left.\frac{d\Phi}{d\varepsilon}\right|_{\varepsilon = 0} = \int_{x_1}^{x_2} \left.\frac{dL}{d\varepsilon}\right|_{\varepsilon = 0} dx = 0 \, .</math>


Taking the [[total derivative]] of <math>L\left[x, y, y'\right],</math> where <math>y = f + \varepsilon \eta</math> and <math>y' = f' + \varepsilon \eta'</math> are considered as functions of <math>\varepsilon</math> rather than <math>x,</math> yields
Taking the [[total derivative]] of <math>L\left[x, y, y'\right],</math> where <math>y = f + \varepsilon \eta</math> and <math>y' = f' + \varepsilon \eta'</math> are considered as functions of <math>\varepsilon</math> rather than <math>x,</math> yields
<math display="block">\frac{dL}{d\varepsilon}=\frac{\partial L}{\partial y}\frac{dy}{d\varepsilon} + \frac{\partial L}{\partial y'}\frac{dy'}{d\varepsilon}</math>
<math display="block">\frac{dL}{d\varepsilon}=\frac{\partial L}{\partial y}\frac{dy}{d\varepsilon} + \frac{\partial L}{\partial y'}\frac{dy'}{d\varepsilon}</math>
and because <math>\frac{dy}{d \varepsilon} = \eta</math> and <math>\frac{d y'}{d \varepsilon} = \eta',</math>
and because <math>\frac{dy}{d \varepsilon} = \eta</math> and <math>\frac{d y'}{d \varepsilon} = \eta',</math>
<math display="block">\frac{dL}{d\varepsilon}=\frac{\partial L}{\partial y}\eta + \frac{\partial L}{\partial y'}\eta'.</math>
<math display="block">\frac{dL}{d\varepsilon}=\frac{\partial L}{\partial y}\eta + \frac{\partial L}{\partial y'}\eta'.</math>


Therefore,
Therefore,
<math display="block">\begin{align}
<math display="block">\begin{align}
\int_{x_1}^{x_2} \left.\frac{dL}{d\varepsilon}\right|_{\varepsilon = 0} dx
\int_{x_1}^{x_2} \left.\frac{dL}{d\varepsilon}\right|_{\varepsilon = 0} dx
Line 59: Line 68:
  & = \int_{x_1}^{x_2} \left(\frac{\partial L}{\partial f} \eta - \eta \frac{d}{dx}\frac{\partial L}{\partial f'} \right)\, dx\\
  & = \int_{x_1}^{x_2} \left(\frac{\partial L}{\partial f} \eta - \eta \frac{d}{dx}\frac{\partial L}{\partial f'} \right)\, dx\\
\end{align}</math>
\end{align}</math>
where <math>L\left[x, y, y'\right] \to L\left[x, f, f'\right]</math> when <math>\varepsilon = 0</math> and we have used [[integration by parts]] on the second term.  The second term on the second line vanishes because <math>\eta = 0</math> at <math>x_1</math> and <math>x_2</math> by definition.  Also, as previously mentioned the left side of the equation is zero so that
where <math>L\left[x, y, y'\right] \to L\left[x, f, f'\right]</math> when <math>\varepsilon = 0</math> and we have used [[integration by parts]] on the second term.  The second term on the second line vanishes because <math>\eta = 0</math> at <math>x_1</math> and <math>x_2</math> by definition.  Also, as previously mentioned the left side of the equation is zero so that
<math display="block">\int_{x_1}^{x_2} \eta (x) \left(\frac{\partial L}{\partial f} - \frac{d}{dx}\frac{\partial L}{\partial f'} \right) \, dx = 0 \, .</math>
<math display="block">\int_{x_1}^{x_2} \eta (x) \left(\frac{\partial L}{\partial f} - \frac{d}{dx}\frac{\partial L}{\partial f'} \right) \, dx = 0 \, .</math>


According to the [[fundamental lemma of calculus of variations]], the part of the integrand in parentheses is zero, i.e.
According to the [[fundamental lemma of calculus of variations]], the fact that this equation holds for any choice of <math>\eta</math> implies that the part of the integrand in parentheses is zero, i.e.
 
<math display="block">\frac{\partial L}{\partial f} -\frac{d}{dx} \frac{\partial L}{\partial f'}=0</math>
<math display="block">\frac{\partial L}{\partial f} -\frac{d}{dx} \frac{\partial L}{\partial f'}=0</math>
which is called the '''Euler–Lagrange equation'''.  The left hand side of this equation is called the [[functional derivative]] of <math>J[f]</math> and is denoted <math>\delta J</math> or <math>\delta f(x).</math>
which is called the '''Euler–Lagrange equation'''.  The left hand side of this equation is called the [[functional derivative]] of <math>J[f]</math> and is denoted <math>\delta J</math> or <math>\delta f(x).</math>


Line 70: Line 83:
=== Example ===
=== Example ===
In order to illustrate this process, consider the problem of finding the extremal function <math>y = f(x),</math> which is the shortest curve that connects two points <math>\left(x_1, y_1\right)</math> and <math>\left(x_2, y_2\right).</math> The [[arc length]] of the curve is given by
In order to illustrate this process, consider the problem of finding the extremal function <math>y = f(x),</math> which is the shortest curve that connects two points <math>\left(x_1, y_1\right)</math> and <math>\left(x_2, y_2\right).</math> The [[arc length]] of the curve is given by
<math display="block">A[y] = \int_{x_1}^{x_2} \sqrt{1 + [ y'(x) ]^2} \, dx \, ,</math>
<math display="block">A[y] = \int_{x_1}^{x_2} \sqrt{1 + [ y'(x) ]^2} \, dx \, ,</math>
with
with
<math display="block">y'(x) = \frac{dy}{dx} \, , \ \ y_1=f(x_1) \, , \ \ y_2=f(x_2) \, .</math>
<math display="block">y'(x) = \frac{dy}{dx} \, , \ \ y_1=f(x_1) \, , \ \ y_2=f(x_2) \, .</math>
Note that assuming {{mvar|y}} is a function of {{mvar|x}} loses generality; ideally both should be a function of some other parameter. This approach is good solely for instructive purposes.
Note that assuming {{mvar|y}} is a function of {{mvar|x}} loses generality; ideally both should be a function of some other parameter. This approach is good solely for instructive purposes.


The Euler–Lagrange equation will now be used to find the extremal function <math>f(x)</math> that minimizes the functional <math>A[y].</math>
The Euler–Lagrange equation will now be used to find the extremal function <math>f(x)</math> that minimizes the functional <math>A[y].</math>
<math display="block">\frac{\partial L}{\partial f} -\frac{d}{dx} \frac{\partial L}{\partial f'}=0</math>
<math display="block">\frac{\partial L}{\partial f} -\frac{d}{dx} \frac{\partial L}{\partial f'}=0</math>
with
with
<math display="block">L = \sqrt{1 + [ f'(x) ]^2} \, .</math>
<math display="block">L = \sqrt{1 + [ f'(x) ]^2} \, .</math>


Since <math>f</math> does not appear explicitly in <math>L,</math> the first term in the Euler–Lagrange equation vanishes for all <math>f(x)</math> and thus,
Since <math>f</math> does not appear explicitly in <math>L,</math> the first term in the Euler–Lagrange equation vanishes for all <math>f(x)</math> and thus,
<math display="block">\frac{d}{dx} \frac{\partial L}{\partial f'} = 0 \, .</math>
<math display="block">\frac{d}{dx} \frac{\partial L}{\partial f'} = 0 \, .</math>
Substituting for <math>L</math> and taking the derivative,
Substituting for <math>L</math> and taking the derivative,
<math display="block">\frac{d}{dx} \ \frac{f'(x)} {\sqrt{1 + [f'(x)]^2}} \ = 0 \, .</math>
<math display="block">\frac{d}{dx} \ \frac{f'(x)} {\sqrt{1 + [f'(x)]^2}} \ = 0 \, .</math>


Thus
Thus
<math display="block">\frac{f'(x)}{\sqrt{1+[f'(x)]^2}} = c \, ,</math>
<math display="block">\frac{f'(x)}{\sqrt{1+[f'(x)]^2}} = c \, ,</math>
for some constant <math>c.</math> Then
 
for some constant <math>c</math>. Then
 
<math display="block">\frac{[f'(x)]^2}{1+[f'(x)]^2} = c^2 \, ,</math>
<math display="block">\frac{[f'(x)]^2}{1+[f'(x)]^2} = c^2 \, ,</math>
where
where
<math display="block">0 \le c^2<1.</math>
<math display="block">0 \le c^2<1.</math>
Solving, we get
Solving, we get
<math display="block">[f'(x)]^2=\frac{c^2}{1-c^2}</math>
<math display="block">[f'(x)]^2=\frac{c^2}{1-c^2}</math>
which implies that
which implies that
<math display="block">f'(x)=m</math>
<math display="block">f'(x)=m</math>
is a constant and therefore that the shortest curve that connects two points <math>\left(x_1, y_1\right)</math> and <math>\left(x_2, y_2\right)</math> is
is a constant and therefore that the shortest curve that connects two points <math>\left(x_1, y_1\right)</math> and <math>\left(x_2, y_2\right)</math> is
<math display="block">f(x) = m x + b \qquad \text{with} \ \ m = \frac{y_2 - y_1}{x_2 - x_1} \quad \text{and} \quad b = \frac{x_2 y_1 - x_1 y_2}{x_2 - x_1}</math>
<math display="block">f(x) = m x + b \qquad \text{with} \ \ m = \frac{y_2 - y_1}{x_2 - x_1} \quad \text{and} \quad b = \frac{x_2 y_1 - x_1 y_2}{x_2 - x_1}</math>
and we have thus found the extremal function <math>f(x)</math> that minimizes the functional <math>A[y]</math> so that <math>A[f]</math> is a minimum. The equation for a straight line is <math>y = mx+b.</math> In other words, the shortest distance between two points is a straight line.{{efn|name=ArchimedesStraight| As a historical note, this is an axiom of [[Archimedes]]. See e.g. Kelland (1843).<ref>{{cite book |last=Kelland |first=Philip |author-link=Philip Kelland| title=Lectures on the principles of demonstrative mathematics |year=1843 |page=58 |url=https://books.google.com/books?id=yQCFAAAAIAAJ&pg=PA58 |via=Google Books}}</ref>}}
and we have thus found the extremal function <math>f(x)</math> that minimizes the functional <math>A[y]</math> so that <math>A[f]</math> is a minimum. The equation for a straight line is <math>y = mx+b.</math> In other words, the shortest distance between two points is a straight line.{{efn|name=ArchimedesStraight| As a historical note, this is an axiom of [[Archimedes]]. See e.g. Kelland (1843).<ref>{{cite book |last=Kelland |first=Philip |author-link=Philip Kelland| title=Lectures on the principles of demonstrative mathematics |year=1843 |page=58 |url=https://books.google.com/books?id=yQCFAAAAIAAJ&pg=PA58 |via=Google Books}}</ref>}}


== Beltrami's identity ==
== Beltrami's identity ==
In physics problems it may be the case that <math>\frac{\partial L}{\partial x} = 0,</math> meaning the integrand is a function of <math>f(x)</math> and <math>f'(x)</math> but <math>x</math> does not appear separately. In that case, the Euler–Lagrange equation can be simplified to the [[Beltrami identity]]<ref>{{cite web |author=Weisstein, Eric W. | url=http://mathworld.wolfram.com/Euler-LagrangeDifferentialEquation.html |title=Euler–Lagrange Differential Equation | website=mathworld.wolfram.com |publisher=Wolfram |at=Eq.&nbsp;(5)}}</ref>
In physics problems it may be the case that <math>\frac{\partial L}{\partial x} = 0,</math> meaning the integrand is a function of <math>f(x)</math> and <math>f'(x)</math> but <math>x</math> does not appear separately. In that case, the Euler–Lagrange equation can be simplified to the [[Beltrami identity]]<ref>{{cite web |author=Weisstein, Eric W. | url=http://mathworld.wolfram.com/Euler-LagrangeDifferentialEquation.html |title=Euler–Lagrange Differential Equation | website=mathworld.wolfram.com |publisher=Wolfram |at=Eq.&nbsp;(5)}}</ref>
<math display="block">L - f' \frac{\partial L}{\partial f'} = C \, ,</math>
<math display="block">L - f' \frac{\partial L}{\partial f'} = C \, ,</math>
where <math>C</math> is a constant. The left hand side is the [[Legendre transformation]] of <math>L</math> with respect to <math>f'(x).</math>
where <math>C</math> is a constant. The left hand side is the [[Legendre transformation]] of <math>L</math> with respect to <math>f'(x).</math>


Line 107: Line 144:


== Euler–Poisson equation ==
== Euler–Poisson equation ==
If <math>S</math> depends on higher-derivatives of <math>y(x),</math> that is, if <math display="block">S = \int_{a}^{b} f(x, y(x), y'(x), \dots, y^{(n)}(x)) dx,</math> then <math>y</math> must satisfy the Euler–[[Siméon Denis Poisson|Poisson]] equation,<ref>{{Cite book |last=Kot |first=Mark |title=A First Course in the Calculus of Variations | publisher=American Mathematical Society | year=2014 |isbn=978-1-4704-1495-5 | chapter=Chapter 4: Basic Generalizations}}</ref> <math display="block">\frac{\partial f}{\partial y} - \frac{d}{dx} \left( \frac{\partial f}{\partial y'} \right) + \dots + (-1)^{n} \frac{d^n}{dx^n} \left[ \frac{\partial f}{\partial y^{(n)}} \right]= 0.</math>
If <math>S</math> depends on higher-derivatives of <math>y(x)</math>, that is, if
 
<math display="block">S = \int_{a}^{b} f(x, y(x), y'(x), \dots, y^{(n)}(x)) dx,</math>
 
then <math>y</math> must satisfy the Euler–[[Siméon Denis Poisson|Poisson]] equation,<ref>{{Cite book |last=Kot |first=Mark |title=A First Course in the Calculus of Variations | publisher=American Mathematical Society | year=2014 |isbn=978-1-4704-1495-5 | chapter=Chapter 4: Basic Generalizations}}</ref>
 
<math display="block">\frac{\partial f}{\partial y} - \frac{d}{dx} \left( \frac{\partial f}{\partial y'} \right) + \dots + (-1)^{n} \frac{d^n}{dx^n} \left[ \frac{\partial f}{\partial y^{(n)}} \right]= 0.</math>


== Du Bois-Reymond's theorem ==
== Du Bois-Reymond's theorem ==


The discussion thus far has assumed that extremal functions possess two continuous derivatives, although the existence of the integral <math>J</math> requires only first derivatives of trial functions. The condition that the first variation vanishes at an extremal may be regarded as a '''weak form''' of the Euler–Lagrange equation. The theorem of Du Bois-Reymond asserts that this weak form implies the strong form. If <math>L</math> has continuous first and second derivatives with respect to all of its arguments, and if
The discussion thus far has assumed that extremal functions possess two continuous derivatives, although the existence of the integral <math>J</math> requires only first derivatives of trial functions. The condition that the first variation vanishes at an extremal may be regarded as a '''weak form''' of the Euler–Lagrange equation. The theorem of Du Bois-Reymond asserts that this weak form implies the strong form. If <math>L</math> has continuous first and second derivatives with respect to all of its arguments, and if
<math display="block">\frac{\partial^2 L}{\partial f'^2} \ne 0,</math>
<math display="block">\frac{\partial^2 L}{\partial f'^2} \ne 0,</math>
then <math>f</math> has two continuous derivatives, and it satisfies the Euler–Lagrange equation.
then <math>f</math> has two continuous derivatives, and it satisfies the Euler–Lagrange equation.


Line 120: Line 165:


However [[Mikhail Lavrentyev|Lavrentiev]] in 1926 showed that there are circumstances where there is no optimum solution but one can be approached arbitrarily closely by increasing numbers of sections. The Lavrentiev Phenomenon identifies a difference in the infimum of a minimization problem across different classes of admissible functions. For instance the following problem, presented by Manià in 1934:<ref>{{Cite journal|last=Manià|first=Bernard|date=1934|title=Sopra un esempio di Lavrentieff| journal=Bollenttino dell'Unione Matematica Italiana|volume=13|pages=147–153}}</ref>
However [[Mikhail Lavrentyev|Lavrentiev]] in 1926 showed that there are circumstances where there is no optimum solution but one can be approached arbitrarily closely by increasing numbers of sections. The Lavrentiev Phenomenon identifies a difference in the infimum of a minimization problem across different classes of admissible functions. For instance the following problem, presented by Manià in 1934:<ref>{{Cite journal|last=Manià|first=Bernard|date=1934|title=Sopra un esempio di Lavrentieff| journal=Bollenttino dell'Unione Matematica Italiana|volume=13|pages=147–153}}</ref>
<math display="block">L[x] = \int_0^1 (x^3-t)^2 x'^6,</math>
<math display="block">L[x] = \int_0^1 (x^3-t)^2 x'^6,</math>
<math display="block">{A} = \{x \in W^{1,1}(0,1) : x(0)=0,\ x(1)=1\}.</math>
<math display="block">{A} = \{x \in W^{1,1}(0,1) : x(0)=0,\ x(1)=1\}.</math>


Clearly, <math>x(t) = t^{\frac{1}{3}}</math>minimizes the functional, but we find any function <math>x \in W^{1, \infty}</math> gives a value bounded away from the infimum.
Clearly, <math>x(t) = t^{\frac{1}{3}}</math>minimizes the functional, but we find any function <math>x \in W^{1, \infty}</math> gives a value bounded away from the infimum.


Examples (in one-dimension) are traditionally manifested across <math>W^{1,1}</math> and <math>W^{1,\infty},</math> but Ball and Mizel<ref>{{Cite journal|last=Ball & Mizel|date=1985|title=One-dimensional Variational problems whose Minimizers do not satisfy the Euler-Lagrange equation.|journal=Archive for Rational Mechanics and Analysis|volume=90|issue=4|pages=325–388| doi=10.1007/BF00276295|bibcode=1985ArRMA..90..325B|s2cid=55005550}}</ref> procured the first functional that displayed Lavrentiev's Phenomenon across <math>W^{1,p}</math> and <math>W^{1,q}</math> for <math>1 \leq p < q < \infty.</math> There are several results that gives criteria under which the phenomenon does not occur - for instance 'standard growth', a Lagrangian with no dependence on the second variable, or an approximating sequence satisfying Cesari's Condition (D) - but results are often particular, and applicable to a small class of functionals.  
Examples (in one-dimension) are traditionally manifested across <math>W^{1,1}</math> and <math>W^{1,\infty},</math> but Ball and Mizel<ref>{{Cite journal|last=Ball & Mizel|date=1985|title=One-dimensional Variational problems whose Minimizers do not satisfy the Euler-Lagrange equation.|journal=Archive for Rational Mechanics and Analysis|volume=90|issue=4|pages=325–388| doi=10.1007/BF00276295|bibcode=1985ArRMA..90..325B|s2cid=55005550}}</ref> procured the first functional that displayed Lavrentiev's Phenomenon across <math>W^{1,p}</math> and <math>W^{1,q}</math> for <math>1 \leq p < q < \infty.</math> There are several results that gives criteria under which the phenomenon does not occur - for instance 'standard growth', a Lagrangian with no dependence on the second variable, or an approximating sequence satisfying Cesari's Condition (D) - but results are often particular, and applicable to a small class of functionals.


Connected with the Lavrentiev Phenomenon is the repulsion property: any functional displaying Lavrentiev's Phenomenon will display the weak repulsion property.<ref>{{Cite journal|last=Ferriero|first=Alessandro|date=2007|title=The Weak Repulsion property | journal=Journal de Mathématiques Pures et Appliquées|volume=88|issue=4|pages=378–388| doi=10.1016/j.matpur.2007.06.002 | doi-access=}}</ref>
Connected with the Lavrentiev Phenomenon is the repulsion property: any functional displaying Lavrentiev's Phenomenon will display the weak repulsion property.<ref>{{Cite journal|last=Ferriero|first=Alessandro|date=2007|title=The Weak Repulsion property | journal=Journal de Mathématiques Pures et Appliquées|volume=88|issue=4|pages=378–388| doi=10.1016/j.matpur.2007.06.002 | doi-access=}}</ref>
Line 132: Line 179:


For example, if <math>\varphi(x, y)</math> denotes the displacement of a membrane above the domain <math>D</math> in the <math>x,y</math> plane, then its potential energy is proportional to its surface area:
For example, if <math>\varphi(x, y)</math> denotes the displacement of a membrane above the domain <math>D</math> in the <math>x,y</math> plane, then its potential energy is proportional to its surface area:
<math display="block">U[\varphi] = \iint_D \sqrt{1 +\nabla \varphi \cdot \nabla \varphi} \,dx\,dy.</math>
<math display="block">U[\varphi] = \iint_D \sqrt{1 +\nabla \varphi \cdot \nabla \varphi} \,dx\,dy.</math>
[[Plateau's problem]] consists of finding a function that minimizes the surface area while assuming prescribed values on the boundary of <math>D</math>; the solutions are called '''minimal surfaces'''. The Euler–Lagrange equation for this problem is nonlinear:
[[Plateau's problem]] consists of finding a function that minimizes the surface area while assuming prescribed values on the boundary of <math>D</math>; the solutions are called '''minimal surfaces'''. The Euler–Lagrange equation for this problem is nonlinear:
<math display="block">\varphi_{xx}(1 + \varphi_y^2) + \varphi_{yy}(1 + \varphi_x^2) - 2\varphi_x \varphi_y \varphi_{xy} = 0.</math>
<math display="block">\varphi_{xx}(1 + \varphi_y^2) + \varphi_{yy}(1 + \varphi_x^2) - 2\varphi_x \varphi_y \varphi_{xy} = 0.</math>
See Courant (1950) for details.
See Courant (1950) for details.


=== Dirichlet's principle ===
=== Dirichlet's principle ===
It is often sufficient to consider only small displacements of the membrane, whose energy difference from no displacement is approximated by
It is often sufficient to consider only small displacements of the membrane, whose energy difference from no displacement is approximated by
<math display="block">V[\varphi] = \frac{1}{2}\iint_D \nabla \varphi \cdot \nabla \varphi \, dx\, dy.</math>
<math display="block">V[\varphi] = \frac{1}{2}\iint_D \nabla \varphi \cdot \nabla \varphi \, dx\, dy.</math>
The functional <math>V</math> is to be minimized among all trial functions <math>\varphi</math> that assume prescribed values on the boundary of <math>D.</math> If <math>u</math> is the minimizing function and <math>v</math> is an arbitrary smooth function that vanishes on the boundary of <math>D,</math> then the first variation of <math>V[u + \varepsilon v]</math> must vanish:
 
The functional <math>V</math> is to be minimized among all trial functions <math>\varphi</math> that assume prescribed values on the boundary of <math>D</math>. If <math>u</math> is the minimizing function and <math>v</math> is an arbitrary smooth function that vanishes on the boundary of <math>D</math>, then the first variation of <math>V[u + \varepsilon v]</math> must vanish:
 
<math display="block">\left.\frac{d}{d\varepsilon} V[u + \varepsilon v]\right|_{\varepsilon=0} = \iint_D \nabla u \cdot \nabla v \, dx\,dy = 0.</math>
<math display="block">\left.\frac{d}{d\varepsilon} V[u + \varepsilon v]\right|_{\varepsilon=0} = \iint_D \nabla u \cdot \nabla v \, dx\,dy = 0.</math>
Provided that u has two derivatives, we may apply the divergence theorem to obtain
 
Provided that <math>u</math> has two derivatives, we may apply the divergence theorem to obtain
 
<math display="block">\iint_D \nabla \cdot (v \nabla u) \,dx\,dy =
<math display="block">\iint_D \nabla \cdot (v \nabla u) \,dx\,dy =
\iint_D \nabla u \cdot \nabla v + v \nabla \cdot \nabla u \,dx\,dy = \int_C v \frac{\partial u}{\partial n} \, ds,</math>
\iint_D \nabla u \cdot \nabla v + v \nabla \cdot \nabla u \,dx\,dy = \int_C v \frac{\partial u}{\partial n} \, ds,</math>
where <math>C</math> is the boundary of <math>D,</math> <math>s</math> is arclength along <math>C</math> and <math>\partial u / \partial n</math> is the normal derivative of <math>u</math> on <math>C.</math> Since <math>v</math> vanishes on <math>C</math> and the first variation vanishes, the result is
where <math>C</math> is the boundary of <math>D,</math> <math>s</math> is arclength along <math>C</math> and <math>\partial u / \partial n</math> is the normal derivative of <math>u</math> on <math>C.</math> Since <math>v</math> vanishes on <math>C</math> and the first variation vanishes, the result is
<math display="block">\iint_D v\nabla \cdot \nabla u \,dx\,dy =0 </math>
<math display="block">\iint_D v\nabla \cdot \nabla u \,dx\,dy =0 </math>
for all smooth functions <math>v</math> that vanish on the boundary of <math>D.</math> The proof for the case of one dimensional integrals may be adapted to this case to show that
 
for all smooth functions <math>v</math> that vanish on the boundary of <math>D</math>. The proof for the case of one dimensional integrals may be adapted to this case to show that
 
<math display="block">\nabla \cdot \nabla u= 0 </math>in <math>D.</math>
<math display="block">\nabla \cdot \nabla u= 0 </math>in <math>D.</math>


The difficulty with this reasoning is the assumption that the minimizing function <math>u</math> must have two derivatives. Riemann argued that the existence of a smooth minimizing function was assured by the connection with the physical problem: membranes do indeed assume configurations with minimal potential energy. Riemann named this idea the [[Dirichlet principle]] in honor of his teacher [[Peter Gustav Lejeune Dirichlet]]. However Weierstrass gave an example of a variational problem with no solution: minimize
The difficulty with this reasoning is the assumption that the minimizing function <math>u</math> must have two derivatives. Riemann argued that the existence of a smooth minimizing function was assured by the connection with the physical problem: membranes do indeed assume configurations with minimal potential energy. Riemann named this idea the [[Dirichlet principle]] in honor of his teacher [[Peter Gustav Lejeune Dirichlet]]. However Weierstrass gave an example of a variational problem with no solution: minimize
<math display="block">W[\varphi] = \int_{-1}^{1} (x\varphi')^2 \, dx</math>
<math display="block">W[\varphi] = \int_{-1}^{1} (x\varphi')^2 \, dx</math>
among all functions <math>\varphi</math> that satisfy <math>\varphi(-1)=-1</math> and <math>\varphi(1)=1.</math>
among all functions <math>\varphi</math> that satisfy <math>\varphi(-1)=-1</math> and <math>\varphi(1)=1.</math>
<math>W</math> can be made arbitrarily small by choosing piecewise linear functions that make a transition between −1 and 1 in a small neighborhood of the origin. However, there is no function that makes <math>W=0.</math>{{efn|The resulting controversy over the validity of Dirichlet's principle is explained by Turnbull.<ref>{{cite web |url=http://turnbull.mcs.st-and.ac.uk/~history/Biographies/Riemann.html |title=Riemann biography |publisher=U. St. Andrew |place=UK |author=Turnbull}}</ref>}} Eventually it was shown that Dirichlet's principle is valid, but it requires a sophisticated application of the regularity theory for [[elliptic partial differential equation]]s; see Jost and Li–Jost (1998).
<math>W</math> can be made arbitrarily small by choosing piecewise linear functions that make a transition between −1 and 1 in a small neighborhood of the origin. However, there is no function that makes <math>W=0.</math>{{efn|The resulting controversy over the validity of Dirichlet's principle is explained by Turnbull.<ref>{{cite web |url=http://turnbull.mcs.st-and.ac.uk/~history/Biographies/Riemann.html |title=Riemann biography |publisher=U. St. Andrew |place=UK |author=Turnbull }}{{Dead link|date=August 2025 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>}} Eventually it was shown that Dirichlet's principle is valid, but it requires a sophisticated application of the regularity theory for [[elliptic partial differential equation]]s; see Jost and Li–Jost (1998).


=== Generalization to other boundary value problems ===
=== Generalization to other boundary value problems ===
A more general expression for the potential energy of a membrane is
A more general expression for the potential energy of a membrane is
<math display="block">V[\varphi] = \iint_D \left[ \frac{1}{2} \nabla \varphi \cdot \nabla \varphi + f(x,y) \varphi \right] \, dx\,dy \, + \int_C \left[ \frac{1}{2} \sigma(s) \varphi^2 + g(s) \varphi \right] \, ds.</math>
<math display="block">V[\varphi] = \iint_D \left[ \frac{1}{2} \nabla \varphi \cdot \nabla \varphi + f(x,y) \varphi \right] \, dx\,dy \, + \int_C \left[ \frac{1}{2} \sigma(s) \varphi^2 + g(s) \varphi \right] \, ds.</math>
This corresponds to an external force density <math>f(x,y)</math> in <math>D,</math> an external force <math>g(s)</math> on the boundary <math>C,</math> and elastic forces with modulus <math>\sigma(s)</math>acting on <math>C.</math> The function that minimizes the potential energy '''with no restriction on its boundary values''' will be denoted by <math>u.</math> Provided that <math>f</math> and <math>g</math> are continuous, regularity theory implies that the minimizing function <math>u</math> will have two derivatives. In taking the first variation, no boundary condition need be imposed on the increment <math>v.</math> The first variation of <math>V[u + \varepsilon v]</math> is given by
 
This corresponds to an external force density <math>f(x,y)</math> in <math>D,</math> an external force <math>g(s)</math> on the boundary <math>C,</math> and elastic forces with modulus <math>\sigma(s)</math>acting on <math>C</math>. The function that minimizes the potential energy '''with no restriction on its boundary values''' will be denoted by <math>u</math>. Provided that <math>f</math> and <math>g</math> are continuous, regularity theory implies that the minimizing function <math>u</math> will have two derivatives. In taking the first variation, no boundary condition need be imposed on the increment <math>v</math>. The first variation of <math>V[u + \varepsilon v]</math> is given by
 
<math display="block">\iint_D \left[ \nabla u \cdot \nabla v + f v \right] \, dx\, dy + \int_C \left[ \sigma u v + g v \right] \, ds = 0. </math>
<math display="block">\iint_D \left[ \nabla u \cdot \nabla v + f v \right] \, dx\, dy + \int_C \left[ \sigma u v + g v \right] \, ds = 0. </math>
If we apply the divergence theorem, the result is
If we apply the divergence theorem, the result is
<math display="block">\iint_D \left[ -v \nabla \cdot \nabla u + v f \right] \, dx \, dy + \int_C v \left[ \frac{\partial u}{\partial n} + \sigma u + g \right] \, ds =0. </math>
<math display="block">\iint_D \left[ -v \nabla \cdot \nabla u + v f \right] \, dx \, dy + \int_C v \left[ \frac{\partial u}{\partial n} + \sigma u + g \right] \, ds =0. </math>
If we first set <math>v = 0</math> on <math>C,</math> the boundary integral vanishes, and we conclude as before that
If we first set <math>v = 0</math> on <math>C,</math> the boundary integral vanishes, and we conclude as before that
<math display="block">- \nabla \cdot \nabla u + f =0 </math>
<math display="block">- \nabla \cdot \nabla u + f =0 </math>
in <math>D.</math> Then if we allow <math>v</math> to assume arbitrary boundary values, this implies that <math>u</math> must satisfy the boundary condition
 
in <math>D</math>. Then if we allow <math>v</math> to assume arbitrary boundary values, this implies that <math>u</math> must satisfy the boundary condition
 
<math display="block">\frac{\partial u}{\partial n} + \sigma u + g =0, </math>
<math display="block">\frac{\partial u}{\partial n} + \sigma u + g =0, </math>
on <math>C.</math> This boundary condition is a consequence of the minimizing property of <math>u</math>: it is not imposed beforehand. Such conditions are called '''natural boundary conditions'''.


The preceding reasoning is not valid if <math>\sigma</math> vanishes identically on <math>C.</math> In such a case, we could allow a trial function <math>\varphi \equiv c,</math> where <math>c</math> is a constant. For such a trial function,
on <math>C</math>. This boundary condition is a consequence of the minimizing property of <math>u</math>: it is not imposed beforehand. Such conditions are called '''natural boundary conditions'''.
 
The preceding reasoning is not valid if <math>\sigma</math> vanishes identically on <math>C.</math> In such a case, we could allow a trial function <math>\varphi \equiv c</math>, where <math>c</math> is a constant. For such a trial function,
 
<math display="block">V[c] = c\left[ \iint_D f \, dx\,dy + \int_C g \, ds \right].</math>
<math display="block">V[c] = c\left[ \iint_D f \, dx\,dy + \int_C g \, ds \right].</math>
By appropriate choice of <math>c,</math> <math>V</math> can assume any value unless the quantity inside the brackets vanishes. Therefore, the variational problem is meaningless unless
 
By appropriate choice of <math>c</math>, <math>V</math> can assume any value unless the quantity inside the brackets vanishes. Therefore, the variational problem is meaningless unless
 
<math display="block">\iint_D f \, dx\,dy + \int_C g \, ds =0.</math>
<math display="block">\iint_D f \, dx\,dy + \int_C g \, ds =0.</math>
This condition implies that net external forces on the system are in equilibrium. If these forces are in equilibrium, then the variational problem has a solution, but it is not unique, since an arbitrary constant may be added. Further details and examples are in Courant and Hilbert (1953).
This condition implies that net external forces on the system are in equilibrium. If these forces are in equilibrium, then the variational problem has a solution, but it is not unique, since an arbitrary constant may be added. Further details and examples are in Courant and Hilbert (1953).


Line 177: Line 253:


Both one-dimensional and multi-dimensional '''eigenvalue problems''' can be formulated as variational problems.
Both one-dimensional and multi-dimensional '''eigenvalue problems''' can be formulated as variational problems.


=== Sturm–Liouville problems ===
=== Sturm–Liouville problems ===
{{See also|Sturm–Liouville theory}}
{{See also|Sturm–Liouville theory}}
The Sturm–Liouville [[eigenvalue problem]] involves a general quadratic form
The Sturm–Liouville [[eigenvalue problem]] involves a general quadratic form
<math display="block">Q[y] = \int_{x_1}^{x_2} \left[ p(x) y'(x)^2 + q(x) y(x)^2 \right] \, dx, </math>
<math display="block">Q[y] = \int_{x_1}^{x_2} \left[ p(x) y'(x)^2 + q(x) y(x)^2 \right] \, dx, </math>
where <math>y</math> is restricted to functions that satisfy the boundary conditions
where <math>y</math> is restricted to functions that satisfy the boundary conditions
<math display="block">y(x_1)=0, \quad y(x_2)=0. </math>
<math display="block">y(x_1)=0, \quad y(x_2)=0. </math>
Let <math>R</math> be a normalization integral
Let <math>R</math> be a normalization integral
<math display="block">R[y] =\int_{x_1}^{x_2} r(x)y(x)^2 \, dx.</math>
<math display="block">R[y] =\int_{x_1}^{x_2} r(x)y(x)^2 \, dx.</math>
The functions <math>p(x)</math> and <math>r(x)</math> are required to be everywhere positive and bounded away from zero. The primary variational problem is to minimize the ratio <math>Q/R</math> among all <math>y</math> satisfying the endpoint conditions, which is equivalent to minimizing <math>Q[y]</math> under the constraint that <math>R[y]</math> is constant. It is shown below that the Euler–Lagrange equation for the minimizing <math>u</math> is
The functions <math>p(x)</math> and <math>r(x)</math> are required to be everywhere positive and bounded away from zero. The primary variational problem is to minimize the ratio <math>Q/R</math> among all <math>y</math> satisfying the endpoint conditions, which is equivalent to minimizing <math>Q[y]</math> under the constraint that <math>R[y]</math> is constant. It is shown below that the Euler–Lagrange equation for the minimizing <math>u</math> is
<math display="block">-(p u')' +q u -\lambda r u = 0, </math>
<math display="block">-(p u')' +q u -\lambda r u = 0, </math>
where <math>\lambda</math> is the quotient
where <math>\lambda</math> is the quotient
<math display="block">\lambda = \frac{Q[u]}{R[u]}. </math>
<math display="block">\lambda = \frac{Q[u]}{R[u]}. </math>
It can be shown (see Gelfand and Fomin 1963) that the minimizing <math>u</math> has two derivatives and satisfies the Euler–Lagrange equation. The associated <math>\lambda</math> will be denoted by <math>\lambda_1</math>; it is the lowest eigenvalue for this equation and boundary conditions. The associated minimizing function will be denoted by <math>u_1(x).</math> This variational characterization of eigenvalues leads to the [[Rayleigh–Ritz method]]: choose an approximating <math>u</math> as a linear combination of basis functions (for example trigonometric functions) and carry out a finite-dimensional minimization among such linear combinations. This method is often surprisingly accurate.
 
It can be shown (see Gelfand and Fomin 1963) that the minimizing <math>u</math> has two derivatives and satisfies the Euler–Lagrange equation. The associated <math>\lambda</math> will be denoted by <math>\lambda_1</math>; it is the lowest eigenvalue for this equation and boundary conditions. The associated minimizing function will be denoted by <math>u_1(x)</math>. This variational characterization of eigenvalues leads to the [[Rayleigh–Ritz method]]: choose an approximating <math>u</math> as a linear combination of basis functions (for example trigonometric functions) and carry out a finite-dimensional minimization among such linear combinations. This method is often surprisingly accurate.


The next smallest eigenvalue and eigenfunction can be obtained by minimizing <math>Q</math> under the additional constraint
The next smallest eigenvalue and eigenfunction can be obtained by minimizing <math>Q</math> under the additional constraint
<math display="block">\int_{x_1}^{x_2} r(x) u_1(x) y(x) \, dx = 0. </math>
<math display="block">\int_{x_1}^{x_2} r(x) u_1(x) y(x) \, dx = 0. </math>
This procedure can be extended to obtain the complete sequence of eigenvalues and eigenfunctions for the problem.
This procedure can be extended to obtain the complete sequence of eigenvalues and eigenfunctions for the problem.


The variational problem also applies to more general boundary conditions. Instead of requiring that <math>y</math> vanish at the endpoints, we may not impose any condition at the endpoints, and set
The variational problem also applies to more general boundary conditions. Instead of requiring that <math>y</math> vanish at the endpoints, we may not impose any condition at the endpoints, and set
<math display="block">Q[y] = \int_{x_1}^{x_2} \left[ p(x) y'(x)^2 + q(x)y(x)^2 \right] \, dx + a_1 y(x_1)^2 + a_2 y(x_2)^2, </math>
<math display="block">Q[y] = \int_{x_1}^{x_2} \left[ p(x) y'(x)^2 + q(x)y(x)^2 \right] \, dx + a_1 y(x_1)^2 + a_2 y(x_2)^2, </math>
where <math>a_1</math> and <math>a_2</math> are arbitrary. If we set <math>y = u + \varepsilon v</math>, the first variation for the ratio <math>Q/R</math> is
where <math>a_1</math> and <math>a_2</math> are arbitrary. If we set <math>y = u + \varepsilon v</math>, the first variation for the ratio <math>Q/R</math> is
<math display="block">V_1 = \frac{2}{R[u]} \left( \int_{x_1}^{x_2} \left[ p(x) u'(x)v'(x) + q(x)u(x)v(x) -\lambda r(x) u(x) v(x) \right] \, dx + a_1 u(x_1)v(x_1) + a_2 u(x_2)v(x_2) \right), </math>
<math display="block">V_1 = \frac{2}{R[u]} \left( \int_{x_1}^{x_2} \left[ p(x) u'(x)v'(x) + q(x)u(x)v(x) -\lambda r(x) u(x) v(x) \right] \, dx + a_1 u(x_1)v(x_1) + a_2 u(x_2)v(x_2) \right), </math>
where λ is given by the ratio <math>Q[u]/R[u]</math> as previously.
 
where <math>\lambda</math> is given by the ratio <math>Q[u]/R[u]</math> as previously.
After integration by parts,
After integration by parts,
<math display="block">\frac{R[u]}{2} V_1 = \int_{x_1}^{x_2} v(x) \left[ -(p u')' + q u -\lambda r u \right] \, dx + v(x_1)[ -p(x_1)u'(x_1) + a_1 u(x_1)] + v(x_2) [p(x_2) u'(x_2) + a_2 u(x_2)]. </math>
<math display="block">\frac{R[u]}{2} V_1 = \int_{x_1}^{x_2} v(x) \left[ -(p u')' + q u -\lambda r u \right] \, dx + v(x_1)[ -p(x_1)u'(x_1) + a_1 u(x_1)] + v(x_2) [p(x_2) u'(x_2) + a_2 u(x_2)]. </math>
If we first require that <math>v</math> vanish at the endpoints, the first variation will vanish for all such <math>v</math> only if
If we first require that <math>v</math> vanish at the endpoints, the first variation will vanish for all such <math>v</math> only if
<math display="block">-(p u')' + q u -\lambda r u =0 \quad \hbox{for} \quad x_1 < x < x_2.</math>
<math display="block">-(p u')' + q u -\lambda r u =0 \quad \hbox{for} \quad x_1 < x < x_2.</math>
If <math>u</math> satisfies this condition, then the first variation will vanish for arbitrary <math>v</math> only if
If <math>u</math> satisfies this condition, then the first variation will vanish for arbitrary <math>v</math> only if
<math display="block">-p(x_1)u'(x_1) + a_1 u(x_1)=0, \quad \hbox{and} \quad p(x_2) u'(x_2) + a_2 u(x_2)=0.</math>
<math display="block">-p(x_1)u'(x_1) + a_1 u(x_1)=0, \quad \hbox{and} \quad p(x_2) u'(x_2) + a_2 u(x_2)=0.</math>
These latter conditions are the '''natural boundary conditions''' for this problem, since they are not imposed on trial functions for the minimization, but are instead a consequence of the minimization.
These latter conditions are the '''natural boundary conditions''' for this problem, since they are not imposed on trial functions for the minimization, but are instead a consequence of the minimization.


=== Eigenvalue problems in several dimensions ===
=== Eigenvalue problems in several dimensions ===
Eigenvalue problems in higher dimensions are defined in analogy with the one-dimensional case. For example, given a domain <math>D</math> with boundary <math>B</math> in three dimensions we may define
Eigenvalue problems in higher dimensions are defined in analogy with the one-dimensional case. For example, given a domain <math>D</math> with boundary <math>B</math> in three dimensions we may define
<math display="block">Q[\varphi] = \iiint_D p(X) \nabla \varphi \cdot \nabla \varphi + q(X) \varphi^2 \, dx \, dy \, dz + \iint_B \sigma(S) \varphi^2 \, dS, </math>
<math display="block">Q[\varphi] = \iiint_D p(X) \nabla \varphi \cdot \nabla \varphi + q(X) \varphi^2 \, dx \, dy \, dz + \iint_B \sigma(S) \varphi^2 \, dS, </math>
and
and
<math display="block">R[\varphi] = \iiint_D r(X) \varphi(X)^2 \, dx \, dy \, dz.</math>
<math display="block">R[\varphi] = \iiint_D r(X) \varphi(X)^2 \, dx \, dy \, dz.</math>
Let <math>u</math> be the function that minimizes the quotient <math>Q[\varphi] / R[\varphi],</math>
 
Let <math>u</math> be the function that minimizes the quotient <math>Q[\varphi] / R[\varphi]</math>,
with no condition prescribed on the boundary <math>B.</math> The Euler–Lagrange equation satisfied by <math>u</math> is
with no condition prescribed on the boundary <math>B.</math> The Euler–Lagrange equation satisfied by <math>u</math> is
<math display="block">-\nabla \cdot (p(X) \nabla u) + q(x) u - \lambda r(x) u=0,</math>
<math display="block">-\nabla \cdot (p(X) \nabla u) + q(x) u - \lambda r(x) u=0,</math>
where
where
<math display="block">\lambda = \frac{Q[u]}{R[u]}.</math>
<math display="block">\lambda = \frac{Q[u]}{R[u]}.</math>
The minimizing <math>u</math> must also satisfy the natural boundary condition
The minimizing <math>u</math> must also satisfy the natural boundary condition
<math display="block">p(S) \frac{\partial u}{\partial n} + \sigma(S) u = 0,</math>
<math display="block">p(S) \frac{\partial u}{\partial n} + \sigma(S) u = 0,</math>
on the boundary <math>B.</math> This result depends upon the regularity theory for elliptic partial differential equations; see Jost and Li–Jost (1998) for details. Many extensions, including completeness results, asymptotic properties of the eigenvalues and results concerning the nodes of the eigenfunctions are in Courant and Hilbert (1953).
on the boundary <math>B.</math> This result depends upon the regularity theory for elliptic partial differential equations; see Jost and Li–Jost (1998) for details. Many extensions, including completeness results, asymptotic properties of the eigenvalues and results concerning the nodes of the eigenfunctions are in Courant and Hilbert (1953).


Line 228: Line 335:
=== Optics ===
=== Optics ===
[[Fermat's principle]] states that light takes a path that (locally) minimizes the optical length between its endpoints. If the <math>x</math>-coordinate is chosen as the parameter along the path, and <math>y=f(x)</math> along the path, then the optical length is given by
[[Fermat's principle]] states that light takes a path that (locally) minimizes the optical length between its endpoints. If the <math>x</math>-coordinate is chosen as the parameter along the path, and <math>y=f(x)</math> along the path, then the optical length is given by
<math display="block">A[f] = \int_{x_0}^{x_1} n(x,f(x)) \sqrt{1 + f'(x)^2} dx, </math>
<math display="block">A[f] = \int_{x_0}^{x_1} n(x,f(x)) \sqrt{1 + f'(x)^2} dx, </math>
where the refractive index <math>n(x,y)</math> depends upon the material.
where the refractive index <math>n(x,y)</math> depends upon the material.
If we try <math>f(x) = f_0 (x) + \varepsilon f_1 (x)</math> then the [[first variation]] of <math>A</math> (the derivative of <math>A</math> with respect to ε) is
If we try <math>f(x) = f_0 (x) + \varepsilon f_1 (x)</math> then the [[first variation]] of <math>A</math> (the derivative of <math>A</math> with respect to <math>\varepsilon</math>) is
 
<math display="block">\delta A[f_0,f_1] = \int_{x_0}^{x_1} \left[ \frac{ n(x,f_0) f_0'(x) f_1'(x)}{\sqrt{1 + f_0'(x)^2}} + n_y (x,f_0) f_1 \sqrt{1 + f_0'(x)^2} \right] dx.</math>
<math display="block">\delta A[f_0,f_1] = \int_{x_0}^{x_1} \left[ \frac{ n(x,f_0) f_0'(x) f_1'(x)}{\sqrt{1 + f_0'(x)^2}} + n_y (x,f_0) f_1 \sqrt{1 + f_0'(x)^2} \right] dx.</math>


After integration by parts of the first term within brackets, we obtain the Euler–Lagrange equation
After integration by parts of the first term within brackets, we obtain the Euler–Lagrange equation
<math display="block">-\frac{d}{dx} \left[\frac{ n(x,f_0) f_0'}{\sqrt{1 + f_0'^2}} \right] + n_y (x,f_0) \sqrt{1 + f_0'(x)^2} = 0. </math>
<math display="block">-\frac{d}{dx} \left[\frac{ n(x,f_0) f_0'}{\sqrt{1 + f_0'^2}} \right] + n_y (x,f_0) \sqrt{1 + f_0'(x)^2} = 0. </math>


Line 240: Line 351:
==== Snell's law ====
==== Snell's law ====
There is a discontinuity of the refractive index when light enters or leaves a lens. Let
There is a discontinuity of the refractive index when light enters or leaves a lens. Let
<math display="block">n(x,y) = \begin{cases}
<math display="block">n(x,y) = \begin{cases}
n_{(-)} & \text{if} \quad x<0, \\
n_{(-)} & \text{if} \quad x<0, \\
n_{(+)} & \text{if} \quad x>0,
n_{(+)} & \text{if} \quad x>0,
\end{cases}</math>
\end{cases}</math>
where <math>n_{(-)}</math> and <math>n_{(+)}</math> are constants. Then the Euler–Lagrange equation holds as before in the region where <math>x < 0</math> or <math>x > 0,</math> and in fact the path is a straight line there, since the refractive index is constant. At the <math>x = 0,</math> <math>f</math> must be continuous, but <math>f'</math> may be discontinuous. After integration by parts in the separate regions and using the Euler–Lagrange equations, the first variation takes the form
 
where <math>n_{(-)}</math> and <math>n_{(+)}</math> are constants. Then the Euler–Lagrange equation holds as before in the region where <math>x < 0</math> or <math>x > 0</math>, and in fact the path is a straight line there, since the refractive index is constant. At the <math>x = 0</math>, <math>f</math> must be continuous, but <math>f'</math> may be discontinuous. After integration by parts in the separate regions and using the Euler–Lagrange equations, the first variation takes the form
 
<math display="block">\delta A[f_0,f_1] = f_1(0)\left[ n_{(-)}\frac{f_0'(0^-)}{\sqrt{1 + f_0'(0^-)^2}} - n_{(+)}\frac{f_0'(0^+)}{\sqrt{1 + f_0'(0^+)^2}} \right].</math>
<math display="block">\delta A[f_0,f_1] = f_1(0)\left[ n_{(-)}\frac{f_0'(0^-)}{\sqrt{1 + f_0'(0^-)^2}} - n_{(+)}\frac{f_0'(0^+)}{\sqrt{1 + f_0'(0^+)^2}} \right].</math>


Line 251: Line 365:
==== Fermat's principle in three dimensions ====
==== Fermat's principle in three dimensions ====
It is expedient to use vector notation: let <math>X = (x_1,x_2,x_3),</math> let <math>t</math> be a parameter,  let <math>X(t)</math> be the parametric representation of a curve <math>C,</math> and let <math>\dot X(t)</math> be its tangent vector. The optical length of the curve is given by
It is expedient to use vector notation: let <math>X = (x_1,x_2,x_3),</math> let <math>t</math> be a parameter,  let <math>X(t)</math> be the parametric representation of a curve <math>C,</math> and let <math>\dot X(t)</math> be its tangent vector. The optical length of the curve is given by
<math display="block">A[C] = \int_{t_0}^{t_1} n(X) \sqrt{ \dot X \cdot \dot X} \, dt. </math>
<math display="block">A[C] = \int_{t_0}^{t_1} n(X) \sqrt{ \dot X \cdot \dot X} \, dt. </math>


Note that this integral is invariant with respect to changes in the parametric representation of <math>C.</math> The Euler–Lagrange equations for a minimizing curve have the symmetric form
Note that this integral is invariant with respect to changes in the parametric representation of <math>C.</math> The Euler–Lagrange equations for a minimizing curve have the symmetric form
<math display="block">\frac{d}{dt} P = \sqrt{ \dot X \cdot \dot X} \, \nabla n, </math>
<math display="block">\frac{d}{dt} P = \sqrt{ \dot X \cdot \dot X} \, \nabla n, </math>
where
where
<math display="block">P = \frac{n(X) \dot X}{\sqrt{\dot X \cdot \dot X} }.</math>
<math display="block">P = \frac{n(X) \dot X}{\sqrt{\dot X \cdot \dot X} }.</math>


It follows from the definition that <math>P</math> satisfies
It follows from the definition that <math>P</math> satisfies
<math display="block">P \cdot P = n(X)^2. </math>
<math display="block">P \cdot P = n(X)^2. </math>


Therefore, the integral may also be written as
Therefore, the integral may also be written as
<math display="block">A[C] = \int_{t_0}^{t_1} P \cdot \dot X \, dt.</math>
<math display="block">A[C] = \int_{t_0}^{t_1} P \cdot \dot X \, dt.</math>


This form suggests that if we can find a function <math>\psi</math> whose gradient is given by <math>P,</math> then the integral <math>A</math> is given by the difference of <math>\psi</math> at the endpoints of the interval of integration. Thus the problem of studying the curves that make the integral stationary can be related to the study of the level surfaces of <math>\psi.</math>In order to find such a function, we turn to the wave equation, which governs the propagation of light. This formalism is used in the context of [[Lagrangian optics]] and [[Hamiltonian optics]].
This form suggests that if we can find a function <math>\psi</math> whose gradient is given by <math>P,</math> then the integral <math>A</math> is given by the difference of <math>\psi</math> at the endpoints of the interval of integration. Thus the problem of studying the curves that make the integral stationary can be related to the study of the level surfaces of <math>\psi</math>. In order to find such a function, we turn to the wave equation, which governs the propagation of light. This formalism is used in the context of [[Lagrangian optics]] and [[Hamiltonian optics]].


===== Connection with the wave equation =====
===== Connection with the wave equation =====
The [[wave equation]] for an inhomogeneous medium is
The [[wave equation]] for an inhomogeneous medium is
<math display="block">u_{tt} = c^2 \nabla \cdot \nabla u, </math>
<math display="block">u_{tt} = c^2 \nabla \cdot \nabla u, </math>
where <math>c</math> is the velocity, which generally depends upon <math>X.</math> Wave fronts for light are characteristic surfaces for this partial differential equation: they satisfy
 
where <math>c</math> is the velocity, which generally depends upon <math>X</math>. Wave fronts for light are characteristic surfaces for this partial differential equation: they satisfy
 
<math display="block">\varphi_t^2 = c(X)^2 \, \nabla \varphi \cdot \nabla \varphi. </math>
<math display="block">\varphi_t^2 = c(X)^2 \, \nabla \varphi \cdot \nabla \varphi. </math>


We may look for solutions in the form
We may look for solutions in the form
<math display="block">\varphi(t,X) = t - \psi(X). </math>
<math display="block">\varphi(t,X) = t - \psi(X). </math>


In that case, <math>\psi</math> satisfies
In that case, <math>\psi</math> satisfies
<math display="block">\nabla \psi \cdot \nabla \psi = n^2, </math>
<math display="block">\nabla \psi \cdot \nabla \psi = n^2, </math>
where <math>n=1/c.</math> According to the theory of [[first-order partial differential equation]]s, if <math>P = \nabla \psi,</math> then <math>P</math> satisfies
 
where <math>n=1/c</math>. According to the theory of [[first-order partial differential equation]]s, if <math>P = \nabla \psi,</math> then <math>P</math> satisfies
 
<math display="block">\frac{dP}{ds} = n \, \nabla n,</math>
<math display="block">\frac{dP}{ds} = n \, \nabla n,</math>
along a system of curves ('''the light rays''') that are given by
along a system of curves ('''the light rays''') that are given by
<math display="block">\frac{dX}{ds} = P. </math>
<math display="block">\frac{dX}{ds} = P. </math>


These equations for solution of a first-order partial differential equation are identical to the Euler–Lagrange equations if we make the identification
These equations for solution of a first-order partial differential equation are identical to the Euler–Lagrange equations if we make the identification
<math display="block">\frac{ds}{dt} = \frac{\sqrt{ \dot X \cdot \dot X} }{n}. </math>
<math display="block">\frac{ds}{dt} = \frac{\sqrt{ \dot X \cdot \dot X} }{n}. </math>


Line 289: Line 419:
=== Mechanics ===
=== Mechanics ===
{{main|Action (physics)}}
{{main|Action (physics)}}
In classical mechanics, the action, <math>S,</math> is defined as the time integral of the Lagrangian, <math>L.</math> The Lagrangian is the difference of energies,
 
In classical mechanics, the action, <math>S,</math> is defined as the time integral of the Lagrangian, <math>L</math>. The Lagrangian is the difference of energies,
 
<math display="block">L = T - U, </math>
<math display="block">L = T - U, </math>
where <math>T</math> is the [[kinetic energy]] of a mechanical system and <math>U</math> its [[potential energy]]. [[Hamilton's principle]] (or the action principle) states that the motion of a conservative holonomic (integrable constraints) mechanical system is such that the action integral
where <math>T</math> is the [[kinetic energy]] of a mechanical system and <math>U</math> its [[potential energy]]. [[Hamilton's principle]] (or the action principle) states that the motion of a conservative holonomic (integrable constraints) mechanical system is such that the action integral
<math display="block">S = \int_{t_0}^{t_1} L(x, \dot x, t) \, dt</math>
<math display="block">S = \int_{t_0}^{t_1} L(x, \dot x, t) \, dt</math>
is stationary with respect to variations in the path <math>x(t).</math>
 
is stationary with respect to variations in the path <math>x(t)</math>.
The Euler–Lagrange equations for this system are known as Lagrange's equations:
The Euler–Lagrange equations for this system are known as Lagrange's equations:
<math display="block">\frac{d}{dt} \frac{\partial L}{\partial \dot x} = \frac{\partial L}{\partial x}, </math>
<math display="block">\frac{d}{dt} \frac{\partial L}{\partial \dot x} = \frac{\partial L}{\partial x}, </math>
and they are equivalent to Newton's equations of motion (for such systems).
and they are equivalent to Newton's equations of motion (for such systems).


The conjugate momenta <math>P</math> are defined by
The conjugate momenta <math>P</math> are defined by
<math display="block">p = \frac{\partial L}{\partial \dot x}. </math>
<math display="block">p = \frac{\partial L}{\partial \dot x}. </math>
For example, if
For example, if
<math display="block">T = \frac{1}{2} m \dot x^2, </math>
<math display="block">T = \frac{1}{2} m \dot x^2, </math>
then <math display="block">p = m \dot x. </math>
then <math display="block">p = m \dot x. </math>
[[Hamiltonian mechanics]] results if the conjugate momenta are introduced in place of <math>\dot x</math> by a Legendre transformation of the Lagrangian <math>L</math> into the Hamiltonian <math>H</math> defined by
[[Hamiltonian mechanics]] results if the conjugate momenta are introduced in place of <math>\dot x</math> by a Legendre transformation of the Lagrangian <math>L</math> into the Hamiltonian <math>H</math> defined by
<math display="block">H(x, p, t) = p \,\dot x - L(x,\dot x, t).</math>
<math display="block">H(x, p, t) = p \,\dot x - L(x,\dot x, t).</math>
The Hamiltonian is the total energy of the system: <math>H = T + U.</math>
 
Analogy with Fermat's principle suggests that solutions of Lagrange's equations (the particle trajectories) may be described in terms of level surfaces of some function of <math>X.</math> This function is a solution of the [[Hamilton–Jacobi equation]]:
The Hamiltonian is the total energy of the system: <math>H = T + U</math>.
Analogy with Fermat's principle suggests that solutions of Lagrange's equations (the particle trajectories) may be described in terms of level surfaces of some function of <math>X</math>. This function is a solution of the [[Hamilton–Jacobi equation]]:
 
<math display="block">\frac{\partial \psi}{\partial t} + H\left(x,\frac{\partial \psi}{\partial x},t\right) = 0.</math>
<math display="block">\frac{\partial \psi}{\partial t} + H\left(x,\frac{\partial \psi}{\partial x},t\right) = 0.</math>


Line 331: Line 476:
== Variations and sufficient condition for a minimum ==
== Variations and sufficient condition for a minimum ==


Calculus of variations is concerned with variations of functionals, which are small changes in the functional's value due to small changes in the function that is its argument.  The '''first variation'''{{efn|name=AltFirst| The first variation is also called the variation, differential, or first differential.}} is defined as the linear part of the change in the functional, and the '''second variation'''{{efn|name=AltSecond| The second variation is also called the second differential.}} is defined as the quadratic part.<ref name='GelfandFominP11–12,99'>{{harvnb|Gelfand|Fomin|2000|pp=11–12, 99}}</ref>
Calculus of variations is concerned with variations of functionals, which are small changes in the functional's value due to small changes in the function that is its argument.  The '''first variation'''{{efn|name=AltFirst| The first variation is also called the variation, differential, or first differential.}} is defined as the linear part of the change in the functional, and the '''[[second variation]]'''{{efn|name=AltSecond| The second variation is also called the second differential.}} is defined as the quadratic part.<ref name='GelfandFominP11–12,99'>{{harvnb|Gelfand|Fomin|2000|pp=11–12, 99}}</ref>
 
For example, if <math>J[y]</math> is a functional with the function <math>y = y(x)</math> as its argument, and there is a small change in its argument from <math>y</math> to <math>y + h,</math> where <math>h = h(x)</math> is a function in the same function space as <math>y</math>, then the corresponding change in the functional is{{efn|name=SimplifyNotation|Note that <math>\Delta J[h]</math> and the variations below, depend on both <math>y</math> and <math>h</math>. The argument <math>y</math> has been left out to simplify the notation. For example, <math>\Delta J[h]</math> could have been written <math>\Delta J[y; h].</math><ref name='GelfandFominP12FN6'>{{harvnb | Gelfand|Fomin|2000 | p=12, footnote 6}}</ref>}}


For example, if <math>J[y]</math> is a functional with the function <math>y = y(x)</math> as its argument, and there is a small change in its argument from <math>y</math> to <math>y + h,</math> where <math>h = h(x)</math> is a function in the same function space as <math>y,</math> then the corresponding change in the functional is{{efn|name=SimplifyNotation|Note that <math>\Delta J[h]</math> and the variations below, depend on both <math>y</math> and <math>h.</math> The argument <math>y</math> has been left out to simplify the notation. For example, <math>\Delta J[h]</math> could have been written <math>\Delta J[y; h].</math><ref name='GelfandFominP12FN6'>{{harvnb | Gelfand|Fomin|2000 | p=12, footnote 6}}</ref>}}
<math display="block">\Delta J[h] = J[y+h] - J[y].</math>
<math display="block">\Delta J[h] = J[y+h] - J[y].</math>


The functional <math>J[y]</math> is said to be '''differentiable''' if
The functional <math>J[y]</math> is said to be '''differentiable''' if
<math display="block">\Delta J[h] = \varphi [h] + \varepsilon \|h\|,</math>
<math display="block">\Delta J[h] = \varphi [h] + \varepsilon \|h\|,</math>
where <math>\varphi[h]</math> is a linear functional,{{efn|name=Linear|A functional <math>\varphi[h]</math> is said to be '''linear''' if <math>\varphi[\alpha h] = \alpha \varphi[h]</math> &nbsp; and &nbsp; <math>\varphi\left[h + h_2\right] = \varphi[h] + \varphi\left[h_2\right],</math> where <math>h, h_2</math> are functions and <math>\alpha</math> is a real number.<ref name='GelfandFominP8'>{{harvnb | Gelfand|Fomin| 2000 | p=8 }}</ref>}} <math>\|h\|</math> is the norm of <math>h,</math>{{efn|name=Norm| For a function <math>h = h(x)</math> that is defined for <math>a \leq x \leq b,</math> where <math>a</math> and <math>b</math> are real numbers, the norm of <math>h</math> is its maximum absolute value, i.e. <math>\|h\| = \displaystyle\max_{a \leq x \leq b} |h(x)|.</math><ref name='GelfandFominP6'>{{harvnb | Gelfand|Fomin| 2000 | p=6 }}</ref>}} and <math>\varepsilon \to 0</math> as <math>\|h\| \to 0.</math> The linear functional <math>\varphi[h]</math> is the first variation of <math>J[y]</math> and is denoted by,<ref name='GelfandFominP11–12'>{{harvnb | Gelfand|Fomin| 2000 | pp=11–12}}</ref>
where <math>\varphi[h]</math> is a linear functional,{{efn|name=Linear|A functional <math>\varphi[h]</math> is said to be '''linear''' if <math>\varphi[\alpha h] = \alpha \varphi[h]</math> &nbsp; and &nbsp; <math>\varphi\left[h + h_2\right] = \varphi[h] + \varphi\left[h_2\right],</math> where <math>h, h_2</math> are functions and <math>\alpha</math> is a real number.<ref name='GelfandFominP8'>{{harvnb | Gelfand|Fomin| 2000 | p=8 }}</ref>}} <math>\|h\|</math> is the norm of <math>h,</math>{{efn|name=Norm| For a function <math>h = h(x)</math> that is defined for <math>a \leq x \leq b,</math> where <math>a</math> and <math>b</math> are real numbers, the norm of <math>h</math> is its maximum absolute value, i.e. <math>\|h\| = \displaystyle\max_{a \leq x \leq b} |h(x)|.</math><ref name='GelfandFominP6'>{{harvnb | Gelfand|Fomin| 2000 | p=6 }}</ref>}} and <math>\varepsilon \to 0</math> as <math>\|h\| \to 0.</math> The linear functional <math>\varphi[h]</math> is the first variation of <math>J[y]</math> and is denoted by,<ref name='GelfandFominP11–12'>{{harvnb | Gelfand|Fomin| 2000 | pp=11–12}}</ref>
<math display="block">\delta J[h] = \varphi[h].</math>
<math display="block">\delta J[h] = \varphi[h].</math>


The functional <math>J[y]</math> is said to be '''twice differentiable''' if
The functional <math>J[y]</math> is said to be '''twice differentiable''' if
<math display="block">\Delta J[h] = \varphi_1 [h] + \varphi_2 [h] + \varepsilon \|h\|^2,</math>
<math display="block">\Delta J[h] = \varphi_1 [h] + \varphi_2 [h] + \varepsilon \|h\|^2,</math>
where <math>\varphi_1[h]</math> is a linear functional (the first variation), <math>\varphi_2[h]</math> is a quadratic functional,{{efn|name=Quadratic| A functional is said to be '''quadratic''' if it is a bilinear functional with two argument functions that are equal. A '''bilinear functional''' is a functional that depends on two argument functions and is linear when each argument function in turn is fixed while the other argument function is variable.<ref name='GelfandFominP97–98'>{{harvnb | Gelfand|Fomin| 2000 | pp=97–98 }}</ref>}} and <math>\varepsilon \to 0</math> as <math>\|h\| \to 0.</math> The quadratic functional <math>\varphi_2[h]</math> is the second variation of <math>J[y]</math> and is denoted by,<ref name='GelfandFominP99'>{{harvnb | Gelfand|Fomin| 2000 | p=99 }}</ref>
where <math>\varphi_1[h]</math> is a linear functional (the first variation), <math>\varphi_2[h]</math> is a quadratic functional,{{efn|name=Quadratic| A functional is said to be '''quadratic''' if it is a bilinear functional with two argument functions that are equal. A '''bilinear functional''' is a functional that depends on two argument functions and is linear when each argument function in turn is fixed while the other argument function is variable.<ref name='GelfandFominP97–98'>{{harvnb | Gelfand|Fomin| 2000 | pp=97–98 }}</ref>}} and <math>\varepsilon \to 0</math> as <math>\|h\| \to 0.</math> The quadratic functional <math>\varphi_2[h]</math> is the second variation of <math>J[y]</math> and is denoted by,<ref name='GelfandFominP99'>{{harvnb | Gelfand|Fomin| 2000 | p=99 }}</ref>
<math display="block">\delta^2 J[h] = \varphi_2[h].</math>
<math display="block">\delta^2 J[h] = \varphi_2[h].</math>


The second variation <math>\delta^2 J[h]</math> is said to be '''strongly positive''' if
The second variation <math>\delta^2 J[h]</math> is said to be '''strongly positive''' if
<math display="block">\delta^2J[h] \ge k \|h\|^2,</math>
<math display="block">\delta^2J[h] \ge k \|h\|^2,</math>
for all <math>h</math> and for some constant <math>k > 0</math>.<ref name='GelfandFominP100'>{{harvnb | Gelfand|Fomin| 2000 | p=100 }}</ref>
for all <math>h</math> and for some constant <math>k > 0</math>.<ref name='GelfandFominP100'>{{harvnb | Gelfand|Fomin| 2000 | p=100 }}</ref>


Line 411: Line 565:
* Giaquinta, Mariano; Hildebrandt, Stefan: Calculus of Variations  I and II, Springer-Verlag, {{ISBN|978-3-662-03278-7}} and {{ISBN|978-3-662-06201-2}}
* Giaquinta, Mariano; Hildebrandt, Stefan: Calculus of Variations  I and II, Springer-Verlag, {{ISBN|978-3-662-03278-7}} and {{ISBN|978-3-662-06201-2}}
* Jost, J. and X. Li-Jost: [https://books.google.com/books?id=QN8Iw7fUA-8C Calculus of Variations]. Cambridge University Press, 1998.
* Jost, J. and X. Li-Jost: [https://books.google.com/books?id=QN8Iw7fUA-8C Calculus of Variations]. Cambridge University Press, 1998.
* Lanczos, Cornelius:The Variational Principles of Mechanics (dedicated to Albert Einstein), University of Toronto Press,{{ISBN|0-8020-1743-6}}, followed by 1962, 1966, 1970 editions. {{ISBN|0-486-65067-7}}
* Lebedev, L.P. and Cloud, M.J.: [https://books.google.com/books?id=_T3ez-32YVsC The Calculus of Variations and Functional Analysis with Optimal Control and Applications in Mechanics], World Scientific, 2003, pages 1–98.
* Lebedev, L.P. and Cloud, M.J.: [https://books.google.com/books?id=_T3ez-32YVsC The Calculus of Variations and Functional Analysis with Optimal Control and Applications in Mechanics], World Scientific, 2003, pages 1–98.
* Logan, J. David: [https://books.google.com/books?id=nUk_AQAAIAAJ Applied Mathematics], 3rd edition. Wiley-Interscience, 2006
* Logan, J. David: [https://books.google.com/books?id=nUk_AQAAIAAJ Applied Mathematics], 3rd edition. Wiley-Interscience, 2006

Latest revision as of 21:37, 10 November 2025

Template:Short description Script error: No such module "redirect hatnote". Script error: No such module "sidebar".

The calculus of variations (or variational calculus) is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions to the real numbers.Template:Efn Functionals are often expressed as definite integrals involving functions and their derivatives. Functions that maximize or minimize functionals may be found using the Euler–Lagrange equation of the calculus of variations.

A simple example of such a problem is to find the curve of shortest length connecting two points. If there are no constraints, the solution is a straight line between the points. However, if the curve is constrained to lie on a surface in space, then the solution is less obvious, and possibly many solutions may exist. Such solutions are known as geodesics. A related problem is posed by Fermat's principle: light follows the path of shortest optical length connecting two points, which depends upon the material of the medium. One corresponding concept in mechanics is the principle of least/stationary action.

Many important problems involve functions of several variables. Solutions of boundary value problems for the Laplace equation satisfy the Dirichlet's principle. Plateau's problem requires finding a surface of minimal area that spans a given contour in space: a solution can often be found by dipping a frame in soapy water. Although such experiments are relatively easy to perform, their mathematical formulation is far from simple: there may be more than one locally minimizing surface, and they may have non-trivial topology.

History

The calculus of variations began with the work of Isaac Newton, such as with Newton's minimal resistance problem, which he formulated and solved in 1685, and later published in his Principia in 1687,[1] which was the first problem in the field to be formulated and correctly solved,[1] and was also one of the most difficult problems tackled by variational methods prior to the twentieth century.[2][3][4] This problem was followed by the brachistochrone curve problem raised by Johann Bernoulli (1696),[5] which was similar to one raised by Galileo Galilei in 1638, but he did not solve the problem explicitly nor did he use the methods based on calculus.[2] Bernoulli solved the problem using the principle of least time in the process, but not calculus of variations. In 1697 Newton solved the problem using variational techniques, and as a result, he pioneered the field with his work on the two problems.[3] The problem would immediately occupy the attention of Jacob Bernoulli and the Marquis de l'Hôpital, but Leonhard Euler first elaborated the subject, beginning in 1733. Joseph-Louis Lagrange was influenced by Euler's work to contribute greatly to the theory. After Euler saw the 1755 work of the 19-year-old Lagrange, Euler dropped his own partly geometric approach in favor of Lagrange's purely analytic approach and renamed the subject the calculus of variations in his 1756 lecture Elementa Calculi Variationum.[6][7]Template:Efn

Adrien-Marie Legendre (1786) laid down a method, not entirely satisfactory, for the discrimination of maxima and minima. Isaac Newton and Gottfried Leibniz also gave some early attention to the subject.[8] To this discrimination Vincenzo Brunacci (1810), Carl Friedrich Gauss (1829), Siméon Poisson (1831), Mikhail Ostrogradsky (1834), and Carl Jacobi (1837) have been among the contributors. An important general work is that of Pierre Frédéric Sarrus (1842) which was condensed and improved by Augustin-Louis Cauchy (1844). Other valuable treatises and memoirs have been written by StrauchScript error: No such module "Unsubst". (1849), John Hewitt Jellett (1850), Otto Hesse (1857), Alfred Clebsch (1858), and Lewis Buffett Carll (1885), but perhaps the most important work of the century is that of Karl Weierstrass. His celebrated course on the theory is epoch-making, and it may be asserted that he was the first to place it on a firm and unquestionable foundation. The 20th and the 23rd Hilbert problem published in 1900 encouraged further development.[8]

In the 20th century David Hilbert, Oskar Bolza, Gilbert Ames Bliss, Emmy Noether, Leonida Tonelli, Henri Lebesgue and Jacques Hadamard among others made significant contributions.[8] Marston Morse applied calculus of variations in what is now called Morse theory.[9] Lev Pontryagin, Ralph Rockafellar and F. H. Clarke developed new mathematical tools for the calculus of variations in optimal control theory.[9] The dynamic programming of Richard Bellman is an alternative to the calculus of variations.[10][11][12]Template:Efn

Extrema

The calculus of variations is concerned with the maxima or minima (collectively called extrema) of functionals. A functional maps functions to scalars, so functionals have been described as "functions of functions." Functionals have extrema with respect to the elements y of a given function space defined over a given domain. A functional J[y] is said to have an extremum at the function f if ΔJ=J[y]J[f] has the same sign for all y in an arbitrarily small neighborhood of f.Template:Efn The function f is called an extremal function or extremal.Template:Efn The extremum J[f] is called a local maximum if ΔJ0 everywhere in an arbitrarily small neighborhood of f, and a local minimum if ΔJ0 there. For a function space of continuous functions, extrema of corresponding functionals are called strong extrema or weak extrema, depending on whether the first derivatives of the continuous functions are respectively all continuous or not.[13]

File:Examples of Euler-Lagrange equation.jpg
Examples where calculus of variations can be applied- finding minimal surfaces, finding geodesics, deriving Snell's law of refraction, getting an equation to solve the double pendulum problem numerically

Both strong and weak extrema of functionals are for a space of continuous functions but strong extrema have the additional requirement that the first derivatives of the functions in the space be continuous. Thus a strong extremum is also a weak extremum, but the converse may not hold. Finding strong extrema is more difficult than finding weak extrema.[14] An example of a necessary condition that is used for finding weak extrema is the Euler–Lagrange equation.[15]Template:Efn

Euler–Lagrange equation

Script error: No such module "Labelled list hatnote". Finding the extrema of functionals is similar to finding the maxima and minima of functions. The maxima and minima of a function may be located by finding the points where its derivative vanishes (i.e., is equal to zero). The extrema of functionals may be obtained by finding functions for which the functional derivative is equal to zero. This leads to solving the associated Euler–Lagrange equation.Template:Efn

Consider the functional

J[y]=x1x2L(x,y(x),y(x))dx,

where

  • x1,x2 are constants,
  • y(x) is twice continuously differentiable,
  • y(x)=dydx,
  • L(x,y(x),y(x)) is twice continuously differentiable with respect to its arguments x,y, and y.

If the functional J[y] attains a local minimum at f, and η(x) is an arbitrary function that has at least one derivative and vanishes at the endpoints x1 and x2, then for any number ε close to 0,

J[f]J[f+εη].

The term εη is called the variation of the function f and is denoted by δf.[16]Template:Efn

Substituting f+εη for y in the functional J[y], the result is a function of ε,

Φ(ε)=J[f+εη].

Since the functional J[y] has a minimum for y=f the function Φ(ε) has a minimum at ε=0 and thus,Template:Efn

Φ(0)dΦdε|ε=0=x1x2dLdε|ε=0dx=0.

Taking the total derivative of L[x,y,y], where y=f+εη and y=f+εη are considered as functions of ε rather than x, yields

dLdε=Lydydε+Lydydε

and because dydε=η and dydε=η,

dLdε=Lyη+Lyη.

Therefore,

x1x2dLdε|ε=0dx=x1x2(Lfη+Lfη)dx=x1x2Lfηdx+Lfη|x1x2x1x2ηddxLfdx=x1x2(LfηηddxLf)dx

where L[x,y,y]L[x,f,f] when ε=0 and we have used integration by parts on the second term. The second term on the second line vanishes because η=0 at x1 and x2 by definition. Also, as previously mentioned the left side of the equation is zero so that

x1x2η(x)(LfddxLf)dx=0.

According to the fundamental lemma of calculus of variations, the fact that this equation holds for any choice of η implies that the part of the integrand in parentheses is zero, i.e.

LfddxLf=0

which is called the Euler–Lagrange equation. The left hand side of this equation is called the functional derivative of J[f] and is denoted δJ or δf(x).

In general this gives a second-order ordinary differential equation which can be solved to obtain the extremal function f(x). The Euler–Lagrange equation is a necessary, but not sufficient, condition for an extremum J[f]. A sufficient condition for a minimum is given in the section Variations and sufficient condition for a minimum.

Example

In order to illustrate this process, consider the problem of finding the extremal function y=f(x), which is the shortest curve that connects two points (x1,y1) and (x2,y2). The arc length of the curve is given by

A[y]=x1x21+[y(x)]2dx,

with

y(x)=dydx,  y1=f(x1),  y2=f(x2).

Note that assuming Template:Mvar is a function of Template:Mvar loses generality; ideally both should be a function of some other parameter. This approach is good solely for instructive purposes.

The Euler–Lagrange equation will now be used to find the extremal function f(x) that minimizes the functional A[y].

LfddxLf=0

with

L=1+[f(x)]2.

Since f does not appear explicitly in L, the first term in the Euler–Lagrange equation vanishes for all f(x) and thus,

ddxLf=0.

Substituting for L and taking the derivative,

ddx f(x)1+[f(x)]2 =0.

Thus

f(x)1+[f(x)]2=c,

for some constant c. Then

[f(x)]21+[f(x)]2=c2,

where

0c2<1.

Solving, we get

[f(x)]2=c21c2

which implies that

f(x)=m

is a constant and therefore that the shortest curve that connects two points (x1,y1) and (x2,y2) is

f(x)=mx+bwith  m=y2y1x2x1andb=x2y1x1y2x2x1

and we have thus found the extremal function f(x) that minimizes the functional A[y] so that A[f] is a minimum. The equation for a straight line is y=mx+b. In other words, the shortest distance between two points is a straight line.Template:Efn

Beltrami's identity

In physics problems it may be the case that Lx=0, meaning the integrand is a function of f(x) and f(x) but x does not appear separately. In that case, the Euler–Lagrange equation can be simplified to the Beltrami identity[17]

LfLf=C,

where C is a constant. The left hand side is the Legendre transformation of L with respect to f(x).

The intuition behind this result is that, if the variable x is actually time, then the statement Lx=0 implies that the Lagrangian is time-independent. By Noether's theorem, there is an associated conserved quantity. In this case, this quantity is the Hamiltonian, the Legendre transform of the Lagrangian, which (often) coincides with the energy of the system. This is (minus) the constant in Beltrami's identity.

Euler–Poisson equation

If S depends on higher-derivatives of y(x), that is, if

S=abf(x,y(x),y(x),,y(n)(x))dx,

then y must satisfy the Euler–Poisson equation,[18]

fyddx(fy)++(1)ndndxn[fy(n)]=0.

Du Bois-Reymond's theorem

The discussion thus far has assumed that extremal functions possess two continuous derivatives, although the existence of the integral J requires only first derivatives of trial functions. The condition that the first variation vanishes at an extremal may be regarded as a weak form of the Euler–Lagrange equation. The theorem of Du Bois-Reymond asserts that this weak form implies the strong form. If L has continuous first and second derivatives with respect to all of its arguments, and if

2Lf'20,

then f has two continuous derivatives, and it satisfies the Euler–Lagrange equation.

Lavrentiev phenomenon

Hilbert was the first to give good conditions for the Euler–Lagrange equations to give a stationary solution. Within a convex area and a positive thrice differentiable Lagrangian the solutions are composed of a countable collection of sections that either go along the boundary or satisfy the Euler–Lagrange equations in the interior.

However Lavrentiev in 1926 showed that there are circumstances where there is no optimum solution but one can be approached arbitrarily closely by increasing numbers of sections. The Lavrentiev Phenomenon identifies a difference in the infimum of a minimization problem across different classes of admissible functions. For instance the following problem, presented by Manià in 1934:[19]

L[x]=01(x3t)2x'6,

A={xW1,1(0,1):x(0)=0, x(1)=1}.

Clearly, x(t)=t13minimizes the functional, but we find any function xW1, gives a value bounded away from the infimum.

Examples (in one-dimension) are traditionally manifested across W1,1 and W1,, but Ball and Mizel[20] procured the first functional that displayed Lavrentiev's Phenomenon across W1,p and W1,q for 1p<q<. There are several results that gives criteria under which the phenomenon does not occur - for instance 'standard growth', a Lagrangian with no dependence on the second variable, or an approximating sequence satisfying Cesari's Condition (D) - but results are often particular, and applicable to a small class of functionals.

Connected with the Lavrentiev Phenomenon is the repulsion property: any functional displaying Lavrentiev's Phenomenon will display the weak repulsion property.[21]

Functions of several variables

For example, if φ(x,y) denotes the displacement of a membrane above the domain D in the x,y plane, then its potential energy is proportional to its surface area:

U[φ]=D1+φφdxdy.

Plateau's problem consists of finding a function that minimizes the surface area while assuming prescribed values on the boundary of D; the solutions are called minimal surfaces. The Euler–Lagrange equation for this problem is nonlinear:

φxx(1+φy2)+φyy(1+φx2)2φxφyφxy=0.

See Courant (1950) for details.

Dirichlet's principle

It is often sufficient to consider only small displacements of the membrane, whose energy difference from no displacement is approximated by

V[φ]=12Dφφdxdy.

The functional V is to be minimized among all trial functions φ that assume prescribed values on the boundary of D. If u is the minimizing function and v is an arbitrary smooth function that vanishes on the boundary of D, then the first variation of V[u+εv] must vanish:

ddεV[u+εv]|ε=0=Duvdxdy=0.

Provided that u has two derivatives, we may apply the divergence theorem to obtain

D(vu)dxdy=Duv+vudxdy=Cvunds,

where C is the boundary of D, s is arclength along C and u/n is the normal derivative of u on C. Since v vanishes on C and the first variation vanishes, the result is

Dvudxdy=0

for all smooth functions v that vanish on the boundary of D. The proof for the case of one dimensional integrals may be adapted to this case to show that

u=0in D.

The difficulty with this reasoning is the assumption that the minimizing function u must have two derivatives. Riemann argued that the existence of a smooth minimizing function was assured by the connection with the physical problem: membranes do indeed assume configurations with minimal potential energy. Riemann named this idea the Dirichlet principle in honor of his teacher Peter Gustav Lejeune Dirichlet. However Weierstrass gave an example of a variational problem with no solution: minimize

W[φ]=11(xφ)2dx

among all functions φ that satisfy φ(1)=1 and φ(1)=1. W can be made arbitrarily small by choosing piecewise linear functions that make a transition between −1 and 1 in a small neighborhood of the origin. However, there is no function that makes W=0.Template:Efn Eventually it was shown that Dirichlet's principle is valid, but it requires a sophisticated application of the regularity theory for elliptic partial differential equations; see Jost and Li–Jost (1998).

Generalization to other boundary value problems

A more general expression for the potential energy of a membrane is

V[φ]=D[12φφ+f(x,y)φ]dxdy+C[12σ(s)φ2+g(s)φ]ds.

This corresponds to an external force density f(x,y) in D, an external force g(s) on the boundary C, and elastic forces with modulus σ(s)acting on C. The function that minimizes the potential energy with no restriction on its boundary values will be denoted by u. Provided that f and g are continuous, regularity theory implies that the minimizing function u will have two derivatives. In taking the first variation, no boundary condition need be imposed on the increment v. The first variation of V[u+εv] is given by

D[uv+fv]dxdy+C[σuv+gv]ds=0.

If we apply the divergence theorem, the result is

D[vu+vf]dxdy+Cv[un+σu+g]ds=0.

If we first set v=0 on C, the boundary integral vanishes, and we conclude as before that

u+f=0

in D. Then if we allow v to assume arbitrary boundary values, this implies that u must satisfy the boundary condition

un+σu+g=0,

on C. This boundary condition is a consequence of the minimizing property of u: it is not imposed beforehand. Such conditions are called natural boundary conditions.

The preceding reasoning is not valid if σ vanishes identically on C. In such a case, we could allow a trial function φc, where c is a constant. For such a trial function,

V[c]=c[Dfdxdy+Cgds].

By appropriate choice of c, V can assume any value unless the quantity inside the brackets vanishes. Therefore, the variational problem is meaningless unless

Dfdxdy+Cgds=0.

This condition implies that net external forces on the system are in equilibrium. If these forces are in equilibrium, then the variational problem has a solution, but it is not unique, since an arbitrary constant may be added. Further details and examples are in Courant and Hilbert (1953).

Eigenvalue problems

Both one-dimensional and multi-dimensional eigenvalue problems can be formulated as variational problems.

Sturm–Liouville problems

Script error: No such module "Labelled list hatnote". The Sturm–Liouville eigenvalue problem involves a general quadratic form

Q[y]=x1x2[p(x)y(x)2+q(x)y(x)2]dx,

where y is restricted to functions that satisfy the boundary conditions

y(x1)=0,y(x2)=0.

Let R be a normalization integral

R[y]=x1x2r(x)y(x)2dx.

The functions p(x) and r(x) are required to be everywhere positive and bounded away from zero. The primary variational problem is to minimize the ratio Q/R among all y satisfying the endpoint conditions, which is equivalent to minimizing Q[y] under the constraint that R[y] is constant. It is shown below that the Euler–Lagrange equation for the minimizing u is

(pu)+quλru=0,

where λ is the quotient

λ=Q[u]R[u].

It can be shown (see Gelfand and Fomin 1963) that the minimizing u has two derivatives and satisfies the Euler–Lagrange equation. The associated λ will be denoted by λ1; it is the lowest eigenvalue for this equation and boundary conditions. The associated minimizing function will be denoted by u1(x). This variational characterization of eigenvalues leads to the Rayleigh–Ritz method: choose an approximating u as a linear combination of basis functions (for example trigonometric functions) and carry out a finite-dimensional minimization among such linear combinations. This method is often surprisingly accurate.

The next smallest eigenvalue and eigenfunction can be obtained by minimizing Q under the additional constraint

x1x2r(x)u1(x)y(x)dx=0.

This procedure can be extended to obtain the complete sequence of eigenvalues and eigenfunctions for the problem.

The variational problem also applies to more general boundary conditions. Instead of requiring that y vanish at the endpoints, we may not impose any condition at the endpoints, and set

Q[y]=x1x2[p(x)y(x)2+q(x)y(x)2]dx+a1y(x1)2+a2y(x2)2,

where a1 and a2 are arbitrary. If we set y=u+εv, the first variation for the ratio Q/R is

V1=2R[u](x1x2[p(x)u(x)v(x)+q(x)u(x)v(x)λr(x)u(x)v(x)]dx+a1u(x1)v(x1)+a2u(x2)v(x2)),

where λ is given by the ratio Q[u]/R[u] as previously. After integration by parts,

R[u]2V1=x1x2v(x)[(pu)+quλru]dx+v(x1)[p(x1)u(x1)+a1u(x1)]+v(x2)[p(x2)u(x2)+a2u(x2)].

If we first require that v vanish at the endpoints, the first variation will vanish for all such v only if

(pu)+quλru=0forx1<x<x2.

If u satisfies this condition, then the first variation will vanish for arbitrary v only if

p(x1)u(x1)+a1u(x1)=0,andp(x2)u(x2)+a2u(x2)=0.

These latter conditions are the natural boundary conditions for this problem, since they are not imposed on trial functions for the minimization, but are instead a consequence of the minimization.

Eigenvalue problems in several dimensions

Eigenvalue problems in higher dimensions are defined in analogy with the one-dimensional case. For example, given a domain D with boundary B in three dimensions we may define

Q[φ]=Dp(X)φφ+q(X)φ2dxdydz+Bσ(S)φ2dS,

and

R[φ]=Dr(X)φ(X)2dxdydz.

Let u be the function that minimizes the quotient Q[φ]/R[φ], with no condition prescribed on the boundary B. The Euler–Lagrange equation satisfied by u is

(p(X)u)+q(x)uλr(x)u=0,

where

λ=Q[u]R[u].

The minimizing u must also satisfy the natural boundary condition

p(S)un+σ(S)u=0,

on the boundary B. This result depends upon the regularity theory for elliptic partial differential equations; see Jost and Li–Jost (1998) for details. Many extensions, including completeness results, asymptotic properties of the eigenvalues and results concerning the nodes of the eigenfunctions are in Courant and Hilbert (1953).

Applications

Optics

Fermat's principle states that light takes a path that (locally) minimizes the optical length between its endpoints. If the x-coordinate is chosen as the parameter along the path, and y=f(x) along the path, then the optical length is given by

A[f]=x0x1n(x,f(x))1+f(x)2dx,

where the refractive index n(x,y) depends upon the material. If we try f(x)=f0(x)+εf1(x) then the first variation of A (the derivative of A with respect to ε) is

δA[f0,f1]=x0x1[n(x,f0)f0(x)f1(x)1+f0(x)2+ny(x,f0)f11+f0(x)2]dx.

After integration by parts of the first term within brackets, we obtain the Euler–Lagrange equation

ddx[n(x,f0)f01+f0'2]+ny(x,f0)1+f0(x)2=0.

The light rays may be determined by integrating this equation. This formalism is used in the context of Lagrangian optics and Hamiltonian optics.

Snell's law

There is a discontinuity of the refractive index when light enters or leaves a lens. Let

n(x,y)={n()ifx<0,n(+)ifx>0,

where n() and n(+) are constants. Then the Euler–Lagrange equation holds as before in the region where x<0 or x>0, and in fact the path is a straight line there, since the refractive index is constant. At the x=0, f must be continuous, but f may be discontinuous. After integration by parts in the separate regions and using the Euler–Lagrange equations, the first variation takes the form

δA[f0,f1]=f1(0)[n()f0(0)1+f0(0)2n(+)f0(0+)1+f0(0+)2].

The factor multiplying n() is the sine of angle of the incident ray with the x axis, and the factor multiplying n(+) is the sine of angle of the refracted ray with the x axis. Snell's law for refraction requires that these terms be equal. As this calculation demonstrates, Snell's law is equivalent to vanishing of the first variation of the optical path length.

Fermat's principle in three dimensions

It is expedient to use vector notation: let X=(x1,x2,x3), let t be a parameter, let X(t) be the parametric representation of a curve C, and let X˙(t) be its tangent vector. The optical length of the curve is given by

A[C]=t0t1n(X)X˙X˙dt.

Note that this integral is invariant with respect to changes in the parametric representation of C. The Euler–Lagrange equations for a minimizing curve have the symmetric form

ddtP=X˙X˙n,

where

P=n(X)X˙X˙X˙.

It follows from the definition that P satisfies

PP=n(X)2.

Therefore, the integral may also be written as

A[C]=t0t1PX˙dt.

This form suggests that if we can find a function ψ whose gradient is given by P, then the integral A is given by the difference of ψ at the endpoints of the interval of integration. Thus the problem of studying the curves that make the integral stationary can be related to the study of the level surfaces of ψ. In order to find such a function, we turn to the wave equation, which governs the propagation of light. This formalism is used in the context of Lagrangian optics and Hamiltonian optics.

Connection with the wave equation

The wave equation for an inhomogeneous medium is

utt=c2u,

where c is the velocity, which generally depends upon X. Wave fronts for light are characteristic surfaces for this partial differential equation: they satisfy

φt2=c(X)2φφ.

We may look for solutions in the form

φ(t,X)=tψ(X).

In that case, ψ satisfies

ψψ=n2,

where n=1/c. According to the theory of first-order partial differential equations, if P=ψ, then P satisfies

dPds=nn,

along a system of curves (the light rays) that are given by

dXds=P.

These equations for solution of a first-order partial differential equation are identical to the Euler–Lagrange equations if we make the identification

dsdt=X˙X˙n.

We conclude that the function ψ is the value of the minimizing integral A as a function of the upper end point. That is, when a family of minimizing curves is constructed, the values of the optical length satisfy the characteristic equation corresponding the wave equation. Hence, solving the associated partial differential equation of first order is equivalent to finding families of solutions of the variational problem. This is the essential content of the Hamilton–Jacobi theory, which applies to more general variational problems.

Mechanics

Script error: No such module "Labelled list hatnote".

In classical mechanics, the action, S, is defined as the time integral of the Lagrangian, L. The Lagrangian is the difference of energies,

L=TU,

where T is the kinetic energy of a mechanical system and U its potential energy. Hamilton's principle (or the action principle) states that the motion of a conservative holonomic (integrable constraints) mechanical system is such that the action integral

S=t0t1L(x,x˙,t)dt

is stationary with respect to variations in the path x(t). The Euler–Lagrange equations for this system are known as Lagrange's equations:

ddtLx˙=Lx,

and they are equivalent to Newton's equations of motion (for such systems).

The conjugate momenta P are defined by

p=Lx˙.

For example, if

T=12mx˙2,

then p=mx˙.

Hamiltonian mechanics results if the conjugate momenta are introduced in place of x˙ by a Legendre transformation of the Lagrangian L into the Hamiltonian H defined by

H(x,p,t)=px˙L(x,x˙,t).

The Hamiltonian is the total energy of the system: H=T+U. Analogy with Fermat's principle suggests that solutions of Lagrange's equations (the particle trajectories) may be described in terms of level surfaces of some function of X. This function is a solution of the Hamilton–Jacobi equation:

ψt+H(x,ψx,t)=0.

Further applications

Further applications of the calculus of variations include the following:

Variations and sufficient condition for a minimum

Calculus of variations is concerned with variations of functionals, which are small changes in the functional's value due to small changes in the function that is its argument. The first variationTemplate:Efn is defined as the linear part of the change in the functional, and the second variationTemplate:Efn is defined as the quadratic part.[22]

For example, if J[y] is a functional with the function y=y(x) as its argument, and there is a small change in its argument from y to y+h, where h=h(x) is a function in the same function space as y, then the corresponding change in the functional isTemplate:Efn

ΔJ[h]=J[y+h]J[y].

The functional J[y] is said to be differentiable if

ΔJ[h]=φ[h]+εh,

where φ[h] is a linear functional,Template:Efn h is the norm of h,Template:Efn and ε0 as h0. The linear functional φ[h] is the first variation of J[y] and is denoted by,[23]

δJ[h]=φ[h].

The functional J[y] is said to be twice differentiable if

ΔJ[h]=φ1[h]+φ2[h]+εh2,

where φ1[h] is a linear functional (the first variation), φ2[h] is a quadratic functional,Template:Efn and ε0 as h0. The quadratic functional φ2[h] is the second variation of J[y] and is denoted by,[24]

δ2J[h]=φ2[h].

The second variation δ2J[h] is said to be strongly positive if

δ2J[h]kh2,

for all h and for some constant k>0.[25]

Using the above definitions, especially the definitions of first variation, second variation, and strongly positive, the following sufficient condition for a minimum of a functional can be stated.

<templatestyles src="Template:Quote_box/styles.css" />

Sufficient condition for a minimum:

Template:Block indent

Script error: No such module "Check for unknown parameters".

See also

<templatestyles src="Div col/styles.css"/>

Notes

Template:Notelist

References

<templatestyles src="Reflist/styles.css" />

  1. a b Script error: No such module "citation/CS1".
  2. a b Script error: No such module "citation/CS1".
  3. a b Script error: No such module "citation/CS1".
  4. Script error: No such module "Citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. Script error: No such module "citation/CS1".
  8. a b c Script error: No such module "citation/CS1".
  9. a b Script error: No such module "citation/CS1".
  10. Dimitri Bertsekas. Dynamic programming and optimal control. Athena Scientific, 2005.
  11. Script error: No such module "Citation/CS1".
  12. Script error: No such module "citation/CS1".
  13. Script error: No such module "Footnotes".
  14. Script error: No such module "Footnotes".
  15. Script error: No such module "Footnotes".
  16. Cite error: Script error: No such module "Namespace detect".Script error: No such module "Namespace detect".
  17. Script error: No such module "citation/CS1".
  18. Script error: No such module "citation/CS1".
  19. Script error: No such module "Citation/CS1".
  20. Script error: No such module "Citation/CS1".
  21. Script error: No such module "Citation/CS1".
  22. Script error: No such module "Footnotes".
  23. Script error: No such module "Footnotes".
  24. Script error: No such module "Footnotes".
  25. Script error: No such module "Footnotes".

Script error: No such module "Check for unknown parameters".

Further reading

External links

Template:Analysis in topological vector spaces Template:Convex analysis and variational analysis

Script error: No such module "Navbox". Template:Authority control