<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Generalized_Hebbian_algorithm</id>
	<title>Generalized Hebbian algorithm - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Generalized_Hebbian_algorithm"/>
	<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Generalized_Hebbian_algorithm&amp;action=history"/>
	<updated>2026-05-04T19:32:59Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>http://debianws.lexgopc.com/wiki143/index.php?title=Generalized_Hebbian_algorithm&amp;diff=5832297&amp;oldid=prev</id>
		<title>imported&gt;GBainier: /* Derivation */ this is not a diagonalization!</title>
		<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Generalized_Hebbian_algorithm&amp;diff=5832297&amp;oldid=prev"/>
		<updated>2025-06-20T15:55:33Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Derivation: &lt;/span&gt; this is not a diagonalization!&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{short description|Linear feedforward neural network model}}&lt;br /&gt;
The &amp;#039;&amp;#039;&amp;#039;generalized Hebbian algorithm&amp;#039;&amp;#039;&amp;#039;, also known in the literature as &amp;#039;&amp;#039;&amp;#039;Sanger&amp;#039;s rule&amp;#039;&amp;#039;&amp;#039;, is a linear [[feedforward neural network]] for [[unsupervised learning]] with applications primarily in [[principal components analysis]]. First defined in 1989,&amp;lt;ref name=&amp;quot;Sanger89&amp;quot;&amp;gt;{{cite journal |last=Sanger |first=Terence D. |author-link=Terence Sanger |year=1989 |title= Optimal unsupervised learning in a single-layer linear feedforward neural network |journal=Neural Networks |volume=2 |issue=6 |pages=459–473 |url=http://courses.cs.washington.edu/courses/cse528/09sp/sanger_pca_nn.pdf |access-date= 2007-11-24 |doi= 10.1016/0893-6080(89)90044-0 |citeseerx=10.1.1.128.6893 }}&amp;lt;/ref&amp;gt; it is similar to [[Oja&amp;#039;s rule]] in its formulation and stability, except it can be applied to networks with multiple outputs.  The name originates because of the similarity between the algorithm and a hypothesis made by [[Donald Hebb]]&amp;lt;ref name=&amp;quot;Hebb 1949&amp;quot;&amp;gt;{{Cite book|last=Hebb|first=D.O.|author-link=Donald Olding Hebb|title=The Organization of Behavior|publisher=Wiley &amp;amp; Sons|location=New York|year=1949|isbn=9781135631918|url=https://books.google.com/books?id=uyV5AgAAQBAJ}}&amp;lt;/ref&amp;gt; about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic [[neurons]].&amp;lt;ref name=&amp;quot;Hertz, Krough, and Palmer, 1991&amp;quot;&amp;gt;{{Cite book|last=Hertz|first=John|author2=Anders Krough |author3=Richard G. Palmer |title=Introduction to the Theory of Neural Computation|publisher=Addison-Wesley Publishing Company|location=Redwood City, CA|year=1991|isbn=978-0201515602}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Theory==&lt;br /&gt;
Consider a problem of learning a linear code for some data. Each data is a multi-dimensional vector &amp;lt;math&amp;gt;x \in \R^n&amp;lt;/math&amp;gt;, and can be (approximately) represented as a linear sum of linear code vectors &amp;lt;math&amp;gt;w_1, \dots, w_m&amp;lt;/math&amp;gt;. When &amp;lt;math&amp;gt;m = n&amp;lt;/math&amp;gt;, it is possible to exactly represent the data. If &amp;lt;math&amp;gt;m &amp;lt; n&amp;lt;/math&amp;gt;, it is possible to approximately represent the data. To minimize the L2 loss of representation, &amp;lt;math&amp;gt;w_1, \dots, w_m&amp;lt;/math&amp;gt; should be the highest principal component vectors.&lt;br /&gt;
&lt;br /&gt;
The generalized Hebbian algorithm is an iterative algorithm to find the highest principal component vectors, in an algorithmic form that resembles [[Unsupervised learning|unsupervised]] Hebbian learning in neural networks.&lt;br /&gt;
&lt;br /&gt;
Consider a one-layered neural network with &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; input neurons and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; output neurons &amp;lt;math&amp;gt;y_1, \dots, y_m&amp;lt;/math&amp;gt;. The linear code vectors are the connection strengths, that is, &amp;lt;math&amp;gt;w_{ij}&amp;lt;/math&amp;gt; is the [[synaptic weight]] or connection strength between the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;-th input and &amp;lt;math&amp;gt;i&amp;lt;/math&amp;gt;-th output neurons.&lt;br /&gt;
&lt;br /&gt;
The generalized Hebbian algorithm learning rule is of the form&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\,\Delta w_{ij} ~ = ~ \eta y_i \left(x_j - \sum_{k=1}^{i} w_{kj} y_k \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\eta&amp;lt;/math&amp;gt; is the &amp;#039;&amp;#039;[[learning rate]]&amp;#039;&amp;#039; parameter.&amp;lt;ref&amp;gt;{{Citation |last=Gorrell |first=Genevieve |title=Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing. |journal=EACL |year=2006 |citeseerx=10.1.1.102.2084}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Derivation===&lt;br /&gt;
In matrix form, Oja&amp;#039;s rule can be written&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\,\frac{\text{d} w(t)}{\text{d} t} ~ = ~ w(t) Q - \mathrm{diag} [w(t) Q w(t)^{\mathrm{T}}] w(t)&amp;lt;/math&amp;gt;,&lt;br /&gt;
&lt;br /&gt;
and the Gram-Schmidt algorithm is&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\,\Delta w(t) ~ = ~ -\mathrm{lower} [w(t) w(t)^{\mathrm{T}}] w(t)&amp;lt;/math&amp;gt;,&lt;br /&gt;
&lt;br /&gt;
where {{math|&amp;#039;&amp;#039;w&amp;#039;&amp;#039;(&amp;#039;&amp;#039;t&amp;#039;&amp;#039;)}} is any matrix, in this case representing synaptic weights, {{math|&amp;#039;&amp;#039;Q&amp;#039;&amp;#039; {{=}} &amp;#039;&amp;#039;η&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;x&amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;x&amp;#039;&amp;#039;&amp;#039;&amp;lt;sup&amp;gt;T&amp;lt;/sup&amp;gt;}} is the autocorrelation matrix, simply the outer product of inputs, {{math|diag}} is the function that sets all matrix elements outside of the diagonal equal to 0, and {{math|lower}} is the function that sets all matrix elements on or above the diagonal equal to 0. We can combine these equations to get our original rule in matrix form,&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\,\Delta w(t) ~ = ~ \eta(t) \left(\mathbf{y}(t) \mathbf{x}(t)^{\mathrm{T}} - \mathrm{LT}[\mathbf{y}(t)\mathbf{y}(t)^{\mathrm{T}}] w(t)\right)&amp;lt;/math&amp;gt;,&lt;br /&gt;
&lt;br /&gt;
where the function {{math|LT}} sets all matrix elements above the diagonal equal to 0, and note that our output {{math|&amp;#039;&amp;#039;&amp;#039;y&amp;#039;&amp;#039;&amp;#039;(&amp;#039;&amp;#039;t&amp;#039;&amp;#039;) {{=}} &amp;#039;&amp;#039;w&amp;#039;&amp;#039;(&amp;#039;&amp;#039;t&amp;#039;&amp;#039;) &amp;#039;&amp;#039;&amp;#039;x&amp;#039;&amp;#039;&amp;#039;(&amp;#039;&amp;#039;t&amp;#039;&amp;#039;)}} is a linear neuron.&amp;lt;ref name=&amp;quot;Sanger89&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Stability and Principal Components Analysis===&lt;br /&gt;
&amp;lt;ref name=&amp;quot;Haykin98&amp;quot;&amp;gt;{{cite book |last=Haykin |first=Simon |author-link=Simon Haykin |title=Neural Networks: A Comprehensive Foundation |edition=2 |year=1998 |publisher=Prentice Hall |isbn=978-0-13-273350-2 }}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Oja&amp;#039;s rule]] is the special case where &amp;lt;math&amp;gt;m = 1&amp;lt;/math&amp;gt;.&amp;lt;ref name=&amp;quot;Oja82&amp;quot;&amp;gt;{{cite journal |last=Oja |first=Erkki |author-link=Erkki Oja |date=November 1982 |title=Simplified neuron model as a principal component analyzer |journal=Journal of Mathematical Biology |volume=15 |issue=3 |pages=267–273 |doi=10.1007/BF00275687 |pmid=7153672 |s2cid=16577977 |id=BF00275687}}&amp;lt;/ref&amp;gt; One can think of the generalized Hebbian algorithm as iterating Oja&amp;#039;s rule. &lt;br /&gt;
&lt;br /&gt;
With Oja&amp;#039;s rule, &amp;lt;math&amp;gt;w_1&amp;lt;/math&amp;gt; is learned, and it has the same direction as the largest principal component vector is learned, with length determined by &amp;lt;math&amp;gt;E[x_j] = E[w_{1j} y_1]&amp;lt;/math&amp;gt; for all &amp;lt;math&amp;gt;j &amp;lt;/math&amp;gt;, where the expectation is taken over all input-output pairs. In other words, the length of the vector &amp;lt;math&amp;gt;w_1&amp;lt;/math&amp;gt; is such that we have an [[autoencoder]], with the latent code &amp;lt;math&amp;gt;y_1 = \sum_i w_{1i} x_i   &amp;lt;/math&amp;gt;, such that &amp;lt;math&amp;gt;E[\| x - y_1 w_1 \|^2] &amp;lt;/math&amp;gt; is minimized.&lt;br /&gt;
&lt;br /&gt;
When &amp;lt;math&amp;gt;m = 2 &amp;lt;/math&amp;gt;, the first neuron in the hidden layer of the autoencoder still learns as described, since it is unaffected by the second neuron. So, after the first neuron and its vector &amp;lt;math&amp;gt;w_1&amp;lt;/math&amp;gt; has converged, the second neuron is effectively running another Oja&amp;#039;s rule on the modified input vectors, defined by &amp;lt;math&amp;gt;x&amp;#039; = x - y_1w_1 &amp;lt;/math&amp;gt;, which we know is the input vector with the first principal component removed. Therefore, the second neuron learns to code for the second principal component. &lt;br /&gt;
&lt;br /&gt;
By induction, this results in finding the top-&amp;lt;math&amp;gt;m &amp;lt;/math&amp;gt; principal components for arbitrary &amp;lt;math&amp;gt;m &amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Applications==&lt;br /&gt;
The generalized Hebbian algorithm is used in applications where a [[self-organizing map]] is necessary, or where a feature or [[principal components analysis]] can be used. Examples of such cases include [[artificial intelligence]] and speech and image processing.&lt;br /&gt;
&lt;br /&gt;
Its importance comes from the fact that learning is a single-layer process—that is, a synaptic weight changes only depending on the response of the inputs and outputs of that layer, thus avoiding the multi-layer dependence associated with the [[backpropagation]] algorithm. It also has a simple and predictable trade-off between learning speed and accuracy of convergence as set by the [[learning]] rate parameter {{math|η}}.&amp;lt;ref name=&amp;quot;Haykin98&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Generalized Hebbian algorithm on 8-by-8 patches of Caltech101.png|thumb|Features learned by generalized Hebbian algorithm running on 8-by-8 patches of [[Caltech 101]].]]&lt;br /&gt;
[[File:Principal component analysis of Caltech101.png|thumb|Features found by [[Principal Component Analysis]] on the same Caltech 101 dataset.]]&lt;br /&gt;
&lt;br /&gt;
As an example, (Olshausen and Field, 1996)&amp;lt;ref&amp;gt;{{Cite journal |last1=Olshausen |first1=Bruno A. |last2=Field |first2=David J. |date=June 1996 |title=Emergence of simple-cell receptive field properties by learning a sparse code for natural images |url=https://www.nature.com/articles/381607a0 |journal=Nature |language=en |volume=381 |issue=6583 |pages=607–609 |doi=10.1038/381607a0 |pmid=8637596 |issn=1476-4687|url-access=subscription }}&amp;lt;/ref&amp;gt; performed the generalized Hebbian algorithm on 8-by-8 patches of photos of natural scenes, and found that it results in Fourier-like features. The features are the same as the principal components found by principal components analysis, as expected, and that, the features are determined by the &amp;lt;math&amp;gt;64\times 64&amp;lt;/math&amp;gt; variance matrix of the samples of 8-by-8 patches. In other words, it is determined by the second-order statistics of the pixels in images. They criticized this as insufficient to capture higher-order statistics which are necessary to explain the Gabor-like features of simple cells in the [[primary visual cortex]].&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
*[[Hebbian learning]]&lt;br /&gt;
*[[Factor analysis]]&lt;br /&gt;
*[[Contrastive Hebbian learning]]&lt;br /&gt;
*[[Oja&amp;#039;s rule]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
{{reflist}}&lt;br /&gt;
&lt;br /&gt;
{{Hebbian learning}}&lt;br /&gt;
&lt;br /&gt;
[[Category:Hebbian theory]]&lt;br /&gt;
[[Category:Artificial neural networks]]&lt;/div&gt;</summary>
		<author><name>imported&gt;GBainier</name></author>
	</entry>
</feed>