Link rot: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>AntiCompositeNumber
Rescuing 4 sources and tagging 0 as dead.) #IABot (v2.0.9.5
imported>Lp0 on fire
m Reverting dubious external links added by Lillybrooks753 (CVPI)
 
Line 3: Line 3:
{{for|link rot on Wikipedia|Wikipedia:Link rot|selfref=y}}
{{for|link rot on Wikipedia|Wikipedia:Link rot|selfref=y}}
[[Image:404 not found.png|alt=Page Not Found|thumb|upright=1.3|A broken link usually leads to an error message.]]
[[Image:404 not found.png|alt=Page Not Found|thumb|upright=1.3|A broken link usually leads to an error message.]]
'''Link rot''' (also called '''link death''', '''link breaking''', or '''reference rot''') is the phenomenon of [[hyperlink]]s tending over time to cease to point to their originally targeted [[computer file|file]], [[web page]], or [[Server (computing)|server]] due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target may be called ''broken'', ''dead'', or ''orphaned''.
'''Link rot''' (also called '''link death''', '''link breaking''', or '''reference rot''') is the phenomenon of [[hyperlink]]s tending over time to cease to point to their originally targeted [[computer file|file]], [[web page]], or [[Server (computing)|server]] due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer resolves at the intended target may be called ''broken'' or ''dead''.


The rate of link rot is a subject of study and research due to its significance to the [[internet]]'s ability to preserve information. Estimates of that rate vary dramatically between studies. Information professionals have warned that link rot could make important archival data disappear, potentially impacting the legal system and scholarship.
The rate of link rot is a subject of study and research due to its significance to the [[internet]]'s ability to preserve information. Estimates of that rate vary dramatically between studies. Information professionals have warned that link rot could make important archival data disappear, potentially impacting the legal system and scholarship.
Line 14: Line 14:
A 2002 study suggested that link rot within digital libraries is considerably slower than on the web. The article found that about 3% of the objects were no longer accessible after one year,<ref name=Nelson2002>{{cite journal | first1 = Michael L. | last1 = Nelson | first2 = B. Danette | last2 = Allen | year = 2002 | title = Object Persistence and Availability in Digital Libraries | doi = 10.1045/january2002-nelson | journal = D-Lib Magazine | volume = 8 | issue = 1 | url = https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1008&context=computerscience_fac_pubs | doi-access = free | access-date = 2019-09-24 | archive-date = 2020-07-19 | archive-url = https://web.archive.org/web/20200719044311/https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1008&context=computerscience_fac_pubs | url-status = live }}</ref> equating to a [[half-life]] of nearly 23 years.
A 2002 study suggested that link rot within digital libraries is considerably slower than on the web. The article found that about 3% of the objects were no longer accessible after one year,<ref name=Nelson2002>{{cite journal | first1 = Michael L. | last1 = Nelson | first2 = B. Danette | last2 = Allen | year = 2002 | title = Object Persistence and Availability in Digital Libraries | doi = 10.1045/january2002-nelson | journal = D-Lib Magazine | volume = 8 | issue = 1 | url = https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1008&context=computerscience_fac_pubs | doi-access = free | access-date = 2019-09-24 | archive-date = 2020-07-19 | archive-url = https://web.archive.org/web/20200719044311/https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1008&context=computerscience_fac_pubs | url-status = live }}</ref> equating to a [[half-life]] of nearly 23 years.


A 2003 study found that on the Web, about one link out of every 200 broke each week,<ref name=Fetterly2003>{{cite conference | first1 = Dennis | last1 = Fetterly | first2 = Mark | last2 = Manasse | first3 = Marc | last3 = Najork | first4 = Janet | last4 = Wiener | year = 2003 | title = A large-scale study of the evolution of web pages | url = http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html | book-title = Proceedings of the 12th international conference on World Wide Web | access-date = 14 September 2010 | conference =  | archive-date = 9 July 2011 | archive-url = https://web.archive.org/web/20110709175020/http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html | url-status = live }}</ref> suggesting a [[half-life]] of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in [[Yahoo! Directory]] (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.<ref>{{cite web |last=van der Graaf |first=Hans |title=The half-life of a link is two year |url=http://blog.zomdir.com/2017/10/the-half-life-of-link-is-two-year.html |website=ZOMDir's blog |access-date=2019-01-31 |url-status=live |archive-url=https://web.archive.org/web/20171017041901/http://blog.zomdir.com/2017/10/the-half-life-of-link-is-two-year.html |archive-date=2017-10-17}}</ref>
A 2003 study found that on the Web, about one link out of every 200 broke each week,<ref name=Fetterly2003>{{cite conference | first1 = Dennis | last1 = Fetterly | first2 = Mark | last2 = Manasse |author-link3= Marc Najork | first3 = Marc | last3 = Najork | first4 = Janet | last4 = Wiener | year = 2003 | title = A large-scale study of the evolution of web pages | url = http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html | book-title = Proceedings of the 12th international conference on World Wide Web | access-date = 14 September 2010 | conference =  | archive-date = 9 July 2011 | archive-url = https://web.archive.org/web/20110709175020/http://www2003.org/cdrom/papers/refereed/p097/P97%20sources/p97-fetterly.html | url-status = live }}</ref> suggesting a [[half-life]] of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in [[Yahoo! Directory]] (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.<ref>{{cite web |last=van der Graaf |first=Hans |title=The half-life of a link is two year |url=http://blog.zomdir.com/2017/10/the-half-life-of-link-is-two-year.html |website=ZOMDir's blog |access-date=2019-01-31 |url-status=live |archive-url=https://web.archive.org/web/20171017041901/http://blog.zomdir.com/2017/10/the-half-life-of-link-is-two-year.html |archive-date=2017-10-17}}</ref>


A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institutions) could have dramatically different half-lives.<ref name=Koehler2004>{{cite journal | first = Wallace | last = Koehler | year = 2004 | title = A longitudinal study of web pages continued: a consideration of document persistence | url = http://www.informationr.net/ir/9-2/paper174.html | journal = Information Research | volume = 9 | issue = 2 | access-date = 2019-01-31 | archive-url = https://web.archive.org/web/20170911062629/http://www.informationr.net/ir/9-2/paper174.html | archive-date = 2017-09-11 | url-status = live}}</ref> The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,<ref>{{cite web | title = All-Time Weblock Report | date = August 2015 | url = https://weblock.io/report?id=all | access-date = 12 January 2016 | url-status = dead | archive-url = https://web.archive.org/web/20160304081204/https://weblock.io/report?id=all | archive-date = 4 March 2016}}</ref> generally confirming a 2005 study that found that half of the [[Uniform Resource Locator|URLs]] cited in ''[[D-Lib Magazine]]'' articles were active 10 years after publication.<ref name=McCown2005>{{cite conference | first1 = Frank | last1 = McCown | first2 = Sheffan | last2 = Chan | first3 = Michael L. | last3 = Nelson | first4 = Johan | last4 = Bollen | year = 2005 | title = The Availability and Persistence of Web References in D-Lib Magazine | url = http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf | url-status = dead | book-title = Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05) | access-date = 2005-10-12 | archive-url = https://web.archive.org/web/20120717000118/http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf | archive-date = 2012-07-17 }}</ref> Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.<ref name=Spinellis2003>{{cite journal | author-link = Diomidis Spinellis | first = Diomidis | last = Spinellis | year = 2003 | title = The Decay and Failures of Web References | url = http://www.spinellis.gr/pubs/jrnl/2003-CACM-URLcite/html/urlcite.html | journal = Communications of the ACM | volume = 46 | issue = 1 | pages = 71–77 | doi = 10.1145/602421.602422 | citeseerx = 10.1.1.12.9599 | s2cid = 17750450 | access-date = 2007-09-29 | archive-date = 2020-07-23 | archive-url = https://web.archive.org/web/20200723030709/https://www.spinellis.gr/pubs/jrnl/2003-CACM-URLcite/html/urlcite.html | url-status = live }}</ref><ref name=Lawrence2001>{{Cite Q | Q21012586 }}</ref> A 2013 study in ''[[BMC Bioinformatics]]'' analyzed nearly 15,000 links in abstracts from Thomson Reuters's [[Web of Science]] citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.<ref>{{Cite journal | last1 = Hennessey | first1 = Jason | last2 = Xijin Ge | first2 = Steven | title = A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques | journal = BMC Bioinformatics | volume = 14 | pages = S5 | date = 2013 | issue = Suppl 14 | doi = 10.1186/1471-2105-14-S14-S5 | pmid = 24266891 | pmc = 3851533 | doi-access = free }}</ref> A 2021 study of external links in ''[[New York Times]]'' articles published between 1996 and 2019 found a half-life of about 15 years (with significant variance among content topics) but noted that 13% of functional links no longer lead to the original content—a phenomenon called ''content drift''.<ref>{{Cite web|title=What the ephemerality of the Web means for your hyperlinks|url=https://www.cjr.org/analysis/linkrot-content-drift-new-york-times.php|access-date=2021-08-02|website=Columbia Journalism Review|language=en|archive-date=2021-08-02|archive-url=https://web.archive.org/web/20210802134941/https://www.cjr.org/analysis/linkrot-content-drift-new-york-times.php|url-status=live}}</ref>
A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institutions) could have dramatically different half-lives.<ref name=Koehler2004>{{cite journal | first = Wallace | last = Koehler | year = 2004 | title = A longitudinal study of web pages continued: a consideration of document persistence | url = http://www.informationr.net/ir/9-2/paper174.html | journal = Information Research | volume = 9 | issue = 2 | access-date = 2019-01-31 | archive-url = https://web.archive.org/web/20170911062629/http://www.informationr.net/ir/9-2/paper174.html | archive-date = 2017-09-11 | url-status = live}}</ref> The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,<ref>{{cite web | title = All-Time Weblock Report | date = August 2015 | url = https://weblock.io/report?id=all | access-date = 12 January 2016 | url-status = dead | archive-url = https://web.archive.org/web/20160304081204/https://weblock.io/report?id=all | archive-date = 4 March 2016}}</ref> generally confirming a 2005 study that found that half of the [[Uniform Resource Locator|URLs]] cited in ''[[D-Lib Magazine]]'' articles were active 10 years after publication.<ref name=McCown2005>{{cite conference | first1 = Frank | last1 = McCown | first2 = Sheffan | last2 = Chan | first3 = Michael L. | last3 = Nelson | first4 = Johan | last4 = Bollen | year = 2005 | title = The Availability and Persistence of Web References in D-Lib Magazine | url = http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf | url-status = dead | book-title = Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05) | access-date = 2005-10-12 | archive-url = https://web.archive.org/web/20120717000118/http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf | archive-date = 2012-07-17 }}</ref> Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.<ref name=Spinellis2003>{{cite journal | author-link = Diomidis Spinellis | first = Diomidis | last = Spinellis | year = 2003 | title = The Decay and Failures of Web References | url = http://www.spinellis.gr/pubs/jrnl/2003-CACM-URLcite/html/urlcite.html | journal = Communications of the ACM | volume = 46 | issue = 1 | pages = 71–77 | doi = 10.1145/602421.602422 | citeseerx = 10.1.1.12.9599 | s2cid = 17750450 | access-date = 2007-09-29 | archive-date = 2020-07-23 | archive-url = https://web.archive.org/web/20200723030709/https://www.spinellis.gr/pubs/jrnl/2003-CACM-URLcite/html/urlcite.html | url-status = live }}</ref><ref name=Lawrence2001>{{Cite Q | Q21012586 }}</ref> A 2013 study in ''[[BMC Bioinformatics]]'' analyzed nearly 15,000 links in abstracts from Thomson Reuters's [[Web of Science]] citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.<ref>{{Cite journal | last1 = Hennessey | first1 = Jason | last2 = Xijin Ge | first2 = Steven | title = A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques | journal = BMC Bioinformatics | volume = 14 | pages = S5 | date = 2013 | issue = Suppl 14 | doi = 10.1186/1471-2105-14-S14-S5 | pmid = 24266891 | pmc = 3851533 | doi-access = free }}</ref> A 2021 study of external links in ''[[New York Times]]'' articles published between 1996 and 2019 found a half-life of about 15 years (with significant variance among content topics) but noted that 13% of functional links no longer lead to the original content—a phenomenon called ''content drift''.<ref>{{Cite web|title=What the ephemerality of the Web means for your hyperlinks|url=https://www.cjr.org/analysis/linkrot-content-drift-new-york-times.php|access-date=2021-08-02|website=Columbia Journalism Review|language=en|archive-date=2021-08-02|archive-url=https://web.archive.org/web/20210802134941/https://www.cjr.org/analysis/linkrot-content-drift-new-york-times.php|url-status=live}}</ref>


A 2013 study found that 49% of links in U.S. Supreme court opinions are dead.<ref>{{Cite web |last=Garber |first=Megan |date=2013-09-23 |title=49% of the Links Cited in Supreme Court Decisions Are Broken |url=https://www.theatlantic.com/technology/archive/2013/09/49-of-the-links-cited-in-supreme-court-decisions-are-broken/279901/ |access-date=2024-01-10 |website=The Atlantic |language=en |archive-date=2024-01-10 |archive-url=https://web.archive.org/web/20240110023856/https://www.theatlantic.com/technology/archive/2013/09/49-of-the-links-cited-in-supreme-court-decisions-are-broken/279901/ |url-status=live }}</ref>
A 2013 study found that 49% of links in U.S. Supreme Court opinions are dead.<ref>{{Cite web |last=Garber |first=Megan |date=2013-09-23 |title=49% of the Links Cited in Supreme Court Decisions Are Broken |url=https://www.theatlantic.com/technology/archive/2013/09/49-of-the-links-cited-in-supreme-court-decisions-are-broken/279901/ |access-date=2024-01-10 |website=The Atlantic |language=en |archive-date=2024-01-10 |archive-url=https://web.archive.org/web/20240110023856/https://www.theatlantic.com/technology/archive/2013/09/49-of-the-links-cited-in-supreme-court-decisions-are-broken/279901/ |url-status=live }}</ref>


A 2023 study looking at United States [[COVID-19]] dashboards found that 23% of the state dashboards available in February 2021 were no longer available at the previous URLs in April 2023.<ref name="Adams1">{{cite journal |last1=Adams |first1=Aaron M. |last2=Chen |first2=Xiang |last3=Li |first3=Weidong |last4=Chuanrong |first4=Zhang |title=Normalizing the pandemic: exploring the cartographic issues in state government COVID-19 dashboards |journal=Journal of Maps |date=27 July 2023 |volume=19 |issue=5 |pages=1–9 |doi=10.1080/17445647.2023.2235385|doi-access=free |bibcode=2023JMaps..19Q...1A }}</ref>
A 2023 study looking at United States [[COVID-19]] dashboards found that 23% of the state dashboards available in February 2021 were no longer available at the previous URLs in April 2023.<ref name="Adams1">{{cite journal |last1=Adams |first1=Aaron M. |last2=Chen |first2=Xiang |last3=Li |first3=Weidong |last4=Chuanrong |first4=Zhang |title=Normalizing the pandemic: exploring the cartographic issues in state government COVID-19 dashboards |journal=Journal of Maps |date=27 July 2023 |volume=19 |issue=5 |pages=1–9 |doi=10.1080/17445647.2023.2235385|doi-access=free |bibcode=2023JMaps..19Q...1A }}</ref>
Line 25: Line 25:


== Causes ==
== Causes ==
Link rot can result for several reasons. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a new [[domain name]]. As far back as 1999, it was noted that with the amount of material that can be stored on a hard drive, "a single disk failure could be like the burning of the [[Library of Alexandria|library at Alexandria]]."<ref name="McGranaghan1999">{{cite journal |last1=McGranaghan |first1=Matthew |date=1999 |title=The Web, Cartography and Trust |url=https://cartographicperspectives.org/index.php/journal/article/view/cp32-mcgranaghan |journal=Cartographic Perspectives |volume= |issue=32 |pages=3–5 |doi=10.14714/CP32.624 |doi-access=free |url-access=subscription |archive-date=2023-12-07 |access-date=2023-12-07 |archive-url=https://web.archive.org/web/20231207182933/https://cartographicperspectives.org/index.php/journal/article/view/cp32-mcgranaghan |url-status=live }}</ref>  A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such as [[HTTP 404]]. Other causes will cause a link to target content other than what was intended by the link's author.
Link rot can result for several reasons. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a new [[domain name]]. As far back as 1999, it was noted that with the amount of material that can be stored on a hard drive, "a [[Hard disk drive failure|single disk failure]] could be like the burning of the [[Library of Alexandria|library at Alexandria]]."<ref name="McGranaghan1999">{{cite journal |last1=McGranaghan |first1=Matthew |date=1999 |title=The Web, Cartography and Trust |url=https://cartographicperspectives.org/index.php/journal/article/view/cp32-mcgranaghan |journal=Cartographic Perspectives |volume= |issue=32 |pages=3–5 |doi=10.14714/CP32.624 |doi-access=free |url-access=subscription |archive-date=2023-12-07 |access-date=2023-12-07 |archive-url=https://web.archive.org/web/20231207182933/https://cartographicperspectives.org/index.php/journal/article/view/cp32-mcgranaghan |url-status=live }}</ref>  A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such as [[HTTP 404]]. Other causes will cause a link to target content other than that which was intended by the link's author.


Other reasons for broken links include:
Other reasons for broken links include:
Line 44: Line 44:


Strategies pertaining to the authorship of links include:
Strategies pertaining to the authorship of links include:
* linking to primary rather than secondary sources and prioritizing stable sites<ref name="Koehler2004" />
* {{cns|linking to primary rather than secondary sources|date=August 2025}} and prioritizing stable sites<ref name="Koehler2004" />
* avoiding links that point to resources on researchers' personal pages<ref name=McCown2005/>
* avoiding links that point to resources on researchers' personal pages<ref name=McCown2005/>
* using [[clean URL]]s or otherwise employing [[URL normalization]] or [[URL canonicalization]]<ref name=Kille2014>{{cite web | last = Kille | first = Leighton Walter | title = The Growing Problem of Internet "Link Rot" and Best Practices for Media and Online Publishers | publisher = Journalist's Resource, Harvard Kennedy School | date = 8 November 2014 | url = http://journalistsresource.org/studies/society/internet/website-linking-best-practices-media-online-publishers | access-date = 16 January 2015 | url-status = live | archive-url = https://web.archive.org/web/20150112034707/http://journalistsresource.org/studies/society/internet/website-linking-best-practices-media-online-publishers | archive-date = 12 January 2015}}</ref>
* using [[clean URL]]s or otherwise employing [[URL normalization]] or [[URL canonicalization]]<ref name=Kille2014>{{cite web | last = Kille | first = Leighton Walter | title = The Growing Problem of Internet "Link Rot" and Best Practices for Media and Online Publishers | publisher = Journalist's Resource, Harvard Kennedy School | date = 8 November 2014 | url = http://journalistsresource.org/studies/society/internet/website-linking-best-practices-media-online-publishers | access-date = 16 January 2015 | url-status = live | archive-url = https://web.archive.org/web/20150112034707/http://journalistsresource.org/studies/society/internet/website-linking-best-practices-media-online-publishers | archive-date = 12 January 2015}}</ref>
Line 59: Line 59:


== See also ==
== See also ==
* [[Archive Team]], web archiving team
* {{anl|Archive Team}}
* [[Dead Internet theory]]
* [[Dead Internet theory]]
* [[Digital preservation]]
* {{anl|Digital preservation}}
* [[Infodemic]]
* {{anl|Infodemic}}
* [[Software rot]]
* {{anl|Lost media}}
* [[Lost media]]
* {{anl|Software rot}}


== References ==
== References ==

Latest revision as of 12:22, 8 December 2025

Template:Short description Script error: No such module "redirect hatnote". Script error: No such module "For".

Page Not Found
A broken link usually leads to an error message.

Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer resolves at the intended target may be called broken or dead.

The rate of link rot is a subject of study and research due to its significance to the internet's ability to preserve information. Estimates of that rate vary dramatically between studies. Information professionals have warned that link rot could make important archival data disappear, potentially impacting the legal system and scholarship.

Prevalence

A number of studies have examined the prevalence of link rot within the World Wide Web, in academic literature that uses URLs to cite web content, and within digital libraries.

In a 2023 study of the Million Dollar Homepage external links, it was found that 27% of the links resulted in a site loading with no redirects, 45% of links have been redirected, and 28% returned various error messages.[1]

A 2002 study suggested that link rot within digital libraries is considerably slower than on the web. The article found that about 3% of the objects were no longer accessible after one year,[2] equating to a half-life of nearly 23 years.

A 2003 study found that on the Web, about one link out of every 200 broke each week,[3] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[4]

A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institutions) could have dramatically different half-lives.[5] The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,[6] generally confirming a 2005 study that found that half of the URLs cited in D-Lib Magazine articles were active 10 years after publication.[7] Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.[8][9] A 2013 study in BMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters's Web of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[10] A 2021 study of external links in New York Times articles published between 1996 and 2019 found a half-life of about 15 years (with significant variance among content topics) but noted that 13% of functional links no longer lead to the original content—a phenomenon called content drift.[11]

A 2013 study found that 49% of links in U.S. Supreme Court opinions are dead.[12]

A 2023 study looking at United States COVID-19 dashboards found that 23% of the state dashboards available in February 2021 were no longer available at the previous URLs in April 2023.[13]

Pew Research found that, in 2023, 38% of pages from 2013 went missing. Also, in 2023, 54% of English Wikipedia articles had a dead link in the 'references' section and 23% of news articles linked to a dead URL.[14]

Causes

Link rot can result for several reasons. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a new domain name. As far back as 1999, it was noted that with the amount of material that can be stored on a hard drive, "a single disk failure could be like the burning of the library at Alexandria."[15] A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such as HTTP 404. Other causes will cause a link to target content other than that which was intended by the link's author.

Other reasons for broken links include:

  • the restructuring of websites that causes changes in URLs (e.g. domain.net/pine_tree might be moved to domain.net/tree/pine)
  • relocation of formerly free content to behind a paywall[13]
  • a change in server architecture that results in code such as PHP functioning differently
  • dynamic page content such as search results that changes by design
  • deletion of the target page and/or its content
  • the presence of user-specific information (such as a login name) within the link
  • deliberate blocking by content filters or firewalls
  • the expiration of a domain name registration

Prevention and detection

Script error: No such module "Unsubst". Strategies for preventing link rot can focus on placing content where its likelihood of persisting is higher, authoring links that are less likely to be broken, taking steps to preserve existing links, or repairing links whose targets have been relocated or removed.Script error: No such module "Unsubst".

The creation of URLs that will not change with time is the fundamental method of preventing link rot. Preventive planning has been championed by Tim Berners-Lee and other web pioneers.[16]

Strategies pertaining to the authorship of links include:

Strategies pertaining to the protection of existing links include:

  • using redirection mechanisms such as HTTP 301 to automatically refer browsers and crawlers to relocated content.Script error: No such module "Unsubst".
  • using content management systems which can automatically update links when content within the same site is relocated or automatically replace links with canonical URLs[24]
  • integrating search resources into HTTP 404 pages[25]

The detection of broken links may be done manually or automatically. Automated methods include plug-ins for content management systems as well as standalone broken-link checkers such as like Xenu's Link Sleuth. Automatic checking may not detect links that return a soft 404 or links that return a 200 OK response but point to content that has changed.[26]

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "Citation/CS1".
  3. Script error: No such module "citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. a b Script error: No such module "Citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. a b Script error: No such module "citation/CS1".
  8. Script error: No such module "Citation/CS1".
  9. Template:Cite Q
  10. Script error: No such module "Citation/CS1".
  11. Script error: No such module "citation/CS1".
  12. Script error: No such module "citation/CS1".
  13. a b Script error: No such module "Citation/CS1".
  14. Script error: No such module "citation/CS1".
  15. Script error: No such module "Citation/CS1".
  16. Script error: No such module "citation/CS1".
  17. a b Script error: No such module "citation/CS1".
  18. Sicilia, Miguel-Angel, et al. "Decentralized Persistent Identifiers: a basic model for immutable handlers Template:Webarchive." Procedia computer science 146 (2019): 123-130.
  19. Script error: No such module "citation/CS1".
  20. Script error: No such module "Citation/CS1".
  21. Script error: No such module "Citation/CS1".
  22. Script error: No such module "citation/CS1".
  23. Script error: No such module "citation/CS1".
  24. Script error: No such module "citation/CS1".
  25. Script error: No such module "citation/CS1".
  26. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

Further reading

  • Script error: No such module "Citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "Citation/CS1".

External links

Template:Prone to spam Template:Sister project Template:Sister project