Computational biology: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Ayan.siv
m Typo fix
 
imported>Citation bot
Add: bibcode, doi-access, issue, article-number, pages. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Losipov | Category:CS1 maint: unflagged free DOI | #UCB_Category 35/56
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{short description|Branch of biology}}
{{short description|Branch of biology}}
{{Distinguish|Bioinformatics|Biological computing}}
{{Distinguish|Bioinformatics|Biological computing}}
{{AI-generated|date=October 2025}}
[[File:Human Genome Project Timeline (26964377742).jpg|thumb|548x548px|This timeline displays the year-by-year progress of the [[Human Genome Project]] in the context of genetics since 1865. Starting in 1990, by 1999, [[Chromosome 22]] became the first human chromosome to be completely sequenced.]]
[[File:Human Genome Project Timeline (26964377742).jpg|thumb|548x548px|This timeline displays the year-by-year progress of the [[Human Genome Project]] in the context of genetics since 1865. Starting in 1990, by 1999, [[Chromosome 22]] became the first human chromosome to be completely sequenced.]]
'''Computational biology''' refers to the use of techniques in [[computer science]], [[data analysis]], [[mathematical modeling]] and [[Computer simulation|computational simulations]] to understand [[biological system]]s and relationships.<ref name="nih" /> An intersection of [[computer science]], [[biology]], and [[data science]], the field also has foundations in [[applied mathematics]], [[molecular biology]], [[cell biology]], [[chemistry]], and [[genetics]].<ref name="brown" />
'''Computational biology''' refers to the use of techniques in [[computer science]], [[data analysis]], [[mathematical modeling]] and [[Computer simulation|computational simulations]] to understand [[biological system]]s and relationships.<ref name="nih" /> An intersection of [[computer science]], [[biology]], and [[data science]], the field also has foundations in [[applied mathematics]], [[molecular biology]], [[cell biology]], [[chemistry]], and [[genetics]].<ref name="brown" />


== History ==
== History ==
[[Bioinformatics]], the analysis of informatics processes in [[biological system]]s, began in the early 1970s. At this time, research in [[artificial intelligence]] was using [[network model]]s of the human brain in order to generate new [[algorithms]]. This use of [[biological data]] pushed biological researchers to use computers to evaluate and compare large data sets in their own field.<ref name="Hogeweg 2011">{{cite journal |last=Hogeweg |first=Paulien |date=7 March 2011 |title=The Roots of Bioinformatics in Theoretical Biology |journal=PLOS Computational Biology |series=3 |volume=7 |issue=3 |pages=e1002021 |bibcode=2011PLSCB...7E2021H |doi=10.1371/journal.pcbi.1002021 |pmc=3068925 |pmid=21483479 |doi-access=free }}</ref>
[[Bioinformatics]], the analysis of informatics processes in [[biological system]]s, began in the early 1970s. At this time, research in [[artificial intelligence]] was using [[network model]]s of the human brain in order to generate new [[algorithms]]. This use of [[biological data]] pushed biological researchers to use computers to evaluate and compare large data sets in their own field.<ref name="Hogeweg 2011">{{cite journal |last=Hogeweg |first=Paulien |date=7 March 2011 |title=The Roots of Bioinformatics in Theoretical Biology |journal=PLOS Computational Biology |series=3 |volume=7 |issue=3 |article-number=e1002021 |bibcode=2011PLSCB...7E2021H |doi=10.1371/journal.pcbi.1002021 |pmc=3068925 |pmid=21483479 |doi-access=free }}</ref>


By 1982, researchers shared information via [[Punched card|punch cards]]. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.<ref name="Hogeweg 2011"/>
By 1982, researchers shared information via [[Punched card|punch cards]]. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.<ref name="Hogeweg 2011"/>


Perhaps the best-known example of computational biology, the [[Human Genome Project]], officially began in 1990.<ref name=":0">{{Cite web |date=22 December 2020 |title=The Human Genome Project |url=https://www.genome.gov/human-genome-project |access-date=13 April 2022 |website=The Human Genome Project}}</ref> By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals.<ref>{{Cite web |title=Human Genome Project FAQ |url=https://www.genome.gov/human-genome-project/Completion-FAQ |access-date=2022-04-20 |website=National Human Genome Research Institute |language=en |date=February 24, 2020 |url-status=deviated |archive-url= https://web.archive.org/web/20220423192726/https://www.genome.gov/human-genome-project/Completion-FAQ |archive-date= Apr 23, 2022 }}</ref> Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues.<ref>{{Cite web |title=T2T-CHM13v1.1 - Genome - Assembly |url=https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.3 |access-date=2022-04-20 |website=NCBI |url-status=live |archive-url=https://web.archive.org/web/20230629231010/http://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.3/ |archive-date= Jun 29, 2023 }}</ref><ref>{{Cite web |title=Genome List - Genome |url=https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/51/ |access-date=2022-04-20 |website=NCBI}}</ref> The missing Y [[chromosome]] was added in January 2022.
Perhaps the best-known example of computational biology, the [[Human Genome Project]], officially began in 1990.<ref name=":0">{{Cite web |date=22 December 2020 |title=The Human Genome Project |url=https://www.genome.gov/human-genome-project |access-date=13 April 2022 |website=The Human Genome Project}}</ref> By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals.<ref>{{Cite web |title=Human Genome Project FAQ |url=https://www.genome.gov/human-genome-project/Completion-FAQ |access-date=2022-04-20 |website=National Human Genome Research Institute |language=en |date=February 24, 2020 |url-status=deviated |archive-url= https://web.archive.org/web/20220423192726/https://www.genome.gov/human-genome-project/Completion-FAQ |archive-date= Apr 23, 2022 }}</ref> Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues.<ref>{{Cite web |title=T2T-CHM13v1.1 - Genome - Assembly |url=https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_009914755.3/ |access-date=2022-04-20 |website=NCBI |url-status=live |archive-url=https://web.archive.org/web/20230629231010/http://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.3/ |archive-date= Jun 29, 2023 }}</ref><ref>{{Cite web |title=Genome List - Genome |url=https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/51/ |archive-url=https://web.archive.org/web/20111128200211/http://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/51/ |archive-date=November 28, 2011 |access-date=2022-04-20 |website=NCBI}}</ref> The missing Y [[chromosome]] was added in January 2022.


Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields.<ref name=":1">{{cite journal|last=Bourne|first=Philip|title=Rise and Demise of Bioinformatics? Promise and Progress|doi=10.1371/journal.pcbi.1002487|volume=8|issue=4|journal=PLOS Computational Biology|pages=e1002487|pmid=22570600|pmc=3343106|year=2012|bibcode=2012PLSCB...8E2487O |doi-access=free }}</ref> Today, the [[International Society for Computational Biology]] recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field.<ref>{{Cite web |title=COSI Information |url=https://www.iscb.org/cms_addon/cosi_reporting_system/COSIs |access-date=2022-04-21 |website=www.iscb.org |archive-date=2022-04-21 |archive-url=https://web.archive.org/web/20220421205504/https://www.iscb.org/cms_addon/cosi_reporting_system/COSIs |url-status=dead }}</ref> In addition to helping sequence the human genome, computational biology has helped create accurate [[model]]s of the [[human brain]], [[Genome architecture mapping|map the 3D structure of genomes]], and model biological systems.<ref name="Hogeweg 2011" />
Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields.<ref name=":1">{{cite journal|last=Bourne|first=Philip|title=Rise and Demise of Bioinformatics? Promise and Progress|doi=10.1371/journal.pcbi.1002487|volume=8|issue=4|journal=PLOS Computational Biology|article-number=e1002487|pmid=22570600|pmc=3343106|year=2012|bibcode=2012PLSCB...8E2487O |doi-access=free }}</ref> Today, the [[International Society for Computational Biology]] recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field.<ref>{{Cite web |title=COSI Information |url=https://www.iscb.org/cms_addon/cosi_reporting_system/COSIs |access-date=2022-04-21 |website=www.iscb.org |archive-date=2022-04-21 |archive-url=https://web.archive.org/web/20220421205504/https://www.iscb.org/cms_addon/cosi_reporting_system/COSIs }}</ref> In addition to helping sequence the human genome, computational biology has helped create accurate [[model]]s of the [[human brain]], [[Genome architecture mapping|map the 3D structure of genomes]], and model biological systems.<ref name="Hogeweg 2011" /> Much of the original progress in computational biology emerged from the [[United States]] and [[Western Europe]], due to their large computational infrastructures.  Recent decades have seen growing contributions from less-wealthy nations, however.  For example, [[Colombia]] has had an international computational biology effort since 1998, focusing on genomics and disease in nationally-important crops like [[coffee]] and [[potatoes]].<ref name=":6">{{Cite journal | last1 = Restrepo | first1 = Silvia | author-link1= Silvia Restrepo|last2 = Pinzón | first2 = Andrés | last3 = Rodríguez-R | first3 = Luis Miguel | last4 = Sierra | first4 = Roberto | last5 = Grajales | first5 = Alejandro | last6 = Bernal | first6 = Adriana | last7 = Barreto | first7 = Emiliano | last8 = Moreno | first8 = Pedro | last9 = Zambrano | first9 = María Mercedes | last10 = Cristancho | first10 = Marco | last11 = González | first11 = Andrés | last12 = Castro | first12 = Harold | title = Computational Biology in Colombia | journal = PLOS Computational Biology | volume = 5 | issue = 10 | article-number = e1000535 | date = October 2009 | doi = 10.1371/journal.pcbi.1000535 | doi-access = free | pmid = 19876381 | pmc = 2762518 | bibcode = 2009PLSCB...5E0535R | url = https://www.researchgate.net/publication/282134718 | access-date = 6 October 2024 }}</ref> [[Poland]], similarly, has recently been a leader in biomolecular simulations and macromolecular sequence analysis.<ref name=":4">{{Cite journal |last1=Bujnicki |first1=Janusz M. |last2=Tiuryn |first2=Jerzy |date=2013-05-02 |title=Bioinformatics and Computational Biology in Poland |journal=PLOS Computational Biology |language=en |volume=9 |issue=5 |article-number=e1003048 |doi=10.1371/journal.pcbi.1003048 |doi-access=free |issn=1553-7358 |pmc=3642042 |pmid=23658507|bibcode=2013PLSCB...9E3048B }}</ref>
 
== Global contributions ==
 
=== Colombia ===
In 2000, despite a lack of initial expertise in programming and data management, Colombia began applying computational biology from an industrial perspective, focusing on plant diseases. This research has contributed to understanding how to counteract diseases in crops like potatoes and studying the genetic diversity of coffee plants.<ref name=":6">{{Cite journal | last1 = Restrepo | first1 = Silvia | author-link1= Silvia Restrepo|last2 = Pinzón | first2 = Andrés | last3 = Rodríguez-R | first3 = Luis Miguel | last4 = Sierra | first4 = Roberto | last5 = Grajales | first5 = Alejandro | last6 = Bernal | first6 = Adriana | last7 = Barreto | first7 = Emiliano | last8 = Moreno | first8 = Pedro | last9 = Zambrano | first9 = María Mercedes | last10 = Cristancho | first10 = Marco | last11 = González | first11 = Andrés | last12 = Castro | first12 = Harold | title = Computational Biology in Colombia | journal = PLOS Computational Biology | volume = 5 | issue = 10 | pages = e1000535 | date = October 2009 | doi = 10.1371/journal.pcbi.1000535 | doi-access = free | pmid = 19876381 | pmc = 2762518 | bibcode = 2009PLSCB...5E0535R | url = https://www.researchgate.net/publication/282134718 | access-date = 6 October 2024 }}</ref> By 2007, concerns about alternative energy sources and global climate change prompted biologists to collaborate with systems and computer engineers. Together, they developed a robust computational network and database to address these challenges. In 2009, in partnership with the University of Los Angeles, Colombia also created a [[Virtual Learning Environment (VLE)]] to improve the integration of computational biology and bioinformatics.<ref name=":6" />
 
=== Poland ===
In Poland, computational biology is closely linked to mathematics and computational science, serving as a foundation for bioinformatics and biological physics. The field is divided into two main areas: one focusing on physics and simulation and the other on biological sequences.<ref name=":4">{{Cite journal |last1=Bujnicki |first1=Janusz M. |last2=Tiuryn |first2=Jerzy |date=2013-05-02 |title=Bioinformatics and Computational Biology in Poland |journal=PLOS Computational Biology |language=en |volume=9 |issue=5 |pages=e1003048 |doi=10.1371/journal.pcbi.1003048 |doi-access=free |issn=1553-7358 |pmc=3642042 |pmid=23658507|bibcode=2013PLSCB...9E3048B }}</ref> The application of statistical models in Poland has advanced techniques for studying proteins and RNA, contributing to global scientific progress. Polish scientists have also been instrumental in evaluating protein prediction methods, significantly enhancing the field of computational biology. Over time, they have expanded their research to cover topics such as protein-coding analysis and hybrid structures, further solidifying Poland's influence on the development of bioinformatics worldwide.<ref name=":4" />


==Applications==
==Applications==
Line 31: Line 24:
=== Data and modeling ===
=== Data and modeling ===
{{main|Bioinformatics}}
{{main|Bioinformatics}}
Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in [[biological system]]s. This entails a more theoretical approach to problems, rather than its more empirically-minded counterpart of [[experimental biology]].<ref>{{Cite web |title=Mathematical Biology {{!}} Faculty of Science |url=https://www.ualberta.ca/science/mathematical-biology.html |access-date=2022-04-18 |website=www.ualberta.ca}}</ref> Mathematical biology draws on [[discrete mathematics]], [[topology]] (also useful for computational modeling), [[Bayesian statistics]], [[linear algebra]] and [[Boolean algebra]].<ref name="nlcb.wordpress.com" />
Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in [[biological system]]s. This entails a more theoretical approach to problems, rather than its more empirically minded counterpart of [[experimental biology]].<ref>{{Cite web |title=Mathematical Biology {{!}} Faculty of Science |url=https://www.ualberta.ca/science/mathematical-biology.html |access-date=2022-04-18 |website=www.ualberta.ca}}</ref> Mathematical biology draws on [[discrete mathematics]], [[topology]] (also useful for computational modeling), [[Bayesian statistics]], [[linear algebra]] and [[Boolean algebra]].<ref name="nlcb.wordpress.com" />


These mathematical approaches have enabled the creation of [[database]]s and other methods for storing, retrieving, and analyzing biological data, a field known as [[bioinformatics]]. Usually, this process involves [[genetics]] and analyzing [[gene]]s.
These mathematical approaches have enabled the creation of [[database]]s and other methods for storing, retrieving, and analyzing biological data, a field known as [[bioinformatics]]. Usually, this process involves [[genetics]] and analyzing [[gene]]s.


Gathering and analyzing large datasets have made room for growing research fields such as [[data mining]],<ref name="nlcb.wordpress.com">{{Cite web |date=2013-02-18 |title=The Sub-fields of Computational Biology |url=https://nlcb.wordpress.com/2013/02/17/the-sub-fields-of-computational-biology/ |access-date=2022-04-18 |website=Ninh Laboratory of Computational Biology |language=en}}{{self-published inline|date=August 2024}}</ref> and computational biomodeling, which refers to building [[computer model]]s and [[Augmented reality|visual simulations]] of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations".<ref name="Kitano 2002 206–10">{{cite journal |last=Kitano |first=Hiroaki |date=14 November 2002 |title=Computational systems biology |journal=Nature |volume=420 |issue=6912 |pages=206–10 |bibcode=2002Natur.420..206K |doi=10.1038/nature01254 |pmid=12432404 |id={{ProQuest|204483859}} |s2cid=4401115}}</ref> While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and gene [[therapy]].<ref name="Kitano 2002 206–10" /> A useful modeling approach is to use [[Petri nets]] via tools such as [[esyN]].<ref name="Bean 2014">{{cite journal |last=Favrin |first=Bean |date=2 September 2014 |title=esyN: Network Building, Sharing and Publishing. |journal=PLOS ONE |volume=9 |issue=9 |pages=e106035 |bibcode=2014PLoSO...9j6035B |doi=10.1371/journal.pone.0106035 |pmc=4152123 |pmid=25181461 |doi-access=free}}</ref>
Gathering and analyzing large datasets have made room for growing research fields such as [[data mining]],<ref name="nlcb.wordpress.com">{{Cite web |date=2013-02-18 |title=The Sub-fields of Computational Biology |url=https://nlcb.wordpress.com/2013/02/17/the-sub-fields-of-computational-biology/ |access-date=2022-04-18 |website=Ninh Laboratory of Computational Biology |language=en}}{{self-published inline|date=August 2024}}</ref> and computational biomodeling, which refers to building [[computer model]]s and [[Augmented reality|visual simulations]] of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations".<ref name="Kitano 2002 206–10">{{cite journal |last=Kitano |first=Hiroaki |date=14 November 2002 |title=Computational systems biology |journal=Nature |volume=420 |issue=6912 |pages=206–10 |bibcode=2002Natur.420..206K |doi=10.1038/nature01254 |pmid=12432404 |id={{ProQuest|204483859}} |s2cid=4401115}}</ref> While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and gene [[therapy]].<ref name="Kitano 2002 206–10" /> A useful modeling approach is to use [[Petri nets]] via tools such as [[esyN]].<ref name="Bean 2014">{{cite journal |last=Favrin |first=Bean |date=2 September 2014 |title=esyN: Network Building, Sharing and Publishing. |journal=PLOS ONE |volume=9 |issue=9 |article-number=e106035 |bibcode=2014PLoSO...9j6035B |doi=10.1371/journal.pone.0106035 |pmc=4152123 |pmid=25181461 |doi-access=free}}</ref>


Along similar lines, until recent decades [[theoretical ecology]] has largely dealt with [[Analytic function|analytic]] models that were detached from the [[statistical model]]s used by [[Empirical evidence|empirical]] ecologists. However, computational methods have aided in developing ecological theory via [[simulation]] of ecological systems, in addition to increasing application of methods from [[computational statistics]] in ecological analyses.
Along similar lines, until recent decades [[theoretical ecology]] has largely dealt with [[Analytic function|analytic]] models that were detached from the [[statistical model]]s used by [[Empirical evidence|empirical]] ecologists. However, computational methods have aided in developing ecological theory via [[simulation]] of ecological systems, in addition to increasing application of methods from [[computational statistics]] in ecological analyses.
Line 60: Line 53:
One of the main ways that genomes are compared is by [[sequence homology]]. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common [[ancestor]]. Research suggests that between 80 and 90% of genes in newly sequenced [[Prokaryote|prokaryotic]] genomes can be identified this way.<ref name="Koonin 2001 155–158"/>
One of the main ways that genomes are compared is by [[sequence homology]]. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common [[ancestor]]. Research suggests that between 80 and 90% of genes in newly sequenced [[Prokaryote|prokaryotic]] genomes can be identified this way.<ref name="Koonin 2001 155–158"/>


[[Sequence alignment]] is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the [[Longest common subsequence problem|longest common subsequence]] of two genes or comparing variants of certain [[disease]]s.{{fact|date=August 2024}}
[[Sequence alignment]] is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the [[Longest common subsequence problem|longest common subsequence]] of two genes or comparing variants of certain [[disease]]s.{{citation needed|date=August 2024}}


An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome.<ref name="Koonin 2001 155–158" /> Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as [[ENCODE]] and the [[Epigenome#Roadmap epigenomics project|Roadmap Epigenomics Project]].
An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome.<ref name="Koonin 2001 155–158" /> Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as [[ENCODE]] and the [[Epigenome#Roadmap epigenomics project|Roadmap Epigenomics Project]].
Line 69: Line 62:


=== Biomarker Discovery ===
=== Biomarker Discovery ===
Computational biology also plays a pivotal role in identifying [[Biomarker|biomarkers]] for diseases such as cardiovascular conditions. By integrating various '[[Omics|Omic]]' data - such as [[genomics]], [[proteomics]], and [[metabolomics]] - researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing between [[coronary artery disease]] and [[myocardial infarction]], thereby enhancing diagnostic precision.<ref>{{Cite journal |last1=Batta |first1=Irene |last2=Patial |first2=Ritika |last3=C Sobti |first3=Ranbir |last4=K Agrawal |first4=Devendra |date=2024 |title=Computational Biology in the Discovery of Biomarkers in the Diagnosis, Treatment and Management of Cardiovascular Diseases |url=https://doi.org/10.26502/fccm.92920400 |journal=Cardiology and Cardiovascular Medicine |volume=8 |issue=5 |doi=10.26502/fccm.92920400 |pmid=39328401 |issn=2572-9292|url-access=subscription }}</ref>  
Computational biology also plays a pivotal role in identifying [[biomarker]]s for diseases such as cardiovascular conditions. By integrating various '[[Omics|Omic]]' data - such as [[genomics]], [[proteomics]], and [[metabolomics]] - researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing between [[coronary artery disease]] and [[myocardial infarction]], thereby enhancing diagnostic precision.<ref>{{Cite journal |last1=Batta |first1=Irene |last2=Patial |first2=Ritika |last3=C Sobti |first3=Ranbir |last4=K Agrawal |first4=Devendra |date=2024 |title=Computational Biology in the Discovery of Biomarkers in the Diagnosis, Treatment and Management of Cardiovascular Diseases |journal=Cardiology and Cardiovascular Medicine |volume=8 |issue=5 |pages=405–414 |doi=10.26502/fccm.92920400 |pmid=39328401 |issn=2572-9292|pmc=11426419 }}</ref>


===Neuroscience===
===Neuroscience===
Line 89: Line 82:
=== Oncology ===
=== Oncology ===
{{Main|Oncology}}
{{Main|Oncology}}
Computational biology plays a crucial role in discovering signs of new, previously unknown living creatures and in [[cancer]] research. This field involves large-scale measurements of cellular processes, including [[RNA]], [[DNA]], and proteins, which pose significant computational challenges. To overcome these, biologists rely on computational tools to accurately measure and analyze biological data.<ref name=":5">{{cite journal |last=Yakhini |first=Zohar |year=2011 |title=Cancer Computational Biology |journal=BMC Bioinformatics |volume=12 |pages=120 |doi=10.1186/1471-2105-12-120 |pmc=3111371 |pmid=21521513 |doi-access=free}}</ref> In cancer research, computational biology aids in the complex analysis of [[Neoplasm|tumor]] samples, helping researchers develop new ways to characterize tumors and understand various cellular properties. The use of high-throughput measurements, involving millions of data points from DNA, RNA, and other biological structures, helps in diagnosing cancer at early stages and in understanding the key factors that contribute to cancer development. Areas of focus include analyzing molecules that are deterministic in causing cancer and understanding how the human genome relates to tumor causation.<ref name=":5" /><ref>{{cite journal |last1=Barbolosi |first1=Dominique |last2=Ciccolini |first2=Joseph |last3=Lacarelle |first3=Bruno |last4=Barlesi |first4=Fabrice |last5=Andre |first5=Nicolas |year=2016 |title=Computational oncology--mathematical modelling of drug regimens for precision medicine |journal=Nature Reviews Clinical Oncology |volume=13 |issue=4 |pages=242–254 |doi=10.1038/nrclinonc.2015.204 |pmid=26598946 |s2cid=22492353}}</ref>
Computational biology plays a crucial role in discovering signs of new, previously unknown living creatures and in [[cancer]] research. This field involves large-scale measurements of cellular processes, including [[RNA]], [[DNA]], and proteins, which pose significant computational challenges. To overcome these, biologists rely on computational tools to accurately measure and analyze biological data.<ref name=":5">{{cite journal |last=Yakhini |first=Zohar |year=2011 |title=Cancer Computational Biology |journal=BMC Bioinformatics |volume=12 |article-number=120 |doi=10.1186/1471-2105-12-120 |pmc=3111371 |pmid=21521513 |doi-access=free}}</ref> In cancer research, computational biology aids in the complex analysis of [[Neoplasm|tumor]] samples, helping researchers develop new ways to characterize tumors and understand various cellular properties. The use of high-throughput measurements, involving millions of data points from DNA, RNA, and other biological structures, helps in diagnosing cancer at early stages and in understanding the key factors that contribute to cancer development. Areas of focus include analyzing molecules that are deterministic in causing cancer and understanding how the human genome relates to tumor causation.<ref name=":5" /><ref>{{cite journal |last1=Barbolosi |first1=Dominique |last2=Ciccolini |first2=Joseph |last3=Lacarelle |first3=Bruno |last4=Barlesi |first4=Fabrice |last5=Andre |first5=Nicolas |year=2016 |title=Computational oncology--mathematical modelling of drug regimens for precision medicine |journal=Nature Reviews Clinical Oncology |volume=13 |issue=4 |pages=242–254 |doi=10.1038/nrclinonc.2015.204 |pmid=26598946 |s2cid=22492353}}</ref>


=== Toxicology ===
=== Toxicology ===
Line 95: Line 88:
Computational toxicology is a multidisciplinary area of study, which is employed in the early stages of drug discovery and development to predict the safety and potential toxicity of drug candidates.
Computational toxicology is a multidisciplinary area of study, which is employed in the early stages of drug discovery and development to predict the safety and potential toxicity of drug candidates.


=== Drug Discovery ===
=== Drug discovery ===
Computational biology has become instrumental in revolutionizing [[drug discovery]] processes. By integrating computational systems biology approaches, researchers can model complex biological systems, facilitating the identification of novel drug targets and the prediction of drug responses. These methodologies enable the simulation of [[intracellular]] and [[intercellular signaling]] events using data from genomic, proteomic, or metabolomic experiments, thereby streamlining the drug development pipeline and reducing associated costs.<ref>{{Cite journal |last1=Materi |first1=Wayne |last2=Wishart |first2=David S. |date=April 2007 |title=Computational systems biology in drug discovery and development: methods and applications |url=https://linkinghub.elsevier.com/retrieve/pii/S1359644607000943 |journal=Drug Discovery Today |language=en |volume=12 |issue=7–8 |pages=295–303 |doi=10.1016/j.drudis.2007.02.013|pmid=17395089 |url-access=subscription }}</ref>
A growing application of computational biology is [[drug discovery]]. For example, simulations of [[intracellular]] and [[intercellular signaling]] events, using data from proteomic or metabolomic experiments, may reduce dependence on experimentation in elucidating [[pharmacokinetics]] and [[pharmacodynamics]] of drug candidates in living organisms.<ref>{{Cite journal |last1=Materi |first1=Wayne |last2=Wishart |first2=David S. |date=April 2007 |title=Computational systems biology in drug discovery and development: methods and applications |url=https://linkinghub.elsevier.com/retrieve/pii/S1359644607000943 |journal=Drug Discovery Today |language=en |volume=12 |issue=7–8 |pages=295–303 |doi=10.1016/j.drudis.2007.02.013|pmid=17395089 |url-access=subscription }}</ref>


Moreover, the convergence of computational biology with artificial intelligence (AI) has further accelerated drug design. AI-driven models can analyze vast datasets to predict molecular behavior, optimize lead compounds, and anticipate potential side effects, thereby enhancing the efficiency and effectiveness of drug discovery.<ref>{{Cite journal |last1=Zhang |first1=Yue |last2=Luo |first2=Mengqi |last3=Wu |first3=Peng |last4=Wu |first4=Song |last5=Lee |first5=Tzong-Yi |last6=Bai |first6=Chen |date=November 2022 |title=Application of Computational Biology and Artificial Intelligence in Drug Design |journal=International Journal of Molecular Sciences |language=en |volume=23 |issue=21 |pages=13568 |doi=10.3390/ijms232113568 |doi-access=free |issn=1422-0067 |pmc=9658956 |pmid=36362355}}</ref>
Increasingly, artificial intelligence plays a central role in the drug discovery process. Using chemical structures of known pharmaceutical agents as inputs, AI models can suggest structures of lead compounds or predict novel modes of drug-protein binding.  AI is also used for [[virtual screening]] of candidate molecules, avoiding the need to synthesize large numbers of molecules for screening.<ref>{{Cite journal |last1=Zhang |first1=Yue |last2=Luo |first2=Mengqi |last3=Wu |first3=Peng |last4=Wu |first4=Song |last5=Lee |first5=Tzong-Yi |last6=Bai |first6=Chen |date=November 2022 |title=Application of Computational Biology and Artificial Intelligence in Drug Design |journal=International Journal of Molecular Sciences |language=en |volume=23 |issue=21 |article-number=13568 |doi=10.3390/ijms232113568 |doi-access=free |issn=1422-0067 |pmc=9658956 |pmid=36362355}}</ref><ref>{{Cite journal |last1=Qureshi |first1=Rizwan |last2=Irfan |first2=Muhammad |last3=Gondal |first3=Taimoor Muzaffar |last4=Khan |first4=Sheheryar |last5=Wu |first5=Jia |last6=Hadi |first6=Muhammad Usman |last7=Heymach |first7=John |last8=Le |first8=Xiuning |last9=Yan |first9=Hong |last10=Alam |first10=Tanvir |date=2023 |title=AI in drug discovery and its clinical relevance |journal=Heliyon |language=en |volume=9 |issue=7 |article-number=e17575 |doi=10.1016/j.heliyon.2023.e17575 |doi-access=free |pmc=10302550  |pmid=37396052 |bibcode=2023Heliy...917575Q }}</ref>


== Techniques ==
== Techniques ==
Line 121: Line 114:
Graph analytics, or [[Network theory|network analysis]], is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such as [[Protein–protein interaction|protein-protein interaction]] networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking at [[centrality]] in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding which nodes are most important. For example, given data on the activity of genes over a time period, degree centrality can be used to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This contributes to the understanding of the roles certain genes play in the network.
Graph analytics, or [[Network theory|network analysis]], is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such as [[Protein–protein interaction|protein-protein interaction]] networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking at [[centrality]] in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding which nodes are most important. For example, given data on the activity of genes over a time period, degree centrality can be used to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This contributes to the understanding of the roles certain genes play in the network.


There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.<ref name=":3">{{Cite journal |last1=Koschützki |first1=Dirk |last2=Schreiber |first2=Falk |date=2008-05-15 |title=Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks |journal=Gene Regulation and Systems Biology |volume=2 |pages=193–201 |doi=10.4137/grsb.s702 |issn=1177-6250 |pmc=2733090 |pmid=19787083}}</ref>
There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.<ref name=":3">{{Cite journal |last1=Koschützki |first1=Dirk |last2=Schreiber |first2=Falk |date=2008-05-15 |title=Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks |journal=Gene Regulation and Systems Biology |volume=2 |pages=193–201 |article-number=GRSB.S702 |doi=10.4137/grsb.s702 |issn=1177-6250 |pmc=2733090 |pmid=19787083}}</ref>


===Supervised Learning===
===Supervised Learning===
Line 127: Line 120:


[[File:Random forest explain.png|thumb|350x350px|Diagram showing a simple random forest]]
[[File:Random forest explain.png|thumb|350x350px|Diagram showing a simple random forest]]
A common supervised learning algorithm is the [[random forest]], which uses numerous [[Decision tree learning|decision trees]] to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a [[Classification chart|classification tree]], but if the target variable is continuous then it is called a [[regression tree]]. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.
A common supervised learning algorithm is the [[random forest]], which uses numerous [[Decision tree learning|decision trees]] to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a [[Classification chart|classification tree]], but if the target variable is continuous then it is called a [[regression tree]]. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.{{citation needed|date=October 2025}}


===Open source software===
===Open source software===
[[Open source software]] provides a platform for computational biology where everyone can access and benefit from software developed in research.<ref>{{Cite journal |last1=Boudreau |first1=Mathieu |last2=Poline |first2=Jean-Baptiste |last3=Bellec |first3=Pierre |last4=Stikov |first4=Nikola |date=2021-02-11 |title=On the open-source landscape of PLOS Computational Biology |journal=PLOS Computational Biology |language=en |volume=17 |issue=2 |pages=e1008725 |doi=10.1371/journal.pcbi.1008725 |doi-access=free |issn=1553-7358 |pmc=7877734 |pmid=33571204|bibcode=2021PLSCB..17E8725B }}</ref> [[PLOS]] cites four main reasons for the use of open source software:
[[Open source software]] provides a platform for computational biology where everyone can access and benefit from software developed in research.<ref>{{Cite journal |last1=Boudreau |first1=Mathieu |last2=Poline |first2=Jean-Baptiste |last3=Bellec |first3=Pierre |last4=Stikov |first4=Nikola |date=2021-02-11 |title=On the open-source landscape of PLOS Computational Biology |journal=PLOS Computational Biology |language=en |volume=17 |issue=2 |article-number=e1008725 |doi=10.1371/journal.pcbi.1008725 |doi-access=free |issn=1553-7358 |pmc=7877734 |pmid=33571204|bibcode=2021PLSCB..17E8725B }}</ref> [[PLOS]] cites four main reasons for the use of open source software:
* [[Reproducibility]]: This allows for researchers to use the exact methods used to calculate the relations between biological data.
* [[Reproducibility]]: This allows for researchers to use the exact methods used to calculate the relations between biological data.
* Faster development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
* Faster development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
* Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
* Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
* Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple [[web page]]s and ensure that they are available in the future.<ref>{{cite journal|journal=PLOS Computational Biology| doi=10.1371/journal.pcbi.1002799 | volume=8| issue=11 |title=The PLOS Computational Biology Software Section|pages=e1002799|year=2012|last1=Prlić|first1=Andreas| last2=Lapp | first2=Hilmar |pmc=3510099| bibcode=2012PLSCB...8E2799P | doi-access=free }}</ref>
* Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple [[web page]]s and ensure that they are available in the future.<ref>{{cite journal|journal=PLOS Computational Biology| doi=10.1371/journal.pcbi.1002799 | volume=8| issue=11 |title=The PLOS Computational Biology Software Section|article-number=e1002799|year=2012|last1=Prlić|first1=Andreas| last2=Lapp | first2=Hilmar |pmc=3510099| bibcode=2012PLSCB...8E2799P | doi-access=free }}</ref>


==Research==
==Research==
Line 153: Line 146:
While each field is distinct, there may be significant overlap at their interface,<ref name="nih"/> so much so that to many, bioinformatics and computational biology are terms that are used interchangeably.
While each field is distinct, there may be significant overlap at their interface,<ref name="nih"/> so much so that to many, bioinformatics and computational biology are terms that are used interchangeably.


The terms computational biology and [[evolutionary computation]] have a similar name, but are not to be confused. Unlike computational biology, evolutionary computation is not concerned with modeling and analyzing biological data. It instead creates algorithms based on the ideas of evolution across species. Sometimes referred to as [[genetic algorithm]]s, the research of this field can be applied to computational biology. While evolutionary computation is not inherently a part of computational biology, computational evolutionary biology is a subfield of it.<ref>{{cite journal |last=Foster |first=James |date=June 2001 |title=Evolutionary Computation |journal=Nature Reviews Genetics |volume=2 |issue=6 |pages=428–436 |doi=10.1038/35076523 |pmid=11389459 |s2cid=205017006}}</ref>
The terms computational biology and [[evolutionary computation]] appear similar but are not identical. Evolutionary computation is a field of computer science comprising algorithms inspired by evolution in biology. Algorithms from within the field of evolutionary computation can be applied to computational biology.<ref>{{cite journal |last=Foster |first=James |date=June 2001 |title=Evolutionary Computation |journal=Nature Reviews Genetics |volume=2 |issue=6 |pages=428–436 |doi=10.1038/35076523 |pmid=11389459 |s2cid=205017006}}</ref>


==See also==
==See also==
Line 171: Line 164:
* [[International Society for Computational Biology]]
* [[International Society for Computational Biology]]
* [[List of bioinformatics institutions]]
* [[List of bioinformatics institutions]]
* [[List of bioinformatics software]]
* [[List of biological databases]]
* [[List of biological databases]]
* [[Mathematical biology]]
* [[Mathematical biology]]
Line 193: Line 187:
| access-date          = 18 August 2012
| access-date          = 18 August 2012
| publisher          = Biomedical Information Science and Technology Initiative
| publisher          = Biomedical Information Science and Technology Initiative
| url-status          = dead
| archive-url          = https://web.archive.org/web/20120905155331/http://www.bisti.nih.gov/docs/CompuBioDef.pdf
| archive-url          = https://web.archive.org/web/20120905155331/http://www.bisti.nih.gov/docs/CompuBioDef.pdf
| archive-date          = 5 September 2012
| archive-date          = 5 September 2012
Line 201: Line 194:
<ref name="brown">
<ref name="brown">
{{cite web
{{cite web
| url          = http://www.brown.edu/research/projects/computational-molecular-biology/
| url          = https://www.brown.edu/research/projects/computational-molecular-biology/
| title        = About the CCMB
| title        = About the CCMB
| access-date  = 18 August 2012
| access-date  = 18 August 2012

Latest revision as of 20:59, 26 October 2025

Template:Short description Script error: No such module "Distinguish". Template:AI-generated

File:Human Genome Project Timeline (26964377742).jpg
This timeline displays the year-by-year progress of the Human Genome Project in the context of genetics since 1865. Starting in 1990, by 1999, Chromosome 22 became the first human chromosome to be completely sequenced.

Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and computational simulations to understand biological systems and relationships.[1] An intersection of computer science, biology, and data science, the field also has foundations in applied mathematics, molecular biology, cell biology, chemistry, and genetics.[2]

History

Bioinformatics, the analysis of informatics processes in biological systems, began in the early 1970s. At this time, research in artificial intelligence was using network models of the human brain in order to generate new algorithms. This use of biological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field.[3]

By 1982, researchers shared information via punch cards. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.[3]

Perhaps the best-known example of computational biology, the Human Genome Project, officially began in 1990.[4] By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals.[5] Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues.[6][7] The missing Y chromosome was added in January 2022.

Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields.[8] Today, the International Society for Computational Biology recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field.[9] In addition to helping sequence the human genome, computational biology has helped create accurate models of the human brain, map the 3D structure of genomes, and model biological systems.[3] Much of the original progress in computational biology emerged from the United States and Western Europe, due to their large computational infrastructures. Recent decades have seen growing contributions from less-wealthy nations, however. For example, Colombia has had an international computational biology effort since 1998, focusing on genomics and disease in nationally-important crops like coffee and potatoes.[10] Poland, similarly, has recently been a leader in biomolecular simulations and macromolecular sequence analysis.[11]

Applications

Anatomy

Script error: No such module "Labelled list hatnote". Computational anatomy is the study of anatomical shape and form at the visible or gross anatomical 50100μ scale of morphology. It involves the development of computational mathematical and data-analytical methods for modeling and simulating biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as magnetic resonance imaging, computational anatomy has emerged as a subfield of medical imaging and bioengineering for extracting anatomical coordinate systems at the morpheme scale in 3D.

The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations.[12] The diffeomorphism group is used to study different coordinate systems via coordinate transformations as generated via the Lagrangian and Eulerian velocities of flow from one anatomical configuration in 3 to another. It relates with shape statistics and morphometrics, with the distinction that diffeomorphisms are used to map coordinate systems, whose study is known as diffeomorphometry.

Data and modeling

Script error: No such module "Labelled list hatnote". Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in biological systems. This entails a more theoretical approach to problems, rather than its more empirically minded counterpart of experimental biology.[13] Mathematical biology draws on discrete mathematics, topology (also useful for computational modeling), Bayesian statistics, linear algebra and Boolean algebra.[14]

These mathematical approaches have enabled the creation of databases and other methods for storing, retrieving, and analyzing biological data, a field known as bioinformatics. Usually, this process involves genetics and analyzing genes.

Gathering and analyzing large datasets have made room for growing research fields such as data mining,[14] and computational biomodeling, which refers to building computer models and visual simulations of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations".[15] While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and gene therapy.[15] A useful modeling approach is to use Petri nets via tools such as esyN.[16]

Along similar lines, until recent decades theoretical ecology has largely dealt with analytic models that were detached from the statistical models used by empirical ecologists. However, computational methods have aided in developing ecological theory via simulation of ecological systems, in addition to increasing application of methods from computational statistics in ecological analyses.

Systems biology

Script error: No such module "Labelled list hatnote".

Systems biology consists of computing the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. This process usually involves networking cell signaling and metabolic pathways. Systems biology often uses computational techniques from biological modeling and graph theory to study these complex interactions at cellular levels.[14]

Evolutionary biology

Script error: No such module "Labelled list hatnote".

Computational biology has assisted evolutionary biology by:

Genomics

Script error: No such module "Labelled list hatnote". Script error: No such module "Labelled list hatnote".

File:Genome viewer screenshot small.png
A partially sequenced genome

Computational genomics is the study of the genomes of cells and organisms. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient.[18] This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life.[19]

One of the main ways that genomes are compared is by sequence homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common ancestor. Research suggests that between 80 and 90% of genes in newly sequenced prokaryotic genomes can be identified this way.[19]

Sequence alignment is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the longest common subsequence of two genes or comparing variants of certain diseases.Script error: No such module "Unsubst".

An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome.[19] Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as ENCODE and the Roadmap Epigenomics Project.

Understanding how individual genes contribute to the biology of an organism at the molecular, cellular, and organism levels is known as gene ontology. The Gene Ontology Consortium's mission is to develop an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular, and organism-level systems. The Gene Ontology resource provides a computational representation of current scientific knowledge about the functions of genes (or, more properly, the protein and non-coding RNA molecules produced by genes) from many different organisms, from humans to bacteria.[20]

3D genomics is a subsection in computational biology that focuses on the organization and interaction of genes within a eukaryotic cell. One method used to gather 3D genomic data is through Genome Architecture Mapping (GAM). GAM measures 3D distances of chromatin and DNA in the genome by combining cryosectioning, the process of cutting a strip from the nucleus to examine the DNA, with laser microdissection. A nuclear profile is simply this strip or slice that is taken from the nucleus. Each nuclear profile contains genomic windows, which are certain sequences of nucleotides - the base unit of DNA. GAM captures a genome network of complex, multi enhancer chromatin contacts throughout a cell.[21]

Biomarker Discovery

Computational biology also plays a pivotal role in identifying biomarkers for diseases such as cardiovascular conditions. By integrating various 'Omic' data - such as genomics, proteomics, and metabolomics - researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing between coronary artery disease and myocardial infarction, thereby enhancing diagnostic precision.[22]

Neuroscience

Script error: No such module "Labelled list hatnote". Computational neuroscience is the study of brain function in terms of the information processing properties of the nervous system. A subset of neuroscience, it looks to model the brain to examine specific aspects of the neurological system.[23] Models of the brain include:

  • Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for error. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.[24]
  • Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model.[24]

It is the work of computational neuroscientists to improve the algorithms and data structures currently used to increase the speed of such calculations.

Computational neuropsychiatry is an emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in mental disorders. Several initiatives have demonstrated that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.[25][26][27]

Pharmacology

Script error: No such module "Labelled list hatnote".

Computational pharmacology is "the study of the effects of genomic data to find links between specific genotypes and diseases and then screening drug data".[28] The pharmaceutical industry requires a shift in methods to analyze drug data. Pharmacologists were able to use Microsoft Excel to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on a spreadsheet. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massive data sets. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed.[29]

Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs.[29]

Oncology

Script error: No such module "Labelled list hatnote". Computational biology plays a crucial role in discovering signs of new, previously unknown living creatures and in cancer research. This field involves large-scale measurements of cellular processes, including RNA, DNA, and proteins, which pose significant computational challenges. To overcome these, biologists rely on computational tools to accurately measure and analyze biological data.[30] In cancer research, computational biology aids in the complex analysis of tumor samples, helping researchers develop new ways to characterize tumors and understand various cellular properties. The use of high-throughput measurements, involving millions of data points from DNA, RNA, and other biological structures, helps in diagnosing cancer at early stages and in understanding the key factors that contribute to cancer development. Areas of focus include analyzing molecules that are deterministic in causing cancer and understanding how the human genome relates to tumor causation.[30][31]

Toxicology

Script error: No such module "Labelled list hatnote". Computational toxicology is a multidisciplinary area of study, which is employed in the early stages of drug discovery and development to predict the safety and potential toxicity of drug candidates.

Drug discovery

A growing application of computational biology is drug discovery. For example, simulations of intracellular and intercellular signaling events, using data from proteomic or metabolomic experiments, may reduce dependence on experimentation in elucidating pharmacokinetics and pharmacodynamics of drug candidates in living organisms.[32]

Increasingly, artificial intelligence plays a central role in the drug discovery process. Using chemical structures of known pharmaceutical agents as inputs, AI models can suggest structures of lead compounds or predict novel modes of drug-protein binding. AI is also used for virtual screening of candidate molecules, avoiding the need to synthesize large numbers of molecules for screening.[33][34]

Techniques

Computational biologists use a wide range of software and algorithms to carry out their research.

Unsupervised Learning

Unsupervised learning is a type of algorithm that finds patterns in unlabeled data. One example is k-means clustering, which aims to partition n data points into k clusters, in which each data point belongs to the cluster with the nearest mean. Another version is the k-medoids algorithm, which, when selecting a cluster center or cluster centroid, will pick one of its data points in the set, and not just an average of the cluster.

File:Jmatrix.png
A heat-map of the Jaccard distances of nuclear profiles

The algorithm follows these steps:

  1. Randomly select k distinct data points. These are the initial clusters.
  2. Measure the distance between each point and each of the 'k' clusters. (This is the distance of the points from each point k).
  3. Assign each point to the nearest cluster.
  4. Find the center of each cluster (medoid).
  5. Repeat until the clusters no longer change.
  6. Assess the quality of the clustering by adding up the variation within each cluster.
  7. Repeat the processes with different values of k.
  8. Pick the best value for 'k' by finding the "elbow" in the plot of which k value has the lowest variance.

One example of this in biology is used in the 3D mapping of a genome. Information of a mouse's HIST1 region of chromosome 13 is gathered from Gene Expression Omnibus.[35] This information contains data on which nuclear profiles show up in certain genomic regions. With this information, the Jaccard distance can be used to find a normalized distance between all the loci.

Graph Analytics

Graph analytics, or network analysis, is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such as protein-protein interaction networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking at centrality in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding which nodes are most important. For example, given data on the activity of genes over a time period, degree centrality can be used to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This contributes to the understanding of the roles certain genes play in the network.

There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.[36]

Supervised Learning

Supervised learning is a type of algorithm that learns from labeled data and learns how to assign labels to future data that is unlabeled. In biology supervised learning can be helpful when we have data that we know how to categorize and we would like to categorize more data into those categories.

File:Random forest explain.png
Diagram showing a simple random forest

A common supervised learning algorithm is the random forest, which uses numerous decision trees to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a classification tree, but if the target variable is continuous then it is called a regression tree. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.Script error: No such module "Unsubst".

Open source software

Open source software provides a platform for computational biology where everyone can access and benefit from software developed in research.[37] PLOS cites four main reasons for the use of open source software:

  • Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data.
  • Faster development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
  • Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
  • Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple web pages and ensure that they are available in the future.[38]

Research

There are several large conferences that are concerned with computational biology. Some notable examples are Intelligent Systems for Molecular Biology, European Conference on Computational Biology and Research in Computational Molecular Biology.

There are also numerous journals dedicated to computational biology. Some notable examples include Journal of Computational Biology and PLOS Computational Biology, a peer-reviewed open access journal that has many notable research projects in the field of computational biology. They provide reviews on software, tutorials for open source software, and display information on upcoming computational biology conferences.Script error: No such module "Unsubst". Other journals relevant to this field include Bioinformatics, Computers in Biology and Medicine, BMC Bioinformatics, Nature Methods, Nature Communications, Scientific Reports, PLOS One, etc.

Related fields

Computational biology, bioinformatics and mathematical biology are all interdisciplinary approaches to the life sciences that draw from quantitative disciplines such as mathematics and information science. The NIH describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data.[1]

Specifically, the NIH defines

<templatestyles src="Template:Blockquote/styles.css" />

Computational biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.[1]

Script error: No such module "Check for unknown parameters". <templatestyles src="Template:Blockquote/styles.css" />

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.[1]

Script error: No such module "Check for unknown parameters".

While each field is distinct, there may be significant overlap at their interface,[1] so much so that to many, bioinformatics and computational biology are terms that are used interchangeably.

The terms computational biology and evolutionary computation appear similar but are not identical. Evolutionary computation is a field of computer science comprising algorithms inspired by evolution in biology. Algorithms from within the field of evolutionary computation can be applied to computational biology.[39]

See also

Template:Columns-list

References

Template:Reflist

External links

Script error: No such module "Navbox". Template:Biology nav Template:Computational science Template:Bioinformatics

  1. a b c d e Cite error: Invalid <ref> tag; no text was provided for refs named nih
  2. Cite error: Invalid <ref> tag; no text was provided for refs named brown
  3. a b c Script error: No such module "Citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. Script error: No such module "citation/CS1".
  8. Script error: No such module "Citation/CS1".
  9. Script error: No such module "citation/CS1".
  10. Script error: No such module "Citation/CS1".
  11. Script error: No such module "Citation/CS1".
  12. Script error: No such module "Citation/CS1".
  13. Script error: No such module "citation/CS1".
  14. a b c Script error: No such module "citation/CS1".Template:Self-published inline
  15. a b Script error: No such module "Citation/CS1".
  16. Script error: No such module "Citation/CS1".
  17. Script error: No such module "Citation/CS1".
  18. Template:Cite magazine
  19. a b c Script error: No such module "Citation/CS1".
  20. Script error: No such module "citation/CS1".
  21. Script error: No such module "Citation/CS1".
  22. Script error: No such module "Citation/CS1".
  23. Script error: No such module "citation/CS1".
  24. a b Script error: No such module "Citation/CS1".
  25. Script error: No such module "Citation/CS1".
  26. Script error: No such module "Citation/CS1".
  27. Script error: No such module "Citation/CS1".
  28. Script error: No such module "Citation/CS1".
  29. a b Script error: No such module "citation/CS1".
  30. a b Script error: No such module "Citation/CS1".
  31. Script error: No such module "Citation/CS1".
  32. Script error: No such module "Citation/CS1".
  33. Script error: No such module "Citation/CS1".
  34. Script error: No such module "Citation/CS1".
  35. Script error: No such module "citation/CS1".
  36. Script error: No such module "Citation/CS1".
  37. Script error: No such module "Citation/CS1".
  38. Script error: No such module "Citation/CS1".
  39. Script error: No such module "Citation/CS1".