Distributed database: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>OAbot
m Open access bot: url-access updated in citation with #oabot.
 
imported>Citation bot
Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 705/1032
 
Line 1: Line 1:
{{Short description|A database in which data is stored across different physical locations.}}
{{Short description|Database whose data is stored in different physical locations}}
{{multiple issues|
{{multiple issues|
{{more citations needed|date=August 2010}}
{{more citations needed|date=August 2010}}
Line 10: Line 10:
O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin</ref>
O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin</ref>


Two processes ensure that the distributed databases remain up-to-date and current: [[Replication (computing)|replication]]<ref>{{Cite journal |last1=Ozsu |first1=M.T. |last2=Valduriez |first2=P. |date=1991 |title=Distributed database systems: where are we now? |url=https://ieeexplore.ieee.org/document/84879 |journal=Computer |volume=24 |issue=8 |pages=68–78 |doi=10.1109/2.84879 |s2cid=5898169 |issn=1558-0814|url-access=subscription }}</ref> and [[Data transmission|duplication]].
Two processes ensure that the distributed databases remain up-to-date and current: [[Replication (computing)|replication]]<ref>{{Cite journal |last1=Ozsu |first1=M.T. |last2=Valduriez |first2=P. |date=1991 |title=Distributed database systems: where are we now? |journal=Computer |volume=24 |issue=8 |pages=68–78 |doi=10.1109/2.84879 |s2cid=5898169 |issn=1558-0814}}</ref> and [[Data transmission|duplication]].


# Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources.   
# Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources.   
Line 69: Line 69:
* [[Teradata]]
* [[Teradata]]
* [[TiDB]]
* [[TiDB]]
* [https://en.oceanbase.com/ OceanBase]
* [[Vertica]]
* [[Vertica]]



Latest revision as of 15:44, 25 August 2025

Template:Short description Template:Multiple issues

A distributed database is a database in which data is stored across different physical locations.[1] It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A distributed database can reside on organised network servers or decentralised independent computers on the Internet, on corporate intranets or extranets, or on other organisation networks. Because distributed databases store data across multiple computers, distributed databases may improve performance at end-user worksites by allowing transactions to be processed on many machines, instead of being limited to one.[2]

Two processes ensure that the distributed databases remain up-to-date and current: replication[3] and duplication.

  1. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources.
  2. Duplication, on the other hand, has less complexity. It identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database. This ensures that local data will not be overwritten.

Both replication and duplication can keep the data current in all distributive locations.[2]

Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous, and asynchronous distributed database technologies. The implementation of these technologies can and do depend on the needs of the business and the sensitivity/confidentiality of the data stored in the database and the price the business is willing to spend on ensuring data security, consistency and integrity.

When discussing access to distributed databases, Microsoft favors the term distributed query, which it defines in protocol-specific manner as "[a]ny SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources".[4] Oracle provides a more language-centric view in which distributed queries and distributed transactions form part of distributed SQL.[5]

Architecture

There are 3 main architecture types for distributed databases:

In the shared-memory and shared-disk architectures, the data is not partitioned, but it has to be in a shared-nothing architecture.

Shared-disk architecture is more common for cloud databases than for on-premise.[6]

Historically, shared-nothing was the first architecture to be implemented on the cloud, before the advent of shared cloud storage made shared-disk possible.

In practice, different layers of the database can have different architectures. It is now common to have a compute layer with a shared nothing architecture, and a storage layer with a shared disk architecture. This is for instance the case of Snowflake[7] and AWS Aurora.[8]

List of shared-nothing databases

List of shared-disk databases

See also

References

Template:Reflist

Further reading

Script error: No such module "Navbox". Template:Authority control

  1. Script error: No such module "citation/CS1".
  2. a b O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin
  3. Script error: No such module "Citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "citation/CS1".
  6. a b Script error: No such module "citation/CS1".
  7. Script error: No such module "citation/CS1".
  8. Script error: No such module "citation/CS1".