Debian Data team - package data
This page is notes for the Debian Data Team. For uses of various data, please see/edit the various other wiki pages not below Teams/Data.
Sources
Canonical:
- Package lists
Aggretations:
- UDD
Prior work
PTS service
The Package Tracker (PTS) generates RDF data (additional notes at RDF).
PTS service is deprecated (seemingly frozen since 2024-05-22).
Metadata is provided for each package, e.g. at https://packages.qa.debian.org/a/apache2.ttl
All metadata is also available via SSH, e.g. like this:
rsync -av packages.qa.debian.org:/srv/packages.qa.debian.org/www/web/full-dump.tar.bz2 .
The full-dump dataset is serialized as RDF/turtle, ~30 MB compressed and 300 MB uncompressed, and can be bootstrapped and served as a SPARQL endpoint, e.g. like this:
sudo apt install oxigraph
tar xfO full-dump.tar.bz2 | pv | oxigraph load --location oxigraph.db --format ttl
oxigraph serve --location oxigraph.db
The on-file database above requires ~1.2 GB on disk.
RDF service
A dedicated RDF service serves RDF representations bases on UDD aggretations (additional notes at RDF).
Service has gone (possibly live only in 2015, according to archive.org snapshot).
Archived pages:
Debtags service
A dedicated service accepts updates to ?DebTags, which are manually pushed to the Debian archive by the Debtags team. Notes on semantizing Debtags data is at RDF).
Service has gone.
TODO
data directly from package lists
Perl tool distro-delta resolves data directly from package lists.
A benefit over UDD-based tools is support for custom systems, including derivatives and local collections of packages.
Currently, distro-delta supports only superficial comparison of two sets, but should be relatively easy to extend to produce RDF of a dataset.
Currently, distro-delta covers only source packages, but should be relatively easy to extend to cover binary packages, including relationships between packages.
