Debian Data team - package data

{i} This page is notes for the Debian Data Team. For uses of various data, please see/edit the various other wiki pages not below Teams/Data.

Sources

Canonical:

Aggretations:

Prior work

PTS service

The Package Tracker (PTS) generates RDF data (additional notes at RDF).

/!\ PTS service is deprecated (seemingly frozen since 2024-05-22).

Metadata is provided for each package, e.g. at https://packages.qa.debian.org/a/apache2.ttl

All metadata is also available via SSH, e.g. like this:

The full-dump dataset is serialized as RDF/turtle, ~30 MB compressed and 300 MB uncompressed, and can be bootstrapped and served as a SPARQL endpoint, e.g. like this:

The on-file database above requires ~1.2 GB on disk.

RDF service

A dedicated RDF service serves RDF representations bases on UDD aggretations (additional notes at RDF).

/!\ Service has gone (possibly live only in 2015, according to archive.org snapshot).

Archived pages:

Debtags service

A dedicated service accepts updates to ?DebTags, which are manually pushed to the Debian archive by the Debtags team. Notes on semantizing Debtags data is at RDF).

/!\ Service has gone.

TODO

data directly from package lists

Perl tool distro-delta resolves data directly from package lists.

A benefit over UDD-based tools is support for custom systems, including derivatives and local collections of packages.

Currently, distro-delta supports only superficial comparison of two sets, but should be relatively easy to extend to produce RDF of a dataset.

Currently, distro-delta covers only source packages, but should be relatively easy to extend to cover binary packages, including relationships between packages.