Project Proposal
Name: Ishan Jayawardena
Contact Details: email udeshike@gmail.com, ishan on irc.debian.org
Background:
I am pursuing a degree in Computer Science and Engineering at University of Moratuwa, Sri Lanka. Linux, experimenting with C/C++ code bases, XML and SOA are my subjects of interest. I have been enthusiastic in contributing to open source software even before I did my first summer of code project in 2010. The most important result that I expect from SoC is to get introduced to a matured, well known project like Debian and actively contribute to its development. Last year, I successfully completed a SoC project under The Apache Software Foundation in which I implemented the W3C candidate recommendation of Schema Component Designators (SCD) for the Xerces2-J XML processor. In addition to these, I have development experiences with Eclipse and Mozilla projects. With all these knowledge, I am determined to taking up another successful project this time as well, under the Debian project.
I had my first Linux experience with Ubuntu few years back and since then, I have been using apt for various purposes. I started to experiment with apt's code base only couple of months back and since I found it enjoyable and because of the unique nature of its design, I started to look more into it. Recently, I tried to fix a bug(#335925) of apt and got couple feedbacks from the community. With this project I am expecting to learn the concepts of apt, getting familiarized with its codebase and its other usages, be a member of the development team, and to contribute to improve its user experience and its support for other front ends etc.
Project title: Debdelta Integration
Synopsis: Integrating debdelta natively into the apt downloader/installer
Benefits to Debian:
- Improves user experience of apt and its front ends by speeding up upgrade processes (particularly useful for users with slow Internet connections)
- Provides a unified interface for handling debdelta features
- Provides a good solution for stable and for security upgrades
- Can be combined with future enhancements like parallel installations to make the total process even smoother
All front ends and other related libraries of apt (such as synpatic, aptitude, packagekit and python-apt at the very least) could benefit and they would be able to provide a consistent experience to their users
Deliverables:
- Suitable modifications/additions to the libapt, debdelta APIs (This might include C++ and Python source files and possibly patches for existing code)
- Documentation
A set of testcases to verify the functionality
Project details:
Debdelta is a collection of applications that can compute changes(deltas) between Debian packages(debs). A typical delta can be viewed as a set of binary differences(similar to the output of diff program) between two versions of a deb and it can be used to store and transmit only the changes between those two debs. Currently, when doing an 'apt-get upgrade', each time, apt downloads complete debs regardless of the sizes and the amount of changes present in the newer versions from the existing debs. This results in downloading excess data and longer download and installation times. But, if it is possible to download only the actual changes between two debs and compute a newer version of the deb with those differences, that would reduce the download time and the total upgrade time considerably.
The normal practice to upgrade a system with debdelta is a three step process. First, the user must run 'apt-get update', then 'debdelta-upgrade' followed by 'apt-get upgrade'. In this process, after updating the packages list, the debdelta-upgrade program downloads deltas and creates bit-identical debs from them, in parallel. Then, 'apt-get upgrade' takes in these debs for the installation stage.
But, there are several drawbacks in this approach. Firstly, an intermediate step for 'debdelta-upgrade' is required before the actual upgrade (or dist-upgrade) step. Secondly, there is no means for apt's existing front ends to use debdelta features since it has not been integrated with any of them, and, therefore, the users of these front ends do not receive the true advantages that debdelta provides. In addition to that, there is always the requirement of maintaining a single unified program/library to manage all lower level operations so that higher level applications can be built/customized on top of apt conveniently.
Therefore, the goal of this project is to modify apt's downloader/installer so that it actually supports and uses debdelta in its operations. This is referred to as the native integration. After the integration, apt handles package upgrades in the following steps;
- debdelta creates a patch index information file with available patches for debs. This file is similar to the package index that apt currently uses, but smaller in size since it contains only additional important information such as the sizes and the hashes of deltas. Then, apt can provide its front ends with these information that can in turn, be used to determine the download progress details, provide local policies about download sizes, verify the integrity of the downloaded deltas. Since currently debdelta does not create this delta index, it will have to be extended to create this index.
- apt consumes the above index file and, based on it, creates the delta download queue and intiates the downloads. This is where apt decides the number of deltas to be downloaded because sometimes apt will have to download complete debs if no delta is available for a package. In addition to that, apt also creates an optimized download order for the deltas by considering the parameters such as the sizes of deltas and the debs, and the speeds of download and patching etc.
in its commit step, apt re-assembles the debs from the downloaded deltas and installs them. This also includes proper progress reporting to the user.
Project schedule:
Community Bonding Period
April 26 - May 22Getting to know the mentor and the community
Learning more about the deliverables including the APIs, features, testcases, documentation etc.
Familiarizing myself with libapt, debdelta and required programming skills etc.May 23 - July 10
Identifying the important components of apt such as, downloader, and instaler that are important and learning their internals and how they can be used in the project
Identifying the stages of the development process with the help of the mentor
Determining the format of the index file
Finalizing the design and start coding
Modifying debdelta to create the delta index (i.e. the first of the steps described in the Project details section above.)
Starting to implement the second stepJuly 11 - July 15
Submitting mid-term evaluations and continuing the development
July 15 - August 15
Completing main development tasks. i.e implementing apt's support for the second and the third steps by modifying apt's downloader and installer
Writing test cases and carrying out tests
Preparing suitable documentation for the workAugust 30
Submitting final code to Google
Other summer plans: I have no specific plans for this summer and therefore, I am able to work on the project full time.
Exams and other commitments: I do not have any exams during the GSoC period.
My Plans for Debian After the Summer:
Initially, I was interested in an idea to speed up apt's installation process. I discussed this idea with both the dpkg and apt communities and gained some insights into it. Installing debs in self contained batches while doing parallel downloads is one such suggestion that came up, and, if implemented, this would address many issues like installing on devices with limited bandwidth/space while considerably decreasing the total installation time. There have been many discussions about this (such as [3], and the merged bugs #30505, #40438, #53152, #135637, #165558, and #185201). The integration of debdelta can solve only a part of this problem by reducing the size of downloads which will be an ideal solution for users with limited bandwidth connections. This will also have some contribution for reducing the installation time. But, the other part of the problem that debdeltas cannot solve is parallelizing the installations with downloads. This requires correct identification and grouping of self contained batches of debs (or minimal subgroups) and correctly feeding them to dpkg for the installation once each such batch is downloaded. Although seems simple, this requires a careful design that always assures and results in a consistent system despite of download/installation failures, among other things. Therefore, I am planning to continue my contribution to apt with these two ideas; combining the integration of debdeltas and the parallel installation of self contained batches together, hoping that it will ultimately produce useful results for variety of users.
References and Resources:
[1] Debdelta support in libapt: https://wiki.ubuntu.com/DebdeltaAptIntegration[2] Xerces2 Java Parser Readme: http://xerces.apache.org/xerces2-j/
[3] Bug#32919: apt: wish: when not enough disk space, incremental install : speedup!: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=32919
[4] Bug#498778: debdelta: better integration with apt, aptitude, etc: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498778
[5] The idea page of 'Debdelta Integration': http://wiki.debian.org/SummerOfCode2011/AptDebdeltaIntegration
[6] http://lists.debian.org/deity/2009/06/msg00038.html
[7] debdelta FAQ: http://debdelta.debian.net/FAQ.txt
[8] debdelta readme: http://debdelta.debian.net/README.txt
[9] Ubuntu debdelta man page: http://manpages.ubuntu.com/manpages/lucid/man1/debdelta.1.html
[10] AptSyncInKarmicSpec: https://wiki.ubuntu.com/AptSyncInKarmicSpec
- Discussions on apt, dpkg mailing lists, #debian-apt, #debian-dpkg and #debian-soc IRC channels on irc.debian.org.
