Skip to content or view screen version

Mathematical backing for Open Source software.

nickleberry | 30.09.2004 16:23 | Technology | Cambridge

The open source community may now have more than ideology on their side, with researchers showing mathematically that their "release early, release often" software development model is the quickest way to bug-free code.

What follows is a re-posted article (from a website which is part of the Millenium Maths Project) describing this research. The original article is at:
 http://plus.maths.org/issue25/news/programs/

Software is usually created in one of two ways. Eric Raymond coined names for the two approaches: the CATHEDRAL approach, where a team of programmers work as an isolated group creating the software, only releasing it to their users every so often after substantial changes; and the BAZAAR approach, where new versions of the software are released as often as possible, perhaps as a result of work done by a community of user-programmers. Most proprietary software, such as Microsoft Windows, falls into the cathedral category, while the Linux operating system and Apache, the most popular web server, are examples of the bazaar approach.

In a paper currently being refereed, Damien Challet and Yann Le Du from the University of Oxford characterised open source software as that developed in the bazaar style, and closed source as that created with the cathedral approach. They developed a mathematical model of how bugs, or errors in the software, are detected and resolved for the two types of software development.

The model assumes that each software program consists of a number of independent parts, and that users will spot bugs present in a part of the program with a certain probability. Once a bug is reported, the chance that some programmer will be able to fix it is given by another probability. The model of open source software assumes that at each step, the users are using a new version of the software containing all the changes made by the programmers in the previous step. In contrast, for the closed source model, new versions of the software is released to users less frequently.

The study shows that under the same conditions - the same number of users with the same chance of spotting bugs, the same number of equally able programmers, and the same initial level of bugs - open source software will always become bug-free quicker than closed source software. For closed source software to become completely reliable in the same amount of time, the programmers must be more able, or there must be more of them, than for open source software.

The researchers explain that this is a result of the release cycles for closed source software. When a new version is released, after an initial rush of bug finds by users, the programmers are left to find and correct errors with little feedback until the next release. In contrast, the frequent releases of open source software allow for continuous testing by users, and feedback to programmers.

The researchers were also able to use their model to investigate the dynamics of real life projects, such as the development of the open source project Linux, by using records of the number of users and programmers, and the size of the program measured in lines of code. Since its birth in 1991, the growth of the Linux code has been approximately quadratic - much faster than the linear growth of other software projects such as Mozilla - and yet it has a reputation for being very reliable. According to the researchers, "this leaves the question of how Linux could grow at such a pace without compromising its quality".

The answer appears to lie in the quality of Linux programmers. The researchers had to put a lower bound on their ability - the probability that they would successfully fix a bug - to make the model emulate the development of Linux. The researchers concluded that "rapid software growth can indeed lead to high quality software, even in adverse conditions, provided that the programmers' quality is high enough."

The researchers are aware that there are limitations to their current model, including the assumption that a program is made up of independent parts. The next step is to apply the model to a scale free network, which Ricard Solé from the Complex Systems Lab, Barcelona, and his colleagues from the Santa Fe Institute, New Mexico, have shown to be a more accurate description of the structure of software.

A scale free network is a network of points, called nodes, joined together in such a way that most of the nodes are linked directly to only a few others, while an important few nodes are linked to many others. These highly linked nodes act like hubs, and a consequence is that the average distance between any two nodes in the network (the number of links you have to follow to get from one to the other) is small. Many real-world networks have recently been shown to be scale free, including the World Wide Web, with sites like yahoo acting as hubs, and the internet, the physical network of routers and wires behind the WWW - even some metabolic networks and food webs appear to be scale free.

So as research continues, many in the programming community will be waiting to see if it provides more ammunition for the open source debate. Meanwhile, software users will be hoping that it will lead to more reliable software, as we head towards an ever more computer-dependent future.

nickleberry