The tragic death of open source research software (v2)

10 minute read

This is an updated tragic death post for the EuroBioc2026 conference.

Setting the stage

We can certainly all agree that research software has become an central player in scientific research. Concomitantly, it has also become single point of failure, partly due to the way it is valued as a research deliverable and how it is funded. More about this later.

Use case

Our use case is as follows:

Imagine that 6 months ago, you, a brilliant and motivated early career researcher in biomedical sciences, defined the ideal experiment to answer an important biological question in your domain. After several months of hard work and thousands of euros of consumables, you have acquired the precious multi-omics data.

You have even identified a research paper that tackles a similar question using exactly the same technologies and type of data. That paper describes a data analysis method and published a piece of software that are ideally suited to answer your question with your data.

    Experimental design + data + software = results

You have generated good quality data and found the right software.

Your results are at arm’s length, aren’t they?

What could go wrong?

Software collapse

What could go wrong?

  • Here’s what could go wrong: software rot, software collapse, software death!
  • Software collapse is the fact that software will stop working at some point if it is not actively maintained.

The software doesn’t work

  • Software collapse (or software rot) is the fact that software will stop working at some point if is not actively maintained. Collapse can be the results of bugs, accidental changes or voluntary breaking changes (i.e. that don’t guarantee backward compatibility) in the software itself, changes in software (and service) dependencies, …

  • Or simply disappearance of the software (or more generally, the page where it was available), or the lack of responses when originally available on request only.

  • Or maybe that the “software” was never meant to last beyond that one use case/paper. In such cases, it should have clearly been labelled as a prototype, not a tool/software can other can reuse.

Or the software ‘works’ but

  • There is no example data, and it’s not clear what the input should look like.
  • There is no documentation - the software works (with the example/test data or with yours), but the commands and/or output don’t make any sense.
  • Even though the software (correctly?) runs, the lack of documentation or its inadequacy make it too difficult to use.

Really?

Now you might wonder whether this is true, or whether I’m hallucinating all this. Researchers working with software will most likely agree with me based on their experience. I don’t remember of any research that has tried to quantify the collapse for research software, but I recently can across Weekend at Bernie’s, that shows that a considerable amount of the most-depended-on open source packages are indeed dead, and there are different ways for software to end up that way. Andrew Nesbitt documents the following:

  • The maintainer left: the person has disappeared; the company or the team that released the software have disappeared; the software was build as part of a thesis that is now over (very relevant for academic research software); the funding stopped (also very relevant for academic research software); the maintainer has been hired away (or the researcher changed lab/topic); …
  • The maintainer is still here but they are too busy or overwhelmed with other thins (burnout plateau); the activity is purely bot-driven and every commit is a bot (benevolent zombie - is this becoming even more relevant with AI agents?); there is custody battle between multiple maintainers or toxic gate-keeping by the one in charge; …
  • Sabotage and capture where maintenance is taken over by a hostile maintainer or the legitimate maintainer deliberately breaks their own package.
  • The release pipeline broke where the occasional fixes are not shipped/released upstreams (this of local changes that are never pushed to CRAN or Bioconductor); the main branch might even have drifted so far that releasing/merging it would lead to breaking it; or the repo was deleted, made private, moved or the hosting service it was on shut down; …
  • The world has moved on: the software depends on software that has been updated but the maintainer hasn’t followed the changes; the software dependencies (or remote) have themselves died or disappeared; or the software has simply been superseded, the maintainer moved on, but many researchers keep relying on it; …
  • The project split, got forked, got re-licensed (closed-sourced - this does happen for research software when maintainers want to create a spin-off based on their successfully software); …

Andrew Nesbitt counted projects that were active (regular non-bot commits to the default branch), dormant (little or no development), dead (the repo is archived or no visible maintenance) or of unknown status (responsiveness hasn’t been tested) across different package managers (npm, rubygems, cargo, pypi, … but not R, unfortunately for us). And the numbers aren’t negligible, with dead projects ranging between 5 to 20%, and many more dormant ones.

Why would it be any better with research software?

What can we do?

There exist many steps that one can take to minimise the risks described above and making software survive longer. These steps are technical to write better, and more maintainable software, or non-technical, to grow and foster a community and support around the software and their developers. There is no silver bullet, and different situations and constrains will define what is possible. But if there’s one thing to take away, it is not to stay alone in the development and maintenance of a piece of software, especially for junior researchers/developers.

We can think along three directions:

  1. Do we even need new research software?
  2. How can we facilitate long term maintenance to avoid software collapse?
  3. How to properly retire our software rather than leave it rot, collapse and die?

Do we need new software?

The first question is whether we need new research software. In many cases yes, of course. But there are also many cases where the answer could be no. There are many cases where the ‘new’ package/software essentially re-invents the wheel.

Investment

So the first question that one should ask before embarking on a new software development is whether something similar already exists, in some form or another. I think there’s a case to publish software review articles that compare some of the existing solutions on the marker, without necessarily benchmarking them.

If a software already exists, could it be conceivable to possible to contribute to that existing open source software and join that community of users and developers?

If none of the options work out, consider what investment a new software entails: the short term development, the slow feature creep and the long term maintenance burden/effort.

Open source licensing

Assuming you embark on the new adventure, consider some administrative aspects of the project, which include any legal constrains or limitations, intellectual property, author- and copyrights, funding obligations, licensing, academia vs industry, policies and regulations, …

Needles to say that, if possible, make your software widely available under an open source licenses increase usage, contributions, and visibility (see below). Choose an open source license to publish your software and archive it (Zenodo or Software Heritage).

How to facilitate long term maintenance?

Good software development is paramount to minimise software collapse. But everybody starts at some point, and sharing code is a good way to move towards the next steps. Even if one feels that the code isn’t ready for prime time because of lack of ‘formal’ training (many lack it and still become respected developers and contributors), it is much better in the short and long run to share code.

Here are some tips:

  • Implement modularity to deal for instance with software collapse. It is much easier to maintain and extend small independent components rather than a large monolithic code base.
  • Do learn and follow best practice when in comes to research software development. These include automation and manual tasks, unit and integration testing, version control, continuous integration, … Finding a well meaning community will help with this.
  • Avoid reinventing the wheel, and try re-use existing and robust infrastructure when possible/available - stand on the shoulders of giants. But beware of fragile dependencies, even though this is difficult without the experience.
  • Document you code and you software. Forget the silly myth that real developers focus on writing code and not documenting it - that certainly holds for bad developers. Writing documentation forces to put oneself in the position of a user, which is very often enlightening on the usability of what is produced. There are many types of documentations: manuals, tutorials, example data, installation, user and developer guides, slides, videos, web page, … There’s no need to have all of them - focus on a few high quality ones.
  • Focus on traceability and reproducibility when analysing data and developing software to do so. Without traceability and reproducibility, there’s no science, only anecdotal evidence, at best.

Plan for retirement rather than collapse?

Software life cycle

This is a point of particular interest to more senior developers and PIs. There’s pressure to produce new features and software, but planning beyond is important.

  • Think of your software’s life cycle: maintenance, new features (if necessary) and feature creep, new developers unbarring, …
  • Also consider removing features/function (see below).
  • Plan for sun-setting you software. Consider ending, pausing, or handing off.
  • Also consider disaster planning, when funding suddenly gets cut off: make a thread model considering social, financial and technical vulnerabilities.

Deprecate and Defunct

See also Chapter 28 Deprecation Guidelines of the Bioconductor developer guidelines. There are two important functions in R, namely .Defunct() and .Deprecated().

Deprecating a function indicates that it will be removed from the package in the future by throwing a warning, and informing the user to use an alternative function.

> data(msnset)
> msnset <- filterZero(msnset)
Warning message:
In filterZero(msnset) : 'filterZero' is deprecated.
Use 'filterNA (after replace 0s by NAs)' instead.
See help("Deprecated")

When a function is removed from a package, is it defunct. Trying to call it throws an error suggesting an alternative.

> library("MSnbase")
> readMSData2("file.mzmL")
Error in readMSData2("file.mzmL") : 'readMSData2' is defunct.
Use 'readMSData' instead.
See help("Defunct")

Shutting down

Bioconductor has also an end of life policy, documented in Chapter 29 Package End of Life Policy of the developer guide. Packages are deprecation if they have error that aren’t fixed and/or if the maintainer is unresponsive. Alternatively, packages can also be removed at the developers request. Bioconductor keeps a list of removed packages up-to-date.

It is possible to officially hand over a package. The Help wanted page lists packages in various stages that may need assistance to stay in Bioconductor.

As mentioned above, it is essential to archive software to maintain a trace of the packages.

Making software live longer

The best way to increase the live of our software is to improve its quality. This can be achieved by nurturing a welcoming and supportive community, and providing training to help developers with their goals. Some memorable examples are courses offered by the Carpentries, Galaxy training, the ELIXIR training platform, Learning and teaching in the Bioconductor community

Incentives, recognition and career path aren’t always aligned with software development and maintenance. The incentives for software and funding that support it are generally not targeted for maintenance and support, but rather for novelty. And many of the suggestions above are difficult to implement or anticipate for junior developers/researchers are, and/or when isolated.

Conclusions

When developing software, we should account for software sustainability, which is as much a technical than a community issue. Even if, as researchers, the incentives play against long term sustainability, becoming aware of software collapse and the means to limit it is a first step in the right direction.

References