What A Concept

If you’re going to do good science, release the computer code:

…if you are publishing research articles that use computer programs, if you want to claim that you are engaging in science, the programs are in your possession and you will not release them then I would not regard you as a scientist; I would also regard any papers based on the software as null and void.

So would, and do I. A large part of the gullibility of the general public and the media on this subject is that it doesn’t understand how computers, and programming works.

I also find it ironic that econometrics is much more rigorous, in terms of the need to present code for publication, about this than climate “science.”

[Update a few minutes later]

There’s a discussion at Slashdot about this. FWIW.

30 thoughts on “What A Concept”

  1. Wouldn’t it suffice to release the algorithm? The algorithm is what should be tested, ideally by writing new code. Really, if you want to avoid misconceptions because of particular bugs or particular idiosyncratic coding choices (ie “really poorly written code”), wouldn’t it be better to not release the code, and instead just publish the algorithm to see if others can replicate the results using new code which implements the algorithm in question?

  2. see if others can replicate the results using new code which implements the algorithm in question?

    …and if they get a different result, compare them to Holocaust deniers.

  3. I’m puzzled by some of the arguments in favor of AGW.

    Finally, the codes are only one line of argument that indicates that the current increase of CO2 is likely the cause of global warming. CO2 is a greenhouse gas, period. Doubling its concentration by itself would increase the Erth temperature only by about 1.1 to 1.4C. What matters is the sensibility of the climate. Study of the last Ice Ages (which does not require models) indicate that the sensitivity is around 3 or more.This is the factor by which one multiplies the warming due to CO2 alone to get the total warming. You can check those numbers in Hansen et al. (2008) and you don’t need any code to understand the paper . It is pretty technical on the other hand and I am still strugling with some aspects of it. Things are explained using simpler concepts at the skeptics website. Enjoy and don’t be alarmed by the current histeria in teh Guardian.

    The problem with that argument is that one would expect higher sensitivity during the Ice Age for two reasons. First, the ice sheets cover a significant portion of the Northern hemisphere. Albedo will change significantly as the Earth warms up. This effect is far less pronounced now since Antarctica and Greenland are the only two significant remaining ice sheets. Second, methane clathrates would be more significant as a feedback generator than now. First, sea level would have been much lower (meaning both that less pressure existed on methane clathrate deposits than now) and that there was less liquid water to act as a heat sink and hence heating of clathrate deposits would occur faster (faster heating results in higher concentrations of methane and a more pronounced heating effect).

    In other words, it makes no sense to me to discuss sensitivity of climate to CO2 warming when comparing two climates with very different behavior. It is apples and oranges. But I imagine this is part of the desire to show AGW in as bad a light as possible.

    Bob-1, you wrote:

    Wouldn’t it suffice to release the algorithm?

    No. This is not hard to understand. The algorithm is not sufficient to reproduce their exact results (which is a reasonable first step to verify their research). Only running the same code and input to generate exactly the same output would do.

  4. No, I don’t think it’s sufficient to just release the algorithm, although doing so is a good idea. Reimplimenting the algorithm can be very expensive. I don’t have the current numbers, but a number of years ago, I came across the statistic that the average professonal programmer produced about 5000 lines of debugged, documented code per year. Using that number (which may be way off for current programmers), to reproduce a one million LOC program, you’d need roughly 200 programmer/years of effort costing millions of dollars. Who is going to pay for that?

    Software errors are often very subtle and hard to detect. A rigorous code review and testing process can detect a lot of the errors, such as the degradaton of significant digits (round-off error) mentioned in the article. It’s essentially impossible to detect all errors in any significant application (Dijkstra*) but modern software testing practice can detect many of them.

    *”We should not introduce errors through sloppiness but systematically keep them out.”

    “Program testing can convincingly show the presence of bugs but it is hopelessly inadequate to show their absence.”

  5. The algorithm is not sufficient to reproduce their exact results (which is a reasonable first step to verify their research).

    Karl, I don’t follow you — the algorithm should be sufficient unless the code doesn’t implement the published algorithm.

    Larry, it is true that in the worst case, it would be horribly expensive, but is the typical case often like the worst case? I would (naively) expect that climate these studies use software modules and libraries that everyone agrees on, and that the amount of truly novel code that must either be analyzed or, as per my suggestion, rewritten, would relatively small. Maybe that’s wrong — I haven’t followed the issue closely enough to know how complex these climate models are, and how much of the code is truly novel.

  6. Wouldn’t it suffice to release the algorithm? The algorithm is what should be tested, ideally

    no.
    For the same reason I don’t present only the algorithm when we conduct peer reviews of my code. What I MEANT to do is unimportant, what is important is what I DID do. And only with the code can you check for bugs, not write an entire program on your own and see if you get the same results.

    Though that is also something that needs to be done, and is how you conduct the same “experiment” to reproduce results. So you are on the right track, but it is not enough.

    By presenting the code, you can compile and run it and check for errors within days. If you present the algorithm, it can take weeks, months or even years of coding. And then you don’t know if the errors you found are due to bugs in your own code. Then it’s an interminable war of he said she said.

    With the code, you know exactly what they did.

    It is two different purposes, one is for finding flaws in what they did, the other for reproducing what they did using only the algorithms

  7. the algorithm should be sufficient unless the code doesn’t implement the published algorithm

    sorry for double post, but yes, the problem is the code NEVER implements the algorithm. This is basically a fact you can count on. It’s the very definition of “bug” but it also creeps into design.

    Once you actually work in computer programming for awhile, you realize it never happens as planned. And no matter how strict your environment is, there will be little changes developers introduced and they forget to update the design. That is why on the entire planet there were only a handful of SEI level 5 developing houses. If you had to look that up, then trust us that it is NEVER the same as your design. 🙂

  8. I recall the debugged number as 17 lines of code a day, which matches pretty well with 5,000 lines a year.

    Availability of data and methodology are what separate science from anecdote.

    I think the Guardian writer did a pretty good job – but to anyone who has been following Steve McIntyre over at Climate Audit this is old news. His post on Due Diligence and Disclosure, ca. 2005, was what opened my eyes to the fraud going on.

  9. The same journals would never accept papers from other labs that wouldn’t make their results completely repeatable. If I was to go to Nature or Science or anywhereelse and refuse to share my DNA or my protein, they would reject my research.

    That’s reason #87 I stopped getting Nature.

  10. Pluto, for the reasons you state, the researcher’s claim should be the algorithm, not the code. Challenge the claim.

  11. Still, point taken. It is better to use both approaches when possible, and sometimes my suggestion won’t be practical.

  12. I don’t know. I’m a software developer and I’m thought to be pretty good, and Bob-1 has a point. Of course you’re going to get the same answer if you compile and run the same code with the same data; you’re not testing the compiler. You want to inspect the algorithm (why are they adding data to the raw data to account for a greater heat-island effect, which raises the temperatures?), but you also want to see if the algorithm is implemented properly and you can do that by 1. inspecting the code and hoping against hope you’re not making the same mistake; or 2. approving the algorithm, implementing it yourself, and seeing if you see the same trends. After all, it’s not the exact numbers you want to verify; it’s the trends.

    After all, when you are attempting to peer review a physics experiment, you don’t get the actual equipment from the study. You attempt to re-create the experiment based on the description in the study. If you use the same exact equipment, you may be incorporating assumptions unnoticed and unstated in the study.

    The problem in the Climategate thing, after all, wasn’t the the code didn’t compile anymore; as I understand it, they threw out the code and couldn’t reproduce the numbers or trends because they hadn’t documented the algorithms. They had different algorithms for each sub-set of the data because of local adjustments to make them all comparable and they hadn’t documented all of those adjustments; is that right?

    What bothers me about what you said, plutosdad, is that the algorithm is the hypothesis of the experiment. If you didn’t implement the algorithm it’s irrelevant whether it worked perfectly. You haven’t proven your hypothesis, because the algorithm you proposed didn’t predict results that match the actuals measurements.

    I don’t think the purpose of releasing code of computer models for peer review is for QA; we should have a expectation that people write code that works and anything less isn’t peer review; it’s mockery. The point is to reproduce the experiment’s effectiveness, which means it should survive multiple (working) implementations of the same algorithm.

  13. J.C.

    The folk at CA tried to replicate, using R and other languages instead of the original Fortran (66?), but there were a lot “steps” left out the code descriptions by Jones (and others).

    You’d need a fairly detailed pseudo-code as well as a good algorithm to “replicate” results.

    But I agree, in theory the results should be robust to slightly different implementations.

  14. and they hadn’t documented all of those adjustments; is that right?

    It is never documented. that is the problem with ALL computer code. AFAIK, only a few groups at NASA and Motorola India were ever SEI 5 compliant. That means (well as far as anyone who has tried to get certified) no on else on the planet can claim their code implements their design.

    As you point out, there is a difference between 1. making sure they implement the algorithm correctly, and 2. reproducing the results.

    2 is accomplished by creating your own program , as Bob-I suggests. 1 is usually by reviewing their practices, but in the case of computer code, it is not sufficient to say “we did this”, because we know from experience not a single software shop in the world can implement exactly what they say. Which is why you need the code.

    You peer review code, not your intentions. Maybe we’re getting into the difference between peer review (checking for errors) and reproducing results (which happens after publication).

  15. After all, it’s not the exact numbers you want to verify; it’s the trends.

    I disagree. You want the ability to verify the research at as many levels as is reasonable. That’s why research is published as it is with explanations not just of the results but how the research was carried out. Suppose I implement your algorithm and get a different result? Who is right and who is wrong? If we can’t compare code, then it’s just a matter of opinion until we get enough implementations in the literature to make statistical characterizations. That’s a terrible way to go BTW.

    If we have access to each other’s code, then it’s a matter of going through the code to see how our computations differ. If you’re hiding your code and I’m hiding my code, then we can’t do that.

    Finally, there’s always the possibility that the code incorrectly implements the algorithm, but correctly computes the result for some reason other than just that I wanted it to work. In other words, the algorithm may be flawed while the code isn’t. I actually had that happen to me once with a simple number theory problem I was toying with. Serendipity could be hiding in the code.

  16. Even if a computer model has reproducable results, it isn’t proof that it actually replicates or predicts what the actual climate is doing.

    It doesn’t even prove that we understand all of the variables in the climate.

  17. Larry, it is true that in the worst case, it would be horribly expensive, but is the typical case often like the worst case? I would (naively) expect that climate these studies use software modules and libraries that everyone agrees on, and that the amount of truly novel code that must either be analyzed or, as per my suggestion, rewritten, would relatively small. Maybe that’s wrong — I haven’t followed the issue closely enough to know how complex these climate models are, and how much of the code is truly novel.

    There are standard libraries that are widely used, carefully documented, and vetted. So long as they’re used properly, then the outcome is acceptable. However, having the complete source code allows you to determine if they used the libraries properly. It allows you to perform unit testing to determine if the code is functioning properly. Software testing is a rigorous approach to finding errors. There are “black box” tests where you only test across a module’s interfaces without looking inside and “white box” tests where you go deep into the code as written.

    Professional code – even that written to the highest standards – still has errors. A lot of the scientific code out there is written by educated amateurs and it shows.

  18. Bob-1 Says:

    February 9th, 2010 at 12:56 pm
    Pluto, for the reasons you state, the researcher’s claim should be the algorithm, not the code. Challenge the claim.

    As any legitimate software developer posting at this site knows, that’s simply not how peer review works.

    If I sent out an email to schedule a code review without attaching a copy of the code (or the location in Source Safe), I’d get half a dozen responses within the hour reminding me that I forgot to attach the code.

    And if I tried to say to my co-workers that what they really need is just my algorithms, they’d laugh and say “Ok, really, where’s the code printout.”

    Nobody has EVER asked me just for the algorithm.

  19. Kayawanee,

    The claims relevant here are about science, not software development. When people use an algorithm to model the climate, they are making a claim about the climate. The code must implement the algorithm faithfully to test the claim, but the claim is not that the code is correct — that’s just experimental method. Unlike software development, in climate science correct code is necessary, but it isn’t scientifically interesting, just as in chemistry clean test tubes are necessary but are not scientifically interesting. Jonathan Card explained this very well above, so I won’t belabor the point.

  20. When people use an algorithm to model the climate, they are making a claim about the climate. The code must implement the algorithm faithfully to test the claim, but the claim is not that the code is correct — that’s just experimental method.

    If the code isn’t correct, neither are the results produced by the code. Scientific code may be “uninteresting” to the scientists but without coding discipline and testing, they might as well just make shit up.

  21. I feel like we are talking past each other, yet we both agree that results should be independently reproduced, and that scientists must share enough information for this to easily happen. We agree that if chemists don’t clean their test tubes, they might as well just make shit up. We’re just disagreeing over whether one chemist needs to inspect another chemist’s test tubes versus simply using her own test tubes as she reproduces the result.

  22. It makes no sense to claim that one need not include computer code because one cannot include the experiment. Remember that code can be used and inspected to a degree that lab experiments cannot. If I could as easily encode a lab experiment right up to the dirty test tubes and my precise motions, then we would be calling for the experiment to be just as accessible as computer code.

    It’d be the same as claiming $10,000 in hundred dollar bills is impossible to pick up because $10,000 in pennies is impossible to pick up.

  23. Karl, I don’t follow you — the algorithm should be sufficient unless the code doesn’t implement the published algorithm.

    Bob, you answered your own question.

    Of course, there are many reasons a scientist might not want to release there code. Two that come to mind are fraud and that scientist generally write poor code relative to a professional programmer.

    True science can not be a hidden discipline… that’s for priests and shaman.

  24. If you can write new code that implements the same algorithm, you’re better off. If that’s not practical, of course it is good to test the other guy’s code. Just understand that the algorithm is the claim.

  25. Bob-1, thank you for your kind words above. I seem to have missed the battle while going to class. I liked your statement, “Just understand that the algorithm is the claim.”

    For a slight change of topic, is this a completely new discussion in scientific circles? Obviously, peer review and the reproducibility of experiments is old-hat in most of science, but is the peer-review of the computer models of complex dynamic systems a well-understood practice? I assume that systems like orbital mechanics have these kinds of software systems, but many of the governing principles were understood before there were computers (and orbital mechanical systems are arguably less complicated that climate prediction). But, I wonder if this is the route that such systems will go in the future, as things like the solar wind pressure, varying gravitational influences of other orbiting bodies not yet discovered, and the Grblgrbl* effect are identified and understood.

    * I can’t remember the name of this. I attended a lecture by Dr. Lauretta of the University of Arizona and the OSIRIS REx project last week and it’s something about the radiation from the heated side of a rotating body providing thrust in the direction of local evening as the body cools.

  26. For a slight change of topic, is this a completely new discussion in scientific circles?

    It’s been a problem since the 70s. The article gave the example of the proof of the Four Color conjecture in 1977. I don’t recall the full details, but the two researchers, Apel and Haken, reduced the problem to a similar conjecture over planar graphs. They then computationally tested a couple thousand graphs to rule out possible exceptions to the conjecture. The code had to be included as part of the proof and it has been vetted and refined considerably since then. For example, Ulrich Schmidt (who did some work for a master’s thesis in the 80s) found an error in the original proof and was able to reduce the number of cases that needed to be tested. Wikipedia has a decent outline of subsequent efforts to refine the proof further. So even in the early days of computer-aided math proofs, they included the full code.

    If your research depends on computer results, then it is in many fields standard to include the code and has been so for some time. A notable exception is the IT field. It is common for researchers to publish non-peer reviewed results about proprietary hardware or software (for example, the traditionally notorious benchmark comparison) without revealing important details about them. Frankly, I think that’s pretty unscientific since it allows the researchers in question to distort their research by claiming different performance for the hardware or software in question (to someone’s advantage).

  27. The global warming debate has so many passionate (okay, hysterical) advocates and critics that releasing the source code AND the algorithms are definitely what must happen. Given the very large number of contributors to Linux worldwide, you will have plenty of people across the planet willing to donate their time comparing algorithm to code so that they can be the one who gets the credit for either settling the accuracy of AGW or finding fault with what has been built so far. Either way, you will wind up with a far better algorithm and significantly improved code base. Plus you get the benefit of bypassing much of the current tainted ‘peer review’ process.

    Hey, all of our governments used our tax dollars to pay for this effort so far. It is reasonable to expect that this effort, which reasonably should not be considered classified in a national security sense, should be completely public. Sunlight makes a great disinfectant, and public disclosure of the algorithm, the code, and the data likely would have avoided the situation we find ourselves in now.

  28. I don’t think it’s normal to publish the code in other fields of modeling. I recall that with combustion modeling and CFD, the objective was to report the algorithm, the results of the model, and validation of the results. The reason for this was that the code was valuable, either commercially or for obtaining funding for further research in the field. Supplying the code for validation of results is analogous to permitting others to use your laboratory and experimental set-up to replicate experimental results. Maybe the advent of digital experiments opens up a new venue for peer review, but it’s certainly not an approach that’s taken hold so far.

  29. Hey, all of our governments used our tax dollars to pay for this effort so far. It is reasonable to expect that this effort, which reasonably should not be considered classified in a national security sense, should be completely public.

    This. If NOAA is going to be the new climate guru, it should be at least as open as NIST, for example.

Comments are closed.