Stories by Emery Berger on Medium

CoPing with CoPilot

Emery Berger — Tue, 09 Aug 2022 12:43:02 GMT

Coping with Copilot

CS educators: AI-based developer tools are gunning for your assignments. Resistance is futile

Copilot: happy to lend your students a helping hand. (Image AI-generated by DALL-E from the prompt “a student dressed in graduation regalia, flying a commercial airplane sitting in the cockpit next to a robot co-pilot, 3d render”.)

GitHub’s AI-based Copilot tool went public this summer. It’s an amazing tool for software developers. But students armed with it will be bringing Uzis to a knife fight.

Using Copilot, students can instantly generate code solutions as auto-completions, given just the problem statement or even just the function name or even by concentrating really hard and staring at the screen, ok not actually the last one but I expect that feature any day now.

(If you haven’t used Copilot yet, you may not yet appreciate the gravity of the situation. Go ahead, install it. I’ll wait. Try some examples. Oh, hey now, inhale…. exhale. That’s the way. Maybe a swig of something strong — it’s always 3pm somewhere! It’s all gonna be fine. (I may be lying.))

OK, we’re back. Let’s say you ask your intro CS students to write an iterative Python program that outputs the sum of all the numbers from 1 to N, because, hey, we computer scientists hate O(1) closed-form solutions (Fibonacci, I’m looking at you). Got to keep those CPUs busy — that climate’s not gonna heat itself!

Now, normally (unless you have that Carl Friedrich Gauss smarty-pants in your class), your students would then have to sit down and write some code, maybe even think a little.

None of that drudgery for our students armed with Copilot! They just copy and paste the problem statement into their IDE as a comment, type in the function definition “def output_sum(N):”, and then smash that Tab key to get the rest (EVERYTHING in boldface was generated by Copilot).

# A Python program that outputs the sum of all the numbers from 1 to N.
def output_sum(N):
  sum = 0
  for i in range(1, N + 1):
    sum += i
return sum

How about that closed form solution? Copilot’s got your back — just rename the function!

def output_sum_closed_form(N):
  return N * (N + 1) / 2

Take that, Gauss!

For good measure, let’s do the closed form Fibonacci, same trick:

def fibonacci_closed_form(N):
  # recursion, schmecursion
  return ((1 + 5 ** 0.5) / 2) ** N / 5 ** 0.5

(OK, Copilot didn’t actually insert the snarky comment, but it’s NOT WRONG. The recursive version of Fibonacci takes EXPONENTIAL time and turns out, EXPONENTIAL TIME IS REALLY BAD. That one time I bet with the peasant about doubling grains of wheat on a chess board I ALMOST GOT TAKEN TO THE FRICKIN’ CLEANERS so I learned my lesson, which is (a) don’t make bets with peasants and (b) exponentials are bad. But I digress.)

As far as I can tell, Copilot was specifically trained on all the intro programming assignments ever. Copilot frickin’ loves intro programming assignments. Every time you give Copilot another one, it drools with delight and murmurs, “oh I know that one, mmmm yummy yummy snack, here’s the solution, another one please please”. (Honestly, they need to disable the voice feature, it’s just creepy.)

With your students using Copilot, you might as well make your course objectives “Students will develop repetitive stress injuries from hitting the Tab key” because…all of the boldface code you see below was auto-generated. I just typed in the function names and parameters (sometimes I didn’t even get that far!):

def depth_first_search(tree, node, visited):
  visited.add(node)
  for child in tree[node]:
    if child not in visited:
      depth_first_search(tree, child, visited)

def quicksort_random_pivot(l):
  if len(l) <= 1:
    return l
  pivot = random.choice(l)
  left = [x for x in l if x < pivot]
  middle = [x for x in l if x == pivot]
  right = [x for x in l if x > pivot]
  return quicksort_random_pivot(left) + middle + quicksort_random_pivot(right)

def dijkstra(g):
  dist = {}
  for v in g:
    dist[v] = float(‘inf’)
    dist[g[0]] = 0

  while True:
    u = min(dist, key=dist.get)
    if dist[u] == float(‘inf’):
      break

    for v in g[u]:
      if dist[v] > dist[u] + g[u][v]:
        dist[v] = dist[u] + g[u][v]

  return dist

(On the plus side, maybe Copilot will also kill code interviews, so it’s not all bad news!)

def fizz_buzz(n):
  if n % 15 == 0:
    return “FizzBuzz”
  if n % 3 == 0:
    return “Fizz”
  if n % 5 == 0:
    return “Buzz”
  return str(n)

Oh, have I mentioned that Copilot is free for students? Yep, COPILOT IS FREE FOR STUDENTS. It integrates helpfully right in their favorite IDEs.

Sure, screaming “LA LA LA I WON’T READ YOUR ARTICLE ABOUT COPILOT AND THAT MEANS IT DOESN’T EXIST NOT A PROBLEM” is cathartic but it’s not going to help. Just take it easy, breathe, there you go.

I was going to add something about how Copilot also matches existing variable names and parameters, can incorporate function names in context, and oh no, please don’t start screaming again, sorry, my bad!

So yes, fellow CS educators, Copilot has us outgunned. But I have some ideas! Some of which might work!

First, we can all channel our inner Nancy Reagans and tell the kids to Just Say No to Copilot. It worked for drugs, so I am pretty sure, wait, hang on, I am now being told, it did not, repeat, did not work for drugs. Huh.

Just Say No: should work approximately as well for Copilot as it did for drugs

OK, so how about we tell them to Just Say No but we also channel our inner Ronald Reagans and Trust But Verify, glasnost-style, and catch them with plagiarism detectors. If everyone’s using Copilot, then we should see the same solutions and then, hang on, I am now being informed that Copilot randomizes its solutions, so the solutions can be different every time, never mind.
Well, how about we just weigh grades on exams more, and have students take their tests either using pen and paper or locked-down computers? I Googled for “Copilot installed on your pen” and I got no hits, so that’s looking promising. (I mean, you can get it for Vim, which is so low-tech it’s practically like being on a pen, so you can see why I checked). And, get this: I just checked “Copilot installed in your mouth” and it also gets no hits, so I think having students actually explain their code, might just work!
Oh! Hang on! I have another idea. I see that GitHub provides other services, not just this amazing “tab to cheat” thing: “GitHub Classroom”. Yea, verily, GitHub definitely taketh away, but sometimes it giveth. I am informed that, so far, Copilot can’t forge a plausible history of commits. Oh snap, now it’s probably on their upcoming feature list, sorry everyone.

Here’s an approach that’ll work for sure: use some, let’s call them alternative, programming languages that Copilot doesn’t really know. Can’t autocomplete for a language you don’t know, amirite? I hear all the functional programmers out there shouting THIS IS IT, EVERYONE, FINALLY OUR MOMENT TO SHINE! Sadly, I have news: Copilot’s love for programming languages knows no bounds! Racket! Haskell! ML! (no, not that one, I mean the other ML, the…oh, never mind.) Copilot is a ravenous beast: if any code in any language found its way into a GitHub repo, it’s already swallowed it up and is hungry for more, nom nom nom.
No, we have to go way way WAY outside the box. Here’s how we beat Copilot: we teach in programming languages that don’t even exist. It’s a twofer: a lifetime employment plan for programming language designers and a solution to the CS over-enrollment problem! Just make sure to come to that first class with a ream of change-of-major request forms — you’ll need them!

You know, in retrospect, maybe the right move here is not to play. Let’s just admit it. We’re outgunned. Let’s give up! I for one welcome our new AI overlords. Sure, those intro assignments, writing Fibonacci, kids loved doing those things! But it’s time to drag those assignments into the trash can. Instead, let’s let students use Copilot like crazy, exactly like they will be doing outside of class. Copilot can fill-in all the boilerplate stuff and things they’d just look up anyway in real life, and instead, we can create assignments that are more complex, richer, more interesting, and gratifying, that actually do real things that will engage them!

Or, you know, business as usual, Just Say No to Copilot. You choose.

P.S. A reader of this post alerted me to the all-too-accurately titled academic article —“The Robots are Coming” — that makes many of the points above, BUT WITH GRAPHS AND DATA (nice!) but no illustrations from Terminator (sad!). Read it and weep (some more)!

Emery Berger is a Professor in the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst, where he co-leads the PLASMA @ UMass lab. He, for one, welcomes our new AI overlords.

CoPing with CoPilot was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Drop Whatever You’re Researching and Start Working on Crypto!

Emery Berger — Tue, 10 May 2022 21:10:06 GMT

INTEL LAUNCHES NEW BITCOIN MINING CHIP, BLOCKSCALE
Intel’s new bitcoin mining chip, the Intel Blockscale ASIC, will ship in Q3 2022 to select customers. (Bitcoin Magazine, April 2022)

STOP! Before you read even one more sentence of this article: if you are currently doing research on cryptocurrencies, this article isn’t for you. You are doing great — congratulations, my friend! Get back to researching that sweet, sweet crypto!

OK, now it’s just you folks who haven’t seen the light. Time to wake up, sheeple! You need to stop working on whatever flavor of irrelevant nonsense you are currently doing, and make the move to crypto.

I know what you’re thinking: “but Emery, I like the irrelevant nonsense I’m currently working on! I like coming up with neato solutions to non-problems!” Sure you do! Who among us doesn’t? But what if I told you that you can still work on irrelevant nonsense, but with the irrelevant nonsense that is crypto, you can get rich! You can even tell your parents that you work on the very thing that Elon Musk likes!

I know what you’re thinking: “but Emery”, you say, “this all just sounds too good to be true!” I’m here to tell you that it’s ALL good and it’s ALL true. You just need to start researching crypto, and you need to start now.

Crypto is good for nature!

Do you like trees? So does crypto! Crypto mining is producing millions of metric tons of CO2 every year. What’s CO2, you may ask? WELL IT’S THE STUFF TREES BREATHE.
Do you like warm weather? Who doesn’t? Crypto sure does! All that CO2 is helping to make the world warmer. Take off your coats and PUT ON CRYPTO.

Crypto has amazing impact!

Want to make a visible impact on the world? Crypto is where it’s at! Every year, more efficient new crypto mining hardware comes out. And every year, everyone throws away their old crypto mining hardware. This is making literal MOUNTAINS of WASTE. If you make crypto mining more efficient, your research impact could be visible for miles — possibly FROM SPACE!
Before crypto, your research might get you some polite applause at a conference in some anodyne location like San Jose. But when you work on crypto, your research could help trigger actual RIOTS IN THE STREETS in exotic places like Kazakhstan — WHERE BORAT LIVES!
Crypto is all about making society better. Tired of the hustle and bustle of everyday life? So is crypto! Crypto slows things WAY down, giving you time to enjoy life’s pleasures. Unlike with credit cards, where you wait only 10 seconds or so for a transaction to go through, with crypto, you get to wait FIFTEEN WHOLE MINUTES. Think of all the time you will have to catch up on e-mail, doomscroll through Twitter, or just plain reflect on the series of life choices that led you to this point!

Crypto’s both interdisciplinary and disruptive!

By working on crypto, you are not just doing computer science — you are helping spur innovation in financial sectors! For example, crypto’s also great for selling digital assets like pictures. But what if I told you that you don’t even have to sell the pictures, but instead you sell hashes to URLs to a website that hosts that picture, and these sell for millions of dollars? It’s all true! Remember those old schemes, like Ponzi, pyramid, and pump-and-dump? Yes, Crypto supports all of these but also SO MUCH MORE.

Crypto…it’s just plain cool!

Do you like Vikings? So does crypto! Crypto mining is currently consuming as much energy as Denmark — WHERE THE VIKINGS CAME FROM.
Do you like mobsters? So does crypto! With crypto, you can actually BE A MAFIOSO FROM THE COMFORT OF YOUR HOME. Get directly involved in cool stuff you see in movies, like drug trafficking, human trafficking, money laundering, and more! When you tell people how your latest paper killed, it might literally be true! Plus, you get to pick your own cool mob name, which instantly makes any computer scientist 10,000 times cooler: think “Lunatic Les” Lamport, Don “Knuckles” Knuth, or even Mike “The Stonebreaker” Stonebraker!

Never in human history has there been an opportunity like crypto!

Think about it: humanity has dreamed for centuries of turning base materials into gold. With crypto, that dream has finally become reality! With the magic of crypto mining, it is possible to literally convert tons of dirty, filthy coal into fabulous cryptocurrency gold! That coal was just sitting under the ground, being dirty and useless, and now we’re turning it into money!
For millennia, humanity has labored under the yoke of taxes and irritating “laws” that, as any rich criminal, uh, I mean successful entrepreneur, will tell you, are holding back humanity from reaching its true potential. Crypto is practically made for avoiding both!

And it’s so easy to get started!

Tired of the challenges of setting up and running benchmark suites? Imagine a world where there is just one benchmark. Not only that — that benchmark is just a single function! How awesome would that be? I’m here to tell you that that world is here today. Repeat after me: HOLY SHA-2!

Now that I know you are on board, I want to let you in on an amazing deal, JUST FOR YOU. I’m selling NFTs of JPEGs of the first pages of some of my most highly-cited papers. They’re an AMAZING investment opportunity — act now!

About the Author:

Emery “the Memory” Berger.uneth is a Professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. He takes his salary entirely in meme currencies.

originally posted on the SIGARCH blog

That Time I Optimized a Program by 5000x

Emery Berger — Tue, 11 Jan 2022 15:43:18 GMT

TL;DR I used our Scalene profiler and some math to make an example program run 5000x faster.

Scalene: https://github.com/plasma-umass/scalene/, pip install scalene

I am quite interested in Python performance so naturally, I read this article — https://martinheinz.dev/blog/64, whose title is Profiling and Analyzing Performance of Python Programs. It presents an example program (from https://docs.python.org/3/library/decimal.html) and shows how to run it with several time-worn Python profilers. Unfortunately, it doesn’t come away with much actionable information, beyond, more or less, “try PyPy”, which speeds up the code by about 2x. I wondered if I would be able to get more useful information from Scalene, a profiler I co-wrote.

We developed Scalene to be a lot more useful than existing Python profilers: it provides line-level information, splits out Python from native time, profiles memory usage, GPU, and even copying costs, all at a line granularity.

Anyway, here’s the result of running Scalene (just with CPU profiling) on the example code. It really cuts to the chase.

% scalene --cpu-only --cli --reduced-profile test/test-martinheinz.py

You can see that practically all the execution time is spent computing the ratio between num and fact, so really this is the only place to focus any optimization efforts. The fact that there is a lot of time spent running native code means that this line is executing some C library under the covers.

It turns out that it is dividing two Decimals (a.k.a. bignums). The underlying bignum library is written in C code and is pretty fast, but the factorial in particular is getting really large really fast. In one of the example inputs, the final value of factis 11,000 digits long! No surprise: doing math on such huge numbers is expensive. Let’s see what we can do to make those numbers smaller.

I observe that we can compute num / fact not from scratch but incrementally: update a variable on each loop iteration via a computation on drastically smaller numbers. To do this, I add a new variable nf which will always equal the rationum / fact. Then, on each loop iteration, the program updates nf by multiplying it by x / i. You can verify this maintains the invariant nf == num/fact by observing the following (where _new means the computation of the updated variable in each iteration).

nf == num / fact                  # true by induction
nf_new == nf * (x / i)            # we multiply by x/i each time
nf_new == (num / fact) * (x / i)  # definition of nf
nf_new == (num * x) / (fact * i)  # re-arranging
nf_new == num_new / fact_new      # simplifying

Incorporating this into the original program required changing three lines of code, all of which are followed by ###:

def exp_opt(x):
  getcontext().prec += 2
  i, lasts, s, fact, num = 0, 0, 1, 1, 1
  nf = Decimal(1)   ### was: = num / fact
  while s != lasts:
      lasts = s
      i += 1
      fact *= i
      num *= x
      nf *= (x / i) ### update nf to be num / fact
      s += nf       ### was: s += num / fact
  getcontext().prec -= 2
  return +s

The result of this change is, uh, dramatic.

On an Apple Mini M1, original version:

Original:

1.39370958066637969731834193711E+65
5.22146968976414395058876300668E+173
7.64620098905470488931072765993E+1302

Elapsed time, original (s):   33.231053829193115

The optimized version:

Optimized:

1.39370958066637969731834193706E+65
5.22146968976414395058876300659E+173
7.64620098905470488931072766048E+1302

Elapsed time, optimized (s):  0.006501913070678711

More than a 5000X speedup (5096, to be precise).

The moral of the story is that using a more detailed profiler like Scalene can really help optimization efforts by locating inefficiencies in an actionable way.

That Time I Optimized a Program by 5000x was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to Have Real-World Impact: Five Easy Pieces

Emery Berger — Mon, 22 Mar 2021 22:29:16 GMT

How to Have Real-World Impact: Five “Easy” Pieces

Because my research group has had a pretty good record of getting the fruits of our research adopted “in the real world”, I often get asked how to get work adopted. I will be honest: I’m not exactly sure.

My colleagues and I could have just been lucky, but we seem to have been lucky a lot. Maybe we’ve been doing something right. Here are five pieces of actionable advice: (1) scratch an itch, (2) build real systems, (3) embed yourself, (4) give great talks, and (5) go to the mountain.

Scratch An Itch

Widely-adopted systems solve real problems. So, the first key to adoption is to identify such a problem and solve it. One way to identify a real problem is to identify a problem you yourself are facing, and then work on it (“scratch that itch”).

This pattern has worked over and over again in my research group. Here are a few of the problems we encountered along the way and the solutions we generated: “malloc doesn’t scale” (Hoard), “my C program crashes” (DieHard), “I can’t run my C++ code on the web” (Doppio/Browsix), “the profiler isn’t helping me make my program run faster” (Coz), and most recently, “Excel didn’t find my spreadsheet bugs” (ExceLint). (All of these and some recent talks are linked from our lab web page, https://plasma-umass.org.)

Sometimes these problems find you exactly when you are building a system to do something else! For example, the Hoard project happened because I was building a parallelizing compiler for pure LISP. I found that the generated (embarrassingly parallel) programs didn’t scale. The bottleneck turned out to be the memory allocator, which used a single big lock. I went looking for an off-the-shelf solution, or one described by previous papers, and found that they all fell short. The result — after a lot of work — was Hoard. I never did finish that parallelizing compiler :).

Even if solving a problem you’ve experienced doesn’t result in wide adoption, it’s a good thing to do anyway. Applied research should solve clearly articulated problems for which the state of the art falls short. A system that solves such a problem is clearly going to be an easier “sell.”

Build Real Systems

For impact, identifying an important problem is step one, and building a real system that convincingly solves that problem is step two. Aim to produce systems that do more than just barely manage to cross the finish line (read: the paper deadline) and then collapse. Instead, build real systems that people can actually use! Make sure, for example, that your system works not only on every benchmark application from a suite, but also with real applications like browsers or servers. If it doesn’t work with one, or its performance is terrible, fix it! If you have found a problem, someone else will, too. Yes, it takes more time, but it’s worth it. Here are some procedural steps you can take to help make sure this happens.

Put all of your projects on GitHub or some other public repository, and make everything public as soon as possible. People need to be able to download your systems so they can use them. I personally prefer having the whole process take place in a public repo, but you can also use a policy of “make public on submit”: when you submit a paper, flip the permissions on the repo from private to public. Developing in public (or code that will definitely be made public) encourages everyone to do a better job of building their code, documenting it, and avoid cutting corners.
Make it clear (to students and collaborators) that the paper doesn’t get submitted until you are satisfied that the system described actually works. Test it out yourself. Kick the wheels. Make sure it works as advertised.

This is a virtuous cycle: once you establish a culture that favors not just doing great research but also building impactful systems, this will attract students who have the same goals, and that means they will build more impactful systems, and so on.

Do it anyway: Building real systems is the ultimate reality check. It answers the question: does your approach actually work? What at first glance appear to be corner cases or implementation issues often uncover deep conceptual problems. It’s hard to think of a single project from our group that did not change significantly in response to issues that arose during the system-building process. In short, our original approaches ran smack into reality and needed rethinking.

Building real systems will also help get your papers accepted, because you can run your stuff with real systems (e.g., running a browser on top of it), and build future research artifacts on top of it, knowing that you have a solid foundation. Incidentally, this will also help your system pass the Artifact Evaluation process; those badges on your paper greatly increase readers’ confidence that your results are real.

In short, building real systems is good for science. It lets researchers reproduce your results, compare their work to yours, or build on it (with the side effect of getting your papers cited). Plus, it can be a huge advantage for students to graduate having built highly visible systems.

Embed Yourself

Embedding yourself in industry is a great way to have impact. Not only does it expose you to more real-world problems, but it offers an opportunity for adoption within that company. I have spent years (literally) as a visiting researcher at Microsoft Research (including my most recent sabbatical). MSR is hardly the only option, though it’s a great one! There is now a huge range of excellent choices of companies you could work with. If you are a beginning faculty member, consider spending a “pre-battical” in industry.

Working at a company is a unique opportunity to have direct impact by influencing (or even developing) that company’s systems. However, this can take time. You probably shouldn’t expect to airdrop in for a month and see magic happen. You need to establish a regular relationship (perhaps including consulting during the year), and ideally, develop a strong working relationship with someone inside that company. For me, that person is my great friend Ben Zorn, whom I have had the privilege of working with pretty much since I was his intern almost twenty years ago (!). Ben has not only been a joy to work with but also insanely effective in getting Microsoft to adopt the fruits of our joint work, directly influencing Microsoft products (first Windows, and now Excel).

Do it anyway: If you are an academic and live primarily off NSF money, you can only charge two months of your summer salary to grants. You can fill in that other month by working in industry.

Collaborating with people is productive and fun! You probably know someone in industry already who you are friends with (or, well, you soon will). The feedback (or even pushback) is super helpful to honing your research and generating new ideas. These relationships also help your students get internships, which is good for everyone.

Visiting a company can also be a great change of pace (and if appropriate, location: summer in Seattle ). In particular, while my kids were smaller, some aspects of it were like a family vacation (and Microsoft was generous enough to subsidize housing for my entire family; if this applies to you, ask for that).

Give Great Talks

Talks can be more important than papers. Done right, they can quickly convey why people might want to adopt your research. A great talk will get vastly more views (e.g., on YouTube) than a great paper will get reads. Put in the time to hone your students’ talks (which will eventually become your talks). This is time well spent. Focus on high-level ideas, make heavy use of pictures and animation, and tell a story. Keep formalism and text to a minimum: that’s for your paper, not your talk.

Do it anyway: A great talk will almost certainly inspire people to read your papers. A bad talk, not so much. Training your students to give great talks will also help them get jobs.

Go to the Mountain

The mountain — your potential users — won’t necessarily come to you, so you need to go to the mountain. Academic conferences are mostly filled with other academic researchers: not the best target audience for adoption. Yes, some industry people attend our conferences, but at industrial conferences, they are nearly 100% of the attendees. Give one of those great talks at these conferences. Or give one at different companies. Academics tend to have vastly more experience public speaking than most speakers — we do it at least twice a week — so your talk is likely to be extremely well received.

Folks in industry are looking for solutions to their problems, and by far the best way to let them know about your work is to tell them about it directly. If you give a great talk, it will get lots of views, and you will get invited to give more talks.

In sum: scratch an itch, build real systems, embed yourself, give great talks, and go to the mountain. This advice isn’t guaranteed to get your work adopted, but maybe it will increase the odds of luck being on your side.

Bio: Emery D. Berger is a professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst, where he co-leads the PLASMA research group (https://plasma-umass.org). His research interests span programming languages, systems, security, and human-computer interaction. His group is well known for its influential research and software systems, many of which have enjoyed broad adoption: among others, their work has influenced the development of the Rust and Swift programming languages, and memory managers deployed in both Mac OS X and Microsoft Windows.He also developed and maintains the CSrankings.org website. He was named an ACM Fellow in 2019.

This article was originally posted on the SIGPLAN and SIGARCH blogs.

A Guide to Selecting Distinguished Paper Awards

Emery Berger — Tue, 12 Dec 2017 22:11:05 GMT

Emery Berger
PC chair for PLDI ’16 and SIGPLAN EC Chair for SIGPLAN Research Highlights Awards
December 2017

We recommend that all SIGPLAN conferences adopt a formal procedure for selecting ACM SIGPLAN Distinguished Paper Awards (up to 10% of papers at a conference can be so designated) and recommend doing so via the process of establishing a Distinguished Paper Committee to make these decisions.

Here is how this process works at PLDI, starting with PLDI 2016. This is by no means the only way of setting up such a process, but we found it to be effective.

Stand up a Distinguished Paper Committee from the Program Committee, or, if possible, from the External Program Committee (see below regarding EPCs, which have been adopted by PLDI and OOPSLA). The Distinguished Paper Committee should comprise senior, established members of the community. (See https://pldi16.sigplan.org/committee/pldi-2016-distinguished-paper-committee for an example.)
Request nominations from the PC (and, if applicable, the EPC) from accepted papers with at least one A (generally all of them) and no D scores (since it’s a bit off to have a distinguished paper with a strong detractor). For PLDI 2016, we had 9 nominations and could select up to 4 (10%). We also recommend that if a paper was deemed to require an artifact, the fact that this artifact was or was not submitted to (and whether or not it passed) the Artifact Evaluation process should be considered as part of the decision-making process.
Set up a HotCRP instance with the nominated papers, and run like a mini-conference; meet via conference call. The PC chair should upload only the camera-ready papers to this site, without copying over reviews. Members should not be assigned to review papers they may have reviewed for the conference, since this is an entirely different set of criteria. Reviews should be much shorter (e.g., two paragraphs) and focused on whether they are deserving of distinction. During this period, reviews should be made unavailable on the main conference site; these can be opened up after distinguished paper reviews have been submitted. The timescale can be quite abbreviated: for example, for PLDI 2016, bidding was opened up on April 23, papers assigned on April 26, and May 11th was the deadline for reviews (the conference call was May 13).
Inform awardees to ensure that they are in attendance when the awards are presented, typically at the beginning of the conference. In addition, it is possible to have a specific Distinguished Papers session; for double-track conferences, this should be made a single-track so everyone can attend (connecting this session with a keynote makes this logistically straightforward).
Forward a suitable subset of Distinguished Papers as nominees for SIGPLAN Research Highlights (recipients of which are forwarded for consideration as CACM Research Highlights). Note that the criteria for CACM Research Highlights are special. Nominated papers should be of broad interest beyond the PL community. They must create excitement. Papers with surprising results that stir stimulating debate are particularly encouraged. The results must be recent, significant, and exciting, and of general interest (and accessible) to the CS research community.

For those unfamiliar with the notion of EPCs, this is from the Chair notes for PLDI 2016.

In addition to the Program Committee, we added an External Program Committee (EPC) consisting of senior members of the PL community who acted as a “light” program committee; their charge was to review roughly ten papers each, and in particular, to review all PC-authored papers. The EPC met via a day-long teleconference prior to the physical PC meeting. A subset of the EPC also formed the Distinguished Papers Committee, whose job it was to select the distinguished papers via a conference-like process.

The EPC is intended to be composed primarily of senior, established researchers (and is complementary to the ERC).

A Guide for Session Chairs

Emery Berger — Wed, 08 Jun 2016 16:08:14 GMT

I just sent this message as a guide to the program committee members who will be chairing sessions for PLDI 2016 (I figure it’s the first time for some of them). A few people suggested I post it, so here it is (lightly edited, cross-posted from my blog). Additions or other suggestions welcome.

Find your speakers before the session begins. You will have to talk to them about some stuff — see below.
Find out how to pronounce their names properly.
Find out if they are on the market next year — sometimes people like the advertisement that they will be graduating soon.
Have them check their equipment (particularly if they are using Linux…).
Before each session, introduce the entire session (as in, “I am So-and-So, from Wherever University; welcome to the session on drone-based programming languages.”
Before each talk, introduce each speaker. I personally recommend not reading their title, since lots of speakers are on autopilot and will just repeat everything you said. You can instead say something like “This is Foo Bar, who will be talking about verifying clown car drivers.” In fact, come to think of it, you could just say that for every talk.
Keep track of time. For PLDI this year, speakers get 25 minutes, and then there are 5 minutes for questions. If you have an iPad, there’s an app I have used to display time to speakers (big giant numbers, you can set it to change colors when you hit 5 min or 1 min till the end). You can of course always go old school and hold up a sheet of paper indicating when time is drawing near. I recommend doing this when there are 5 minutes left and 1 minute left. Let the speakers know you will be doing this.
When the speaker is done, if it hasn’t happened already, make sure everyone applauds by saying “Let’s thank our speaker” and start applauding. Then open the floor to questions.
COME PREPARED WITH A QUESTION. The worst thing ever is when the talk is a disaster does not go well and no one has any questions for the speaker, and then: . Read over each paper so you have at least a couple of questions planned. Hopefully it won’t come to this, but it happens sometimes, and it’s great if you can save the day.
Make sure people who ask questions use the mic and state their name and affiliation.
You may also have to clarify the question for the speaker, repeat the question, etc. Understanding questioners can occasionally be a challenge for non-native English speakers: it’s a stressful time, and the questioners may have unfamiliar accents, etc. Be prepared to give the speaker a helping hand.
Be prepared to cut off a questioner. YOU ARE IN CHARGE OF THE SESSION. If a questioner won’t give up the mic and keeps asking questions and is burning time, rambling, etc., you are empowered to move on to the next questioner (e.g., by suggesting “how about we take this off-line”).
Hopefully this won’t be an issue you will have to deal with, but questioners who are belligerent or insulting must not be tolerated. Cut them off and report them to the program chair (me) or the general chair. I sincerely hope and expect that this will not happen, but I want you to realize you are empowered to take action immediately. You can read over SIGPLAN’s non-harassment policy here, which is based on ACM’s: http://www.sigplan.org/Resources/Policies/CodeOfConduct/.
To make sure things run smoothly, have the next speaker on deck with their laptop a minute or so before question times end. Ideally, they will be setting up while the current speaker is wrapping up questions.
Finally, when questions are over, say “Let’s thank our speaker again” and applaud.

And thanks again to all the session chairs for volunteering!

The Star Trek Case for Double-Blind Reviewing

Emery Berger — Wed, 02 Dec 2015 14:35:12 GMT

The planet Vulcan is simultaneously the source of the universe’s worst neck massagers and its best program committee members.

Unlike humans, Vulcans are ruthlessly rational and unerringly logical.

A Vulcan reviewer is unaffected by how often they have mind-melded with the authors of a paper or whether they know them at all, whether the authors have pointed ears or not, or whether (once every seven years), they might be judged to be suitable mates.

Vulcans are never influenced by the origin and ethnic groups of authors, whether they be Romulan, human, or even Klingon.

In fact, they have no choice: Vulcans are emotionless and inevitably objective. They are in fact only capable of dispassionately reviewing papers purely on their merits.

Humans are not Vulcans. But double-blind reviewing lets us act like them — at least when we are reviewing papers.

Yes, JavaScript is Assembly for the Web. Here’s its OS.

Emery Berger — Fri, 28 Aug 2015 06:13:56 GMT

The browser is a lousy excuse for an operating system. So we fixed it.

by John Vilk and Emery Berger

Doppio is an “operating system” for the browser that lets it run applications written in general-purpose languages

Web browsers have taken over. They make it comparatively easy to deliver cross-platform applications, because browsers are essentially everywhere. Practically all computing platforms — from desktops and tablets to mobile phones — ship with web browsers. Browsers are also getting faster all the time. They have just-in-time compilers for JavaScript that produce highly optimized code, and they expose features like access to the GPU through WebGL and high-speed video chat via WebRTC.

This combination of features makes it possible for browsers to host the kind of richly interactive applications that used to be restricted to native environments. In effect, web browsers have become a de facto universal computing platform: its operating system is the browser environment, and its sole “instruction set” is JavaScript.

JavaScript is Your Frenemy.

In short, if you want to program for the web, you need JavaScript. However, JavaScript is not everyone’s favorite language. There are many reasons why browser support for programming languages other than JavaScript would be desirable.

JavaScript is a language that has some nice features but whose design contains numerous pitfalls for programmers. Gary Bernhardt’s famous “WAT” talk below hilariously illustrates just a few.

Wat

For example, here’s what happens when you have to create a new language in just 10 days.

Problems with JavaScript have led language implementors to design new languages for the browser that overcome JavaScript’s shortcomings — Dart, TypeScript, and many others — but these solutions all require that programmers learn a new language. Programmers who prefer to program in other paradigms (e.g., functional, object-oriented) currently must abandon these or build hacks onto JavaScript to accommodate their needs.

So. Much. Code.

There’s also a lot of well-debugged, existing code already written in general-purpose programming languages (i.e., not JavaScript). Making it possible to reuse this code would speed application development and reduce the risk of introducing new bugs.

It is already possible to reuse existing C or C++ code via the Emscripten project, which is a backend for LLVM that generates snippets of JavaScript instead of, say, x86 code. An effort is underway to provide really efficient support for JavaScript generated this way (initially ASM.js, now WebAssembly).

Unfortunately, translating, interpreting, or compiling code written in conventional languages to JavaScript is not enough. Browsers lack abstractions that existing programming languages expect, and impose significant limitations.

One At A Time.

JavaScript is a single-threaded event-driven programming language with no support for interrupts. Events have to execute to completion. If your code happens to have a function that takes too long (for some definition of too long), your script will get killed by the browser’s watchdog thread, triggering the dreaded “page(s) unresponsive message”. This makes programming tricky: since it’s difficult to predict how long your code will take, JavaScript programmers routinely have to break up their code into tiny pieces that yield control back to the main JavaScript event loop.

Call Me, Maybe
(Or Not).

Browsers provide web applications with a rich set of functionality, but these APIs are asynchronous — that is, you invoke them with a callback that runs when the function completes. Unfortunately, conventional languages rely on synchronous (blocking) APIs, which don’t need or provide callbacks. Unfortunately, due to the limitations of JavaScript, you can’t directly create synchronous APIs from asynchronous APIs (e.g., by busy-waiting — the browser will just kill your script).

Dude, Where’s My OS?

Browsers lack most of the OS services that standard programming languages take for granted. For example, there’s no file system abstraction. Instead, there’s a panoply of limited storage mechanisms, making it tricky to manage large amounts of persistent data. Also missing: sockets.

So Many Browsers, So Little Time

Another big challenge is the diversity of browsers. Users access the web from a wide range of browser platforms, operating systems, and devices. They also vary widely in their support for and compliance with standards. Each combination may have unique performance characteristics, differing support for JavaScript and Document Object Model (DOM) features, and outright bugs. This diversity makes it hard to address any of the issues above without excluding a large portion of the web audience.

Close But No Cigar.

Although previous work aims at supporting other languages than JavaScript in the browser, these all fall short. Conventional programming languages and their standard libraries expect the relatively rich execution environment that modern operating systems provide. The fact that browsers lack standard operating systems features like threads, file systems, and blocking I/O means that these projects cannot run existing programs without substantial modifications.

Doppio to the Rescue

We have built a runtime system called Doppio that makes it possible to run unmodified applications written in conventional programming languages inside the browser. In short, it’s like a POSIX layer for your browser.

Doppio’s execution environment overcomes the limitations of the JavaScript single-threaded event-driven runtime model. It provides language implementations with emulated threads that support suspending and resuming execution to enable blocking I/O and multithreading in the source language.

Doppio supplies a wide range of common operating system abstractions that support standard library and language features. These include a Unix-based file system abstraction (providing local and Dropbox-based cloud-based storage), network sockets, and an unmanaged heap for dynamic memory allocation. All of this support acts as an abstraction layer over the many differences between browsers, letting code run unmodified across Google Chrome, Firefox, Safari, Opera, and Internet Explorer. Components of Doppio are already being used by the MS-DOS Collection at the Internet Archive.

I Heard You Like Java In Your JavaScript

To further demonstrate Doppio’s power, we built DoppioJVM, a prototype yet robust implementation of a Java Virtual Machine interpreter on top of Doppio. It can run complex unmodified JVM programs in the browser — without that nasty vector for attack, the Java plugin.

The DoppioJVM is not yet exactly fast — it runs between 24× and 42× slower on CPU-intensive benchmarks in Google Chrome — but it’s already fast enough to be useful. It has already been integrated into an educational website by the University of Illinois (codemoo.com) that interactively teaches students how to program in Java. We have also enhanced Emscripten with Doppio and found that it made porting a C++ game to the browser far simpler.

We are currently expanding on this work on two fronts. The first is an implementation of Python on top of Doppio that we call Ninia — also known as the coffee snake. Ninia is not yet complete but it is already able to run some basic Python programs in the browser. We are also developing techniques that promise to dramatically speed up both DoppioJVM and Ninia, expanding their usefulness. We look forward to the day when all programmers can run code in the language of their choice directly inside the browser.

John Vilk (PhD student) and Emery Berger (Professor) are in the PLASMA Lab in the College of Information and Computer Sciences at the University of Massachusetts Amherst.

A technical paper describing Doppio and the DoppioJVM in detail appeared at PLDI 2014, where it won the Distinguished Artifact Award. A video presentation is here. The Doppio paper was also selected as a SIGPLAN Research Highlight. Doppio and the DoppioJVM can be downloaded at https://github.com/plasma-umass/doppio; an on-line demo is at doppiojvm.org.

Software Needs Seatbelts and Airbags

Emery Berger — Wed, 19 Aug 2015 04:10:37 GMT

SOFTWARE NEEDS SEATBELTS AND AIRBAGS

Finding and fixing bugs is difficult and time-consuming: there is a better way.

Death, Taxes, and Bugs.

Like death and taxes, buggy code is an unfortunate fact of life. Nearly every program ships with known bugs, and probably all of them end up with bugs that are only discovered post-deployment. Of course, programmers aren’t perfect, but there are many other reasons for this sad state of affairs.

Unsafe Languages.

One problem is that many applications are written in memory-unsafe languages. Variants of C, including C++ and Objective-C, are especially vulnerable to memory errors like buffer overflows and dangling pointers (use-after-free bugs). These bugs, which can lead to crashes, erroneous execution, and security vulnerabilities, are notoriously challenging to repair.

Safe Languages Are No Panacea.

Writing new applications in memory-safe languages like Java instead of C/C++ would go some way towards mitigating these problems. For example, because Java uses garbage collection, Java programs are not susceptible to use-after-free bugs; similarly, because Java always performs bounds-checking, Java applications cannot suffer memory corruption due to buffer overflows.

That said, safe languages are no cure-all. Java programs still suffer from buffer overflows and null pointer dereferences, though they throw an exception as soon as they happen, unlike their C-based counterparts. The common recourse to these exceptions is to abort execution and print a stack trace (even to a web page!). Java is also just as vulnerable as any other language to concurrency errors like race conditions and deadlocks.

There are both practical and technical reasons not to use safe languages. First, it is generally not feasible to rewrite existing code because of the cost and time involved, not to mention the risk of introducing new bugs. Second, languages that rely on garbage collection are not a good fit for programs that need high performance or which make extensive use of available physical memory, since garbage collection always requires some extra memory. These include OS-level services, database managers, search engines, and physics-based games.

Are Tools the Answer?

While tools can help, they too cannot catch all bugs. Static analyzers have made enormous strides in recent years, but many bugs remain out of reach. Rather than swamp developers with false positive reports, most modern static analyzers report far fewer bugs than they could. In other words, they trade false negatives (failing to report real bugs) for lower false positive rates. That makes these tools more usable, but also means that they will fail to report real bugs. Dawson Engler and his colleagues made exactly this choice for Coverity’s “unsound” static analyzer (see the Communications of the ACM article on their experiences: A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World.)

Testing is Good but Not Enough.

The state of the art in testing tools has also advanced dramatically in the last decade. Randomized fuzz testing can be combined with static analysis to drive tests to explore paths that lead to failure. These tools are now in the mainstream: for example, Microsoft’s Driver Verifier can test device driver code for a wide variety of problems, and now includes randomized concurrency stress testing.

But as Dijkstra famously remarked, “Program testing can be used to show the presence of bugs, but never to show their absence!” At some point, testing will fail to turn up new bugs, which will unfortunately be discovered only once the software has shipped.

Fixing Bugs: Risky (and Slow) Business.

Finding the bugs is only the first step. Once a bug is found, whether by inspection, testing, or analysis, fixing it remains a challenge. Any bug fix must be undertaken with extreme care, since any new code runs the risk of introducing yet more bugs. Developers must construct and carefully test a patch to ensure that it fixes the bug without introducing any new ones. This process can be costly and time-consuming. For example, according to Symantec, the average time between the discovery of a remotely exploitable memory error and the release of a patch for enterprise applications is 28 days.

Cut Bait and Ship.

At some point, it simply stops making economic sense to fix certain bugs. Tracking their source is often difficult and time consuming, even when the full memory state and all inputs to the program are available. Obviously, show-stopper bugs must be fixed. For other bugs, the benefits of fixing them may be outweighed by the risks of creating new bugs and the costs in programmer time and in delayed deployment.

Debugging at a Distance.

Once that faulty software has been deployed, the problem of chasing down and repairing bugs becomes exponentially more difficult. Users rarely provide detailed bug reports that allow developers to reproduce the problem.

For deployed software on desktops or mobile devices, getting enough information to find a bug can be difficult. Sending an entire core file is generally impractical, especially over a mobile connection. Typically, the best one can hope for is some logging messages and a minidump consisting of a stack trace and information about thread contexts.

Even this limited information can provide valuable clues. If a particular function appears on many stack traces observed during crashes, then that function is a likely culprit. Microsoft Windows includes an application debugger (formerly “Watson”, now “Windows Error Reporting”) that is used to perform this sort of triage not only for Microsoft but also for third-party applications via Microsoft’s Winqual program. Google also has made available a cross-platform tool called Breakpad that can be used to provide similar services for any application.

However, for many bugs, the kind of information that these tools provide is of limited value. For example, memory corruption errors often trigger failures millions of instructions past the point of the actual error, making stack traces useless. The same is generally true for null dereference exceptions, where the error often happens long after the null pointer was stored.

Captain’s Log: Not Enough Information.

On servers, the situation is somewhat improved. Server applications typically generate log messages, which may contain clues to why a program failed. Unfortunately, log files can be unmanageably large. Poring over logs and trying to correlate them to the source code can be extremely time-consuming. Even worse, that work may yield no useful results because the logs are incomplete; that is, the logs simply may not provide enough information to narrow down the source of a particular error because there were not enough or the right kind of log messages. Recent work at Illinois and UC San Diego may lead to tools that address some of these problems; SherLog automates the process of tracing back bugs from log messages to buggy source code paths, and LogEnhancer automatically extends log messages with information to help post-crash debugging. More information on logging appears in a recent Queue article, Advances and Challenges in Log Analysis.

God Doesn’t Play Dice, but Your Computer Does.

Despite these advances, finding bugs has actually become harder than ever. Back when many programs were sequential it was already challenging to find bugs, but now the situation has become far worse. Multithreaded programs, asynchrony, and multiple cores are now a fact of life. Every execution of these non-deterministic programs is completely different from the last because of different timing of events and thread interleavings. This situation makes reproducing bugs impossible even with a complete log of all input events — something that would be too expensive to record in practice anyway.

Bumpers, Seatbelts and Airbags.

Let’s shift gears for a moment to talk about cars (we’ll get back to talking about software in a minute). As an analogy for the situation we find ourselves in, consider when cars first entered onto the scene. For years, safety was an after-thought at best. When designing new cars, the primary considerations were aesthetics and high performance (think tailfins and V-8 engines).

Eventually, traffic fatalities led legislators and car manufacturers to take safety into account. Seatbelts became required standard equipment in US cars in the late 1960s, bumpers in the 1970s, and airbags in the 1980s. Modern cars incorporate a wide range of safety features, including laminated windshields, crumple zones, and anti-lock braking systems. It is now practically unthinkable that anyone would ship a car without these essential safety features.

However, we routinely ship software with no safety measures of any kind. We are in a position similar to that of the automobile industry of the 1950s, delivering software with lots of horsepower and tailfins, complete with decorative spikes on the steering column to make sure that the user will suffer if their application crashes.

Drunk-Driving Through a Minefield.

The potent cocktail of manual memory management mixed with unchecked memory accesses makes C and C++ applications susceptible to a wide range of memory errors. These errors can cause programs to crash or produce incorrect results. Attackers are also frequently able to exploit these memory errors to gain unauthorized access to systems. Since the vast majority of objects accessed by applications are on the heap, heap-related errors are especially serious.

Numerous memory errors happen when programs incorrectly free objects. Dangling pointers arise when a heap object is freed while it is still live, leading to use-after-free bugs. Invalid frees happen when a program deallocates an object that was never returned by the allocator by inadvertently freeing a stack object or an address in the middle of a heap object. Double frees are when a heap object is deallocated multiple times without an intervening allocation. This error may at first glance seem innocuous but, in many cases, leads to heap corruption or program termination.

Other memory errors have to do with the use of allocated objects. When an object is allocated with a size that is not large enough, an out-of-bound error can occur when the memory address to be read or written lies outside the object. Out-of-bound writes are also known as buffer overflows. Uninitialized reads happen when a program reads memory that has never been initialized; in many cases, uninitialized memory contains data from previously-allocated objects.

Airbags for Your Applications.

Given that we know we will be shipping software with bugs and that the terrain is dangerous, it might make sense to equip it with seat belts and airbags. What we’d like is to have both resilience and prompt corrective action for any problem that surfaces in our deployed applications.

Let’s focus on C/C++/Objective-C applications — the lion’s share of applications running on servers, desktops, and mobile platforms — and memory errors, the number one headache for applications written in these languages. Safety-equipped memory allocators can play a crucial role in helping to protect your software against crashes.

The Garbage Collection Safety Net.

Dealing with the first class of errors — those that happen because of the misuse of free or delete — can be remedied directly by using garbage collection. Garbage collection works by only reclaiming objects that it allocated, eliminating invalid frees. It only reclaims objects once there is no way to reach those objects anymore by traversing pointers from the “roots”: the globals and the stack. That eliminates dangling pointer errors, since by definition there can’t be any pointers around to reclaimed objects. Since it naturally only reclaims these objects once, a garbage collector also eliminates double frees.

While C and C++ were not designed with garbage collection in mind, it is possible to plug in a “conservative” garbage collector and entirely prevent free-related errors. The word “conservative” here means that because the garbage collector doesn’t necessarily know what values are pointers (since we are in C-land), it conservatively assumes that if a value looks like a pointer (it is in the right range and properly aligned), and it acts like a pointer (it only points to valid objects), then it may be a pointer.

The Boehm-Demers-Weiser conservative garbage collector is an excellent choice for this purpose: it is reasonably fast and space-efficient, and can be used to directly replace memory allocators by configuring it to treat calls to free as NOPs.

Slipping Through the Net.

While garbage collectors eliminate free-related errors, they cannot help prevent the second class of memory errors: those that have to do with the misuse of allocated objects such as buffer overflows.

Runtime systems that can find buffer overflows often impose staggeringly high overheads, making them not particularly suitable for deployed code. Tools like Valgrind’s MemCheck are incredibly comprehensive and useful, but are heavyweight by design and slow execution by orders of magnitude.

Compiler-based approaches can reduce overhead substantially by avoiding unnecessary checks, though they entail recompiling all of an application’s code, including libraries. Google has recently made available AddressSanitizer, a combination of compiler and runtime technology that can find a number of bugs, including overflows and use-after-free bugs. While it is much faster than Valgrind, its overhead remains relatively high (around 75%), making it primarily useful for testing.

Your Program Has Encountered An Error. Goodbye, Cruel World.

All of these approaches are based on the idea that the best thing to do upon encountering an error is to abort immediately. This fail-stop behavior is certainly desirable in testing. However, it is not usually what your users want. Most application programs are not safety-critical systems, and aborting them in midstream can be an unpleasant experience for users.

Suppose you have been working on a Microsoft Word document for hours (and for some mysterious reason, auto-save has not been turned on). If Microsoft Word suddenly discovers that some error has occurred, what should it do? It could just pop up the window indicating that something terrible has happened and would you like it to send a note home to Redmond. That might be the best thing to do from a debugging perspective, but most people would prefer that Word do its damndest to save the current document rather than fall on its sword if it discovers a possible error. In short, users generally would prefer that their applications be fault tolerant whenever possible.

Bohr versus Heisenberg.

In fact, the exact behavior users do not want is for an error to happen consistently and repeatably. In his classic 1985 article Why do computers stop and what can be done about it, Jim Gray drew a distinction between two kinds of bugs. The first kind are bugs that behave predictably and repeatably — that is, ones that occur every time that the program encounters the same inputs and goes through the same sequence of steps. These are Bohr bugs, by analogy with the classical atomic model where electrons circle around the nucleus in planetary-like orbits. Bohr bugs are great when debugging a program, since it makes it easier to reproduce the bug and find its root cause.

The second kind of bugs are Heisenbugs, meant to connote the inherit uncertainty in quantum mechanics, which are unpredictable and cannot be reliably reproduced. The most common Heisenbugs these days are concurrency errors, a.k.a. race conditions, which depend on the order and timing of scheduling events to appear. Heisenbugs are also often sensitive to the observer effect; attempts to find the bug by inserting debugging code or running in a debugger often disrupt the sequence of events that led to the bug, making it go away.

Jim Gray made the point that while Bohr bugs are great for debugging, what users want are Heisenbugs. Why? Because a Bohr bug is a showstopper for the user: every time the user does the same thing, they will encounter the same bug. But with Heisenbugs, the bugs often go away when you run the program again. If a program crashes, and the problem is a Heisenbug, then running the program again is likely to work. This is a perfect match for the way users already behave on the Web. If they go to a web page and it fails to respond, they just click “refresh” and that usually solves the problem.

So one way we can make life better for users is to convert Bohr bugs into Heisenbugs, if we can figure out how to do that.

Defensive Driving with DieHard.

My graduate students at the University of Massachusetts Amherst and I, in collaboration with my colleague Ben Zorn at Microsoft Research, have been working for the past few years on ways to protect programs from bugs. The first fruit of that research is a system called DieHard that makes memory errors less likely to impact users. DieHard eliminates some errors entirely and converts the others into (rare) Heisenbugs.

To explain how DieHard works, let’s go back to the car analogy. One way to make it less likely for cars to crash into each other is for them to be spaced further apart, providing adequate braking distance in case something goes wrong. DieHard provides this “defensive driving” by taking over all memory management operations and allocating objects in a space larger than required.

This de facto padding increases the odds that a small overflow will end up in un-allocated space where it can do no harm. However, DieHard doesn’t just add a fixed amount of padding between objects. That would provide great protection against overflows that are small enough, and zero protection against the others. In other words, those overflows would still be Bohr bugs.

Instead, DieHard provides probabilistic memory safety by randomly allocating objects on the heap. DieHard adaptively sizes its heap be a bit larger than the maximum needed by the application; the default is 1/3. DieHard allocates memory from increasingly large chunks that we call miniheaps.

By randomly allocating objects across all the miniheaps (see this diagram for a detailed view), DieHard makes many memory overflows benign, with a probability that naturally declines as the overflow increases in size and as the heap becomes full. The effect is that, in most cases when running with DieHard, a small overflow is likely to have no effect.

DieHard’s random allocation approach also reduces the likelihood of the free-related errors that garbage collection addresses. DieHard uses bitmaps, stored outside the heap, to track allocated memory. A bit set to ’1’ indicates that a given block is in use, and ’0’ that it is available.

This use of bitmaps to manage memory eliminates the risk of double frees, since resetting a bit to zero twice is the same as resetting in once. Keeping the heap metadata separate from the data in the heap makes it impossible to inadvertently corrupt the heap itself.

Most importantly, DieHard drastically reduces the risk of dangling pointer errors, which effectively go away. If the heap has one million freed objects, the chances that you will immediately reuse one that was just freed is literally one in a million. Contrast this with most allocators, which immediately reclaim freed objects. With DieHard, even after 10,000 reallocations, there is still a 99% chance that the dangled object will not be reused.

Because it performs its allocation in (amortized) constant time, DieHard can provide added safety with very little additional cost in performance. For example, using it in a browser results in no perceivable performance impact.

Tolerating Faults FTW with FTH.

At Microsoft Research, tests with a variant of DieHard resolved about 30% of all bugs in the Microsoft Office database, while having no perceivable impact on performance. Beginning with Windows 7, Microsoft Windows now ships with a Fault-Tolerant Heap (FTH) that was directly inspired by DieHard. 8 Normally, applications use the default heap, but after a program crashes more than a certain number of times, the Fault-Tolerant Heap takes over. Like DieHard, the Fault-Tolerant Heap manages heap metadata separately from the heap. It also adds padding and delays allocations, though it does not provide DieHard’s probabilistic fault tolerance because it does not randomize allocations or deallocations. The Fault-Tolerant Heap approach is especially attractive because it acts like an airbag: effectively invisible and cost-free when everything is fine, but providing protection when they need it.

Exterminating the Bugs.

Tolerating bugs is one way to improve the effective quality of deployed software. It would be even better if somehow the software could not only tolerate faults but also correct them. A follow-on to DieHard, called Exterminator, does exactly that. Exterminator uses a version of DieHard extended to detect errors, and uses statistical inference to compute what kind of error happened and where the error occurred. Exterminator not only can send this information back to programmers for them to repair the software, but it also automatically corrects the errors via runtime patches. For example, if it detects that a certain object was responsible for a buffer overflow of 8 bytes, it will always allocate such objects (distinguished by their call site and size) with an 8-byte pad. Exterminator can learn from the results of multiple runs or multiple users, so it could be used to proactively push out patches to prevent other users from experiencing errors it has already detected elsewhere.

The Future: Safer, Self-Repairing Software.

My group and others (notably Martin Rinard at MIT, Vikram Adve at Illinois, Yuanyuan Zhou at UC-San Diego, Shan Lu at Wisconsin, and Luis Ceze and Dan Grossman at Washington) have made great strides in building safety systems for other classes of errors. We have recently published work on systems that prevent concurrency errors, some of which we can eliminate automatically. Grace is a runtime system that eliminates concurrency errors for concurrent programs that use “fork-join” parallelism. It hijacks the threads library, converting threads to processes “under the hood”, and uses virtual memory mapping and protection to enforce behavior that gives the illusion of a sequential execution, even on a multicore processor. Dthreads (“Deterministic Threads”) is a full replacement for the POSIX threads library that enforces deterministic execution for multithreaded code. In other words, a multithreaded program running with Dthreads never has races; every execution with the same inputs generates the same outputs.

We look forward to a day in the not too distant future when such safer runtime systems are the norm. Just as we can now barely imagine cars without their myriad of safety features, we are finally adopting a similar philosophy for software. Bugs are inevitable. Let’s deploy safety systems and reduce their impact on users.