Dave's Data

Posts

Showing posts with the label research

Stealing Google's Coding Practices for Academia

April 27, 2016

I'm spending the year in Google's Visiting Faculty program. I had a few goals for my experience here: From xkcd 378 Learn learn learn ! I hoped to get a different perspective from the inside of the largest collection of computing & distributed systems that the world has ever seen, and to learn enough about machine learning to think better about providing systems support for it. I haven't been disappointed. Do some real engineering . I spend most of my time as a faculty member teaching & mentoring my Ph.D. students in research. I love this - it's terribly fun and working with fantastic students is an incredibly rewarding experience. But I also get a lot of creative satisfaction from coding, and I can only carve out a bit of my faculty time to dedicate to it. I haven't written large amounts of production code since I was 21 - and the world has changed a lot since then. Contribute something useful to Google while I was here....

Experience with ePaxos: Systems Research using Go

October 24, 2013

Writing our to-appear SOSP'13 paper on Egalitarian Paxos ("There is More Consensus in Egalitarian Parliaments") was a journey made more interesting because of our choice to use Go as the implementation language. It rocked, and it let us do some things in the evaluation that we likely wouldn't have in C++; It had a few drawbacks that we had to deal with, mostly with performance variation and optimization; Our community wasn't used to it and we got yelled at once by a reviewer (!). [Note: While I (Dave) am writing this post, please realize that the standard professorial disclaimer applies here: When I say "we", I really mean, "the student who did all the work", who in this case is Iulian Moraru , a CS Ph.D. student at Carnegie Mellon. If you think "woah, that's cool work", he's the one who should be credited. But if you want to yell at someone for the strong opinions expressed here or the way they're expressed,...

Rank & Select for Systems Folks + our minor contribution to the area

April 02, 2013

Last Spring, I had the opportunity to teach a "special topics" course. I use that as shorthand for "Dave wants to learn about X, so I'll inflict my learning process upon some willing victims and collaborators called students." I roped in Michael Kaminsky to co-teach a course on "memory and resource efficient big data computing," which we pretty much defined as we went along. In fact, one of the first assignments for the students was to help come up with topics to discuss. CMU students are an awesome resource that way. There's been a lot of exciting progress in memory-efficient data structures from the theory side of things in the last decade or so. Not all of it is usable yet, but some of it has that "this is almost there" feel to it that gets me salivating for the chance to refactor how I think about some aspects of system design. There are two themes that I think are particularly worth considering: The ideas behind succinct d...

Optimistic Cuckoo Hashing for concurrent, read-intensive applications

March 11, 2013

Our FAWN team has been spending a lot of time looking at memory-efficient data and algorithms structures for various things (with a lot of emphasis on key-value stores, as per our first and second papers on FAWN-KV and SILT , respectively). Coming up at NSDI'13 , Bin Fan has a new paper in which we substantially improve the throughput of the memcached distributed DRAM cache. One of the core techniques we use is a new multiple-reader, single-writer concurrent variant of Cuckoo hashing that we call optimistic cuckoo hashing. It combines the refinement of a technique we introduced in SILT ("partial-key cuckoo hashing"), a new way of moving items during cuckoo insertion, and an optimistic variant of lock-striping to create a hash table that is extremely compact and supports extremely high read throughput, while still allowing one thread to update it at high speed (about 2M updates/second in our tests). We've released the code on github to this and you should...

Caring about Causality - now in Cassandra

February 25, 2013

Over the past few years, we've spent a bunch of time thinking about and designing scalable systems that provide causally-consistent wide-area replication. (Here, "we" means the team of Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and myself; but if you know academia, you wouldn't be surprised that about 90% of the project was accomplished by Wyatt, who's a graduating Ph.D. student at the time of this writing.) I'm posting this because we've finally entered the realm of the practical, with the release of both the paper (to appear at NSDI'13) and code for our new implementation of causally-consistent replication (we call it Eiger) within the popular Cassandra key-value store. Why do we care about consistency in wide-area replication? Because there's a fundamental, unavoidable trade-off between having guaranteed low-latency access (meaning not having to send packets back-and-forth across the country) and making sure that every client sees ...

Two examples from the computer science review and publication process

January 30, 2013

A few days ago, I posted about getting the Ph.D. in computer science . As part of that post, I mentioned that I'd publish & discuss some of the reviews my papers have received. These aren't so nasty that they're funny, but I decided to post the full submission/review/submission/review/final as a way to also help show what the publication and revision process looks like. I probably owe a better explanation of the delta between the papers, but - hey, the SIGCOMM, USENIX ATC, and SEA (algorithms) deadlines are all this week. :) I've deliberately picked two papers with very different initial reviews. Both were eventually accepted, and neither was accepted upon first submission. One received a best paper award. Both have over 100 citations. I've picked these not because they're representative -- I've written papers with low citation counts also -- but because they illustrate the wide disparity in review constructiveness and tone that eve...