Digithead's Lab Notebook: seattle

Showing posts with label seattle. Show all posts

Monday, March 09, 2015

Extended Lake Union Loop

The standard running loop around Lake Union is a touch over 6 miles. With the addition of a side loop around Portage Bay, you can bring it up to 8 and a half, taking in a bit of UW's campus and crossing over the cut into Montlake. Sticking to the water's edge keeps the terrain nice and flat, but if you want some climbing, head up into Capitol Hill via Interlaken park.

Here, I've factored in a stop at PCC for a cold drink.

Tuesday, June 17, 2014

Ada's Technical Books

If you're in Seattle, you owe it to yourself to spend some time in Ada's Technical Books in my old neighborhood. It's on the top of Capitol Hill on 15th between Republican and Harrison.

The shop features books on all sorts of techy topics - science, math, engineering and, of course, computers along with various nifty maker-type gadgets and a cafe full of tasty treats. Geek heaven!

Wednesday, May 29, 2013

Shiny talk by Joe Cheng

Shiny is a framework work for creating web applications with R. Joe Cheng of RStudio, Inc. presented on Shiny last evening in Zillow's offices 30 stories up in the former WaMu Center. Luckily, the talk was interesting enough to compete with the view of Elliot bay aglow with late evening sunlight streaming through breaks in the clouds over the Olympics.

Shiny is very slick, achieving interactive and pleasant looking web UIs with node.js, websockets and bootstrap under the hood. It's designed on a reactive programming model (like bacon and ember) that eliminates a lot of the boiler-plate code associated with listeners or observers in UI coding.

Shiny comes in two parts, the shiny R package for developing Shiny apps and Shiny server for deploying them. The RStudio company intends to create a paid tier consisting of an enterprise server and a paid hosting service, Glimmer, which is free for now.

Among several demos were a plot of TV Show Rankings over time and a neat integration with Google's Geochart library to map World Bank health nutrition and population statistics. There are also some examples of combining D3 with Shiny (G3Plot).

Possibly, the coolest demo was a tutorial on reactive programming in the form of an R console in a browser. Chunks of code could drag-and-drop around as in "live document" systems like IPython notebooks or Chris Granger's Clojure IDE Light Table.

Monday, April 08, 2013

HiveBio: a DIY biology lab for Seattle

Seattle is one of the few cities with a big biotech industry lacking a community lab space. Katriona Guthrie-Honea and Bergen McMurray are going to fix that by creating a DIY bioscience lab. The Seattle HiveBio Community Lab will be a community supported Do-It-Yourself (DIY) biology hacker-space or maker-space.

Katriona Guthrie-Honea is a student at Ingraham High and an intern at Fred Hutch. Bergen McMurray is a neuroscience student and an alumna of the Allen Brain Institute and Jigsaw Renaissance, a maker-space in Seattle's International District.

Worrying about an "innovational stagnation period" because not enough people are learning and playing with biotech, Guthrie-Honea wants to provide a place where people of all ages can do just that.

Synthetic Biology was founded on the idea of bringing an engineering mindset to biotechnology, with one result being BioBricks, the beginnings of a set of modular components. The iGEM competitions drive education and open community around synthetic biology.

But, one could argue that a standard engineer wouldn't make a centrifuge out of a salad spinner or a ceiling fan. To do that, what you need is a hacker.

I love the idea of bringing the hacker mentality to life sciences. Just like we should all take the lids off our computers and root our phones, we should be hacking the yeast in our beer like mad scientist Belgian monks.

Anticipating a May opening, Guthrie-Honea and McMurray are seeking funding from Microryza, which is like a Kickstarter for science, and a great idea in itself.

Do you love the idea, too? Want to help? Just like Kickstarter, Microryza is a crowdfunding platform. Check out their project and kick in a few bucks.

Friday, April 27, 2012

Sage Bionetworks Synapse

Michael Kellen, Director of Technology at Sage Bionetworks, is trying to build a GitHub for science. It's called Synapse and Kellen described it in a talk at the Sage Bionetworks Commons Congress 2012, this past weekend: 'Synapse' Pilot for Building an 'Information Commons'.

To paraphrase a Kellen's intro:

Science works better when people build off of each other's works. Every great advance is preceded by a multitude of smaller advances. It's no accident that the invention of the printing press and the emergence of the first scientific journals coincide with the many great scientific discoveries of the age of enlightenment. But scientific journals are stuck in a paradigm revolving around the printing press. In other domains, namely open source software, people are more radically reinventing systems for sharing information with each other. Github is a collaborative environment for the domain of software. Synapse aims to be a similar environment for medical and genomic research.

The Synapse concept of a project packages together data and the code to process it. I tried to download the R script shown in the contents and couldn't, either because I'm a knucklehead or because Synapse is a work in progress. On the plus side, they give you a helpful cut-n-paste snippet of R code in the lower right corner to access the project through their R API. When this is fully implemented, it could provide a key piece of computing infrastructure for reproducible data-driven science.

Sage intends to explore ways of connecting to traditional scientific journals. Picture figures that link to interactive visualizations or computational methods that link to code. I'm a big fan of the "live document" concept and it would be great to see journal articles evolve in that direction.

An unintended consequence of NGS, Robert Gentleman points out, is that the data is too big for existing pipes. Any concept of a GitHub for science will have to incorporate processing biological data in the cloud. I could imagine a Synapse project containing data sets, code and a recipe for standing up an EC2 instance (or several). At a click, a scripted process would run, bootstrapping the machines, installing software and dependencies, running a processing pipeline, and visualizing the results in a browser. How would that be for reproducible science?

Michael Kellen's blog has a bunch of interesting stuff about why building a GitHub for biology is more fun than selling sheets. I bet it is.

Thursday, December 08, 2011

Effective Data Visualizations

Noah Iliinksy spoke at UW, yesterday, on the topic of Effective Visualization. Iliinksy has a new book out, Designing Data Visualization (review), and served as editor of Beautiful Visualization, both from O'Reilly. And, yay Seattle, he lives here in town and has a degree from UW.

If you had to sum up the talk in a sentence, it would be this: Take the advice from your college technical writing class and apply it to data visualization. Know your audience. Have a goal. Consider the needs, interests and prior knowledge of your readers / viewers. Figure out what do you want them to take away. Ask, “who is my audience, and what do they need?” I guess that's more than a sentence.

Encoding data

The human eye is great at perceiving small differences in position. Use position for your most salient features.

Color is often used poorly. Question: Is orange higher or lower than purple? Answer: No! Color is not ordered. However, brightness and saturation are and can be used effectively to convey quantitative information. Temperature is something of an exception, since it is widely understood that blue is cold and red is hot. Also, color is often loaded with cultural meanings - think of black hats and white hats or the political meanings of red, orange or green, boy/girl = blue/pink, etc.

Appropriate encodings by data type

Click to expand this handy chart!

As an example of how to do it right, Iliinsky points to Hipmunk, which crams an enormous amount of data into a simple chart of flights from Seattle to Phuket, Thailand.

We can see departure and arrival time and duration, in both the absolute and relative senses, plus layovers, airline, airport and price. And, you can sort by "Agony", which is cool. They've encoded lengths (of time) as lengths, used text (sparingly) for exact amounts, color to show categorical variables (airline) and iconography to indicate the presence or absence of wireless internet on flights.

The cool chart and the quote about encoding, were expropriated from the slides from Iliinksy's talk at Strata. If you want more, there's a video of a related talk on You-Tube and a podcast on Letting Data Tell the Story. Tools recomended by Iliinsky include R and GGPlot, D3 and Protovis, and Tableau.

Saturday, July 02, 2011

Running in Queen Anne

The crown of Queen Anne is a great place for running. It's mostly level along tree-lined streets and has great views of Elliot Bay and the Seattle skyline. The Queen Anne Boulevard route is 4.1 miles, according to Google. Jen and I often do a slightly shorter 3.7 mile run, by cutting short the north-west loop staying on 7th West past Coe Elementary School.

More Seattle running routes

Monday, December 27, 2010

Cloud bioinformatics

Personally, the thing I love about cloud computing is never having to ask permission. There's no ops guy or pointy-haired boss between me and the launch-instance button. As lovely as that is, the cloud is also a powerful tool for scientific computing, particularly bioinformatics.

Next-gen sequencing, which can produce gigabytes per day, is one factor pushing bioinformatics into the cloud. Data analysis is now the major bottleneck for sequencing-based experiments. Labs are finding out that generating sequencing data is getting to be cheaper than processing it. According to Dave O’Connor Lab at the University of Wisconsin's Department of Pathology and Laboratory Medicine, "There is a real disconnect between the ability to collect next-generation sequence data (easy) and the ability to analyze it meaningfully (hard)."

O'Connor's group works with LabKey Software, a Seattle-based bioinformatics software company founded by the Fred Hutchinson Cancer Research Center. LabKey develops open-source data management software for proteomics, flow cytometry, plate-based assay, and HIV vaccine study data, described in a presentation by Lead Developer Adam Rauch. Their technology stack seems to include: Java, Spring, GWT, Lucene and Gauva (aka Google Collections). LabKey integrates with the impressive Galaxy genomics workflow system and the Trans-Proteomic Pipeline (TPP).

A good part of modern biology boils down to mining biological data, with the goal of correlating sequence, transcription or peptides to outputs like function, phenotype or disease. Machine learning and statistical modeling tend toward long-running CPU-intensive jobs that get run intermittently as new data arrives, making them ideal candidates for the cloud.

Amazon's EC2 seems to be better positioned than either Microsoft's Azure or Google's AppEngine for scientific computing. Amazon has been ahead of the curve in seeing the opportunity in genomic data overload. Microsoft has made some welcome efforts to attract scientific computing, including the Microsoft Biology Foundation and grants for scientific computing in Azure. But they're fighting a headwind arising from proprietary licensing and a closed ecosystem. Oddly, considering Google's reputation for openness, AppEngine looks surprisingly restrictive. Research computing typically involves building and installing binaries, programming in an odd patchwork of languages and long running CPU intensive tasks, none of which is particularly welcome on AppEngine. Maybe Google has a better offering in the works?

It's worth noting that open-source works without friction in cloud environments while many proprietary vendors have been slow to adapt their licensing models to on-demand scaling. For example, lots of folks are using R for machine learning in the cloud, while MatLab is still bogged down in licensing issues. The not-having-to-ask-permission aspect is lost.

According to Xconomy, Seattle has a growing advantage in the cloud. There are several Seattle companies operating in the bioinformatics and cloud spaces. Sage Bionetworks, also linked to FHCRC, was founded by Eric Schadt, also of Pacific Biosciences, and Stephen Friend former founder of Rosetta Inpharmatics. Revolution Analytics sells a scalable variant of R for all kinds of applications including life sciences. Seattle hosts a lot of activity in analytics, cloud computing and biotechnology, which will keep Seattle on the technology map for some time to come.

Thursday, January 28, 2010

Analytics in Seattle

Seattle hosts a lot of activity in analytics (aka data analysis, visualization, and business intelligence).

Robert Gentleman, one of the principles of the R project, was for a long time associated with Fred Hutchinson Cancer Research Institute, just across the lake from where I sit. He got hired in September of 2009 by Genentech (now a subsidiary of Roche), and was also recently named to the board of REvolution computing. REvolution's product development office is here in seattle.

R is an open source implementation of the S language, which originated at Bell Labs. S-plus, a commercial offspring of S, was introduced in 1988 by UW professor R. Douglas Martin's company Statistical Sciences, Inc. That was bought in 1993 by MathSoft, maker of MathCAD, then sold off again in 2001 as Insightful corporation. MathSoft continued to struggle and was itself acquired in 2006 by PTC Corporation. TIBCO bought Insightful in 2008. TIBCO, as of 2007, also owns SpotFire.

Tableau, maker of analytics and visualization software that competes with SpotFire, is next to Google in Fremont. Probably doesn't hurt that UW has a strong statistics, biostats, and computer science departments.

Just last year, Microsoft nabbed former South Lake Union residents Rosetta biosoftware for integration into the spooky sounding Microsoft Amalga Unified Intelligence System.

Vaguely related

Seattle Xconomy tech/biotech business news
bioinformatics at genentech
selling visual analysis
Interview w/ Revolution Analytics CEO Norman Nie

Digithead's Lab Notebook