Showing posts with label ruby. Show all posts
Showing posts with label ruby. Show all posts

Wednesday, September 17, 2014

Separating concerns: Data vs Algorithm

I'm by no stretch of the imagination a Knuth, but I was pairing the other day with a newer developer; and we started looking at some fairly complex code - at least from their perspective.
[
  {
    key: :known_issue_ids, 
    data: Something::KnownIssue, 
    map_to: :known_issues
  },
  {
    key: :depth_of_market_ids, 
    data: Something::DepthOfMarket, 
    map_to: :depth_of_market
  },
  {
    key: :developer_history_ids, 
    data: Something::DeveloperHistory, 
    map_to: :developer_histories
  },
].each do |param_value|
  property_params[param_value[:map_to]] = create_data_mapping(param_value[:data].all, property.send(param_value[:key])) if  property.send(param_value[:key]).any?
end

Obviously, there's a lot going on there; but it seemed quite impenetrable. I couldn't seem to convey that the importance of the technique was splitting the input data from the actual operation/algorithm; even if it resulted in some slightly more abstract code.

For example, it's quite often easy to look at duplicated, similar code and realise that introducing a method would greatly reduce issues; but we often fail to do this with conditionals or iterators. Languages with lambdas/procs/blocks/etc tend to make this a little bit easier to pull off; markedly so when it's translating from A to B with a lot of exceptions to rules.

Here, it becomes a lot more obvious that mapping is really just a set of lookups; compared to other code like

if a.foo
  b.foo_translated = a.foo
end
if a.foo2
  b.foo2_translated = a.foo2
end
if a.foo_catfish
  b.blah_translated = a.foo_catfish
end

The above even looks simpler for simple operations, but when you start performing really convoluted logic the chance of mistakes increases; particularly if you end up doing copy/paste/replace scutwork.
I'm curious - how wide spread is this practice?

Is it just a variation on certain DRY refactorings; or is it just an idiom I've picked up inadvertently from other languages like C?

Apart from the minor cognitive load put on the next maintainer (I should have really added a one line comment explaining what it boiled down to); are there any inherent negatives?

Tuesday, July 15, 2014

Shaprgram: a Govhack entry for crowdsourcing photos of spatial data

#Govhack is over, and I'm still recovering from it. I didn't know what to expect, but ran into Kieran and Chris from the AdelaideRB meetup.

We decided to team up, and kicked around a few ideas.


Friday

We got to talking through the ideas and datasets. Some we didn't love, some we didn't just have time for. Things got a bit weird with all of the teams murmuring in hushed tones, protecting their ideas. We decided the point of govhack was openness, so we shared our non starters. I was surprised to see the huge rush of people within milliseconds, obviously stalking the #govhack twitter feeds.

It came down to do a visualisation/data exploration tool or make a reusable tool that improves data. I managed to sway the rest of the team towards reusable tool, pointing out that schema and data production was a complete pain; and anything we built might not survive beyond the underlying data being regenerated.

We spent the rest of the time setting up on #Slack, Trello, Github and more. I sketched out some of the worst looking paper prototypes, then at home early into the AM bashed out a bunch of placeholders.

Saturday

Exhausted from the early morning work, I crawled in at around 11am to find terrible scaffold turning into a much nicer rails app.
We made a few choices that would come back to haunt us, including using https://github.com/mutablestate/ink3-rails - works great in dev, but absolutely hates being compiled in production mode, as we later found out.

We wrote some CSV and Shapefile importers, working with rgeo and postgis to add collections of points. Chris pointed out postgres' JSON field support solved one of the flexible schema problems we had (I was dreaming up nasty hacks and ways to kick ActiveRecord square in the pants), and by mid afternoon we had a pretty good prototype working.

A stampede took place, and all of the other teams vanished before we knew what was going on. 

I took the chance to catch up with Alex, and we got talking to some of the data.sa.gov.au folks. 

I found this pretty valuable - we touched on some of the reluctance to open up data that exists, the need for productive narratives & collaborations to highlight the value of open data, and the need for data set requests.

Chris was bitten by the urge to work into the early hours, and pushed us a lot further along.

When I got home, I thought we were looking pretty good - at least until I demo'd it to a non developer. Nothing like a live demo to find bugs.

Sunday

The final day - it came down to polish and deploy, which turned out to be a lot harder than it should have been.

Due to the previously mentioned asset compilation problems, there was frenzied cursing and kicking of the asset pipeline; as well as a fair bit of effort being put into infrastructure. I'd started out with a bit of chef to cook a machine, but found the change/deploy/test loop was tedious.

A few cosmetic and usability fixes were done, our team video recorded, and then disaster: most of the data sets we chose were in arbitrary reference systems (GDA94, etc), and we'd accidentally broken our transformations into WSG84.

Literally in the last 10 minutes, we ended up live editing to strongly encourage WSG84 friendly data sets to be added and to prove our code was happy end to end.

Results

Shaprgram is live, if a little buggy. It lets you take any dataset in WSG84 hosted on a data.*.gov.au site and crowdsource photos.

Here's our entry - don't forget to vote!

Who's this for


Shaprgram is aimed at government data producers who may not have the full resources of a geospatial team at their beck and call, who need to crowdsource information about their assets; and anyone that can benefit from visual imagery to supplement raw data.

Use cases


Adelaide City Council produces a number of data sets regarding picnic sites, which is published on their website and as open data. Unfortunately no one was able to snap a few photos at the time, and google streetview doesn't reach these places.

Using this tool, anyone with a shapefile or CSV can put up a list of assets that need photographs, and download a simple .zip file with an enhanced dataset.

Asset tracking


Just spent a few hard earned rate payer dollars on nice new benches? Use this tool to capture a visual representation of the asset at the time of installation; and during routine tasks - letting you see how your assets survive in the wild, when they need repair or more.

Much more


  • Parking spaces & restricted parking - all parking inspectors carry cameras, making it trivial to document your city
  • Signs - need to capture the signed speed limits across multiple authorities? Use this tool to complete your dataset
  • Tree health - Let members of your community tell you when a tree is ill, growing too much or more - find out *before* it becomes a costly issue to rectify.

Wednesday, April 23, 2014

Rails, how developers grow, and what's wrong with it

Rails is a great tool for automating the heck out of pumping out a quick and dirty site in short order.

It's a terrible tool for learning. The community isn't like the IRC communities I've been involved in. It's catering for a lot of developers early in their journey.

This is what I see on /r/rails or hacker news or being actively practised far too often:

  • I just discovered service objects
  • Test Driven Development/Unit Tests are dead, because it's too hard to stub out ActiveRecord.
  • Uh, guys, I started using after_save hooks and now my persistence layer is working with Sidekiq and the Mail server to kill all humans.
  • DHH said something outrageous! 60% of us are rushing to take his word as gospel right now; cause like, uh, nuts to the status quo!
  • Oh neat I discovered modules, and that classes are objects, so... like, I'm not going try to do evil... but... here's a gem I wrote that adds a search to all of the defined classes, because it's not like that could cause conflicts; right?

and it makes me cranky. It's actively harmful for people who don't have seasoned peers near them, ready to slide over a polite link on SOLID; or talk to you about the Principle of least astonishment.

Surely we can do better.

Where are the articles on

  • If you are testing your active record instances, that's not a unit test. Go extract your domain models and test those please.
  • Here's 10 terrible ways to write a component/gem/etc; because it's the people using your code that have to put up with your mistakes when your github repo becomes abandonware.
  • How to open up a github project to collaborators when you don't have the time to maintain your code anymore.
  • How to write a decent readme that sums up your gem/repo/etc.
  • What is Single Responsibility Principle? How does it apply within the context of ActiveRecord, Controllers and triggering after-create/after-save events?
  • Why are we OK to use FactoryGirl for tests (obviously useful), but much less willing to use the Factory Pattern to assemble our object graph; when convention breaks down in the face of complexity?
  • KISS: Why it's often worth writing code you might feel is imperfect if it helps your team collectively understand what the heck you are doing.
  • Why version control systems have a blame feature, and the startling reality the next guy looking after your system can track you down and beat you to death with your own severed legs.
  • Stop reading the ember.js documentation now, and just go look at the source code, it'll save us  all a load of time.
  • Magic is generally bad unless you are all wizards; but like Lord of the Rings it's freakin' easy for one or more of you to turn evil.

Interestingly, many these are framework and language agnostic - in my experience though, many framework centric types tend to only know their framework and community; so it distorts their understandings. 
It's not their fault - otherwise smart people have set out to solve a number of problems, then tried to generalise them having already learnt a significant amount of things the hard way - they just keep forgetting to share their learnings.


The best resources I had when I was learning was a mix of IRC (there is nothing like asking a stupid question and realising you just did so infront of the guy who invented the web, or the language you are using right now), and wikis like Portland Pattern Repository.



Wednesday, January 01, 2014

Samsung air conditioners - direct interaction

A few days ago, a Brisbane based chap named Shannon dusted off my blog posts from last year, and pointed out that directly interacting the a samsung A/C only required a bit of SSL.

We seem to have slightly different models, but the underlying principles are all roughly the same. Given my web based authentication method stopped working, and given the Australian Copyright Council says this, with regard to reverse engineering:

Making interoperable products
Software may be reproduced or adapted in order to get information necessary to enable an
interoperable product to be made. The relevant provision also allows the person making the interoperable product to reproduce or adapt the original software in the interoperable product, but only to the extent necessary to enable interoperability either with that or any other software.
I'm happy to publish the direct method now via bitbucket.


Speaking with Shannon a bit, he's likely to write a full perl or other language version as well; and had some strong ideas around home automation - I've suggested integrating into the thing system.

Sunday, December 22, 2013

Weekend mapping experiments, misc small but fulfilling development

This week I was working from home quite a lot, and given the holiday period, managed to get into the zone a fair bit.

It seems like this has continued into the open source work I'm doing. Most of the actual work is small at the end of the day, but has a larger impact.

So far; I've:


View Larger Map

Monday, September 30, 2013

Dog of a week... and it's only monday?


Feeling a bit overwhelmed already
  1. I don't know why I have to argue about DI, but I find I do. This neatly gets away from fat controllers, fat models and finally the the stuff that matters.
  2. I pulled together a quick ABN input on the weekend. I figured I've built the thing enough times to do it right, and other Australian web devs might have the same underlying issue.
  3. Many, many pull requests this week, contributing to Fat Free CRM, Chef recipes and even my much unloved fork of tipsy.
  4. All of Adelaide, most of Melbourne, Tasmania, Brisbane and Sydney have had a lot of attention from keepright.ipax.at - Adelaide has the cleanest map data of the lot, but Melbourne isn't far behind.
    My main focus: unclosed ways, redaction errors, sport tags without a physical associated tag - mostly things that won't show up in the rendering.
  5. Took the opportunity to work from home today, so spent my lunch break constructing gravel path.

Thursday, March 21, 2013

Consuming SOAP services with Ruby / Rails

Savon is a ruby soap library, and I mostly like it. I've been using the latest Savon to do integrations with SSRS at work.

Getting started
As practical advice, I strongly recommend you go and find a decent pre-built SOAP IDE. I've used XMLSpy and Oxygen before, which have a slightly wider focus than just pure SOAP; but in recent times I've come to like SoapUI quite a lot.

Compared to other tools, it will:

  • Take a WSDL
  • Generate a sample request for each action
  • Generate code for you in you happen to be writing in Java - and stub services
  • Allow you to easily set different endpoints (ie: prod, staging, dev) without stuffing around with WSDLs.
  • Allow you to develop a number of premade requests which will return expected responses.
In short, it's a heck of a tool with a low price point: free.

SSRS
If you aren't developing in C# or similar, MSDN isn't very helpful. Not having had to... enjoy... windows servers for some time, I found a lot of time was spent:
  • Working out if I was authenticating correctly (browser to web UI, browser to WSDL, SOAP UI to WSDL, SOAP UI to SOAP Action, Actual Code to Soap Action)
  • Blundering around looking for configuration files in a file manager, instead of using google to find a likely file path and using vim until success was achieved.
  • Finding all kinds of Authentication that should work, simply don't. In the end, if the SOAP MSDN articles had simply pointed me to Configure Basic Authentication on the Report Server, I think I have achieved more.
  • Discovering the docs were incorrect re what WSDLs were actually available vs what the server published, and what worked vs the configuration of 'Native Mode'.

Once you get over the successive hurdles of Auth, identifying the correct service, and guessing that ListChildren is the method you want to get a list of reports, it's fairly straight forward.

Gotchas with Savon
A few of the soap calls wanted specific soap headers. Savon only supports global soap headers, and once you've built the object it's hard to mutate it.

Tips:
1) To get more control, ditch the Savon::client() factory method - it's not really doing too much you can't do yourself.
https://github.com/savonrb/savon/blob/master/lib/savon.rb

2) If you need to monkey with the global options, you can often extend your client to make them accessible.
https://github.com/savonrb/savon/blob/master/lib/savon/client.rb

class ExtendedSoapClient < Savon::Client
   attr_writable :globals
end

3) It's worth stealing from Javaland in most cases - for each configured Savon Client, do up a quick FooServiceProxy. Inject your client into the constructor, and for all of the management of WSDLs, go stick that in your environment config.
If needed, go build yourself a SSRSServiceFactory to handle all of this wiring up.

The primary advantage - you get a reasonably clean API that can return either hashes, or if you go a little further you can generate models for the SOAP responses and have those appropriately populated.

It's important to remember these proxy classes exist only to translate a SOAP request from your code, to the transport layer, and to return a simple, meaningful result to your code.

That means avoiding a lot of logic in favour of making an API explicit.


My recommended pathway into Ruby / Rails


I was asked what I'd encourage a complete outsider to RoR to look into.

Here's my list - what would you add?

- Linux intro
  - User accounts
  - File permissions
  - su / sudo
  - Installing programs (apt)
  - What's a distro? (Ubuntu, Debian, etc)
  - Gnome desktop
    - Alternatives
  - Bourne Again SHell (BASH)
    - cd, ls, touch, vim, top, kill, pgrep, pkill, mkdir, chmod, chown
  - Basic bash scripts
- Git intro
  - Cloning
  - Commits
  - Log
  - Push
  - Pull
  - Fetch, Merge
  - Branches
  - Tags
  - Revert
  - Reset
- Ruby intro
  - Hello world
  - Functions, Control flow, Loops
  - Classes
  - Modules
  - Blocks! Lambas! Currying!
    - Chunky Bacon and Foxes!
  - DSLs
  - Libraries (gems)
  - Require / include path
  - Metaprogramming
  - OOP Intro
  - OOP: SOLID principles
  - OOP: TDD
- HTML intro
  - HTML5
    - what is it?
    - differences to HTML4/xhtml?
    - Data attributes
    - New inputs
  - Javascript basics
    - JQuery
    - AJAX
    - Backbone, Ember, etc
    - D3
  - CSS basics
    - CSS3
    - Responsive design
  - SVG basics
- JSON! YAML! Data structures & serializations.
- Rails intro
  - Generators
  - Rake
  - Organisation
  - Scaffolds
  - Convention, not configuration
  - Rspec
- Sinatra intro
- SASS / Less
  - What is it?
  - Bootstrap/Compass
- Continuous Integration
  - Jenkins
  - Travis CI
  - Post commit hooks
  - Rake tasks
- DevOps!
  - Chef
  - Puppet

Thursday, February 28, 2013

Adelaide Ruby Hack Day

There's an effort to pull together a Ruby Hack Day

We're aiming to start up an open coding session on the Saturday following the monthly meetups. The main goal is to complement the presentations with a more hands on component in which some of the more experienced guys can make themselves available to help out newbies working their way into the Ruby / Rails world. 
The guys at the Majoran Distillery have been fantastic and offered to host this, so we've locked in the first one for the 9th of March from 10am-1pm. 

I'd like to go, but have travel planned.


A decent article on what tends to upset me about rails

From http://collectiveidea.com/blog/archives/2012/06/28/wheres-your-business-logic/

 I put some of the blame on Rails itself, which has guided developers to use Controllers, Models, or Libraries, and nothing else. 
and
 Are your tests slow (>1s to run a single unit test)? Do your tests have large ungainly setup? Are you using factory_girl in your unit tests? Are you mocking implementation instead of interfaces? 

I'm only really into my 3rd or 4th rails application of appreciable size, and they all seem to have this kind of problem - the tight coupling of active record to the model, and the model mixing persistence with business logic is rampart.

Enter the Interactor. An Interactor handles a use case. It pulls together the models and libraries it needs to process a single business rule, and then it’s done. These objects are very easy to test and use and in proper OO fashion can be used anywhere the app needs to apply the use case or business rule.
I dont think you need to call it an Interactor, it's just a Model. Not the relational-persistance kind, but the Model of your business logic - expressly applying SRP.

Enhanced by Zemanta

Tuesday, February 26, 2013

Ruby, Currying

One of the better, clearer explanations of currying.

http://khelll.com/blog/ruby/ruby-currying/

Obviously useful for math, or scenarios where you inject your own lambdas (ie: sorting).

Monday, February 25, 2013

Keyword arguments.

Thank god.

# Ruby 1.9: # (From action_view/helpers/text_helper.rb) def cycle(first_value, *values) options = values.extract_options! name = options.fetch(:name, 'default') # ... end # Ruby 2.0: def cycle(first_value, *values, name: 'default') # ... end

More http://blog.marc-andre.ca/2013/02/23/ruby-2-by-example/


Enhanced by Zemanta

Tuesday, February 19, 2013

Tree structures in relational data stores

http://fungus.teststation.com/~jon/treehandling/TreeHandling.htm

Handy to read, as it lets you run queries over collections when tree relationships dont matter as much. IE cascading security or calculating totals.

Friday, February 01, 2013

Capistrano frustrates me


Here's why:

1) It defines its own class path. If you want to load a customised extension, good luck - it is seemingly unaware of gems you add via bundle; pointing at a git repository

2) Task definition and execution are not separated. require is replaced with load - and load means "load and run".

For example - https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy/assets.rb
You cannot "just" load the code and do something like compile assets locally by drawing on commands generated by someone else.

3) The methods defined are too rigid. I would refactor every capistrano recipe out there into behaving in two ways. First, there's a lot of methods that generate commands to be run. Second, there's business logic which decides where those commands should be run.

Looking at the recipe above:

    task :clean, :roles => assets_role, :except => { :no_release => true } do
      run "cd #{latest_release} && #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:clean"
    end

You get no control there - the run() method will execute that remotely.
More frustrating: the commands are dumb strings. You don't really have the input escaping or knowledge of what's valid to put into the command in the code itself.

There needs to be flexibility, I agree, but some kind of basic input checking or modelling of what's happening should be more widely used.

4) The recipes tend to have no idea of where they are - you might as well write shell scripts to do most of it. Every time you see 'cd {absolute path}' is a good indication of this - smarter recipes would likely check current working directory (pwd); or keep a pointer to that.

The only thing that capistrano does better than pure bash scripts:

  • There's a lot pre-written, so you don't reinvent the wheel.
  • It has a dependency solver (before this do that, after that do this)


Sunday, January 27, 2013

Air con remote control success

After yet more hacking, I've stuck together just enough code to think I want to go and find a proper home automation framework.

I've got weather-util installed on a cron job doing:
0 * * * * weather ypad ~/butler/weather.txt

This gets me a small text file:
$ cat weather.txt 
Current conditions at Adelaide Airport, Australia (YPAD) 34-56S 138-31E 4M
Last updated Jan 26, 2013 - 11:30 PM EST / 2013.01.27 0430 UTC
   Temperature: 73 F (23 C)
   Relative Humidity: 46%
   Wind: from the SW (230 degrees) at 14 MPH (12 KT)
   Sky conditions: mostly clear
The first argument is an airport identifier. Next, I've used Blather to pull together a quick XMPP bot. It supports commands like
!ac on
!ac cool
!ac temp 23
!ac off
!weather

!weather just pipes the weather.txt back to whomever messaged the bot.

The !ac commands invoke some of the code to send commands to the air conditioner I talked about earlier.

Finally, the bot checks the configured owners for all Auth, and if it hasn't seen you in 8 hours, says hello.

It's far from the cleanest code I've ever written, and Blather is good but needs that little bit more to become "the rails of XMPP". Most importantly here though, I'm a little bit of regex away from writing some simple rules (IFTTT style), and adding the relevant cronjobs. The rules are going to be fairly terse:
while temp > 30; turn on AC

while temp < 25; turn off AC

I'd say if I did much more I'd want to refactor a lot of stuff to be more componentized - IE: jabber notification service when a decision is made doesn't need to be right next to the jabber command service.

I can't decide if I'd want to bring in something like Whenbot, or deal with something as complicated seeming as rhouse. Rhouse (linuxMCE) has a lot of UPnP stuff built it, but annoyingly the samsung smart devices are only using a subset of it - my phone notifies, the air con notifies back; rather than properly using M-SEARCHs.

Having spent a lot of time on this, I suspect that would be quite annoying to work with.

The next step I'll probably do is a further cleanup of my nmap code - when my phone's mac address walks in the door, the butler-bot should message me a greeting / fire up the AC decision engine.

Finally, I have to stick this on it's production server - my new raspberry pi.

More than happy to share the code I've got with anyone interested, it's in a few private bitbucket repos for now - I figured there would be few people with the same setup as I have; so kept it private (at least till I can make it into a few gems)...

Thursday, January 17, 2013

Activerelation, stop being so clever.


I nearly flipped by lid today, when doing what should have been a simple activerelation query to get the  'last record'.

Foo.order('column')
SELECT * FROM foo ORDER BY column ASC;


Foo.order('column ASC').last
SELECT TOP 1 * FROM foo ORDER BY column DESC;



Foo.order('column DESC').last
SELECT TOP 1 * FROM foo ORDER BY column ASC;


Foo.order('column DESC').first
SELECT TOP 1 * FROM foo ORDER BY column DESC;


Foo.order('column ASC').last
SELECT TOP 1 * FROM foo ORDER BY column ASC;



Foo.order('column ASC').limit(1)
SELECT TOP 1 * FROM foo ORDER BY column ASC;

Foo.order('column DESC').limit(1)
SELECT TOP 1 * FROM foo ORDER BY column DESC;

Something tells me I'll never like SQL server's TOP 1 syntax, as I spent ages just staring at the end of the queries and wondering "why the heck isn't it choosing the last record, I told it to sort DESC!"

I miss my LIMITs.


Saturday, November 10, 2012

Flattening your models in JSON

You have a graph of objects, with some of those external lookup codes.
class ViciousAnimal < ActiveRecord::Base
  has_one :classification
end

# A simply code, description pair that captures the Smithson Nasty Bite Classification (WEAK_BITE, OW, WHERE_IS_MY_LEG)
class Classification < ActiveRecord::Base
end

When you are using something like knockout, quite often it's more useful for you to have application.classification.description present, but unless you do two serialisations to JSON, and build the relationship between the two JSON objects that's a bit annoying. In this scenario, I'll typically create an implentation like:
class ViciousAnimal < ActiveRecord::Base
  has_one :classification

  def classficiation_description
     classification.description
  end

  def to_json(options = {})
    super(options.merge({:methods => :classification_description}))
  end
end
This allows you to begin to flatten your object graph into something easier to work with on the UI.

What are the gotchas?

When you customise your to_json method, don't always expect rails to invoke it. For example, if you are passing back a number of objects, you'd typically do:

render :json => {
  :something => true,
  :vicious_animal => @vicious_animal
}

However, you'll be surprised to find none of your own implementation of to_json appears to be called. only whatever the Hash's to_json method supports.

To work around this, you often have to dumb your ActiveRecord instance down to a plain old hash:

render :json => {
  :something => true,
  :vicious_animal => ActiveSupport::JSON.decode(@vicious_animal.to_json)
}

Hardly the prettiest scenario.

How do you make that scale?

The above approach is only really good for a handful of descriptors - if you suddenly find half of your model is devoted to flattening out the graph, I would recommend extracting said code into a ViciousAnimalJSONSerializer, which knows how to orchestrate the mapping between the object graph and flattened view.

Rails provides ActiveModel, which gives you a good starting point for cleaner mapping code on plain objects. The benefit of getting yourself familiar with these sorts of mapping libraries is fairly high - for example, it would be fairly trivial to implement an n-triples mapper, to/from text serialization, or just about anything else you can imagine.

Friday, November 02, 2012

How to put a semantically enabled autocomplete control into your applications

One of the most common application design patterns is to implement a lookup table - some piece of business data has been given a description, and possible a code or identifier.

When creating new data, a user is often needing to select a code/identifier for a piece of information. This is usually done as a dropdown, or if they are many entries, an autocomplete control is often used.

This works well - some people will just make hashes storing the key/value pair in their code, others will ensure it's published into their relational data store.

Where it starts to fall down is in multiple applications working together - who can agree on the meaning of a code?
Your code of CASE_NIGHTMARE_GREEN is applied by a user and treated by one application as the coming of Chthulu, but after an ETL, CSV export or webservices message, the next application treats it as something different - users not up to date with the latest Lovecraftian spy thrillers start to misinterpret the data and apply it to anything involving green suitcases in horrible colours.

How do you fix this?
The next logical step often becomes to add a description, so that a UI can explain the term, but in a non services oriented environment, that's trapped in your datastore.

This won't work in a multiple vendor scenario, at least not unless you want to share your DB with them.

Another approach is the Code Table service - a service that has a focus on only retrieving data about a given input identifier.

I've seen this done in at least one SOA, and it's not a terrible pattern - but each vendor still has to stand up their own code table services, and there's a lot of repetition.

What else can you do?
Soon it becomes obvious that you want a decent way to find a code and the related data, but you also want to support aliases - my CASE_NIGHTMARE_GREEN is your WALK_IN_THE_PARK.

This gets tricky, quickly, as 1-1 mappings are difficult - and either a collection of vendors pull together and standardise on a list and the mappings, or no one really collaborates and fragile mapping code is introduced.

By this point, fear of change often sets in as the interfaces between parties are fragile, or to push changes through the consortium of vendors becomes a nightmare of project management and communication.

If you haven't had to roll out minor enhancements to a standard with a number of other parties who just aren't quite interested, take my word for it - it's painful.

All is not lost, there is another way - and it's simple.


What's the way forward?

My recommendation here is to push your codes into a triplestore. It doesn't fix everything, but it becomes trivial to relate information to the code - aliases, for example, or descriptions.

A triplestore is a RESTful service that allows you to execute queries - if you can deal with mongodb or mysql, you should be able to comprehend what's going on.

Don't just take my word for it - here's one prepared earlier - SNOMED, SPARQL powered autocomplete UI components. Pretty neat stuff.

Here's what wikipedia has to say about SNOMED, if you haven't heard of it.
SNOMED CT Concepts are representational units that categorize all the things that characterize health care processes and need to be recorded therein. In 2011, SNOMED CT includes more than 311,000 concepts, which are uniquely identified by a concept ID, i.e. the concept 22298006 refers to Myocardial infarction. All SNOMED CT concepts are organized into acyclic taxonomic (is-a) hierarchies; for example, Viral pneumonia IS-A Infectious pneumonia IS-A Pneumonia IS-A Lung disease. Concepts may have multiple parents, for example Infectious pneumonia is also a child of Infectious disease. The taxonomic structure allows data to be recorded and later accessed at different levels of aggregation. SNOMED CT concepts are linked by approximately 1,360,000 links, called relationships
That's one big code table, and you can see it's grown beyond just code/name pairing to include more data.

One of the key things that has been highlighted by the freebase folks and a few other places is the common problem - from a bunch of user input, go locate an object or identifier related to that term.

The moment you have an autocomplete control like these, it instantly kicks your application from "user is entering data into a text field" into "user is describing a semantic object, and I can grab all of the information about it that is relevant to my user".

Unlike standard, relational powered applications, SKOS + SPARQL makes this trivial - you simply write out a preferred label (skos:prefLabel), and many alternative labels (skos:altLabel).
What does that look like? Here's a sample query showing a user searching for... ear wax.

Note the URIs (try clicking on them to find out more information), and the preferred label/aliased label in the resultset, and try the resultset as JSON.

Even if no other parts of your application is aware of linked data, you can see how this graph of information can be flattened and pushed into a standard data store, for later use.




How can I build myself one of these?

Installing 4store


For this exercise, let's install some of the requirements:
$ sudo apt-get install 4store

Now we'll instantiate a new store (think database):

$ sudo 4s-backend-setup reference_store
4store[5196]: backend-setup.c:185 erased files for KB reference_store
4store[5196]: backend-setup.c:310 created RDF metadata for KB reference_store

Fire up the backend service (think of it like /etc/init.d/mysql start)
$ sudo 4s-backend reference_store

Populate some data - we'll use something I've prepared earlier as in turtle format. It helps to think of turtle as yaml but with URIs and a bit more magic.


$ git clone git://github.com/CloCkWeRX/4store-reference-service.git
$ cd 4store-reference-service
$ 4s-import reference_store --format turtle data.ttl



We're good to go - let's put the endpoint up
$ 4s-httpd -p 8000 reference_store



Now there's a (restful) endpoint living at
http://127.0.0.1:8000/sparql/

and you can run queries on it via http://127.0.0.1:8000/test/ - though until Issue #93 is solve, you probably just want to open the test-query.html page - this query will bring back both sets of data.

$ chrome test-query.html

From here, you can see the plain text, csv, JSON or XML results.

How do I do this in PHP, Rails, etc?

There's a lot of client libraries out there - I'd suggest having a quick read through of http://www.jenitennison.com/blog/node/152 for most rails developers, or looking at the sparql-client gem.

Failing that, peruse the ClientLibraries.


Where can I learn more about SPARQL?


Step 1, learn Turtle. If you can comprehend YAML, you should feel fairly comfortable.

Step 2, I'd try SPARQL by example. There's a good chance that if you are thinking of an SQL concept you want, such as LIKE matching; there's a SPARQL equivalent (FILTER regexp).

Luckily 95% of what you learned with turtle is simply reused by SPARQL - it introduces variables, where clauses/graphs, filters, and a few other things... but that's really all that's new.

Where to from here?

If you were to deploy this internally within an organisation, your service is pretty much good to go. You may want to look at Graph Access Control to add in some security, and the related SparqlServer/Update APIs.

Was this easy enough?

In comparison to the other approaches I have seen, it's fairly good.
  • It's trivial to put a front-end on your triplestore.
    You can roll your own with a minimum of fuss, or use things like https://github.com/kurtjx/SNORQL to provide an 'expert user' ability to inspect your data.
  • Adding, removing, etc aliases is trivial - there's no schema to migrate or anything else troublesome, and you can add in extra data at the drop of a hat - even if it's unrelated to your core set.
  • It's trivial to relate concepts to each other.
  • Your ontology (schema) is already there for code tables - http://www.w3.org/TR/skos-reference/ - you'll never have to reinvent that
  • There's products available that let you tie in your application behaviour/code tables right into Confluence or other platforms.

Saturday, September 08, 2012

rspec, let(), let!(), activerecord and global state

Yesterday at work I was beating my head against a brick wall. The setup we have is machinist, rspec, and devise in an otherwise standard rails application.

We have blueprints for Users and Profiles, which are all pretty straight forward. When doing coverage of security related scenarios, we've got a helper

def login_as
   user = User.make!

   # .. and bits to make a faux current_user
end

All fairly straight forward so far. A typical security test might look like this:

describe "cat" do
  login_user

  let(:profile) {
    Profile.make!(:user_id => @user.id)
  }


  let(:cat) {
    Cat.make!(:profile_id => profile.id)
  }
  
  let(:brush) {
    Brush.make!(:way => "wrong", :owner => @user.id)
  }

  it "should not allow users to rub them the wrong way" do
    should_not be_able_to :brush, cat
  end


  it "should allow users to feed them" do
    should be_able_to :feed, cat
  end
end

Seen the problem? Of course, let() is being used as a kind of setup(), and if the guts of our code actually relies on the relationships created by executing the above code, you'd expect a User, Profile, Cat and Brush to be made for each it() block.

You'd be wrong, like I was, so helpfully a colleague tells you to always use let!() to ensure it's executed and instantiated.


describe "cat" do
  login_user


  let!(:profile) {
    Profile.make!(:user_id => @user.id)
  }


  let!(:cat) {
    Cat.make!(:profile_id => profile.id)
  }
  
  let!(:brush) {
    Brush.make!(:way => "wrong", :owner => @user.id)
  }

  it "should not allow users to rub them the wrong way" do
    should_not be_able_to :brush, cat
  end


  it "should allow users to feed them" do
    should be_able_to :feed, cat
  end
end

... and the test still doesn't work as you'd expect, with using let!() as a setup().


describe "cat" do
  login_user


  let!(:profile) {
    Profile.make!(:user_id => @user.id)
  }


  let!(:cat) {
    Cat.make!(:profile_id => profile.id)
  }
  
  let!(:brush) {
    Brush.make!(:way => "wrong", :owner => @user.id)
  }

  it "should allow users to feed them" do
    should be_able_to :feed, cat
  end

  it "should not allow users to rub them the wrong way" do
    should_not be_able_to :brush, cat
  end

end


But this version does.
What?

So after much further staring, I remembered that let() and let!() are all about caching - you call it once and the result is persisted across tests.

That's not the case with login_user, which executes every time - so all of a sudden, @user.id is different in multiple places, and it's completely non obvious - let() is making global state.

Why would anyone want to add global state to testing frameworks? It just causes Spooky Action at a Distance, like above.
What do we want to do instead? There's a before() which should almost always be used, as it's a direct relation to setup() from the xUnit family of tools.
But let() is there, so what's it actually for?

Dan's rules for let()

  • Only use this for otherwise dynamic values or complex calculations that you do want to fix at a point in time.
    let(:time) { Time.now }
    let!(:number) { Math.random }
  • If you have to use let() as a setup type thing, try to make One Big Setup
  • Don't listen to co-workers unless they are offering to buy you a beer


Tuesday, August 21, 2012

Ruby 1.8.*, UTF-8 and Oracle / Oracle Enhanced / OCI-8

... is a mess.

For future reference:
  1. Check select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'; is AMERICAN_AMERICA.AL32UTF8 or similar
  2. In your boot.rb, prior to any gems being loaded, add
    require 'rubygems'
    
    # Set up gems listed in the Gemfile.
    ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../../Gemfile', __FILE__)
    ENV['NLS_LANG'] = 'AMERICAN_AMERICA.AL32UTF8'
    
    require 'bundler/setup' if File.exists?(ENV['BUNDLE_GEMFILE'])
    
  3. Curse repeatedly at Oracle's database wide character sets, rather than MySQL's table based character sets - if you are migrating, you have to migrate every schema being used.
  4. Export your data, change the character set by ALTER DATABASE CHARACTER SET, re-import.