Stories by Bruce Becker on Medium

Ansible Style Guide in Action

Bruce Becker — Thu, 26 Jul 2018 15:09:23 GMT

Step into the ring…

A few weeks ago, I announced a style guide for developing Ansible roles. The intended audience is the developers of middleware components1and the aim of the guide is to improve or ability to collaborate, and to deliver products smoothly and reliably, without breaking the infrastructure in general.

A typical case would be an existing product which performs some specific function e.g., a storage management front-end service. Another case would be the one I want to use as an example here — the so-called “worker-node” function.

Case: Worker Node

The worker node is essentially a composition of clients which interact with infrastructure components:

“validate user token”
“get data”
“submit workload request to your local resource manager”
“check how that’s going”
“send accounting data”

etc. If we were starting out now, these functions might well be built as serverless endpoints, but as with all things infrastructure-related, one has to deal with the legacy of what came before.

The worker node function has typically been distributed as a meta-package in OS repositories — an RPM or DEB which expresses all of the necessary dependencies. A site wishing to provide the worker node function could therefore easily ensure that this was present by simply installing the metapackage. That is, if the prerequisite state is assured.

That’s a big if and a big ask in 2018.

Building the worker node now

If we had the case of a totally new site in the federation wishing to participate by offering compute resources, we would probably want this site to be integrated and functional with a little demand on the site itself. If we start from this position, we might well consider the site resources and the layer of middleware necessary to federate it as separated by a well-defined contract : We (federation) give you a bunch of endpoints to send data to, you (resource provider) send the data. We’ll go one step further and provide you with the functions which send that data, so that you have zero interference with your setup.

This separation of function from platform is why containers were developed.

Modelling services

How would things be different, if we approached this from a 12 factor point of view? We have to deal not only with the installation of binaries and other files, but also the configuration of these, around site-specific setups, procedures and policies. The last thing a product team wants to deliver to an endpoint which will eventually use it, is a product which doesn’t play nicely with the rest of the environment. This could include, for example, hard-coding certain paths, asserting the presence of particular users or, usage of the network in a specific way. All of these would be examples of “bad behaviour”, since the integration of the site into the federation is not done according to a central prescription, but according to an OLA agreed to by both parties.

We therefore need to deliver not only products, but also strategies for deploying those products, which are flexible enough to respect local site policies. If we are to be fluid, we also need a high degree of trust that the final result will not only perform as advertised (i.e., works, and does what it needs to do), but also won’t break local setups. The Ansible Style Guide describes aspects of developing, testing, documenting and delivering the role. It is more about how than what, because the overriding, big-picture goal is to solve problems and have them stay solved.

The way to do this is, as with most engineering problems, to factor out the big problem into smaller ones in some logical way. Doing things in this way, we come to have a sort of “dependency tree” of roles, so that infrastructure engineers can separate problems and solve them permanently.

Figure 1: A simple representation of the dependency graph of Ansible roles for the UMD User Interface.

This has the happy consequence however that end users (typically, site administrators) can re-use these products with confidence at their site, know where to go for support and understand how to contribute back. Looking at the simple case of building a User Interface, shown in Figure 1, this is quite easy to understand. We can even link the roles themselves to various actions and outputs as shown in Figure 2.

Figure 2: Representation of the expression of the Ansible role and the resulting product — container images in EGI's Quay Organisation. The vertical axis describes the dependency graph of the Ansible roles for UMD products, while the horizontal axis shows how these are expressed in various environments by applying them. The final products (container images in this case) are immediately re-usable.

In this way, we can continue modelling individual roles and map events in source code to artifacts in production. The final touches to our modelling flow are added in Figure 3, where we add the links to the respective GitHub repositories and the all-important testing phase — more on that in a later section.

Figure 3: Schematic diagram of the full continuous integration and delivery of UMD configurations, as well as dependency tree respective Ansible roles, for the simple case of the User Interface. In this case, we deliver pre-built and Docker images to the Quay registry. Testing is done with TestInfra, a python-based infrastructure spec tool.

Action

Now that we have a clear idea of how to go about modelling our roles, and putting the tools in place for our continuous integration and delivery pipeline, we can take a closer look at using the EGI Ansible Style Guide to get started.

Getting started

The first thing you need to do is get the style guide, and use it to create a new Ansible role. Ansible roles are usually generated with the Ansible Galaxy CLI command init, but this uses a role skeleton which doesn’t cover many of EGI’s bases. We therefore use the egi-galaxy-template in the Style Guide repo to generate a better one:

git clone https://github.com/EGI-Foundation/ansible-style-guide ansible-galaxy init --role-skeleton=ansible-style-guide/egi-galaxy-template ansible-role-wn

We now have a shiny new Ansible role : ansible-role-wn. Before we go about implementing it, we need to have a means for implementing tests and generating test scenarios. Typically we use Molecule for this, which is great for generating a full set of test scenarios and strategies.

Install Molecule with pip, and generate a scenario, using a virtualenv:

$ virtualenv style $ source style/bin/activate (style)$ pip install molecule (style)$ molecule init scenario -r ansible-role-wn

Initial Commit

At this point we have an empty (but stylish) role in a clean environment and a default testing scenario.

Running the test strategy should result in all of it passing2:

molecule lint
molecule dependency
molecule create
molecule converge
molecule verify

This means absolutely nothing, of course — we need to start adding some failing tests !

Tests and Development of Roles

The EGI UMD follows something similar to an Acceptance Test Driven Development pattern.

There are several products, each of which are testing independently upstream by their owners, and candidates for inclusion in the distribution are then communicated to the release co-ordination team. This team then checks whether the UMD Quality Criteria are respected by the product, and whether the new version breaks anything already in production. There are several strategies for doing this, and the one which makes the most sense varies from product to product. Then of course, there is the expected functionality of the product as it would be in production. Finally, there is the consideration that we expect these roles to be deployed into production, which means that the configurations should be hardened and secure by design. Deploying faulty configurations into production environments — even with fully-patched software — can lead to serious degradation in operational security.

We therefore need to implement tests for each of these, as far as we can.

Test-Driven Development 3, from Extreme Programming4suggests that engineering proceed on a “Red, Green, Refactor” cadence.

Red

Considering we are developing the functionality of a worker node here, the first thing we could check for is that the relevant packages are actually present. Using TestInfra’s package module, we can write this assertion.

def test_packages(host, pkg): 
assert host.package(pkg).is_installed

Seems simple, right? All we need to do is pass the correct fixtures to the function test_packages, to see whether the host we will provision with molecule is in the desired state.

It is important to remember what we are testing for here. We are not testing whether the Ansible playbook has run correctly — or even whether an Ansible playbook has run at all — we are simply making assertions about the host. These assertions should be true no matter how the host arrived at its current state, and of course should reflect the desired state in production environments.

We therefore need to consult the source of truth5for the worker node package requirements — the same repository that the product team is maintaining which the UMD team has tested and done the QC tests on — to write the fixtures for this test.

We can still converge the role with no problems (nothing has been implemented yet), but when it comes to running the tests (molecule verify), we will be duly informed that they are all failing

Great success. Go ahead and add that test to the scenario:

git add molecule/default/test_packages.py git commit -m "Added failing test for packages" 
git push

Note: using the EGI Ansible Style Guide, there is a .travis.yml already set up for you if you want to do CI on Travis. All you need to do is enable the repository and Travis will take care of the rest.

Green

The next step in TDD is to implement just enough code to make that test pass. With Ansible, this is amost too easy:

First, create a variable in defaults/main.yml to hold the packages that need to be present, taking into account differences across operating systems and OS releases:

--- 
# defaults/main.yml 
packages:
  redhat:
   '6':
     - wn_pkg_1
     - wn_pkg_2
   '7':
     - WN_1 
     - WN_2 
  debian:
   jessie:
     - worker_node
   stretch:
     - worker_node

Next, add a task which ensures that those packages are present:

--- # tasks/main.yml
- name: Ensure worker node packages are present
  package:
    name: "{{ item }}" 
    state: present
  loop: "{{ packages[ansible_os_family|lower][ansible_os_distribution_major] }}"

Here, we take advantage of the facts gathered by Ansible identifying the host OS and version — which of course is why we crafted the variable packages in the way we did.

Of course, these tasks need to be applied in an actual playbook. Molecule creates the simplest possible playbook for the scenario for you:

---
# molecule/default/playbook.yml
- name: Converge 
  hosts: all
  roles:
    - role: ansible-role-wn

This playbook is used during the converge stage. If there are any dependencies which are required (which are now clear from our dependency tree!), they can be added before the application of the role you are working on :

---
# molecule/default/playbook.yml
- name: Converge 
  hosts: all 
  roles: 
    - {role: EGI-Foundation.umd, release: 4, tags: "UMD" }
    - {role: EGI-Foundation.voms-client, tags: "VOMS" }
   - {role: ansible-role-wn, tags: "wn"}

Once we have implemented the functionality, we repeat the converge and verify until the tests are passing.

Refactor, repeat

Figure 4: Schematic representation of a Test-Driven Development of an Ansible role.

Once the tests are passing, we take another look over our code and tests and try to ascertain whether the tests are really doing what we want them to do and whether that part of the role has been implemented in the best possible way. Figure 4 shows a general workflow of how this should be done.

Conclusions

Clearly, we are not done with the development of the worker node role. However we can be sure that application of this role to any production site will not break the site — a very important point! We now have a solid base from which to step to the next iteration, adding tests for desired behaviour and functionality to achieve it as we go. We also have the means to express this role in arbitrary environments — be they bare metal, hypervisor virtualisation, or Linux containers — all from a single well-maintained role.

As discussed above The worker node needs to be able to perform many functions — we should try to implement tests for as many of these functions as we can. Similarly, as many of the EGI Quality Criteria should be included in our test coverage, so that we can ensure sites that by applying these roles off-the-shelf, they will be increasing the stability of their site and decreasing their day-to-day operations load.

Furthermore, by using a common style guide for developing these roles, we make it easier to get started for others who want to contribute. The style guide helps peers and collaborators do code review when features or development is proposed via pull request, and gives clear guidelines for how these contributions should be recognised.

All in all, this is a small step towards improving the stability of sites in the EGI federation, without compromising agility and quality, and reducing the friction in the middleware delivery pipeline.

Originally published at brucellino.github.io.

e-Infrastructure Components that are Built to Last

Bruce Becker — Sat, 14 Apr 2018 00:00:00 GMT

Fashions come and go, but style never goes out of style.

TL;DR We have started work on an Ansible Style Guide. We hope it will encourage re-use and collaboration in our community, increase the velocity of delivery, and ultimately improve the quality of our infrastructure

The dark art of turning computers into science

“e-Infrastructures” composed of many different ICT services underpin much of modern research activities. Despite their importance, they are rarely seen or interacted with directly by researchers themselves, which makes them difficult to talk about1. Just like any other kind of research infrastructure however, they have to be built, and the way in which they are built and delivered informs aspects of their usability, fitness for purpose, scalability, sustainability, cost-effectiveness, and more. There are many ways to describe what e-Infrastructures are2, but I like to think of it as:

e-Infrastructures: the dark art of turning computers into science.

There are of course as many ways to turn computers into science as there are computers (or scientists, for that matter) — but there’s only so far you can get by yourself. At some point, you need to scale — either by accessing more computer resources, or by accessing more people through services which enable collaboration. This is where you start moving from doing research which needs computers, to research which needs infrastructures. This is a qualitatively different undertaking.

The same thing that makes this research (and hence the e-Infrastructures which underpin them) so tricky also makes them powerful: you can’t do them in isolation. Their value lies not in in their components, but in the interaction between these components — and the most important component in them has always been people.

Now, if we want to wield the magic that turns computers into science, we’d best do it at a pace which research itself requires, preferably without breaking things. We’ve 3 not been around long enough to see where “Move Fast and Break Things” leads eventually. Spoiler alert: it’s not4 good5(i.e., everything is broken).

Based purely on my own experience however, infrastructures often lag considerably behind the pace of development of their components. This makes sense — there is always more rapid development at the edges of technology than at its centre; it’s called the “Bleeding Edge”6 for a reason, after all: that’s where most of the pain is! But why do we have to make a choice between a decaying, if stable infrastructure, and one which is so brittle that it hurts to use it? There is perhaps a sweet spot in terms of proximity to the bleeding edge — a situation where we can move fast enough for infrastructures to adopt relevant new technologies (before communities get fed up with the slow pace and branch off on their own), but not so fast that we break things.

Keeping pace, together

How can we balance the need for change with the need for collaboration and co-operation ?

I want to discuss one specific case here, specifically related to the smooth delivery of e-infrastructure service components7. What I’ll describe here is a style guide for developing the code which builds our services, and how that guide can be an expression of our operations methodology and our culture of collaboration, mutual support and desire to build something that our children will still be using. I will be using the case of Ansible roles, but hopefully the points discussed here, as well as the later implementation, will be generic enough to cover other use cases too8.

Infrastructure as code

The “Infrastructure as Code” pattern has come to some maturity recently 9, but refers mostly to “Managing Servers in the Cloud”. What about “Managing the actual cloud” ? Well, this may be well-covered by a similar pattern (or more precisely, job description) — the Site Reliability Engineer, or SRE10. To quote Google:

SRE is what you get when you treat operations as if it’s a software problem

While we’re a long way away from that, there is a quiet shift happening in the e-Infrastructure world. There is a recognition that infrastructures are complex, interacting systems, but that these systems can indeed be described by software.

We do have a pretty good framework for managing services in EGI, based on a few industry standards. These standards require certain procedures and documentation to be in place in order to comply with them, and help to ensure that services are delivered in a consistent, reliable way to customers and peers in a federation.

Compliance, however can be attained in many ways — the standards make few statements about how requirements are met — only that they are. This leaves a lot of room for interpretation, which is a good thing if the standard is to apply in the widest possible set of cases. However, that can also leave open the possibility for confusion and conflicting styles to take hold, with some negative consequences we describe below.

Code as Community

Style is almost always a matter of opinion. As the old saying goes, there’s no accounting for taste. Modern configuration management tools use high-level languages such as YAML to describe what they do and how they do it, allowing developers and operators to communicate their work almost in plain English. The great irony is that while this makes it easier for individuals to read and write code, it can make it quite difficult for communities or even teams to do so, since individuals are prone to expressing their individual style. Differences in style can be a really positive thing, allowing freer-thinking, more creativity and ultimately more satisfaction in working together, as long as there is consensus along broad lines as to what constitutes good style, and more importantly what constitutes bad style.

If a community is to truly be a community, the individuals which comprise it must have values and ideas in common, beyond what is required by the language, the framework and the standards adopted.

Part of the security that comes with adopting a tool like Ansible11is the huge community that comes along with it. The fact that it is in use in so many different environments, with so many different goals and usage patterns, is in a way a vaccine against bias. This diversity can be channelled into some form of common understanding of what constitutes good style, and highlighting where the flexibility of the tool or language is being abused (as well as whether that abuse is justified).

Almost all languages have their linters12, and Ansible is no different. There are in fact two different style checkers for Ansible:

The jury is currently out on whether the latter is still alive, but given the tenfold ratio of almost all the metrics, I think it’s safe to say that at the time of writing ansible-lint wins.

Contributing to the Commons

In a service federation like EGI, there is a strong temptation for individual service providers to develop these roles themselves — a symptom of the “Not Invented Here” syndrome. The barrier to creating these roles is particularly low, especially if we consider the case where the community using these roles is empowered with solid knowledge of how the tool works. Much of the impetus to “rediscover the wheel” derives from the quality and reliability of the other wheels which have already been built. Instead of a robust design for “a wheel”, which can be re-used by anyone who wants to build a car, we end up with many flimsy wheels which just barely work. This is clearly a suboptimal situation — there is little to be gained by having such duplication of work, and the individual effort required to produce high-quality work is high. It is, however, a situation which nevertheless persists in part due to the lack of ownership of the products created.

How then can we create useful bits of infrastructure as a community, where these things are owned by the community itself? Ownership need not be restricted to the mere authorship of code — there are other ways to “own”, for example code reviews, bug fixes, contributing to the style guide, and of course ownership through usage, i.e. reporting issues, helping developers produce high-quality work, talking about the work at meetings, etc.

The main point here is that there are many roles to play, beyond the mere authorship of code, and each of them is important.

A guide, not a standard

Finally, a style guide is not a standard. It can be treated as one, but then it mostly ceases to offer the benefits of creativity described above. A guide is most useful when it is the sincere expression of consensus, based on the experience of a community of practice, of a better way of conducting an activity — not the only way, nor indeed the best way13. A guide should be more descriptive than prescriptive — describing how one should go about doing something rather than what one should be doing.

We’ve hit this problem so many times that the time has come to address it.

Reducing “Not Invented Here” Syndrome

Let’s say you’re starting work on the development of a new role. This could be either an existing service that doesn’t have a configuration management repository, or perhaps you’re working on a whole new service. The chances are that this role already exists — but the only easy way to check that is to see if it’s on Galaxy. Let’s see:

ansible-galaxy search umd Found 3 roles matching your search: Name Description ---- ----------- brucellino.UMD3 UMD3 repository for CentOS 6.x egi-qc.umd UMD distribution repository deployment AAROC.UMD-role Configures the Unified Middleware Distribution and Cloud Middleware Distrbution Stacks on your host

Uh-oh…

OK — but middleware components will be there, right? I’ll save you a lot of frustration, dear reader — they are not. This is not to say that a lot of work has not been done in our community in writing roles for “domestic use”. The tragedy is however that all this effort usually doesn’t produce a result of sufficient quality and scope that it’s reusable. Now, this is usually a problem with the role metadata, meaning that either it’s not enabled on Galaxy, or metadata doesn’t parse properly — but a larger problem is when roles are written to be so specific to a given use case or site that they cannot be re-used elsewhere.

Improving re-usability

For a role to be re-used, it has to be absolutely trustworthy, and this means putting some more effort into developing these infrastructure components, with a wider appreciation of it’s benefit to the wider community.

All of these problems could be entirely avoided, and transparently to the developer, by slightly changing the environment and making the development process a little more frictionless.

A better generator

Ansible, like almost any good tool out there, provides a neat way to generate a skeleton for a new project: ansible-galaxy init. It’s clear that many of the roles for EGI infrastructure that have been produced so far have not taken advantage of this, from the missing directories, files, etc., but even those that have been generated with Ansible Galaxy have conflicting or missing metadata14, resulting in them failing to show up in the Galaxy search.

But why should we be fiddling about with metadata in the first place?

In terms of the middleware, we only have a few options — these should be automatically added to the supported platforms in metadata.yml. By the same token, if you’re developing a piece of infrastructure, it’s probably a good idea to have your role cover the possible platforms, and not just one specific option.

Testing by default

Another sore point in re-use is knowing whether the role actually works. Sure, the documentation can express the limits of what the role is designed for, but again we hit the bias implicit in the developer’s mind. The only way to know if a role really does what it says it does is:

Apply it to various initial states.
Make assertions on the final state.

This borrows a lot from the Test-Driven Development (TDD) paradigm that Agilists know and love. Which now begs the question:

Where are all the tests?

There are two things we can do to improve both the re-usability of the role and the life of the developer:

Write the tests first
Generate appropriate test coverage along with the role skeleton

The former needs a whole post in infrastructure spec tests, which is in the pipeline. For the latter, we can easily include at least a default testing scenario with molecule, as well as a .travis.yml so that the role can have continuous integration.

This won’t solve all of our problems and certainly doesn’t guarantee re-use of existing roles, but laying this groundwork and making it easy to write solid, widely applicable roles will help. Infrastructure components should be reliable, do what they says they do, satisfy the needs of the community rather than the individual, and not introduce any vulnerabilities!

Stay tuned…

Tagged with community • style • commons • TDD • DevOps • Infrastructure

Originally published at brucellino.github.io on April 14, 2018.

DevSecOps

Bruce Becker — Sun, 08 Apr 2018 00:00:00 GMT

You can’t heal what you can’t see

I have been thinking about how to include a vulnerability scan in a pipeline to deliver applications to EGI FedCloud sites. It goes a little like this.

The big picture

A CSIRT’s job is never done! A distributed computing platform is inherently open to risks, even more so a collaborative platform. Ensuring that the platform is safe and secure for users is a thankless, full-time job. The attack surface can be very large indeed when one considers a platform like FedCloud — there are several layers to it which may provide vectors for exploits. While the majority of these can be locked down by the operators of these sites, at the end of the day, they are still used by … well, users.

Whose cloud is it anyway ?

There is the usual thin line to tread between usability, ease of access, and security. Much of the appeal of a thing like FedCloud is the freedom of users — and the communities which they belong to — to define their own applications and workload scenarios, molding the basic infrastructure into something they are comfortable with. In essence, by providing a common IaaS or PaaS layer, FedCloud allows users to deploy arbitrary applications, at their own speed, under their own control. Of course, with great freedom comes great responsibility.

There is an inherent difference between users and operators : the former are trying to optimise their usage of an infrastructure, while the latter are trying to optimise the stability thereof. It’s not that either of these player is malicious per se, but their different priorities generate a natural conflict — one which perhaps cannot entirely be removed, but which can be mediated.

Something to talk about

How could subtle changes to the environment improve the relationship between operators and users ? Perhaps the first positive step is to surface issues before they become a problem. The second is to provide a common language for announcing vulnerabilities and clear, easy-to-execute instructions for mitigating them when detected. With Dev (the users), Sec (the CSIRT) and Ops (the infrastructure) all on the same page, any conflicts can be discussed in an objective manner.

Prevention is better than cure

Currently, EGI CSIRT does a great job of scanning the endpoints of the infrastructure to detect vulnerabilities. This is necessary, since these vulnerabilities are not a static set, but new ones are being found continuously. Fixing machines that are already deployed is a necessity in order to provide a secure platform, but what about the new applications that are built and deployed by user communities ? Wouldn’t it be nice if these applications could be checked before they were deployed1? In a perfect world, there would be a more-or-less well-defined pipeline through which applications could flow, before landing in the production environment.

Typically, this would include all the great things that you can imagine in a pipeline — continuous integration, testing, and delivery. It would be awesome if we could surface the vulnerabilities or security risks at the same time as surfacing run-time or deploy-time issues. Put differently, we wouldn’t deploy a broken application, so why deploy an insecure one ?

How to spot a vulnerability

That of course begs the question:

How do we know that an application is insecure ?

A naive answer appears to be “Just scan the damn things for known vulnerabilities”. Indeed there are several tools out there for doing this, including Pakiti, clair and others. There are also tools for delivering “compliance as code” — particularly InSpec, TestInfra. These tools typically compare installed packages against a vulnerability database. They are designed to check OS packages. They may also work with some language ecosystems like Ruby2and Node3- maybe not so much with Python or Go — but that relies on specific packages which can be matched against a vulnerability database. These scanners do not actually do penetration testing, as is usually the case in network scanning systems, or more advanced penetration testing systems.

And herein lies the catch : What if your applications are not delivered with packages. This is indeed the case with, e.g. CODE-RADE, where everything is built from scratch and no “packages” are installed. We could, for example, tag builds according to the versions of the source code built, and then match those against the CVE databases, perhaps.

Although this is a design feature of CODE-RADE, it may be a mere convenience for many use cases. Users may simply hack their application into shape until it works, then tag a VM or container and call it a day. Detecting vulnerabilities introduced in such applications is going to be a risky business unless a true penetration testing suite can be introduced to the delivery pipeline. There are some tools, again mostly language specific, which can be called to the cause of keeping applications in the production infrastructure safe, e.g. OWASP Dependency Check 4.

Adding Sec to the DevOps pipeline

Let’s face it, we’re not running a Fortune500 company here. You can’t have 100% secure applications, but we can do a damn sight better than what we’ve got right now ! I propose a shift from vulnerability monitoring of the infrastructure to vulnerability testing of the applications before they even get there.

If development of research applications follows a continuous integration, adding compliance and vulnerability testing to the pipeline represents just another step for the application to pass. Sure, there is a conceptual leap to make: from “Trust me, Ops ! This opaque blob of data is totally benign!” to “Ok Ops, you’ve got your tests, I’ve got mine and they’re all passing” is perhaps a big one for many. In order to have people adopt this way, Ops needs to deliver a smoother experience and better support than they are currently delivering to users : builds for arbitrary applications, arbitrary environments and configurations and what Devs love about 12 Factor Apps : Dev/Prod Parity5.

Something we can all trust

Once we have a pipeline, we can and should raise security or vulnerability issues wherever we can, along with all the infrastructure tests. Furthermore, these tests should be separated from the application tests themselves. In other words, if Ops provides a tested and immutable environment for Dev to build on, then the application should:

Ensure that it can build — Are the compilers and dependencies available ? Are the relevant infrastructure services available ?
Ensure that it is correct — Have errors been introduced into the environment in recent commits ? Does the application maintain internal consistency in the build and execution environment provided by Ops ?
Ensure that it will run — Does the execution environment permit the proper execution of the application, with access to relevant backing6services ?
Ensure that infrastructure remains immutable — Has the application made detectable changes to our infrastructure ?

That last point is key. The rest of the tests (integration tests, unit or functional tests) are up to Dev. But trusting Dev to ensure that Prod is ok is like trusting the fox with the chickens — there’s an inherent conflict of interest, even if there is, as is most often the case, no malice. No, these tests need to be maintained by Prod, in collaboration with Sec.

Asserting Compliance

We then come full-circle. EGI has an extensive list of security policies which can be used as a basis for writing compliance as code. They need to get out of whatever format they’re in now, and into something that can be executed. To quote the Chef pitch for Inspec:

Transform your requirements into versioned, executable, human-readable code.

Detect Fleet-wide Issues and Prioritize Their Remediation

Reduce Ambiguity and Miscommunication Around Rules

Keep up with Rapidly Changing Threat and Compliance Landscapes

Originally published at brucellino.github.io on April 8, 2018.

Update: Thanks to Baptiste Grenier for useful comments and proof-reading.

Memento mori

Bruce Becker — Tue, 05 Dec 2017 22:39:34 GMT

Even this shall pass

The African Data Infrastructure Community Map

Bruce Becker — Thu, 31 Aug 2017 08:01:01 GMT

Where the hell am I ?

Last week, we published a story on making sense of the data infrastructures community. There was a question on the table of how various initiatives and projects related to each other, and who was involved in what — questions which could best be answered by providing some visual representation of the stream of emails that tried to describe the situation. I started a map of relationships, which quickly spread out into a map of a wider community.

community

The response has been a bit positively overwhelming and we’d like to respond to a few common questions and issues raised.

I tried to summarise the effort as follows:

The map is a network of entities.
These entities are people and things.
Things are projects, organisations or subdivisions thereof.
People and things can have attributes like You can add any tag you like, for example. Tags should describe the thing or the person itself, not its relation to anything else.
Things and people are connected via … wait for it… connections.
Connections can be directed. Connections must describe the relationship between things and people. Usually this takes the form of a verb. IE, directs, employs, builds, etc
Things are connected to other things via people. there should few or no thing-to-thing connections, this is an abstract connection. Sometimes it is necessary, when say you have an MoU or an institutional peering arrangement, but these should be really, really well-defined. The only obvious exception is the relationship between organisations and organisational units — e.g. The University of the Free State and the Library of the University of the Free State.

Questions

What is this ?

This is a map.

A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. — Wikipedia

It is also data. This is a map of relationships between entities. See above for a deeper description of what’s on the map.

What is this for ?

It’s what all maps are for: to find out where you are, what’s around you and how to get where you want to go. To be more concrete, this serves a variety of purposes, depending on who is using it:

A resource for describing relationships between relevant entities in the e-Infrastructure communities
A tool to answer questions about the nature of the community (how well-connected are various national initiatives? What degree of separation is there between libraries and national initiatives ?), or specific entities in it (who do I need to contact at institute X about project Y? Who is funding initiative Z?).

Who is this for?

Every map has a perspective. No map is universal. Image: “The world from 9th Avenue, Saul Steinberg” (New Yorker, March 29 1976 cover)

Who knows? In a very limited sense, it’s for people who are directly involved in the construction of data infrastructure. However, data infrastructure has a lot of edges, and interfaces with a lot of other things — libraries, scientific consortia, government agencies, elements of the e-Infrastructure Commons.

If you are at all involved with scientific data in Africa, you will probably find yourself in this map.

Who is this not for?

We consider Open Data initiatives like that of the African Development Bank to be on the periphery of our map. The same goes for peer infrastructures across the world like EGI, EUDAT, OSG, OpenAIRE, etc. Every map needs a hic sunt leones and ours will have quite a few, at least initially.

What exactly do you mean ?

Kindly let me know what you mean by ‘data infrastructures’ and ‘involvement’.
- Moses Thiga

Great question… my tentative response is :

“Data Infrastructures” means

resources (storage, caches, archives, course material)
services (persistence and uniqueness, publication, movement, replication)
actors which extract knowledge from data — things and people which process and publish data: librarians, researchers, science gateways, automated agents.

Data Infrastructures are probably best not considered as a binary thing — either they are or they are not — they are best considered on a spectrum. The more they are used by and for more than one actor, the more they are infrastructure.

So, let’s say you have a university with a researcher who has a data set on their desk and does research with it. This is clearly not data infrastructure.

If that same dataset is shared with other researchers via something simple, it is still not data infrastructure, but it’s closer.

If that dataset is stored in an OAI-PMH-compliant data repository, indexed by a metadata harvester, published with DOI and that repository is connected to a national federation, and a national compute infrastructure then you are close to 100% a data infrastructure.

“Involvement” means that the person performs one or other function in this system — they are connected to an entity in the map (a library, a HPC site) via some verb (data curator, data movement service developer).

When in doubt, just shout out and we’ll deal with things on a case-by-case basis.

How can I use this ?

The data is published under a CC-BY-ND-4.0 license:

Attribution-NoDerivatives 4.0 International

You can re-use the data, as long as you cite the original data set. You may not change the data, unless through the agreed mechanism for contributing.

How do I contribute ?

Would you mind please to elaborate the how I can help.
- Fireweini Gebreegziabiher. And literally everyone else.

This seems like the obvious question, very politely put. One might rephrase it as

OK NOW WHAT !?

Perhaps we need to answer a different question first : what can I contribute ? There are two things one can contribute:

Knowledge.
Questions.

Knowledge is things you know — independently verifiable facts about entities in this map. This is the only hard criterion: whatever you think should be on this map — a person, an institute, a project, a connection between any of these — it needs to be an objective statement of reality.

Contributing and sharing data

If you have been doing similar work in the past, building up databases of people and institutes, let’s share our notes ! Kumu can ingest various data formats. If you would like to contribute your data, you would be added as an author of the data, and a co-owner of the license. Of course, this has no impact on existing conditions of your original dataset.

If you have any doubts about the validity of a particular piece of data e.g. “Should this project be included?”, then likely it should, we just need to figure out how.

This means we’re in for a lot of discussion… I look forward to that.

Feedback

The easiest way to start these discussions is right in the map itself — Open an issue and we’ll take it from there.

Of course, you’re welcome to engage right here — leave us notes, share this article, let your network know. I look forward to seeing this map take ever more shape.

Thanks to all who got in touch, in and out of band, in no particular order:

Peter van Heusden
Kasandra Pillay
Ina Smith
Ronald Osure
Moses Thiga
Fireweini Gebreegziabiher
Uli Horn

The African Data Infrastructure Community Map was originally published in 🛠️ Building Data-Intensive Research Initiative for South Africa on Medium, where people are continuing the conversation by highlighting and responding to this story.

Making sense of the data infrastructure community map

Bruce Becker — Fri, 25 Aug 2017 10:59:27 GMT

Image credit : Pieter Goos [Public domain], via Wikimedia Commons

Splendid Isolation

Building something is easy when you’re doing it for yourself. You get to make all the decisions, and execute all the tasks, bound only by your own limits. Just focus on that burning idea in your head, and do it. However, as soon as you have to build together. Then, things get a little tricky.

Data infrastructure is one of those tricky things. In terms of mere complexity and scale, it’s not what one would call a “pet project” — good infrastructure is built for and is usable by the widest possible group of people. This implies a certain amount of consideration into what gets built. It may help to add more people to the project, to speed things up… but that may complicate how things get built. It also doesn’t address why things get built or done in the way that they are.

I think it’s fair to say that it’s not the scale or complexity of the technology that makes it so difficult, but rather that of the community. Doing things with people is different to doing them for people. Projects become ecosystems. and making sure that one has a coherent understanding of the implications of any particular action down the line requires a fine understanding of the subtle connections in that ecosystem. Making sense of this community — especially how you or others fit into it — can be confusing, to say the least. But that doesn’t mean we shouldn’t try.

Community and connections

During the Sci-GaIA Final Conference, we had a presentation from Mario Marais, on “Commmunal Sensemaking using Social Mapping”

https://medium.com/media/8a58590c6ecaec8aed8fcfb1902be8e4/href

This presentation inspired and educated me about how one could slowly build up a picture of the relationships, connections and other features of an ecosystem of stakeholders, and eventually make sense of it. Mario demonstrated how Kumu could be used to enrich the nodes and metadata of the network. Mapping tools like Kumu can really help to visualise the shape of a social network, as well as many more aspects besides.

Making sense

Now, I have recently started working in the data infrastructure community. As a newcomer, I have a lot of catching up to do, to make sense of what I’m being asked to do, how it impacts others, and vice-versa. One of the exciting developments which has recently started was the Figshare pilot, announced during a few workshops held across the country. As developer of the infrastructure which eventually needs to inter-operate with these repositories, I need to know which institutes are participating, who is involved, and what state the deployment is in.

Sure, I could have spent the rest of my life sending emails around, but…

https://medium.com/media/67954b979849715f1c59bab4c7da3e1f/href

This information — these connections and meaning and context — that’s all data. Precious data. I want to do data things to it, like preserve it, protect it, version control it, analyse it, use it to tell stories to convince people of things I don’t even know yet.

I want to make sure this data is there for future me, but also for everyone else in the ecosystem, so that they can help make sense of it, improve, add to and correct it, as we go.

Never Alone

There’s an internet meme which people use to express their solitude and sadness in a hyperconnected world. It’s particularly tragic when, in a sea of potential collaborators, we feel isolated and disconnected. There is a natural tendency amongst many people to see connections, and a temptation to connect the dots. However that internet meme also expresses a kind of subtle acceptance of the disconnected status quo. We simply cannot afford to ignore the connections that exist between us.

It is inevitable that subjective impressions are created (does this person really connect to that project ? In what way, specifically, does this person interact with this person ?) so we need to be careful in constructing such community maps. It’s very easy to misrepresent others, and indeed to create a distorted view of reality.

https://medium.com/media/e930d2b00c032a698af134062b0c2efd/href

As you can see, I started with the simple spoke-like map of the Figshare pilot implementation in South Africa. Each participating institute got a representative (which I gleaned from an email chain). This immediately allowed me to start adding metadata to enrich the nodes, such as the node type (person, institute or project), as well as relationship type (e.g. employed by, directs, advisor to, funds, etc).

This map immediately answered the one question I had had —

How is the figshare project linked to DIRISA?

Answer : through Xolani Nkosi ! Note that this is a person, but also a node in a complex network.

Next question: Where are you ?

This map doesn’t have to be perfect immediately — it can and should evolve. But now everyone with an interest, or a connection to data infrastructure in South Africa or our region can stake and claim their place.

I look forward to making sense of this jungle together.

Making sense of the data infrastructure community map was originally published in 🛠️ Building Data-Intensive Research Initiative for South Africa on Medium, where people are continuing the conversation by highlighting and responding to this story.

What’s in a name?

Bruce Becker — Tue, 01 Aug 2017 06:47:20 GMT

O, be some other name!
What’s in a name? that which we call a rose
By any other name would smell as sweet;

Perhaps this happens in every contemporary man’s life. At some point, he stops, looks around and wonders what strange carriage brought him to this particular station.

Banalities of the waiting room make it difficult to concentrate, distract my attention and derail my thoughts. Recovering from the intrusion, I take another look at my surroundings. Vending machines, rebellious announcements scribbled on an otherwise bare yellow wall, half a family tree seated beside me.

The sky darkens. After weeks of oppressive heat and squinting at blue skies, the thunder outside induces a counter-intuitive calm.

We have arrived. Today is the day… whoever you are, today we’ll be meeting you, in this familiar, crowded, noisy hospital.

We seem ready. We’ve been here before, for your brother, then for your sister. Your mother and I (well, perhaps just your mother) have done everything, taken every precaution, measured every risk and considered every angle. All of the aspects of adulthood are here, all of the trappings of experienced parents.

Except one.

You don’t yet have a name

You are number three, the third. We couldn’t rightly call you “Bruce Becker III”, pronounced “Bruce Becker the third”, as awesome as that sounds, with its air of inheritance and lineage. For one thing you’ve not much inheritance to speak of, and what little is there you will have to share with your brother and sister who came before you. What is more, your lineage only goes back so far. You are being born into a new family, one which stretches back just as far as the eye can see, but no further. Perhaps this is just the view from where I stand though, with my obstinate inclination to the future. Where I come from, you’re better in the future, kid.

No, we can’t very well call you anything to do with this line. I had had a name chosen for you, to tell the truth, depending on who you turned out to be. So long ago, ages, a lifetime, I remember a song, heard through a dream as barely more than a teenager…

Elle vole le velours et la soie
Qu’offre la guitare à l’infante
Pour se les poser dans la voix
Belle Isabelle quand elle chante

In the gardens of the Alhambra, still naive, I decided that should I have a daughter, she should be named Isabelle. That name spoke to me through the carved corridors and discreet fountains of the palace, and through the image of a woman who would be the inspiration for mirth, art and music.

Isabelle. What poetry lay in that name.

You are already late. It’s been more than a week since you were due, and now we’re getting nervous. No matter what, today is your day, though. Your mother and I move to the delivery room, your grandparents go for lunch and a walk. The sky finally breaks and the water held within it’s towering clouds comes crashing down all at once. While your mother attempts desperately to induce her contractions and convince herself that you are getting settled into position, I’m reminded of the long periods of our separation, while I lived in Pretoria, and of the summer downpours there. Another song makes its way into my sub-conscious

Amìala ch’â l’arìa amìa cum’â l’é
Amiala cum’â l’aria ch’â l’è lê ch’â l’è lê

Dolcenera… The names of all the women in songs I’ve ever heard come to me… Caroline. Elizabeth. Eve. Rose. Matilda. Louise. Lucille… Isabelle.

The doctor comes in, has a look around, helps your mother break the water, tells us he’s giving us an hour or so to see if labour will start, then it’s into the operating room for a C-section. It’s not that things are looking bad, just that this is not what we had planned for. Of course, we all had this suspicion that there was something not quite right with how you were sitting in that comfortable vessel of yours. Too high, facing the wrong direction; stubbornly and brazenly with your face in our direction, almost as if to say “Here I am, look at me”. Yes, we did look at you, and with some elegantly Fourier transformed sounds got to see your face through the protective shell of your mother’s womb… but that didn’t detract from the fact that you were facing the wrong way.

“Bruce, che lo chiamiamo se è maschio ?”. Your mother is starting to get worried. I am calm, though. I’ve been here before.

I can’t remember exactly when it was that I decided that my first born would be called Duncan, but I had no doubts as to why. Perhaps it was when I walked into that office, now empty, with his last message chalked up on the board for me. Perhaps it was during the endless late night chats with Alison. Perhaps it was when the message came that day that something terrible, something final had happened on Huascaràn down in Peru. I can’t remember — I can’t be expected to remember, at this distance. I just know that at some point, I looked around for a reference and found Duncan. This man had lived and above all, that is what I wanted for myself and for whoever came after me. A conscious, daily act of volition.

Duncan (the name, not the man), followed me from woman to woman, alighting in my mind as nothing more than a possibility. Nobody but your mother had the — shall we say “good fortune” — to be with me in the right place, at the right time, to start a family.

When finally that day came, it was nothing like I had imagined, and in a place I barely knew, much less understood. When I first proposed that name, I was taken aback by the derision that it inspired. After a time, since I was deeply in love with your mother and treasured her desires, I chose to let her choose. She chose “Federico”. A good name, no doubt, for a very good boy.

When your sister came along I was no less prepared than I am today. Again, we did not want to know whether we were awaiting a boy or a girl and again, the list of names started out long and became fractious. The story of your sister’s name is one for another time — this is about you, number 3 — suffice to say that I still longed to bring Isabelle the muse into the world. Again, I loved your mother, and out of love chose to give her the gift of choosing your sister’s name. She was no less in love with “Costanza” than I was with Isabelle, after all.

Time is up. Despite the insistent, painful exertions of your mother, despite your dried and hollowed-out capsule, despite your overdue stay, you will not budge. I laugh, as I always do when the situation is bad, and there’s nothing you can do about it. Your mother never understood this laugh, I suspect, and I do not have the words to explain it to her. How can I explain to her that it is my love for what is over what we desire that provokes it. She does not understand what I mean by objectivity, she comes from a world where the most important parts are deeply human. Although I am separated from that world, there has always been a bridge across it, between us. Something which allowed us to share thoughts and emotions without the least bit of translation necessary. A bridge wrought of the music of a dead poet…

We have a few minutes to prepare before she is led off into the operating room. Whoever comes out of that door with her, it will be you. And now, you have a name.

While we wait for the news, I hum to myself.

ricorda Signore questi servi disobbedienti
alle leggi del branco
non dimenticare il loro volto
che dopo tanto sbandare
è appena giusto che la fortuna li aiuti
come una svista
come un’anomalia
come una distrazione
come un dovere

- “Smisurata Preghiera”

Music invades my inner voice. There are no more useless diatribes on who and why in the past. There are no more names floating around, demanding to be heard, demanding to be given their time in the light.

What’s in a name ?

“È nato ! È nato !

Benvenuto, Fabrizio !”

This is our life. It is not in the Alhambra, nor on Huascaràn. We are in this ancient, provincial, rich, welcoming, corrupted, bigoted, beautiful, damaged island.

And you are my son, named for the poet who sang the bridges between your mother and I.

Per chi viaggia in direzione ostinata e contraria.

Benvenuto, Fabrizio.

Test-Driven Teaching

Bruce Becker — Tue, 29 Mar 2016 00:00:00 GMT

TL;DR — Can we really get the bots to help us in an online school, or are we just going to create more work for ourselves ?

We are facing a small dilemma in the development of a short technology-heavy course which we are going to run online for the Sci-GaIA project winter school which starts on the 1st of April. The dilemma goes to the heart of why there is so much inertia in our methods — whether those be training, writing documentation, or actually designing new services and tools — whilst we on the other hand continually strive to stay at the cutting edge of e-Science. It’s almost easy to stay up to date with products out there — many of the great tools that we use daily are designed to be easy to use and adopt, but it’s worthwhile remembering that we are also in the game of developing tools and services which we want research communities to adopt. The case in point is the Science Gateway framework. This is a framework for developing web-portals for allowing researchers to conduct their workflows by exposing user interfaces to their respective applications, and which is properly integrated into a federated computing backend1. But how appealing, how useful, how functional, how relevant is this approach really, in the real world ? Are research groups really able to build their environments2around the Science Gateway concept, with the science gateway3? It’s always good to be skeptical about the greatness of your own ideas and the usefulness of your own tools. And I think the only way we can really be unbiased is by putting our philosophy into action, and testing it in the crucible of independent adoption.

A lab in a school ?

I hope to be able to use this winter school as a testing ground for some ideas that I have about how we should be teaching research software engineers to use tools. What exactly we teach them to use — which specific tool is the right one for a particular job — is determined by the scope of the course, but I’m referring rather to how we actually use the tools we’re talking about.

I would like to focus on two things which I consider as fundamental disruptions to the status quo :

Testing
Automation

These are at the heart of the the philosophy and practice of Continuous Integration 4 5 which will, for the first time at least in my experience, take centre-stage in a course.

Automated Testing

I touched briefly and prematurely perhaps on the benefits of adopting continuous integration tools during the CHAIN-REDS/RECAS Summer School in 2014. This was the last time we’ve actually run a training event on developing science gateways, and I hope a lot has changed in the course of the last two years…

https://medium.com/media/6aa644c79c7894022dcc9990ba185c8c/href

Collaboration and Code from the start

A fundamental difference between the “old” way of doing things and the “new” way of doing things is the expression of everything as code. The idea that the development environment can be expressed as code, and can be created via the execution of that code is a way of re-enforcing the principle :

If something is worth doing, it’s worth keeping

“Keeping” in this case means putting your work into a change-controlled repository and working in a methodical way with this, using the version and change control tools at your disposal. In my opinion, this means “Use Git, duh.”, which has the corollary of “Put your repo on Github”6. This automatically provides you with an environment conducive to collaboration. Who knows, chances are that nobody but you, the author of the code, will ever look at — much less use ! — the code that you write, but working from the start in a manner which makes future re-use likely is a very good insurance policy.

Cruise Missiles for Miggies 7?

It may seem like a lot of overhead to get started — perhaps a whole day may need to be spent on setting up the various tools necessary for working on a portlet. However, I reject this point of view. In my opinion, getting to work on a project like this without an environment that is conducive to, and actively supports, future use is probably only going to create future headaches. It’s better, in my opinion, to spend time on a supportive environment than to introduce Technical Debt at a later stage.

These science gateways start of as small, standalone portlets, with just a few lines of javascript, java and html to define them, so it may seem like total overkill to insist on the creation of a repo, integration of that repo with automation tools, etc — akin to using a cruise missile to kill a fly. However, introducing these things at some point down the line — once bad habits, quick fixes and ugly hacks have become the norm — can be very, very painful. It’s better to make a time investment at the beginning of the project (or course in this case), to become comfortable with the mechanics of developing these science gateways, whilst learning about the underlying infrastructure and frameworks, than to put off the payment in technical debt for later, when the interest on this debt has grown too large.

Automated Feedback

This is the first time I will be running a course entirely online. We will be using the EdX open courseware platform. We will not have the ability to work side-by-side with the course students and see what they are seeing most of the time, as was the case in the previous face-to-face schools. How are we going to be “on the same page” as the students in this case ? Either we could be on call during the entire duration of the school, over videoconference, and fall back to screen-sharing, and real-time information exchange… Guess how that’s going to go !8

Or… we could provide a tool as part of the school which conducted the same kinds of checks which we human “teachers” would do. Except, these checks would be done automatically and consistently, and every time the student requested. It would and allow both sets of humans (the teachers and the students) to look at the same code and resulting artifacts (or errors), even asynchronously. Asynchronicity in this case is important, since time is at a premium and needs to be dedicated when it’s avaiable — which is probably not going to be synchronous between the student and the teacher.

Indeed, this is what we will do, with Jenkins — a dedicated instance of Jenkins will be use to run tests and compilation checks on the portlets developed by the students.

Contained tutorials.

Another piece of shiny which we’ll be bringing to the table during the development of the course is the use of Docker. Docker will be used to provide the students with preconfigured development environments — which have been tested ! — the mission which was previously fulfilled by virtual machine images. This in itself will not make a substantial difference to the students, I think. However, the AUFS filesystem and overlay capabilities of Docker will provide us with a means to make atomic changes to the environmnet which can be expressed as code9we will also be using Docker to reproduce various stages of the course, and compare student work with the reference material more easily. I think that using containers to express differences in the environment in a more atomic way, instead of providing the students with a perfectly configured, pre-prepared virtual machine will not only increase their confidence (since they will have to do some of the work, as tutorials, in getting to these states), but also make the lessons far more transparent and expose holes in their understanding of the various components of the framework. This will help not only them to learn, but also us to teach better.

Automated Deployment

The Jenkins instance will also be used to run the deployment into the testing environment of the new portlets, to estimate any negative impact on a production environment, should they eventually be deployed. It’s important that all of this extra “infrastructure” is itself reproducible, should we or someone else want to reproduce the course. For this reason, I’ve been working on the Ansible role for the Jenkins installation and configuration. Ansible is very capable when it comes to orchestrating services on new infrastructure, and we’re going to be using it to set up and maintain server and build slaves which will be used by the student projects.

In terms of running the tests, the idea is to run these tests in the containers that have been provided for the school. After the initial material has been covered, and the practical fundamentals have been taught, the developmnet/hacking phase can begin. By running continuous integration on their code in an environment which is as similar as posisble the real world portals where these portlets will be deployed, will allow bugs and errors to be caught early.

This will be a learning curve for all involved. Hopefully, we can stick to our philosophy of doing things right even when the temptation is strongest to just get a dirty hack out the way.

Discussion and critical thinking will be very important during the course of the school… either way, it’s going to be fun !

Originally published at brucellino.github.io on March 29, 2016.

Test-Driven Teaching was originally published in Open Science in Africa on Medium, where people are continuing the conversation by highlighting and responding to this story.

Announcing CODE-RADE Foundation Release 1

Bruce Becker — Mon, 07 Dec 2015 00:00:00 GMT

We are pleased to announce the first Foundation Release of the CODE-RADE project :

Continuous Delivery of Research Applciations in a Distributed Environment.

This release has been the result of the efforts of several people, and in particular the input of Sakhile Masoka, Fanie Riekert and Dane Kennedy is warmly acknowledged.

Goal and Design

Simply put,the goal of the CODE-RADE project is to provide a user-driven, high-quality, continuous delivery platform for research applications to any site which wants them. The platform has been designed based on widely-used components and tools :

Source code repositories : Github
Continuous Integration : Jenkins
Continuous Delivery : CVMFS
Service Orchestration : Ansible

CODE-RADE fills a gap between the user or research software engineer who wants to have their applications available on a distributed computing infrastructure, and the site administrator who is tasked with keeping resources available and properly configured. This user- and test-driven approach allows for continuous integration and delivery of new research applications into a common repository which can be permanently configured at all participating sites.

Actors and workflow in CODE-RADE

CODE-RADE is Open Source

The platform itself is expressible in terms of Jenkins configurations and build, test and deploy scripts. The entire platform can be reproduced, although this is not currently automated. Source configuration for the platform, as well as other supporting configurations can be found at the CODE-RADE repo in AAROC on Github

Targets

CODE-RADE aims to deliver pre-built applications to any site that wishes to execute them. In order to do this, the applications need to be compiled for a range of target sites, which may vary in operating system, CPU architecture, or in other ways such as network interconnect, availability of GPU’s, etc. Each of these aspects are axes in the build system and define a build matrix as follows:

SITE : A code used to describe the particular nature of the site. Currently only generic is used for all sites.

ARCH: The CPU architecture used at the site. Currently only x86_64 is used for all sites.

OS: The operating system used at the sites. Currently only sl6 and u1404 are used for RedHat6 and derivatives and Ubuntu 14.04 LTS are used respectively

Artefacts

Foundation Release 1 publishes three CVMFS repositories and several base libraries necessary for compiling other applications.

For this reason, the release is called a “Foundation” — this provides you stuff to build other stuff.

Several projects have been built:

Libraries are compiled with their dependencies expressed using module files and where possible several versions of the library are built in order to check for consistency and provide a wide as possible coverage. A tradeoff has had to be made between coverage and build times due to the limited resources of the build slaves.

Details of the projects and builds can be seen on the Foundation Release 1 page.

Dependencies

Dependencies are managed using the bash modules tool, and by relevant build triggers in Jenkins. Software dependencies have been expressed using these build triggers and inspiration has been taken from the Easybuild project. Other information on the dependencies of applications have been obtained from the project description pages.

The artefacts are built from the lowest-lying dependencies, up through the dependency chain, to end-user applications. Each build is responsible for creating it’s own modulefile.

Build Configurations

All build configurations are kept in change control. CODE-RADE has been designed to execute builds and tests of user-provided configuration on every commit to the source code management system. As the default SCM, we have chosen git and rely on the Github API to trigger most builds in Jenkins. While we do not have an explicit internal API, there is a working model of how builds should be configured, relying on three separate scripts executed in conditional sequence :

build.sh — This starts the configuration and compilation of the application, with relevant dependencies added
check-build.sh — if the compilation has been performed without error, this script performs application-specific checks, usually provided by the project itself (typically runnning make check or similar)
deploy.sh — Once the application has been checked and staged to the continuous integration environment, this script reconfigures the build to the actual CVMFS target. This usually just involves a cleanup of the configuraiton and a re-installation to a different prefix.

There are several limitations to this approach, not least the capacity for user error, since there are almost no checks on the content of the scripts. The possibility of using a domain-specific language or a build-flow approach has been considered, but

Repositories

CVMFS is used to distribute the artifacts to the sites. There is a Stratum-0 CVMFS server at the University of the Free State at apprepo.sagrid.ac.za. Three repositories have been created for differing use cases :

For more information on the repositories and how to use them, see the AfricaGrid webpage

Using CODE-RADE

With this release, the tools necessary for integrating new applications from any user are available. They can be used by mounting the repositories directly on your laptop or cluster, using the Ansible playbook or scripts provided. The repositories are usually mounted under /cvmfs, e.g. /cvmfs/fastrepo.sagrid.ac.za

Adding and using modules

Modules can be used by adding them to your module library :

module use /cvmfs/fastrepo.sagrid.ac.za/compilers 
module avail 
---- /cvmfs/fastrepo.sagrid.ac.za/modules/compilers ---- gcc/4.9.2 gcc/5.1.0 
---- /usr/share/modules/versions ---- 
3.2.10 
---- /usr/share/modules/modulefiles ---- 
dot module-git module-info modules null use.own

The environment variables SITE, OS, ARCH and CVMFS_DIR are needed to use the modules properly on your site. You can set them by hand or use the set script in the CODE-RADE repo.

Contributing

Should you be interested in contributing to the CODE-RADE project, you can :

Roadmap and planning

This is the first release of the CODE-RADE platform; new applications and builds are continuously added to the repositories. Further releases will contain more tooling (different compilers, application libraries and application dependencies). The Release Milestone issues have all been closed.

We use Slack to keep in touch while developing, and automation talks to us in #code-rade. If you would like to help us deliver software better to researchers, get in touch and come hang out with us (and our trusty bots.).

Tagged with blog • CODE-RADE • Release

Originally published at www.africa-grid.org on December 7, 2015.

Announcing CODE-RADE Foundation Release 1 was originally published in Open Science in Africa on Medium, where people are continuing the conversation by highlighting and responding to this story.

Back home

Bruce Becker — Thu, 19 Nov 2015 00:00:00 GMT

Cathrin Stöver on Twitter

Here are the Ladies of the Ubuntunet Alliance celebrating 10 years of success!! @UbuntuNet @AfricaConnect2 pic.twitter.com/9qs2DHbkBb

Back home

Everybody has a place that they like to call home. It may be a place, or it may be a group of people — and coming back to an Ubuntunet Connect meeting felt a bit like this for me — coming home. It’s a community meeting of people who are sincerely engaged with their constituents and are committed to making a difference to research communities in the long run. Most of us there know how little positive feedback there is in developing an NREN — when the bandwidth is slow and the services don’t work, we hear complaints and murmurings of discontent, but when everything works, we hardly ever get a pat on the back for a job well done. When the network works, it disappears (as most well-functioning techonlogy). Yet, despite the often lacking feeling appreciation from user communities, the NREN community is really close and there’s plenty of support from within.

So, this was the first time in Mozambique for me, and the n-th time “back home” in Ubuntunet’s annual meeting. The conference was held over 19–20 November in Maputo.

Looking back…

This event was an especially important one for two reasons : for one thing, it’s Ubuntunet’s 10-year birthday !

Now, you can write your own “where was I …” anecdote in the comments; in my case, ten years ago, I was in the thick of my Ph.D. at the University of Cape Town, working on the ALICE experiment at CERN. Part of my job was developing software for the alignment of the Dimuon spectrometer of the ALICE experiment, but we were also tasked with building part of the Worldwide LHC computing Grid. Turns out it’s quite hard to do that without a network…

Skip ahead 10 years and in South Africa we’re running jobs like a boss at several sites in the WLCG, including ALICE jobs at the Centre for High-Performance Computing

It’s obvious to see things taking off around 2012. The availability of adequate bandwidth to South Africa via undersea cables around this period played a huge part in this continued success1. This is just one case where NRENs are directly enabling research which couldn’t be done without them; Several more can be found at http://www.casefornrens.org 2. It’s worthwhile to remember that in times of plenty (which we in South Africa are experiencing now) as well as in times of scarcity, which many of our colleagues in Africa are living through. It is as a community that we grow and mutually support each other, and it’s important to remember that this is how we achieve the greatest goals: through collective action.

Seeing continued investment in the network is one of the most satisfying feelings we can have as infrastructure developers, and it was therefore with great excitement that we heard that the AfricaConnect 2 project agreement had been signed :

AfricaConnect2 sets out to extend the success story initiated by AfricaConnect and EUMEDCONNECT to the whole African continent, thus accelerating the development of the Information Society in Africa. Whilst the connectivity boost will improve the lives of millions of Africans through accelerated research and education, it will equally benefit collaborative scientific research the world over, in areas such as climate change, biodiversity, crop research, malaria and other infectious diseases. AfricaConnect2 is expected to commence in July 2015 and will have a duration of 3.5 years.

— AfricaConnect Project closing statement

There were several speakers of note at the first day of the meeting, who commented on these milestones. We had representatives of the African Association of Universities, the African Academy of Sciences, GEANT, the regional NRENs of course (Ubuntunet, WACREN and ASREN), as well as the Network Startup Resources Centre, who has been there for the African RENs amongst others for … well, since the beginning ! A few things stood out as they gave their speeches.

First of all, the realisation that Africa is a first-class citizen of the research world. Perhaps one with a tiny voice, but still a full citizen, participating and contributing to efforts at every scale and in every area. Research networking is needed more than ever now and this was noted by the AAU as well as the AAS. This may sound obvious to the casual observer, but for those of us who have been around 10 years or longer, we can appreciate how much of a change this is. Research networking and advanced services were scoffed at not so long ago as a luxury that could not be afforded (in the best cases), or as an outside influence (possibly even a corrupting one at that) of dubious value.

Words of encouragement came from the regional NRENS in the North of Europe (Nordunet), Latin America (Clara) and the Caribbean (CKLN). Of course GÉANT’s presence — in the person of Katrin Stover — was also warmly felt in the room.

ORCID provides persistent and unique identifiers for researchers and institutes

Highlights

The first highlight was that ORCID made a huge showing at this meeting. Whether this was a coincidence, due to the fact that it is objectively gaining traction, or the fact that we had organised a pre-conference workshop on Open Science wherein ORCID and DataCite featured heavily, is hard to tell. At the end of the day, it doesn’t matter — what matters is that the audience was hearing the same message consistently :

Unique identifiers and persistence are key to research output.

DataCite provides persistent digital object identifiers to research products and other advanced services

Good to know we’re all on the same page.

Service and Identity Federations

The second big thing was the breakthrough of Identity Federations. I would perhaps be going out on a limb to say that they have “arrived” in Africa — and that’s certainly not true across the board — but everyone in the room understood that Identity Federations were something that the research and education communities needed — and what is more needed to develop amongst themselves. Federated services are being heavily promoted by all three cluster projects (MAGIC, TANDEM and Sci-GaIA) and I’m proud to say that we’ve helped to bring new identity providers and services online across the continent. If the old saying goes that “You can take a horse to water, but you can’t make it drink” then perhaps in this case, the horses are starting to get thirsty !

With the work that Roberto Barbera and team is doing around the Catch-All Identity Federation, and the effort we’re putting into the development of infrastructure services for these federations in African countries, we might well see a few new African Federations in edugain by the end of 2016. Hey, maybe it’ll even be SAFIRE !

South African Federated Identity for Research and Education

e-Infrastructure and Collaboration Platforms

The third big theme that came out of the conference was discussion around those phantomatic “advanced services”. Since the official theme of the Conference was “The Road to Maturity”, this was on-topic, but what struck me was the maturity of the discussion and opinions. Gone is the euphoria of expectation for what “the almighty network” will bring us. “The Grid” is no longer referred to in capital letters, and we’re getting a pretty good idea of what collaboration actually is : a long slog of a thousand dancers all dancing to a slightly different tune, continually stepping on each others toes and only apologising when it suits them. The real world is a harsh place ! But it’s the only one we’ve got for now. Making it any better, unfortunately, means making it better for all — and that means working together, collectively where possible, and as I’ve said above, trying to support each other wherever we can.

It was very encouraging however to see that there are new computing resources coming to light, and new investments in data capacity and people in areas of the region. Mozambique, Zambia, Kenya, Sudan, DRC, Ghana, Nigeria, all bringing systems online. I hope that we will be able to coalesce around some common goals and work together to build a platform which will benefit us all. This is clearly a pipe dream — there’s no way we could even design something that broad — but I think it’s a worthy ideal. In order to get anything done in the long run, we will have to pick our battles carefully — especially when the opponents are our own brothers and sisters, academically speaking !

Videoconferencing, Security and Open Science

There were as usual a very good selection of presentations from across the NREN world at the conference. I was particularly happy to see presentations on very mature videoconferencing platforms3as well as a much-needed presentation on the development of a CSIRT in Kenya. There were a lot of good talk about Open Access and Open Science activities in Mozambique and beyond. There were also some demonstrations of mature services such as Collaboratorio and the Open Science Commons Platform being developed by Sci-GaIA. More about these in the future…

Thoughts

As I sat through the presentations, I considered how all ideas and technologies sort of have their time. When things come ahead of their time, they fail because of lack of perceived need or a general comprehension of how things work… Nicola Tesla might have something to say about that. When ideas come to fruition after their time has passed, for whatever reason, they also fail because they are born obsolete.

Well distributed computing is such an idea — particularly this idea that we can build common platforms for different kinds of research. It all makes sense on paper, but it’s not paper that decides what works, it’s the communities out there. We started building massive community-based distributed computing facilities in preparation for the the LHC data… and hey — they worked very well ! Then, they took a look around and said “hey, everyone’s like us — they need this too”. And so the general-solution of grid computing was born.

To be fair, a lot of people believed this over the years and many hundreds of millions of euros were spent. Some communities got productive use out of it, but not many. Especially, this was not a solution for the “generalised case”… but was it because the technology of grid computing was premature, or that the factors behind it’s success in a few fields were not generally applicable ? There is no answer, this is just a point to reflect on as we embark on the construction of yet another panacea — the academic cloud. I’m willing to bet that the architects will make many of the same logical connections drawn from successful use cases and eventually convince themselves that they are doing the right thing. User communities will also be bombarded with the message :

“You’re doing it wrong”

If you take umbrage, dear reader, at this crude analysis, I certainly can’t fault you. However, the point of this discussion is not to find fault with the past, but to learn from it; and more even more to the point, the discussion is aimed entirely at yours truly — there’s no greater critic than yourself, they say.

So, as we embark on the development of new infrastructure in our region, moving on and learning from the good old grid paradigm, trying to make things cloudier and user-friendlier and more flexible and whatnot, I would like to keep in mind a few things :

If it ain’t broke don’t fix it.

There’s a lot of juice to be squeezed from the “good old grid” yet, and indeed, the Africa-Arabia Regional Operations Centre still has resources to consume. So, don’t decide that we need to build something new just because it’s new.

Let’s build with users, not for users

Whether it’s Science Gateway user interfaces, new publishing platforms, or distributed computing and data infrastructure, user experience needs to be included from the start. Sure, many communities don’t know what’s possible and may feel constricted in their approach, and a new analysis of the situation may bring them more productivity, but let’s be on the look out for that ever present, oh-so-tempting “You’re doing it wrong.”

Building community is better than building impunity4

Perhaps “impunity” is the wrong word, but hey, it rhymes with community. There has to be a symbiotic relationship between research communities and research infrastructures. e-Infrastructures should probably not be built by individual research communities, since that would result in huge duplication of effort, which means that true e-Infrastructures (as opposed to, say an ICT component of a research infrastructure) need to be built with researchers, not simply for researchers. As we need to understand their way of doing things, they need to understand by we can’t just do anything they want; they — like we — would be entwined in an ecosystem. This would emphasise the long-term benefits of a healthy community over the shorter term benefits of specific project outputs. This may be hard for research groups to hear, but it may be even harder for funders to swallow, since they want to know that their money is going to the cause they want to support right now.

There’s no easy fix to this and not every case would follow this co-design route. We can’t run a workshop or send an email or make a position paper that will make things better, because this is about a culture of sharing and collaboration and an emphasis on the “big picture”, rather than a technical challenge.

Perhaps what we can aim for is a common understanding, better and more open communication, and most of all, some emphathy between those building and those using the tools of the knowledge economy.

On that note, I leave you. Until the next meeting, somewhere warm5.

— Bruce Becker

(with tweets from Katrin Stöver)

Cathrin Stöver on Twitter

Back home (sweet home) from the exciting UbuntunetConnect conference in Maputo! @UbuntuNet @AfricaConnect2 pic.twitter.com/hkfv7cNsGb

Tagged with Ubuntunet Connect • conference • commentary • community

Originally published at www.africa-grid.org on November 19, 2015.

Back home was originally published in Open Science in Africa on Medium, where people are continuing the conversation by highlighting and responding to this story.