Octue - Medium

Achieving Scalability with Digital Twins

Tom Clark — Mon, 17 Feb 2025 12:54:23 GMT

The engineering industry is alight with the phrase “Digital Twin” — but in a world of complex systems, how do we do that in a scalable way?

Continue reading on Octue »

72 Hours at WindEurope: Using the service

Tom Clark — Mon, 17 Feb 2025 12:41:53 GMT

Well, it’s all reappeared now and people at the conference are looking at it, evaluating the tools and figuring out the processes.

The last of the six steps is to use the product, then start mapping out the chess game of what to do next.

I’ll come back here once we’ve had a decent length of time to do so, and update you with what went on and what we think the next steps are!

72 Hours at WindEurope: Using the service was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 Hours at WindEurope: Delivering the service.

Tom Clark — Mon, 17 Feb 2025 12:41:47 GMT

It’s Wednesday Afternoon in Copenhagen, and it’s time to deliver both a service and a presentation. This article is for the fifth of our “Six Steps” — you can read the overview here.

This article is aimed at engineers, researchers and execs in the Wind Industry, to help understand the process of digitalisation. If you’re qualified in Systems Architecture or Data/Software Engineering, you’re way ahead of this; just get stuck in already!!

About the ‘Deliver’ Step

‘Delivering’ is all about how easy you make it for the consumers (end users) of the data to get it. Essentially, we’re talking about User Experience.

User Experience (UX) is King. And not just for webapps.

User Experience (UX) is the way that you, and people in your team, will work with your tool. UX isn’t just for webapps — its how anyone interacts with your tool… and your project will live or die on UX. If tools are difficult to use, people avoid them like the plague no matter how smart.

And this right here is why you, a humble engineer somewhere in the vast Wind Industry, are the perfect person for this bit. You know what you need. So don’t be afraid to build something to make your own life easy!

Some things to think about:

Should there be a Command Line Utility? Check out the Clicks python library for making user friendly CLIs.
Will you be centralising/storing data somewhere? (See below.)
Can you wrap complicated stuff in a helper library?
If so, what language(s) are needed? Try to stick to just one for the first few iterations. Python is pretty popular, or if you know it then Rust is fast and extremely versatile.
Will you need an API? Cloud Functions are a great way of creating ultra-simple API endpoints.

Where we’re at

We decided that we’d do two things to make it ultra easy to interface with the service. Sure, users can make requests directly to the API but we wanted to make it familiar for basic users of python, and non-technical users.

Python client (on pypi): https://github.com/octue/windeurope72hours-elevations-client-python
Map (React/ Javascript using DeckGL): https://je3mob.csb.app/

Try it

pip install windeurope72hours

python

>>> from windeurope72hours import get_coordinate_elevations
    elevations, later, estimated_wait_time = get_coordinate_elevations(
        [[54.53097, 5.96836]],
        resolution=12,
    )
    print(elevations)

>>> {(54.53097, 5.96836): 0.0}

As you pan the map, it automatically fetches elevations near the centre of the view :)

Wrapping up

Now it’s over to YOU! Please use the API or the python client, and let us know how it goes!!

72 Hours at WindEurope: Delivering the service. was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 hours at WindEurope: Populating data

Tom Clark — Mon, 17 Feb 2025 12:41:42 GMT

It’s Wednesday Morning in Copenhagen, and we’re looking at how to populate data into our database. This article is for the fourth of our “Six Steps” — you can read the overview here.

About the ‘Populate’ step

There’s really not a lot we can say about populating your data store, because this is the bit that gets totally different each time. In some cases, this might not even be necessary — for example if you’re tying into an existing store. You can safely skip this step if that’s the case.

Just show me the code!

With pleasure :)

Populating Elevations Data

Ensuring provenance

We should always be delivering data with clear sourcing. I can’t stand it when you get data from Google or somewhere, and it seems legitimate but isn’t sourced. You can’t use that for science!

We’d forgotten this in our brainstorming, so added an extension to the database graph we developed earlier. This allows us to specify the source of the dataset with a proper reference:

Adding a graph node to preserve scientific provenance of the data

Data sources

The underlying dataset we used to provide the elevations is the Copernicus DEM — Global and European Digital Elevation Model (COP-DEM) GLO-30 dataset:

We accessed it via the AWS S3 mirror, which provides easy access to the dataset’s GeoTIFF files:

Information: https://copernicus-dem-30m.s3.amazonaws.com/readme.html
URL: https://copernicus-dem-30m.s3.amazonaws.com
S3 URI: s3://copernicus-dem-30m/

While developing the populator, for cross-validation purposes, we developed a short script to plot data over a map on a Plotly chart — if we get time, we’ll refactor that into an Observable for people to use.

About Resolution

Elevations in GLO30 go down to 30 arcseconds spatial resolution, which varies depending on where you are on the globe but broadly is about 30m.

Looking up the H3 cell statistics, we see that Level 12 hexagons have an edge length of ~10m, making Level 12 the first level that’s finer than the spatial resolution of the data itself. So there’s no point going finer than this; we don’t populate any cells lower than L12.

These Level 12 hexagons have a 10m edge length. You can see the underlying resolution of the data in this view over some rough terrain. Note that the slight oversampling of the dataset is apparent in the pattern here, but not perfectly because of the nearest neighbour sampling and the different grids.

How we populate higher levels

The L12 cells are populated by nearest-neighbour sampling the original TIFF files using the centre of the hexagon cell. That’s ideal, because L12 cells are smaller than the grid size.

But Level 12 is a lot of data to render if you want to cover a whole country! What if we need something coarser? Rather than attempting to analyse the raw data, we were able to take advantage of the graph structure to aggregate values up to coarser resolutions:

Populate all L12 hexagons in an area.
To populate a parent L11 cell, take the seven L2 hexagons inside it and average them.
Keep going until reaching Level 8 (simply because we didn’t think coarser cells would be very useful).

Aggregate up to coarser levels by averaging seven values per parent hexagon

There are some very powerful ways of doing aggregation in databases, but we stayed simple and just wrote some python code inside the populator service. Easy wins the day!

Levels 9 through 12 of refinement in the same region used for cross-checking above

Engineering the Populator service

We used our own SDK to create and deploy a service to Google Cloud Run. The point of our SDK is to help wrap scientific code and provide it as a data service, so we’re constantly working on features to give the extra helping hand.

It works on a “question-answer” model. Some features we’re proud of:

Under the hood it uses an event stream to initiate a “question”, and a second (question-specific) event stream to manage communication (for logs, monitor metrics, progress updates, and optionally a final “answer”).
It handles errors by capturing all the inputs of a question, so an error can be reproduced for investigation with single line of code.
Services using the framework can ask each other questions.
Extra tools for querying and managing files in cloud object stores.

The populator service only makes use of the most basic aspects of the framework. It sits there. The API can ask it to add a list of hexagons to the database.

Security

We added a “Secret” to the service (you can Secret Manager resources in our terraform config) and the purpose of that is to add credentials for accessing the database. Never commit credentials to source code!

Cross checking

We started without using the database at all — just plotting directly. to make sure we were pulling out regions correctly.

We used a variety of different locations and maps, plotting hexagons with non-zero opacity to check that we were correctly correlating values with features in the landscape.

Lazy Loading

The populator is set up to load just a region around a selected area. This allows us to lazily-load data on demand by sending a question to the populator (from the API service).

72 hours at WindEurope: Populating data was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 hours at WindEurope: Engineering the cloud bits

Tom Clark — Mon, 17 Feb 2025 12:41:37 GMT

It’s Tuesday Afternoon in Copenhagen, and it’s about time to show our hand with some code. This article is for the third of our “Six Steps” — you can read the overview here.

About the ‘Engineer’ step

Partly because we’re keen to just show you what we’re up to already, and partly because examples are much more useful than essays, I’m going to only provide a short general background.

This step is where you get stuck in and actually do something other than fun diagrams and fantasising about worlds covered in hexagons. If you’re not a pro, Cloud Engineering can seem really daunting at first, but follow some tutorials (or have a go at re-implementing our exact setup below) to get started.

We can’t resist mentioning just THREE things you’ll see us do this week. Get these set up for your team and you’ll save a TON of time, we promise.

Use terraform (if you do nothing else, do this)

We were a bit late to the party with this, and recently started using Terraform to specify our Infrastructure As Code (IAC). It works with all the Cloud Providers like GCP, AWS and Azure.

Terraform has been a revelation. Start now. Not later. Don’t even create a single bucket using the console.

Terraform is SO simple to set up, don’t think of it as “something to do later”. Your setup and learning time will be saved after the first day of work. We especially like that the identity/access permissions (the most difficult and most important part to get right!) are clear.

There are plenty of alternatives to Terraform (for example Pulumi syncs nicely with existing infrastructure). We wouldn’t a Provider-specific solution — one of the strengths of Terraform is being provider agnostic.

Version control and conventional commits

Even if you’re working alone, having code on GitHub and using the git workflow lets you remember, check and compare things easily.

Adopting a conventional commits pattern means you not only get a great engineering logbook, but you can auto-generate releases and version numbers. At Octue, we built a whole system of open source tools to help (see an example of the autogenerated releases here).

Continuous Delivery

In each of the repositories we release in this task, look in the .github/workflows folder. These “actions” deploy code to the cloud systems, create release notes, or run other automations every time we merge code into main branch. We don’t ever think about how to get code onto servers; it’s just there a couple of minutes after we finish. HUGE time saver and great for quality control.

Just show me the code!!!

Here you go. The windeurope72hours-elevations-api repository defines an API service that queries a database. All the Cloud Infrastructure is also defined in the same repo. There will be two more repositories coming up soon.

Code is on GitHub

Remember, this event is NOT a hackathon — we didn’t write this all in the hall, although we are refining it throughout the week. For transparency, it’s taken about 4 days to produce this basic service, and there’s lots of room for improvement — look at the commits to see a detailed history of how it evolved.

What it does

The API comprises basically one function. It:

Accepts a request (POSTed data complying to a schema we’ve published)
Checks the input isn’t outside some basic sensible ranges.
Queries the database for cell contents
For hexagon cells that aren’t in the database, it asks a “question” to a scientific data service (more about that in our next step!), in order to populate the database.
Responds with results from the database, and an “ask again later”

Figuring out the database

We spent some time reading a variety of material around Graph databases, and how to store tree-like data such as this. In the end, we opted for (almost) the simplest arrangement we could:

By storing elevations on a separate node, we doubled the amount of nodes in the database. Why do such a thing? We were bearing in mind the ability to federate additional data in the future — having one set of nodes representing the position allows us to query for those nodes and all types of data associated with them more straightforwardly than if each data type had its own mesh.

We were inspired a lot by this paper, worth a read:

Massively Scalable Geographic Graph Analytics Using InfiniteGraph and Uber's Hexagonal Hierarchical Spatial Index

Choosing a Cloud Provider

We use GCP for everything at Octue, we find that it has some subtle technical details (like the atomic clock synchronising all the cloud stores globally) that make things overall run smoother. And the console has a pretty consistent interface between all their different offerings, which makes a big difference to the learning curve.

But mostly, we use it because we’re used to it. If Azure or AWS are your thing, there’s nothing here that doesn’t have an equivalent in those providers.

Single-endpoint API

The API that we laid out in our architecture is super simple — it has a single endpoint. So rather than spin up a whole server to handle that, we’ve gone with a “serverless” Cloud Function which is nice and easy to deploy:

Cloud Infrastructure — Terraform

The entire infrastructure is defined in our terraform files here. We’ve added notes to each entry describing why and what it’s for. You’ll notice there are some things we haven’t talked about yet (like a Cloud Run service)... We’ll come back to those!

To reproduce this entire project for yourself, you would:

Create an account on GCP
Create a new project in your GCP account
Fork the code repository to your GitHub account
Check out code to your laptop and install terraform
Create a service account in your GCP project, named terraform, with Editor permissions. Save the JSON key file to terraform/gcp-credentials.json
Change the variables in variables.tf to match your project then run tf apply

You’ll be asked to enable a lot of APIs which is a chore, but once you’ve clicked the links and enabled all the APIs you’re done.

Pro tip: We always choose our regions to have lowest CO2 impact. Here, we’ve set everything up in europe-west-1 — there’s no need to worry about global query speed or redundancy for a little demo service like this.

Cloud Infrastructure — Database

The only thing we didn’t provision using Terraform is the database, because we prefer to use managed database services, we signed up for a paid tier on AuraDB, hosted by neo4j. The main motivation for this selection was simply that we wanted to try it.

Pro tip: Moving from the free to the paid tier on AuraDB got us a dedicated instance in the same region as our cloud function is deployed, which sped up API access calls from 5s to 1s!

Pro tip: Managing databases requires a dedicated expert. Unless you have access to one of those people in your organisation, always used one of the managed database services from your cloud provider. It costs less in time, and carries much lower risk (unless your data is easily reproducible).

Limiting Costs

The bright-eyed among you will have realised by now that to handle the trillions of nodes we’d need an EXTREMELY large and costly database!
So, rather than populating the entire planet we chose to “Lazily Load” data. That means, on request for a particular location / h3 cell, we populate that area on demand.

Pro tip: Provided you’ve wrapped complexity up behind a straightforward API, you can always change the mechanics later…

If this takes off, we can use all sorts of strategies to reduce storage cost. We could use advanced heuristics to remove rarely accessed data, or use more advanced tree-walking to cover areas of ocean without storing all the fine level cells, or ditch the database altogether and pre-process raw tif images into a much quicker-to-access custom binary format, just o name a few ideas.

We’ve also heavily limited the number of simultaneous attempts to access the API. This is purely to keep our costs down (we’re providing this for free, after all!).

The cloud function keeps a cache of all the cells it’s asked to be populated so it doesn’t ask for the same ones again.

Something to ponder: What do you think is wrong with our caching strategy? Could it be improved and how much would that cost? Answers on a postcard!

Wrapping Up

We made a really simple API and deployed a free tier database, which we later upgraded to the (cheapest!) paid tier for performance reasons.

We knew that we couldn’t store all the data, so we developed a way of lazy-loading it on demand, which can be improved and streamlined later.

But that’s useless right now. The next step? Let’s get some data into it!

72 hours at WindEurope: Engineering the cloud bits was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 hours at WindEurope: Architecture

Tom Clark — Mon, 17 Feb 2025 12:41:32 GMT

It’s Tuesday Morning in Copenhagen, and we’re talking about Architecture with people in the hall. This article is for the second of our “Six Steps” — you can read the overview here.

About the ‘Architect’ step

In the ‘Architect’ step we aim to design a ‘Minimum Viable Product‘ to deliver the data.

Product and/or data architecture is very cool sounding. It brings to mind extremely highly paid people at Google, finessing the perfect solution in their mind whilst having a conversation over a perfectly-brewed latte. The reality is different. It’s rough and ready, and subject to rapid change.

The objective is to write down roughly what the key components of your system will be, and roughly how they connect.

The golden rule for this step is that it’ll never be right the first time. Accept that, and you won’t get roadblocked. This is why the “Six Steps” are a cycle, not a straight line. Each time around the cycle, you improve and iterate on the architecture.

Who should do this?

Pretty much anyone with a general background in data/science/tech can approach it, although whether you should is really up to you and how you want to spend your time. If you tackle it, pass your ideas by someone experienced — even a 30 minute call will give you peace of mind.

What tools are available?

No kidding: at Octue, we always — and I mean always — use paper and pens. This is somewhat about creativity, but mostly because it’s the fastest way of getting diagrams down: using colours allows you to quickly differentiate components / sidenotes / complete tangents where you teach your team about an obscure aspect of computational geometry.

If you have to do it remotely, we’ve tried a lot of whiteboarding solutions (like jamboard and so on) but the best one by far is Google Slides. It’s not marketed for this purpose but it probably has the least glitchy interface of all the collaborative drawing tools out there, plus you can export slides to SVG, meaning your diagrams stay useful and are modifiable elsewhere.

DON’T DO: Systems level diagrams

Controversial opinion: we think it’s not worth bothering with systems level architecture diagrams (at this stage):

Remember the previous step: keep the scope tiny, and isolated from your other efforts.
As soon as you iterate even slightly, they’re out of date.
In a subsequent step (‘Engineering’) we’ll use a tool called ‘terraform’. You can quickly generate system diagrams from terraform.
You’ll quickly develop experience — when you need to get serious about these things, you’ll know.

That said, if it helps to clarify what’s going on in the first instance, then get stuck in. Here’s a good tool to help:

Introducing a Google Cloud architecture diagramming tool | Google Cloud Blog

Getting started

How are you supposed to write down the entire system for a new data service?! To get started I always think about three things, in this order:

Data structure
Usability
Storage

With these 3 thought about, you’ll be able to write a really basic diagram of your system.

Always, always, ALWAYS start with data structure

Even if you think you won’t use it later, this process will clarify in your mind the data that goes into and out of the system.

Create or collect some examples of the raw data that you’re processing. Make sure they’re not commercially sensitive (so you can share them later). Then write down a clear schema for that data — use JSONSchema (for general data exchange) or AVRO (for extreme high throughput pipelines).

Pro tip / Shameless plug: Octue are building a repository for JSONSchema to help you do this. At the time of writing it’s in very early alpha, but you’ll soon be able to find examples from across the industry — or publish your own so your whole team are on the same page.

Then think about usability

We’ll come back to this again later in much more detail later in the “Six Steps”, so I won’t duplicate material here. Come and meet us in the hall (or if you’re catching up after the event, then read ahead) to see what we have in mind.

Finally, think about storage

Sorry for the length of this one! Skip to the next section if you don’t care about data stores.

I’ve also excluded a lengthy topic about data lakes and data warehouses and data meshes and data . These are the purview of the data and knowledge engineers. For the grassroots digitalisation work we’re talking about here, don’t worry about them. If you‘re doing the six steps, and the result gets some traction with your colleagues, any solution you‘ve chosen can be evolved to be part of such things.

You may not have to store data but if you do, here are a few considerations with tips on tools or examples:

Object stores. For big binary files like audio, video or specialised instrument data in binary form, the chances are that dumping them into a cloud object store is the way to go.

Pro tip / Shameless plug: We built a really powerful way of creating a datalake from a mass of legacy files on a hard drive… Octue‘s ‘django-twined’ and ‘octue-sdk-python’ libraries are designed to upload/download files and metadata in cloud storage, synchronising entries between an object store, a SQL database and your laptop. That means you can filter and query for cloud files straightforwardly, then download just the ones you need!!

Timeseries / Event databases. For data which arrives in high-volume streams of small “events”, consider solutions like BigQuery or InfluxDB. Their best application is where data slowly gets less useful over time (eg you daily need to do some analysis, or aggregation, after which the raw data rarely gets touched).

Pro tip: Beware!! If you frequently need to query across these whole datasets to select subsets, querying can get very expensive. In that case, be aware of the need to cluster tables, or switch to PostgreSQL with a JSON column to contain the event data.

Pro tip: This talk, “From Blade to BigQuery”, shows you, complete with full open-source code, how to get events from a wind turbine to the cloud.

Graph databases like neo4j. These are great where you have highly relational data (although you trade off data integrity) and need to fetch many things at once through the relations. There are two stellar use cases:

Federation. You can straightforwardly connect graphs across databases, so when you get to that stage, you’ll be able to join up your own private data with public (or other private) data securely.
Scalability. You can scale these things to trillions of nodes, so if your dataset is going to get mega quickly, you can avoid all sorts of troubles maintaining a conventional SQL instance.

Pro tip: A comment from Octue’s Senior Software Engineer, Marcus Lugg on using neo4j for the first time last week: “I thought it was weird at first but this query language is really beautiful; [this query] would be a nightmare of joins in SQL”

NoSQL databases like mongoDB. Are probably not the answer. For our industry sector, other than event/timeseries streams and graphs mentioned above, I’ve never ever seen a use case that couldn’t be covered with PostgreSQL (see below). If you disagree and have a good use case, please comment below, I’d love to hear it!

Last but not least, SQL databases are a universal starters workbench. If you’re choosing a SQL database these days, it’s PostgreSQL or nothing. PostgreS has powerful NoSQL capability built in, and is just a really versatile workhorse.

Here’s the rub: if you need a database and don’t KNOW that one of the above DB types are right for you, starting with PostgreSQL is a safe bet. Remember this isn’t about getting it right, it’s about learning quickly:

If you start with PostgreSQL, you’ll either know within one iteration of the Six Steps that your choice was super wrong, or have something that’ll tide you over for a while.

Architecting a solution for 72 hours at WindEurope

And now for the good bit! We’ve long wanted to try out a geospatial solution called ‘h3’, which uses a hexagonal mesh covering the globe.

The system is incredibly elegant and was invented by some engineers at Uber, to help manage the widely varying spatial density of their data points (outside a rail station you’ll have many data points per square m, in the countryside you’ll have few or none, so you need to manage different spatial resolutions).

They open-sourced it (thanks Uber!); read their beautiful blog post here:

H3: Uber's Hexagonal Hierarchical Spatial Index

The mesh has successive refinement levels, meaning that with a single integer, you can represent not only location on the earth but also spatial resolution inherent to the data.

So, we’ll use that as a really compact way of expressing data.

Brainstorming

We sat down for about 3 hours with some pens and some paper. Sorry, we can’t really capture it but for those of you following on live, come and talk, we’ll work with you in the hall to go through this stage or help you through your own:

Rabid brainstorming during a wide-ranging conversation.

Thinking about data structure

We thought about the ways people might need to fetch data. We concluded that to get started, we’d want to fetch elevations for a single point, a collection of points, or for a region (if we’re displaying on a map).

From our brainstorming, we already knew we’d need an API, which is where our service hits the outside world. We talked roughly about this at first but later published the definitions of what data looks like at the boundary.

Thinking about usability

Most of the uses we can think of either involve putting data onto a map, or loading it into python. Javascript developers are pretty familiar with fetching; and the fetch pattern is tied intimately to the use case — so javascript is covered.

On the python-side, we wanted to make it as easy as possible for non-developers. So we figured that a very lightweight python library would allow you to get elevations with a single line of code. More on this later.

Database selection and graph structure

Luckily, our above statement about PostgreSQL holds true (or we’d be quite embarrassed!). We looked into it and… yes, you can totally store h3 data efficiently in Postgres!! Postgres has PostGIS; a highly advanced geospatial library which really complements the use case too.

But, we work with Postgres all the time at Octue and this should be a learning experience for us too. So, because of the nature of the hexagonal data structure being a heptree graph (each node divides into seven nodes) we’ve decided to try out a new technology for us — the neo4j graph database. The idea is to:

Efficiently traverse up- or down- the tree, to aggregate data up from the fine resolutions or zoom in from coarser data.
Bind any number of data sources later (starting with elevations, but thinking big!)
Federate databases, so if a customer has confidential, high resolution measurements on site we can easily join them.

Because our data is so simple, the graph is both straightforward and beautiful:

Wrapping Up

Here’s where we’re at, with a little explanation:

This is all you need for the first iteration. If the architecture diagram has more than just a few clear elements, you may struggle to deliver.

Remember, you can always grow and adapt but to get something in-place, start simple. See you next time!

72 hours at WindEurope: Architecture was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 hours at WindEurope: Defining a use case.

Tom Clark — Mon, 17 Feb 2025 12:41:25 GMT

It’s Monday evening in Copenhagen, and the challenge starts here: at Wind Europe 2023 we’ll be deploying a live data service. This article is for the first of our “Six Steps” — you can read the overview here.

About the ‘Define’ step

In the ‘Define’ step we aim to find a simple, well-scoped use case with clear value.

Thinking of digitalisation as a wide-ranging and nebulous exercise is a trap: the sheer breadth of work, technologies and processes involved is overwhelming. The golden rule is to start small, with clear boundaries.

That doesn’t mean that a whole organisation can’t be working toward digitalisation at once; it’s just that each effort is extremely prone to failure (especially if it’s a large and complex one) —rather than risk it all, you sow many small seeds and see which ones take root.

Remember, the objective is to improve how effective you (or your team) are. It almost doesn’t matter where you start. Find a little problem that takes up some time…

Frustration is your best friend

If you’re frustrated with something during the work day, chances are that’s a good candidate. Especially if it’s something that keeps coming up (even if they doesn’t take much time, routine handle-turning tasks are a distraction). Here’s some examples:

Logging onto a web portal and downloading data. You might prepare geospatial data for site assessment, but for each site you end up logging into the ESA web portal, downloading and stitching images together. Whenever a customer asks about a new site, you breathe a heavy sigh…
Distributing data to more than one place in your organisation. You might get power curves in PDF form then enter values into your internal file format. Great, until colleagues start repeatedly asking for it; and each time you have a back-and forth about exactly which one they mean…

Leave the gnarliest problem for later

It’s best not to start with that one issue that’s super complex, requires dozens of stakeholders and is bugging the whole team. Chances are, if you work around the fringe of it you’ll start coming up with partial solutions that ease the way. You’ll also have more experience of what works and what doesn’t.

Once you’ve got a few simpler solutions under your belt, that’ll help draw a clear boundary around the really difficult one.

Check it’s valuable

Do not [repeat: DO NOT] get caught up with writing a business case, analysing accessible market sizes, competitor analyses and cost of services. All of these things are critical before launching a digital product, but that’s not what you’re doing. You’re planting a small seed.

By the time you’ve done all that stuff you could have built the thing already! The trick is to do the first couple of cycles around the “Six Steps” and then figure out, with your colleagues and industry partners, whether this deserves to be a full-blown product or service.

But, you should sanity-check up-front that what you’re thinking about is basically not a bad idea. Have a few conversations — phone relevant people up and ask “What if?”. Ask them how much time they spend doing a job, and how frequently they do it. Informal estimates aren’t super reliable, but they should be enough to give you a sense that something is worthwhile.

If you can save a team member half an hour every two weeks, you save about 1% of their time. If it’s a frustrating or boring job, that feels like much more!

Tiny time savings seem ridiculous, but add up a few of them and suddenly your team is much happier, quicker and more reliable.

Defining our problem for 72 hours at WindEurope

Our best friend: Frustration

At Octue, we’ve long been looking to help our customers improve their workflows around fetching geospatial data. We’ve done a bit ourselves to help out at times. The example above mentioned the sheer grind of fetching geospatial data by logging into a web portal, downloading images, stitching them together then re-cropping…

…and that’s before you even try to maintain correct metadata (like data sources/provenance). Don’t get us started.

Avoiding gnarly problems

Part of our vision is to help ‘data providers’ (organisations that generate measured or simulated geospatial data) give their users (like resource assessment engineers or operations planners) a really easy, intuitive and reliable way to fetch it. Handling aspects like authentication, permissions, metadata and integration of multiple private/public data stores is integral to that.

But that’s far too gnarly. It involves lots of stakeholders, lots of effort, lots of edge cases and a really broad problem statement. And frankly, more funding than we have.

Narrowing it down

Let’s follow our own advice: “Small and well-scoped”. What if we chose just one data source, from just one provider? If that were public data then we could do away with authentication and permissions.

Of course we want to maximise impact for the industry; so what’s the one data source, that’s public, that just about everyone in the industry uses at some stage?

Elevations data. Of course!

Surface elevations of the planet (and many other amazing data sets) are provided both by NASA’s Earthdata archive and by the European Space Agency (ESA), to the public, totally for free (requiring attribution, quite reasonably)! You can browse ESA’s entire catalogue here, but we’re all about the Copernicus Elevations Dataset:

Copernicus Digital Elevation Model - Copernicus Contributing Missions Online

Checking it’s valuable

We asked around a few Wind Engineers. Most were quite laid-back about the issue. As something that takes only a small fraction of your time, that’s quite understandable. We asked them to put a time on it, and got answers ranging between 15 and 45 minutes per site (that they prepared for very early stage assessment, so quite a few candidate sites). And they were preparing between 1 and 2 sites per fortnight.

So about 1% of their time. Not the biggest, most challenging part of the job, but not nothing — and quite impactful across the thousands of engineers globally.

That’s perfect. It seems really tiny, but in a few weeks of work, we can make every Wind Engineer in the world 1% more effective. Done alongside many other such exercises with similar impact, we can easily double the effectiveness of our workforce over the next couple of years.

Wrapping up

So we have our use case definition:

We’re going to provide a solution that makes it easy and quick for anyone to access elevations data.

Let’s be a bit more specific:

In the Wind Industry there’s a drive for better visualisation, mapping and planning/optimisation tools. Being able to integrate directly into those would get us an A+ on the report card, although doing those integrations is out of scope here.
Being able to clearly demonstrate that we’d massively sped up the process is a clear metric for success.
Part of the pain is ensuring that there’s data available in a particular location, or stitching data when the desired location is near a boundary. So this should work anywhere in the world straightforwardly.
A point elevation is useful to some extent, but generally if we’re doing resource assessment or planning a wind farm installation, we need to get data for an area, not just a point.
To be useful in design tools (eg when panning a map in a web application or investigating site potential globally), fetching data must be quick (“quasi real time” — fast enough not to really notice it).
With 10s lag, user interfaces start to feel very slow, but sub-second performance is likely unnecessary. Ballparking a requirement, we should fetch data with <3s round trip time.

And that’s our ‘Define’ step finished! I’ll drop back and add a link when the next step is published, but in the meantime you can follow along on LinkedIn — it’ll be great to see you!

72 hours at WindEurope: Defining a use case. was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.

72 hours of digitalisation at WindEurope

Tom Clark — Mon, 17 Feb 2025 12:41:08 GMT

Working through the six steps to deliver a new geospatial service for the Wind Industry…

The six steps of digitalisation

Welcome to the 72 hour challenge…

We’re doing something special at WindEurope — rolling out a live data service, usable by the industry, there in the room. The purpose is to provide an example of implementing what we call the “Six Steps” — a tactic for progressing your organisation toward digitalisation.

At WindEurope right now? Follow us on linkedIn or join us at our stand beside the poster session!!!

The articles

The presentation

Slides for the presentation given at WindEurope

The map

https://codesandbox.io/s/up-and-down-the-world-in-72-hours-je3mob

The code

Why?

I mean, seriously, why bother with digitalisation? And what does it mean to me? I think our main motivation is to help the industry scale — a major part of that is overcoming staffing problems:

People in data-heavy roles lose 45% of their time in low-value data management tasks (Anaconda Foundation 2022).
Only 20% of technological/R&D effort has significant organisational outcomes (Gartner 2019) — the so-called “technology chasm”.
It takes 10+ years to train specialist technical staff.
To meet the 1.5C target by 2030, installation rate needs to grow by 5x very quickly (GWEC 2023).
Installation rate is constrained by (among other things) staff availability.

Think about it. Your staff are wildly talented, but are only 55% x 20% = 11% efficient in their core role, because of reasons that could be solved* with effective digitalisation. It takes ~10 years to train more. You need 5x more staff in the next 3 years.

This is what I mean when I say “Digitalisation”. I mean “Make everyone in the industry 10x better and quicker at what they do. FAST.”

Not to mention that all those wasted staff wages will cost £3.9b/year by 2025 (based on 132,000 professionals, a 55% efficiency, and a £44k wage +50% overhead). But that kind of seems insignificant.

*Or at least substantively improved. In particular, automated and easy data interchange will minimise data management overhead, while modularisation helps new technologies cross the chasm — particularly in the area of software and analytics.

Levels of digitalisation

Digitalisation happens at 3 levels:

Strategic: What effect should this have on your core mission?
Tactical: What conditions and processes will you put in place, to cultivate success at an organisational level?
Implementation: What actions must be taken at a team or individual level?

The Six Steps we’re presenting here are a tactic for digitalisation. They’re designed to get small cogs turning — in the next 72 hours we’ll be giving an example implementation following the Six Steps.

You can apply the tactic many times simultaneously across your business units… you’ll also need higher level tactics to start meshing everybody’s efforts together, and a clear strategy to point everyone in the right direction. But those will be the topic of other articles and other challenges!

Get involved

72 hours of digitalisation at WindEurope was originally published in Octue on Medium, where people are continuing the conversation by highlighting and responding to this story.