Gett Tech - Medium

Improving Navigation Performance Using a Hierarchical Graph

Or Nachmias — Mon, 26 Jan 2026 09:37:35 GMT

Mirror: https://ornachmias.github.io/2026/01/21/hierarchical-graph-performance.html

Introduction

If you’ve ever faced the challenge of writing your own navigation system, you’ve likely considered one of two solutions: either using an open-source system such as OSRM or Valhalla, or using OSM data to generate a graph and applying graph theory algorithms for navigation. While I’m a major advocate of open-source solutions, they sometimes have significant limitations (e.g., they cannot dynamically change edge weights, which set the actual time passing through a specific road, and instead rely instead on predefined weights based on time windows). In most cases, you can overlook those limitations or make some small changes to the code base, but if those are non-negotiable, or if you just feel like writing your own thing, the main issue you’re going to face when writing production-level service is the performance.

In this post I’ll show you how we can improve the performance of the navigation on graph using simple algorithms provided by both osmnx and igraph packages to modify our original graph, and then build a new graph made from multiple modified graphs to gain the performance boost needed for a production-grade service, with minimal data loss.

(A quick note about the plots in this post — I’m only going to show a very small part of the graphs each time to make the visualizations as clear as possible, later on when testing performance I’ll increase the size of the graph.)

OSM Graph Structure

Before diving into the question of how to improve performance, let’s take a look at the graph provided by OSM. Basically, assuming you’re looking at the driving graph, every driving available road will be an edge, those edges are always straight lines. By that, we can understand that every intersection or curve of the road is a vertex. The OSM data file israel-and-palestine-latest.osm.pbf is a graph with 2314075 vertices and 4069559 edges.

osm = OSM(map_path)
nodes, edges = osm.get_network(nodes=True, network_type='driving')
nx_graph = osm.to_graph(nodes, edges, graph_type='networkx')
raw_graph = ig.Graph.from_networkx(nx_graph)

Now, at this point, finding the shortest path between two points is very easy, and will provide us with an extremely detailed route. We’ll simply use the “igraph” “get_shortest_path” method between our desired vertices, and a route will be available based on the vertices or edges in the path.

So, let’s take a look at the performance of such navigation when looking at a larger territory. I randomized 1000 sources and destinations, with some limitation on the number of steps in the path so we won’t accidentally use vertices that are on the same road, making the path finding trivial.

Looking at the performance histogram, things are looking pretty good — we got ourselves a mean time of ~50 milliseconds, and the max time is ~120 milliseconds, which is not bad for a production service. But let’s think about that for a second — I made the performance test on my laptop, making sure I’m running almost nothing except for that performance test. The requests were made sequentially, so the CPU was fully available for each request. When deploying such a thing to production, unless you have unlimited resources, such a thing won’t be possible. As an example, when trying to deploy this naive architecture into a service, we reached multiple seconds for each request. So from now on, we’ll continue the post assuming we care more about relative performance (comparing improvement between steps) instead of absolute values (evaluating success based on the milliseconds values).

Graph Simplification

The first observation you’ve probably noticed is that the graph is highly inefficient — every road is separated into multiple edges, thus containing so much more data than needed. The bigger the graph, the more time it takes to find the optimal shortest path.

While the raw graph has its advantages (e.g. map matching is so much easier), in navigation this level of detail is redundant — the drivers won’t mind about every slight curve of the road, or if the sub-tag of the road changed. For data scientists, handling such a detailed graph is more difficult: when calculating durations from GPS data, there is much more room for errors when the roads are short, and we could also accidentally match points incorrectly to road segments.

So the first step in our hierarchical graph is to simplify the graph topology, without changing any of the geometries. That can be easily done using the package osmnx, specifically the method simplify_graph, which is based on “Topological Graph Simplification Solutions to the Street Intersection Miscount Problem.” publication by Boeing, G. This algorithm removes all nodes that are not intersections or dead-ends, and replaces them with an edge connected directly, while keeping the road shape using the geometry property.

nx_graph_simplified = ox.simplify_graph(nx_graph, track_merged=True)

Our graph now has 427001 vertices and 916380 edges, which reduced its size by about 80%. Let’s take a look at the performance of the get_shortest_path for this simplified graph.

Just like that, we improved the time to ~18 milliseconds, a 60% improvement compared to the raw graph without any loss of necessary data.

From this point forward, this layer will be our baseline. We’ll create additional layers, where each layer will lose some of the information, but will gain us a performance boost.

Consolidate Intersections

Did you notice in the previous visualizations how roundabouts and intersections are represented using multiple nodes and edges? It makes sense, since we have a specific set of roads that intersect in a slightly different way in each case, based on the directions of the road and the type of intersection.

But when we want to navigate through a road network to calculate distance and ETA, we usually don’t care about such stuff, since as far as we are concerned — it’s just an intersection. When calculating ETA, I don’t really care how the driver crossed the intersection, only that he did.

This is where the consolidate_intersections in osmnx package come into play — it merges those intersections into a single node (based on provided tolerance value), and re-create the graph accordingly. The algorithm itself is also part of the G. Boeing (2025) paper we mentioned earlier, and it works by identifying and merging clusters of nearby nodes, often created by divided roads or traffic circles, which represent a single intersection in the real world. This consolidation algorithm will reduce our graph to 321243 vertices and 748246 edges, which is 24% smaller compared to the simplified graph.

Since we’re handling distance measurements, the graph should be projected before consolidated (you can read about it here), which is also a mandatory requirement by the package implementation of the algorithm.

nx_graph_consolidated = ox.project_graph(nx_graph_simplified, to_crs='EPSG:2039')
nx_graph_consolidated = ox.consolidate_intersections(nx_graph_consolidated, rebuild_graph=True, reconnect_edges=True)
nx_graph_consolidated = ox.project_graph(nx_graph_consolidated, to_crs='EPSG:4326')

While the route ETA and distance can be safely calculated using this graph, the main issue with this data simplification is that we do lose information here — the actual physical route. Meaning, if you wanted to create a nice UI to show the user how exactly he can navigate to the target — the nodes and edges won’t correlate with the underlying map. Let’s set aside this problem for now, and we’ll discuss it further on when we’ll build the full solution.

Let’s take a look at the performance of such a graph. Same territory, same map, only simplified and then consolidated.

We improved from a mean of ~18 milliseconds to ~8 milliseconds, more than 50%, but with the loss of the actual geospatial route between the points.

Communities Graph

Having the consolidated graph on our hand, which supplies us the performance we needed, you might think that we should stop and deploy. But as I’ve mentioned before, this performance isn’t exactly what you’re going to see when handling a high volume of requests, so we need to somehow add some levels that will filter out unnecessary parts of the graphs.

Let’s look at a real example of a navigation request — a driver wants to get from one point in Tel-Aviv, Israel to another. Do we care about edges in other cities? Other countries? Even most of the edges in the same city are usually irrelevant to the route we’re looking for. Removing those parts of the graph will increase the performance of our overall algorithm, with minimal impact.

There are many methods to perform this kind of level — you can do it based on cities boundaries [1], or based on previous data you have [2], or even join multiple layers into several filtering levels. While you can do many combinations of filtering at this point — you have to be careful — too much filtering and you’ll lose data necessary to find the optimal path, too few and the overhead of the mapping will overcome the performance benefit.

As the title suggests, for this post I’ve chosen to generate communities using the Leiden algorithm. The Leiden algorithm discovers communities that have full connectivity within the community. While traditional methods sometimes made the mistake of loosely lumping items together just to form a group, Leiden introduces a refinement phase that ensures every discovered cluster is genuinely and densely connected. For our purposes, it’ll make sure that every point in the community has access to other locations in the same community, making sure that we don’t accidentally create islands of inaccessible vertices.

Even within the community discovery algorithms there are many options to choose from, but I’ve decided to choose Leiden due to its “safe” community generation. The major disadvantage is that the path might not be optimal. Some algorithms, such as infomap might find a more optimal community separation, but that requires an implementation of fallback in case of failure to filter things correctly, which I wish to keep out of scope for now.

partition = la.find_partition(consolidated_graph, la.CPMVertexPartition, resolution_parameter=0.05)
membership = partition.membership
communities_graph = consolidated_graph.copy()
communities_graph.contract_vertices(membership, combine_attrs=vertex_combiners)
communities_graph.simplify(combine_edges=edge_combiners, loops=True)

Looking at the graph we created we can see that it has almost nothing to do with the consolidated graph, but each vertex here represents an entire set of vertices. I’ve chosen a resolution of 0.05, which seems to satisfy the performance gain I wished to achieve. The output graph has 50597 vertices and 142989 edges, which is only 15% in size from our consolidated graph.

At this point the performance gain starts to get a bit obvious, but I wanted to have a full image of the steps taken. As you can see, we’ve dropped to an average performance of 1 millisecond, x8 improvement compared to the consolidated graph, meaning this kind of filtering will do the trick for now.

Building the Hierarchy

Up until now we’ve generated multiple graphs using different details resolution, where in one direction we improved the performance of navigating through it, while in the other we increased the number of details available though the found route. Our next step is to bind all of them together, so we could project one layer into the other, hence allowing us to move between them based on the level of detail we need at the moment.

The igraph package identifies edges and vertices using an index, but unfortunately this index cannot be consistent once we add or remove parts of the graph, so we need to overcome this challenge by creating our own mapping between those layers. The only limitations we’re facing are: 1. Achieving the mapping in O(1) performance, otherwise the overhead of moving between the graphs will cost us time instead of saving it. 2. Ensuring it supports vectorized operations, which, while not critical per se, will be vital for performance when deploying the service to handle multiple simultaneous navigation requests. Those two limitations are great, since the old scriptures mandate all data scientists to use only one data structure their entire career — pandas.DataFrame, which answers both of those requirements.

We’ll create multiple DataFrames, two for each movement we need (one way and the other): community to consolidated and consolidated to simplified, where the index will be our current vertex indices, and the value is the next vertex indices.

To get the transformation of the indices we’ll save on top of the graph in each step a “stable_index”, and when the graph will be transformed we’ll aggregate those properties together, and finally index them into a DataFrame. Let’s look at an example:

# Assuming we’ve just generated the community_graph
# and the consolidated graph had the same call to `set_stable_indices`
def set_stable_indices(graph, prefix):
   for v in graph.vs:
       v[f'{prefix}_stable_index'] = v.index

   for e in graph.es:
       e[f'{prefix}_stable_index'] = e.index


def flatten_if_needed(value):
   if isinstance(value, list) and len(value) > 0 and isinstance(value[0], list):
       return [item for sublist in value for item in sublist]
   return value

set_stable_indices(communities_graph, 'community')

# Unpack merged list of lists in indices and edges
stable_id_props = ['simplified_stable_index', 'consolidated_stable_index']
for v in communities_graph.vs:
   for prop in stable_id_props:
       if prop in v.attributes():
           v[prop] = flatten_if_needed(v[prop])

for e in communities_graph.es:
   for prop in stable_id_props:
       if prop in e.attributes():
           e[prop] = flatten_if_needed(e[prop])

def create_mapping(graph, current_stable_name, next_stable_name):
   current_vertices = graph.vs[current_stable_name]
   next_vertices = graph.vs[next_stable_name]


   current_to_next_vertices = pd.DataFrame(data={current_stable_name: current_vertices, next_stable_name: next_vertices}).set_index(current_stable_name)
   next_to_current_vertices = current_to_next_vertices.explode(next_stable_name)
   next_to_current_vertices = next_to_current_vertices.reset_index()
   next_to_current_vertices = next_to_current_vertices.groupby(next_stable_name)[current_stable_name].agg(list)

   return {'vertices': {'to_children': next_to_current_vertices, 'from_children': current_to_next_vertices}}

community_consolidated_mapping = create_mapping(communities_graph, 'community_stable_index', 'consolidated_stable_index')

This heavy (and slightly deformed) preprocessing is only done when we create the graph, then we can just save it aside and load it into the service. Notice that I kept the stable ids for both simplified and consolidated, so technically I can jump over layers in the architecture — from communities to simplified.

Navigating in Hierarchy

We created multiple graphs on top of each other, and we mapped the indices between those graphs. All that is left is to actually navigate through this entire pyramid of graphs. To do so, let’s assume we got as input the source and target from the simplified graph, and then we’ll perform the following:

Map the simplified source/target to consolidated source/target
Map the consolidated source/target to community source/target
Navigate in the communities graph and get communities path
Map communities path into consolidated vertices
Create a consolidated subgraph based on those vertices
Find the consolidated subgraph source/target
Navigate through the consolidated subgraph and get consolidated subgraph path
Map the consolidated subgraph path into consolidated graph vertices
Map the consolidated graph vertices to simplified graph vertices
Create a simplified subgraph from vertices
Find the simplified source/target in the subgraph
Find a path in the simplified subgraph

Pretty long, right? Intuitively it seems so much more than simply navigating through the simplified graph and hoping for the best, but let’s take a look.

There are three main steps: 1. Navigate through communities, 2. Navigate through consolidated vertices, 3. Navigate through simplified graph vertices. All the rest are steps that help us remove unused vertices in the next level.

In terms of performance, let’s see how well we perform using a set of 1000 routes, with a minimum of 200 vertices in the simplified path.

I’ve cleaned the two outliers to get a better look at the distribution, but even with them we average to ~11 milliseconds, which is almost 40% improvement from the simplified graph navigation, and almost the same as the consolidated graph navigation, but with all the additional geospatial details. In cases when we want to get only a consolidated path, for ETA and distance calculations, this average drops to ~7 milliseconds, an improvement of 12% compared to navigating through a consolidated graph alone.

Summary

Scaling a custom navigation engine requires more than just efficient algorithms, it requires efficient data structures. By transforming a raw OSM network into a hierarchical graph, we successfully balanced the need for geospatial data with the requirement for low-latency performance. Through the use of osmnx for simplification and consolidation and igraph for community detection, we reduced the search space by orders of magnitude. The final architecture resulted in a robust system capable of calculating detailed routes in low latency, proving that you don’t have to sacrifice detail for speed in production services.

Resources

Obligatory XKCD

https://xkcd.com/461/

References

[1] Jung, Sungwon, and Sakti Pramanik. “An efficient path computation model for hierarchically structured topographical road maps.”

[2] Yang, Xiaobo. “Directed-edge-based mining of regular routes for enhanced traffic pattern recognition from travel trajectories.”

Improving Navigation Performance Using a Hierarchical Graph was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

From Guessing to Knowing: Our Data-Driven Hiring Process

Noa Rechnitz — Wed, 14 Jan 2026 08:19:25 GMT

In the fast-paced world of tech, a new hire is one of the most impactful decisions a company can make. It shapes the product, the culture, and our ability to innovate and move fast. That’s why we believe hiring should never be based on intuition or a “gut feeling.” It needs to be a structured, data-driven process that’s transparent, fair, and designed to find the best fit for both you and us.

At Gett, we’ve adopted the WHO methodology, a proven framework that shifts hiring from guesswork to a structured process based on facts and behavioral patterns. Inspired by WHO The Method for Hiring by Geoff Smart and Randy Street, it focuses on identifying top-performing individuals through an in-depth look at their past performance and real achievements rather than relying solely on resumes or traditional interviews.

So, what does this mean for you?

Step 1: Defining Success Before We Start

Before we even publish a job description, we hold a kickoff meeting with the hiring team. The goal is simple: define what success looks like in the first 90 days. These SMART outcomes (Specific, Measurable, Achievable, Relevant, Time-bound) guide everything — what we screen for, what we ask in interviews, and how we make decisions.

For example, let’s imagine we’re hiring a Backend Engineer. In the kickoff meeting, the hiring manager might define success for the first 90 days as: own, lead, and deliver one key feature to production. These measurable outcomes ensure that our interview questions are targeted toward finding someone who can realistically achieve them.

Step 2: From Screening to Professional Interviews

Your journey with us starts with a quick chat with our Talent Acquisition team. These steps help us identify candidates whose motivations and experience align with the role.

You’ll then move on to a series of professional interviews. These aren’t just technical Q&A sessions. We use them to dive deep into your past projects and experiences. These interviews are conducted with the SMART outcomes in mind, meaning we assess not just technical knowledge, but whether you can realistically deliver on the specific goals defined in the kickoff meeting.

In some cases, we may ask you to complete a short FactSheet, a Google Sheets file. This isn’t about extra work; it’s about making our interviews more valuable. This brief exercise helps us gather key data about a specific topic beforehand. The benefit? Instead of a Q&A session focused on facts, the interview becomes a deeper, more meaningful discussion about your decision-making, problem-solving, and the real impact of your work. For you, it’s a chance to organize your thoughts and reflect on your accomplishments, ensuring you can showcase your expertise and value in the most effective way.

Step 3: The HR Interview — Your Story Beyond the CV

At Gett, the HR interview isn’t about personality tests or a quick check on cultural fit. It’s a structured, in-depth process designed to uncover behavioral patterns that predict future success. We’re not just looking at what you’ve done, but how you did it and why.

Our approach is different: we walk through your career chronologically, starting from your very first job. Yes, even that internship you did years ago matters. We want to know how you got into each role, who you worked with, and what feedback you received. We also ask why you chose to leave each position. Why do we do this? Because careers tell stories, and when you look closely, you see patterns.

These questions may sound simple, but they are your chance to tell the real story behind the bullet points on your CV — not just the highlights, but the choices, the impact you created, and the lessons that shaped your journey. We’re looking for evidence of excellence, which is rarely accidental. When we look at your career history through this lens, we can identify patterns of growth-oriented behavior: people who consistently take initiative, go beyond their job description, and add value no matter where they are.

For candidates, this is your opportunity to bring your CV to life. It’s where we move beyond a list of skills and dive into the behaviors that make a great professional. We believe this is the most effective way to ensure we’re making a thoughtful, data-driven decision, and that you’re joining a company where your contributions will be valued.

Step 4: The Leadership Interview

Following your professional and HR interviews, you’ll meet with a senior leader, such as a Head of or VP. This conversation is about mutual alignment. Beyond assessing technical and behavioral patterns, this is where we evaluate how your strengths align with our company values. It’s also an opportunity for you to ask strategic questions about team dynamics, company vision, and how you can contribute to our broader goals.

Step 5: Decision Making: A Data-Driven Discussion

Before we make an offer, we hold a final meeting with everyone who interviewed you. This is where all the data points we collected come together. Each interviewer shares their observations and feedback, focusing on the specific behaviors and achievements they observed. The goal is to ensure all perspectives are considered and that the final decision is based on a complete, data-rich picture of your capabilities.

Ultimately, the hiring manager makes the final call, but only after weighing all the feedback and insights gathered. This collaborative, data-driven approach ensures a fair, transparent, and thoughtful decision.

What This Means for You

If you’re interviewing with us, expect a process that values structure, transparency, and real conversations. We don’t make decisions based on gut feelings. Every step is designed to uncover evidence from your past experiences — the decisions you made, the challenges you tackled, and the impact you delivered, because we believe past behavior is the best predictor of future success.

Our goal is simple: to make hiring thoughtful, data-driven, and fair, while giving you the opportunity to showcase your real strengths and the value you can bring to our future.

Ready to tell your story? Explore our open roles and find out how you can be a part of our team.

From Guessing to Knowing: Our Data-Driven Hiring Process was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

How We Put the Company’s Key Metrics in Everyone’s Pocket: Israeli Main KPIs Dashboard

Ani Mezhlumyan — Thu, 23 Oct 2025 13:36:36 GMT

How We Put the Company’s Key Metrics in Everyone’s Pocket:
Israeli Main KPIs Dashboard

In today’s fast-paced business environment, management of our company needs instant access to reliable data to make informed decisions.

Although mobile dashboards already exist in our company, they were primarily global and not tailored to local needs. However, there was a clear demand for local tools. Together with the Israeli analytics team, we decided to create a mobile dashboard specifically for the Israeli part of Gett, focusing on their interests, metrics, and needs.

This article tells the story of how we built the Main KPIs Dashboard: a mobile, simple, and intuitive tool that consolidates key metrics, standardises calculations, and puts actionable insights directly in the hands of top management — anytime, anywhere.

The problem

For our Israeli top management, data is an everyday decision-making tool. But until recently, there was no single mobile-friendly dashboard where they could track all strategic KPIs in one place.

Globally, the company already had a mobile dashboard. However, it wasn’t tailored to the local Israeli management’s needs and didn’t always reflect the definitions agreed with our local analytics team. Different teams often used different formulas.

In short:

No unified mobile dashboard specifically for Israeli executives
No single place for key metrics
No quick, mobile-first way to see the big picture

Goals when building the dashboard

From the very beginning, our priorities were clear:

Give top management access to main KPIs in one place
A mobile dashboard covering only what really matters at the top level.
Centralise and validate metric logic with Israeli analytics
Every KPI calculation is officially aligned and approved by our local analytics team.
Keep it simple and mobile-first
The dashboard had to be intuitive, fast and easy to read on mobile devices.

How we built it

Collecting the metrics

At Gett, our process is structured so that the Data Visualization team develops dashboards in close collaboration with analysts. Following this workflow, analysts provided us with the metrics and the calculation logic.

Each analyst in the team is responsible for a specific business process, so every analyst contributed the SQL queries for the area they work in.

All these smaller SQL pieces were then combined into one big query, which became the single data source for the dashboard.

Designing the dashboard

Simplicity first.

Following Tableau best practices, we decided to keep the dashboard simple and minimalistic.

For the main visualizations, we chose KPI tiles that show the core value we care about — the number for yesterday.

But looking at just one day isn’t enough for context. To provide a better picture, we added a bar chart showing the values from the past five same weekdays.

We also wanted to highlight whether we were up or down compared to the previous week. The free space on the right side of the screen turned out to be a perfect spot for this comparison.

The result was a clean KPI block combining yesterday’s value, short-term history, and a week-over-week comparison.

*The data shown in the visualizations is dummy data.

We use color coding for the latest day: blue indicates growth compared to last week, while orange shows a decline. The colors in the dashboard were also chosen following Tableau best practices.

Handling multiple domains on mobile.

One challenge was fitting different domains (B2C, B2B, General) into a single mobile dashboard. That significantly increased the number of elements on screen.

On web dashboards, our typical solution is to display all main KPIs and let users apply filters to switch between domains. But when we tested this approach on mobile, loading times for filters were too long, creating a poor user experience.

Since our goal was to optimize the dashboard for mobile, we decided to use tabs instead, allowing users to quickly switch between domains.

*The data shown in the visualizations is dummy data.

Organising metrics within each tab.

Each tab contains several groups of metrics. For example, on the B2C tab we have:

B2C Taxi
B2C Delivery
B2C General (contains metrics like App Downloads, Registrations, etc. which are shared across Taxi and Delivery and therefore counted in total.)

Adding another layer of tabs would have made the interface overly complex and unintuitive.

Instead, we chose to organise metrics into blocks within the same tab.

This way, when a user opens the dashboard, they first see Taxi KPIs, and by scrolling down, they can access Delivery and General metrics.

*The data shown in the visualizations is dummy data.

In the end, we focused on clarity — a simple layout, mobile-friendly navigation, and intuitive grouping to make the dashboard easy to read and act on.

The Outcome

The result was a clean, mobile-friendly dashboard that brings all key metrics together in one place.

Israeli top management can now track strategic KPIs anytime, anywhere — with consistent metric logic approved by the analytics team.

The simple design and clear navigation made adoption easy, turning the dashboard into an everyday tool for quick, data-driven decisions.

What’s Next

Looking ahead, we plan to continue developing our mobile analytics ecosystem. In the near future, we’ll enable daily email delivery of the dashboard to users, introduce weekly and monthly analytics pages, and expand the set of available metrics.

At Gett, we see huge potential in the development of mobile dashboards — tools that make data accessible, actionable, and always within reach.

Conclusion

Building the Main KPIs Dashboard was more than just a technical project — it was about creating a single, trusted place for decision-making.

By focusing on simplicity, consistency, and a mobile-first experience, we made key company metrics accessible to top management anytime, anywhere. And while this dashboard was designed for executives, its transparency and usability make it valuable for everyone in the company.

How We Put the Company’s Key Metrics in Everyone’s Pocket: Israeli Main KPIs Dashboard was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

From A to B to Everywhere: Understanding the World of Vehicle Routing

Yishay Shapira — Mon, 08 Sep 2025 06:01:44 GMT

Ever wondered how your online shopping arrives so quickly, or how on-demand services manage to coordinate countless daily deliveries with such precision? Behind these everyday conveniences lies a complex challenge known as the Vehicle Routing Problem (VRP).

In simple terms, the VRP is the puzzle of finding the most efficient routes for a fleet of vehicles to serve a group of customers. Solving this puzzle is a huge deal. Getting it right means companies can dramatically save money, cut fuel costs, reduce the load on our roads, and ultimately keep customers happy with fast and reliable service.

In this post, we’ll break down this fascinating topic. We’ll start with the basics and build our way up, exploring the three main types of routing problems: one-to-one, one-to-many, and the complex many-to-many. These are the exact kinds of challenges we tackle every day at Gett. Let’s get started.

The Building Block: One-to-One Problems

Before we can tackle a whole fleet of vehicles, we need to start with the simplest journey: getting one vehicle from a single origin to a single destination. This is the one-to-one problem, the fundamental building block of all routing. Think of a taxi taking you to the airport or a courier making a single, direct delivery. The goal is simple: find the absolute best path.

So, how do computers solve this? First, they translate a real-world map into a structure they can understand: a graph. In this graph, every location (your starting point, your destination, and every intersection in between) is a node (a point). The roads connecting these locations are represented as edges (lines), each assigned a cost or weight, such as travel time, distance, or monetary expense — depending on the objective we want to optimize.

One-to-One graph

With the map now represented as a network of nodes and edges, we can use a classic algorithm to find the optimal route.

Key Algorithm: Dijkstra’s Algorithm

The most famous tool for solving this problem is Dijkstra’s Algorithm. You can think of it as a methodical explorer. Starting from your origin node, Dijkstra’s algorithm explores the graph step-by-step, always choosing the path with the lowest total cost from the start. It keeps track of the “cheapest” way to reach every node it has visited until it finally arrives at the destination, guaranteeing that the path it found is the shortest or fastest possible one.

https://commons.wikimedia.org/wiki/File:Dijkstra_Animation.gif

This animation shows Dijkstra’s algorithm finding the shortest path from the start (node 1) to the destination (node 5).

The algorithm begins by setting the start node’s cost to 0 and all others to infinity. It then methodically explores the graph, always moving to the unvisited node with the lowest known cost. From each new node it visits, it looks at its neighbors and calculates if the path through this new node is cheaper than any previously found.

You can see this happen when the cost to reach node 6 is first set to 14 (directly from node 1), but is later updated to a cheaper cost of 11 once a better path is found through node 3. This process repeats, locking in the lowest cost for each node it visits, until the destination is reached. The algorithm determines the final cost to reach node 5 is 20. The optimal route is then found by tracing the path backward from the destination, giving the final result: 1 → 3 → 6 → 5.

The time complexity of Dijkstra’s algorithm depends on its implementation. Using a simple array, the complexity is typically O(V²), where V is the number of vertices (nodes). This is because the algorithm must iterate through all V nodes, and for each one, search the entire list of remaining nodes to find the next one with the lowest cost, which takes O(V) time.

However, this performance can be significantly improved. By using a more advanced data structure, such as a binary heap (a type of priority queue), the complexity becomes O((V+E) ⋅ log V), where V is the number of vertices and E is the number of edges. This version is significantly faster for most graphs because finding the next node to visit requires only O(log V) time, compared to O(V).

The Classic Challenge: One-to-Many & Many-to-One Problems

This is where routing gets really interesting. The one-to-many problem involves a single starting point and multiple destinations. Common examples include a delivery truck leaving one warehouse to drop off packages at many different homes, or a large vehicle taking multiple employees from a central office to their individual homes after work. The reverse, known as many-to-one, involves multiple starting points and a single destination, such as employee shuttles picking up people from their homes to begin a shift at a factory. From an algorithmic perspective, one-to-many and many-to-one problems are functionally equivalent; you can treat many-to-one as one-to-many by simply reversing the direction of the edges in the graph (using the travel cost from B→A instead of A→B).

Each individual “leg” of these complex tours, the trip from the depot to customer A, or from customer C to customer D, is fundamentally a one-to-one problem, like we saw in the last section. The real challenge emerges when you have to string these legs together into the most efficient sequence, creating a full tour. This puzzle introduces new layers of complexity, such as vehicle capacity (a truck can only hold a certain number of packages) or time windows (a delivery must arrive within a specific timeframe).

One-to-Many graph

In the example above, the route begins at the hub (marked with a star), which could represent an office, school, or similar location, picks up all the passengers or parcels, and then proceeds to drop them off at the various destinations.

Finding the single, perfect solution to such a complex problem is computationally difficult, a class of problems known as “NP hard.” For real-world applications, we use clever, fast algorithms that provide excellent and practical solutions.

Key Algorithms

Heuristics: The “Good Enough” Approach

Heuristics are smart, efficient methods that build practical, high-quality solutions without the massive processing time required to find the single perfect answer.

Nearest Neighbor: A vehicle starts from the depot and goes to the closest customer. From that customer, it travels to the next closest unvisited customer, and so on. This continues until the vehicle’s capacity is full, and a new vehicle starts the process for the remaining requests. The time complexity of this algorithm is O(V²), as for each of the V nodes, it must search through all remaining nodes to find the closest one.

https://commons.wikimedia.org/wiki/File:Nearestneighbor.gif

Clarke and Wright Savings Algorithm: This classic method starts by assuming every customer gets a separate, direct trip. It then intelligently merges two routes into a single tour if doing so creates the biggest “saving” in total distance traveled. It repeats this greedy process until no more savings can be found. Its time complexity is also O(V²), primarily because it must first calculate the potential “saving” for every possible pair of customers.

Metaheuristics: The “Smarter” Approach

Metaheuristics are higher-level strategies used to guide simpler heuristics toward even better solutions and to avoid common pitfalls.

GRASP (Greedy Randomized Adaptive Search Procedure): Instead of always going to the very closest node, this algorithm creates a short candidate list of the top few closest nodes. It then chooses from this list based on a weighted probability. By running this randomized process hundreds of times, we can generate many different, high-quality solutions and then simply pick the best one overall. Time Complexity: The complexity is O(i ⋅ V²), where i is the number of iterations (e.g., “hundreds of times”) you decide to run the process, and O(V²) is the complexity of the greedy construction phase, like Nearest Neighbor.
Tabu Search: To avoid getting stuck in a rut or reversing its own progress, this method keeps a “memory” or “tabu list” of recent moves it has made. By making these moves temporarily forbidden, it forces the algorithm to explore new and different solutions. Time Complexity: The complexity is typically O(i ⋅ V²), where i is the chosen number of iterations. The O(V²) term comes from the process of exploring the “neighborhood” of the current solution at each step (e.g., evaluating all possible two-edge swaps in a route).

The Ultimate Puzzle: Many-to-Many Problems 🧩

We’ve now arrived at the most complex category of routing challenges: many-to-many problems. Here, we’re dealing with a tangled web of multiple origins and multiple destinations that all need to be connected efficiently. This is often called the Pickup and Delivery Problem (PDP), where items are moved between various points.

Real-world examples of this puzzle are everywhere. In ride-sharing pools, a single vehicle coordinates a route to pick up multiple passengers from different locations and drop them off at their unique destinations. In courier services, a courier picks up packages from various clients and delivers them to other clients across a city, creating a tangled web of routes.

The core problem here is coordinating this complex, interconnected network of movements. This often introduces a critical new rule: precedence constraints. This simply means a pickup must happen before its corresponding drop-off can occur. You can’t drop off a package before it has been picked up.

Many-to-Many graph

In the example shown, there are four pickup–drop-off pairs. Pickup points are marked with ‘P’ and their corresponding drop-off points with ‘D,’ with each pair shaded in a unique color. The route begins at one of the pickup locations and then visits the remaining points, following the rule that for each pair, the pickup must occur before the drop-off. This is, of course, a simplified illustration; real-world scenarios often involve dozens of pickups and deliveries, tight time windows, and entire fleets of vehicles that must be coordinated efficiently.

Key Algorithms

Solving these problems often involves adapting the heuristics from the previous section, but they typically require more advanced and powerful techniques to handle the added complexity.

Genetic Algorithms: Inspired by natural selection, this method starts by creating a “population” of many different, valid solutions (routes). It then takes the best solutions and “breeds” them together, combining parts of good routes to create new, potentially even better “offspring” routes. This process is repeated over many generations, allowing the solutions to evolve toward a high-quality result. The time complexity is generally O(g ⋅ p ⋅ V), where g is the number of generations the algorithm runs for, p is the size of the population, and V is the number of nodes, which influences the cost of evaluating each route’s “fitness.”
Large Neighborhood Search (LNS): This is a popular modern technique with a simple but powerful idea. It takes a good, existing solution and intentionally “destroys” a part of it, for example, by removing a handful of related pickups and deliveries. It then uses other methods (like a greedy heuristic) to “rebuild” that part of the solution in a smarter, more optimal way. This allows the algorithm to explore a large “neighborhood” of similar solutions to find improvements. Its time complexity can be described as O(i ⋅ d ⋅ V), where i is the number of iterations, d is the number of customers destroyed in each step, and V is the number of total nodes (customers), which influences the cost of the “rebuild” phase.

Conclusion: Putting the Puzzle Together

From a simple trip between two points to a complex city-wide network, the Vehicle Routing Problem is a fascinating puzzle with a major real-world impact. We’ve journeyed from the basic

one-to-one challenge of finding the single best path to the classic one-to-many problem where constraints like vehicle capacity and time windows come into play. Finally, we explored the ultimate logistical puzzle: the

many-to-many problem, where a tangled web of pickups and drop-offs must be perfectly coordinated.

Solving these challenges is the key to the efficiency of modern logistics and on-demand services. The algorithms and heuristics we’ve discussed, from the straightforward Nearest Neighbor to the evolutionary approach of Genetic Algorithms, are the tools that turn this operational chaos into a synchronized dance of efficiency. Getting it right saves money, reduces the load on our roads, and ultimately keeps customers happy with fast, reliable service. At Gett, these are the exact kinds of challenges that drive us every day as we continue to optimize the world of transport.

From A to B to Everywhere: Understanding the World of Vehicle Routing was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Map Matching Part 3 — Matching Algorithms

Or Nachmias — Wed, 03 Sep 2025 06:52:12 GMT

Map Matching Part 3 — Matching Algorithms

Mirror: https://ornachmias.github.io/2025/08/27/map-matching-3.html

This series of articles will look into the subject of map matching, providing foundational knowledge. It examines the core concepts and inherent challenges within this domain, and details a comprehensive process for generating a synthetic dataset. The discussion then extends to exploring various map matching algorithms and a comparative evaluation of their performance, utilizing the dataset constructed.

Part 1: Medium, Mirror
Part 2: Medium, Mirror

Map Matching Algorithms

With our dataset generated, we can now implement and evaluate several map-matching algorithms to understand their respective strengths and weaknesses. We will explore four distinct approaches, starting from a simple baseline and progressively building in complexity:

Baseline Matching
Weighted Matching
Topological Matching
Hybrid Matching

This sequence was chosen because the algorithms are straightforward, grounded in solid theoretical principles, and build upon one another, creating a clear and logical progression. Importantly, these methods are known to produce strong results in practice [1].

Before diving into the solutions, let’s formally redefine the problem:

Annotations:

X — longitude
Y — latitude
B — bearing
H — heading
A — accuracy
E — edge in the graph

Given GPS samples that contains: X_noisy, Y_noisy, B_noisy, H_noisy, A_accuracy.

Our objective is to find the corrected coordinates: X_pred, Y_pred, and the most likely road segment: E_pred.

The primary goals are:

To assign the point to the correct road segment in the graph:

E_pred = E_actual

To minimize the geometric distance between the predicted point and the true point:

Min(((X_pred - X_actual) ** 2 + (Y_pred - Y_actual) ** 2)**0.5)

In simple terms, our mission is to accurately snap each noisy GPS point to the correct road on the map, ensuring the corrected location is as close as possible to the vehicle’s actual position. Easy peasy.

Evaluation

To evaluate our algorithms, we’ll use a multiple metrics approach that assesses performance at both the individual point level and the overall route level. Our three key metrics:

Point-level Accuracy (Classification): This is the percentage of GPS samples correctly matched to their true edge. It tells us how often the algorithm gets the right answer.
Point-level Error (Regression): This measures the average geometric distance (in meters) between the algorithm’s predicted point and the actual location. This quantifies how wrong the matches are.
Route Similarity (Structural): Since a trajectory is a connected sequence, we also need a metric that evaluates the entire route’s structural integrity. Point-level metrics alone can be misleading; an algorithm might get 99% of points right but place one on a completely different highway.

For route-level evaluation, we’ll use a modified Levenshtein Distance [2]. This algorithm measures the difference between two sequences by counting the minimum number of single-edge “edits” (insertions, deletions, or substitutions) required to change one sequence into the other.

Before applying the metric, we perform a crucial preprocessing step: we compress both the actual and predicted routes by removing consecutive duplicates. For example, a route (1, 1, 1, 2, 2, 3) becomes (1, 2, 3). This is necessary because a vehicle generates multiple GPS samples while traversing a single long road segment.

Lets have a quick example, assuming the actual route is: 1, 1, 1, 2, 2, 2, 3, 2, 4.

And our predicted route is: 1, 1, 2, 2, 1, 3, 3, 4.

We’ll use our modification to exclude consecutive duplicates and turn it into: 1, 2, 3, 2, 4 and 1, 2, 1, 3, 4.

The first two edges (1 and 2) match perfectly. (0 edits)

At the third position, the actual route has edge 3 while the predicted has 1. This requires one substitution. (1 edit)

At the fourth position, the actual route has edge 2 while the predicted has 3. This requires a second substitution. (2 edits)

The final edge (4) matches.

The total Levenshtein distance is 2. To make this value comparable across routes of different lengths, we normalize it into a score where 1 indicates a perfect match:

L_score = 1 - (L_distance / max(Len_predicted, Len_actual))

We evaluate every algorithm on 1000 routes from an augmented dataset we created in the previous article.

Base Class

To ensure a consistent structure, all algorithms are implemented on top of a single abstract base class called MatchingAlgorithm. This class manages two essential components: a graph attribute, holding the igraph.Graph road network we’ve discussed, and the index. The index is an STRTree provided by the shapely package, a data structure that enables highly efficient spatial queries on geometric objects, allowing us to quickly find nearby road segments for any given GPS point [3].

Each sample is matched using the following procedure:

Extract the sample’s details: longitude, latitude, heading/bearing, and accuracy.
Project the point from geographic coordinates to a local Cartesian system for accurate distance calculations.
Identify a set of candidate road edges that are most likely to be the correct match.
Score and select the single best candidate from this set based on specific criteria.
Calculate the point’s perpendicular projection onto the selected edge geometry.
Determine the final matched coordinates on that edge.
Revert the final coordinates from the local system back to the original geographic system.

While most of these steps are shared, the “secret sauce” of each algorithm lies in Steps 3 and 4. The unique logic used to find, score, and rank candidate edges is what distinguishes the map matching algorithms in this article.

Baseline Algorithm

Every good analysis starts with a benchmark. A baseline algorithm serves as a fundamental point of comparison, allowing us to objectively measure the performance improvements offered by more sophisticated solutions. For our baseline, we’ll implement the most intuitive approach: a purely geometric match.

Translating this logic into our base implementation is just as simple: the get_candidates method is overridden to find and return only the single closest edge.

Since the list of candidates will always contain exactly one item, the find_best_match method simply selects it without needing any complex scoring logic.

Accuracy: 0.47599
Distance: 16.01573
Lavenshtein: 0.50584

Weighted Matching

Moving beyond our simple baseline, the first real technique we’ll explore is Weighted Matching. The core idea is to create a more intelligent scoring system by combining multiple metrics. Instead of relying solely on distance, we’ll evaluate each candidate edge based on a weighted score that considers distance and bearing with the vehicle’s movement.

First, we select a pool of potential candidates using a dynamic search radius around the noisy GPS point. A robust method is to set this radius to three times the reported accuracy, as this covers over 99.7% of the probable locations for the vehicle (see previous map matching article for more information):

D_candidates = A_noisy * 3

Next, each candidate in this pool is scored using a formula that combines two key factors: a Distance Score and a Bearing Score.

D_diff = |(Y_2 - Y_1)X_noisy - (X_2 - X_1)Y_noisy + X_2Y_1 - Y_2X_1| / ( (Y_2 - Y_1) ** 2 + (X_2 - X_1) ** 2 ) ** 0.5
B_diff = Min(|B_noisy - B_edge|, 2pi - |B_noisy - B_edge|)
Score_pred = 0.5 * (1 / D_diff + 1) + 0.5 * ((cos(B_diff) + 1) / 2)

D_diff: This is the shortest perpendicular distance from the GPS point to the candidate edge (Yeah don’t worry about that, shapely has the implementation).

B_diff: This measures the smallest angle between the vehicle’s bearing and the direction of the road segment, correctly handling the 360-degree wrap-around (e.g., the difference between 359° and 1° is 2°, not 358°).

These two components are then normalized and combined into a final score. For this implementation, we’ll assign them equal importance with a weight of 0.5 each.

The candidate with the highest Score_pred is selected as the best match.

One of the greatest strengths of this weighted system is its flexibility. You could easily incorporate another metric, like the difference between vehicle speed and the road’s speed limit, and adjust the weights to 0.33 for each. If you believe distance is more critical than bearing for your use case, you could change their respective weights to 0.7 and 0.3. This adaptability allows the algorithm to be fine-tuned for specific datasets and requirements.

Accuracy: 0.658677
Distance: 13.40313
Lavenshtein: 0.68891

Topological Matching

Our Weighted Matching algorithm is a significant step up from the baseline, but it has a critical limitation: it treats every GPS point in isolation. Topological Matching remedies this by introducing the concept of memory. It leverages the fundamental structure of the road network to understand that a vehicle’s next position is highly dependent on its current one. Simply put, a car on edge A is far more likely to next be on an adjacent edge B than on a disconnected edge Z across town.

To implement it, we first maintain the state of each route by remembering the most recently matched road segment. For each new GPS point, we calculate the Weighted Score for all candidate edges as before, but then we add a topology_bias to reward candidates that are topologically sensible.

The rules for the bias are as follows:

If a candidate edge is the same as the previous edge, it receives a high bias: + 0.2
If a candidate edge is directly connected to the previous edge, it receives a medium bias: + 0.1
If a candidate is disconnected from the previous edge, it receives no bias: + 0.0

The final score is then calculated as:

Final Score = Weighted Score + topology_bias

The candidate with the highest final score is chosen as the best match.

Accuracy: 0.57710
Distance: 14.35921
Lavenshtein:  0.71596

A Note on Design: Bias vs. Factor

I initially experimented with a multiplicative topology_factor instead of an additive bias (e.g., multiplying the score by 1.5, 1.2, or 1.0, respectively). However, I found this approach to be too aggressive. A multiplicative factor can disproportionately amplify already high scores, potentially locking the algorithm onto a nearby highway even if a topologically correct local road is a better fit. An additive bias provides a more gentle and stable “nudge” toward the correct path, leading to more robust results.

Results

+---------------+----------------+---------------+---------------+
|   Algorithm   |   Accuracy     |    Distance   |   Lavenshtein |
+---------------+----------------+---------------+---------------+
| Baseline      |     0.47599    |   16.01573    |     0.50584   |
| Weighted      |     0.65867    |   13.40313    |     0.68891   |
| Topological   |     0.57710    |   14.35921    |     0.71596   |
+---------------+----------------+---------------+---------------+

Summary

In this series, we embarked on a comprehensive journey to build an effective map-matching solution from the ground up. We moved from theory and synthetic data generation to implementing and evaluating a progression of algorithms. This process revealed a critical insight: the “best” approach is not one-size-fits-all. The optimal strategy depends on the context, particularly the quality of the GPS signal, and requires balancing the trade-offs between different evaluation metrics like point-level accuracy and overall route integrity.

While this series provides a thorough foundation, the world of location intelligence is vast. For instance, performance can be enhanced by incorporating additional sensor data, such as signals from nearby WiFi networks or readings from a phone’s accelerometer and magnetometer. You can also implement even more advanced techniques, such as Incremental Look-Ahead algorithms or Sliding-Window Hidden Markov Models (HMMs).

But for now, this article concludes the map matching series.

Appendix: Results Analysis & Bonus Algorithm

We can see that the collected results match our predictions regarding the success of each algorithm, with one key exception: the performance of the Weighted algorithm compared to the Topological algorithm in certain metrics. This outcome could be explained by several reasons: a bug in the implementation, a flaw in the algorithm’s logic, or some special cases where the topological approach fails.

Since I’m the writer of the article, so obviously there are no bugs in my code (although reality is often disappointing) and the theory that providing more structural information should lead to better results, we’re left to explore the specific cases where this algorithm might perform worse than the simpler Weighted algorithm.

Let’s re-think the implementation. There are two main differences between our “advanced” algorithms and the baseline: candidate selection and scoring.

Candidate selection is based on distance, but it uses a dynamic parameter: the accuracy of each GPS sample to define the search radius.

Scoring first adds the bearing and distance difference (used in both Weighted and Topological) and then, for the Topological algorithm, we add the topology_bias. We can work under the assumption that this bias, which rewards physically sensible paths, shouldn’t “hurt” our score.

Let’s take another look at the metrics, broken down by accuracy:

High Accuracy (3m — 15m error)

Weighted Matching: Distance: 7.68, Accuracy: 0.7295
Incremental Topological: Distance: 8.96, Accuracy: 0.6472

Medium Accuracy (15m — 30m error)

Weighted Matching: Distance: 21.05, Accuracy: 0.5291
Incremental Topological: Distance: 23.88, Accuracy: 0.4291

Low Accuracy (30m — 60m error)

Weighted Matching: Distance: 48.69, Accuracy: 0.3327
Incremental Topological: Distance: 40.07, Accuracy: 0.3430

Very Low Accuracy (60m — 90m error)

Weighted Matching: Distance: 85.87, Accuracy: 0.1969
Incremental Topological: Distance: 55.03, Accuracy: 0.2767

You’ll notice a clear trend: when accuracy is high, Weighted Matching outperforms the Topological algorithm. When accuracy is low, the reverse is true. Let’s try to explain this:

When accuracy is high → The candidate set is small, and the distance differences between them are minimal. This gives the bearing difference a higher impact. In this delicate balance, adding the topology_bias can be too influential, causing the algorithm to repeat the same edge when it should be making a turn.

When accuracy is low → The candidate set is large and scattered. The distance differences are significant. Here, the topology_bias is extremely useful, acting as a powerful filter to eliminate candidates that are not connected to our current route.

In other words, the topology_bias topology_bias helps filter down a large, noisy set of candidates to only those that follow the route’s logic.

Therefore, I’ve implemented one more algorithm: HybridMatching. This approach simply selects which matching logic to use based on the provided accuracy for each sample.

Accuracy: 0.65990
Distance: 12.75588
Lavenshtein: 0.69737

We can see that the point-level accuracy and distance error outperform all other algorithms. Interestingly, the Levenshtein score dropped slightly. This is because we deliberately allowed non-topological logic back into our algorithm for high-accuracy points, trading a small amount of perfect route structure for better point-by-point precision.

Resources

Code

https://github.com/ornachmias/map_matching

References

[1] Quddus, Mohammed A., Washington Y. Ochieng, and Robert B. Noland. “Current map-matching algorithms for transport applications: State-of-the art and future research directions.” Transportation research part c: Emerging technologies 15.5 (2007): 312–328

[2] https://en.wikipedia.org/wiki/Levenshtein_distance

[3] Leutenegger, Scott T., Mario A. Lopez, and Jeffrey Edgington. “STR: A simple and efficient algorithm for R-tree packing.” Proceedings 13th international conference on data engineering. IEEE, 1997.

Map Matching Part 3 — Matching Algorithms was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

What is Work-Life Balance and How Does It Depend on Your Life Situation?

Vladimir Ressin — Tue, 12 Aug 2025 09:37:29 GMT

The topic of balancing work and personal life has become especially relevant in recent years. Today, work-life balance isn’t just a trendy phrase — it’s a real factor that impacts both your quality of life and your effectiveness at work.

But what does this concept actually mean, and why is it perceived differently depending on where you are in life?

Balance isn’t about “working less”

Many people still think that work-life balance just means spending fewer hours in front of a screen. But it’s not really about the number of hours — it’s about flexibility and mutual understanding.

At its core, it’s about respect: for your own time, for other people’s boundaries, and for the stage of life you’re currently in.

How does your life situation affect it?

When you’re single and free of major commitments, you have maximum flexibility and mobility. You can shift your schedule around, take initiative, and lean into challenges — and you enjoy it.

When you’re in a relationship, your priorities start to merge. It’s no longer just about getting tasks done — it’s about making time for shared evenings, conversations, and being truly present in your partner’s life.

When you become a parent, balance takes on a whole new meaning. It’s not just about being physically present, but emotionally present too. Your work schedule needs to reflect that reality — not fight against it.

And this is where the company you work for makes a real difference.

What company support for work-life balance really looks like

The difference between companies that talk about work-life balance and those that actually enable it is night and day. Real support goes beyond slogans — it shows up in how people are treated, day to day.

Here’s what that can look like:

Flexible hours and asynchronous work
→ You’re trusted to manage your own time — whether that means starting late after a rough morning or taking a break in the middle of the day to recharge.
No-meeting blocks and focus hours
→ Time is protected for actual work, not just status updates.
Parental support that reflects real life
→ Parental leave that’s not just offered, but encouraged. Understanding when a sick kid derails your day. Flexibility for school pickups without guilt.
Time off that’s actually respected
→ PTO isn’t performative. You log off, and no one expects you to “just check Slack real quick.”
Leaders who model balance
→ They don’t just talk about it — they live it. They leave on time, take vacations, and create space for others to do the same.

At the heart of it all is trust:
Trust that you’ll deliver — and trust that you’re a whole person, not just a worker.

Balance is a partnership

True work-life balance isn’t about doing the bare minimum. It’s a two-way street: the company takes your life circumstances into account, and in return, you’re not afraid to take responsibility.

You’re not counting down the minutes to the end of the day, because you know no one is taking advantage of your time.

How to know if you’re actually in balance

Work-life balance looks different for everyone — but there are some clear signs when you’ve found a rhythm that works for you. And just as importantly, there are red flags that something might be off.

✅ Signs you’re in balance:

You can unplug from work without guilt
You still have energy at the end of the day for things that matter to you
You don’t feel anxious when you’re not “available”
You have time for rest, connection, and personal growth
You feel trusted — not micromanaged or monitored

🚩 Red flags to watch for:

You feel constantly behind or overwhelmed, even after long hours
You check emails or Slack late at night, not out of urgency but out of habit or fear
You feel guilty for taking breaks or using vacation time
You have no buffer between work and personal life — they blend into one
You tell yourself, “I’ll rest after this project” — but that moment never comes

Balance isn’t about perfection — it’s about sustainability.
The question isn’t “Am I doing everything?”
It’s: “Can I keep living like this and still feel like myself?”

In conclusion

Work-life balance isn’t a fixed formula. It shifts as life changes, and it’s shaped by many factors. But at the heart of it is respect and trust.

When a company intentionally creates an environment where you can bring your full self — with your family, your plans, your worries — you become genuinely engaged and motivated. Because you feel like more than just a resource — you feel like a person.

What is Work-Life Balance and How Does It Depend on Your Life Situation? was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Map Matching Part 2 — Creating Synthetic Dataset

Or Nachmias — Thu, 10 Jul 2025 10:14:59 GMT

Map Matching Part 2 — Creating Synthetic Dataset

Mirror: https://ornachmias.github.io/2025/07/06/map-matching-2.html

Generating a Dataset

Having established what map matching is and why it’s necessary, our next step is to explore the algorithms that perform this task. However, before evaluating these algorithms, we first require a suitable dataset. The following section details the code used to generate both augmented noisy GPS data and its corresponding ‘actual’ non-noisy counterpart.

Synthetic Data

Following the initial graph construction, the next phase focuses on generating augmented data based on this graph. The dataset file encapsulates all logic for this data generation process. While it might be more efficient to work in a vectorized form on the data, I’ve decided to work iteratively for numerous reasons:

The code is more readable, which is important since I assume no previous knowledge is needed to read this article.
Intuitively, it simulates the way we expect the GPS to work.
Performance is not much of a concern at this point.
If you decide to take it to production and generate points over an entire country, I recommend that you take the time to re-write it in vectorized form to improve performance.

Select a Route

The first step is to generate a realistic route from the graph. This route serves as the ground truth that our algorithms will try to reconstruct.
The process begins with the selection of source and target vertices. These are chosen randomly from the graph, with the limitation that they should be separated by a minimum distance of 100 meters. This constraint is implemented to prevent trivial routes, such as those where the origin and destination are on the same edge. For this distance calculation here and in the next steps, the coordinates are projected to an appropriate local CRS to ensure geographic accuracy, a methodology detailed in my previous post.
Once suitable source and target vertices are identified, the get_shortest_path method, provided by the igraph package, is employed to determine the sequence of edges forming the optimal path between them.
Some of you might notice a problem with those sources and targets — they will never be at the middle point of an edge. While data points for data generation are generated across all of the route’s edges, this characteristic of start and end points being on nodes is considered acceptable for our purposes. The rationale is that the point-matching process applies to all generated data points along the route, and the specific location of the route’s two endpoints has a limited impact on the overall augmented dataset.

Points’ Timestamps

With the route itself established, the arguably more tricky task is to determine the timestamps for our simulated GPS sampling points along this path. While many applications are configured to send location data at fixed intervals (say, every X seconds), the reality of server-side data is often less predictable. To create a dataset that reflects it, we need to simulate how far a user would travel in variable time-frames, rather than strictly uniform ones.

The challenge is that real-world data isn’t always clockwork. An app might be set to send GPS updates every few seconds, but several factors can lead to data being missing, delayed, or ill-synchronized. For example, driving through an area with no cellular reception, like a tunnel, can interrupt transmission. In other instances, a user might switch to another app (hopefully Spotify and not iMessages) and when background data collection isn’t permitted, it’ll temporarily halt updates. On a more technical note, slight drifts between satellite and receiver clocks can also cause minor deviations in the actual sampling intervals [1]. Our dataset needs to reflect these common scenarios.

To model these variations, I’ve chosen to generate the time intervals between samples using a randomized approach based on a normal distribution. We’re using a mean interval of 4 seconds with a standard deviation (sigma) of 3 seconds. This setup means most time gaps between samples will naturally fall between 1 and 7 seconds. I’ve also implemented a minimum interval of 1 second to prevent multiple samples from effectively occurring at the same instant.

Driving Speed

Beyond just the route, another key variable influencing both a driver’s location over time and the perceived accuracy of GPS bearing measurements is the driving speed. Since we’re not working with live measurements, we need to simulate these speeds. The most sensible starting point for this is the posted speed limit for each road segment.
We take this information from OpenStreetMap’s maxspeed attribute. (For segments where this data is missing, we already had a fallback strategy discussed in previous post). This maxspeed then becomes our base for our speed simulation. For each edge in our graph, we model the driving speed by drawing a random value from a normal distribution. The mean of this distribution is set to the edge’s maxspeed value, and I’ve chosen a standard deviation equal to half of that mean.

This touch of randomization is our way of mimicking real-world driving: the flow of traffic, individual driving styles, and other on-the-road variables. As an example, consider a road segment with a maxspeed of 90 km/h. Our model uses a mean of 90 km/h and sigma of 45 km/h, which can generate realistic speeds. This includes everything from very slow travel (e.g. traffic jam) up to speeds around the 90 km/h limit. It also allows for drivers reasonably exceeding this limit — for instance, speeds up to around 135 km/h are within one standard deviation. While the model will mostly generate speeds closer to the mean, it doesn’t entirely rule out those more… enthusiastic drivers (like the 232 km/h scenarios that occasionally make headlines [2]), as these would represent rarer values further out in the distribution’s tail.

Coordinates

Alright, with our route charted and speeds for each segment determined, we’re ready to generate the actual sequence of GPS points along this path. We begin our journey with a relative timestamp of zero. This means we’re not tied to a specific real-world start time, instead, the absolute “when” of the route can be decided by whatever application eventually uses this data. It keeps things nice and flexible.
To place each data point, we effectively “drive” along our route, edge by edge. For each segment, we interpolate points based on two main ingredients: the simulated speed for that particular edge and our series of (deliberately varied) sampling timestamps. The geometric magic for this is handled by the line_interpolate_point function from the shapely package. Essentially, you provide this function with the start and end coordinates of an edge, the speed assigned to it, and a specific time offset (from our sampling timestamps, representing time elapsed along that edge), and it calculates the precise coordinates where that sample point would fall.

Accuracy

Now, let’s talk about GPS accuracy. For each simulated location point, I also need an associated accuracy value. This figure usually indicates a radius in meters around the reported coordinates, representing a 68th percentile confidence level [3]. Essentially, how much wiggle room there is in that fix.
As you might guess, a whole host of factors play into how good (or bad) this accuracy figure can be. We’re talking satellite positions, how tall the buildings are around the device, signal strength, atmospheric conditions, and more [4]. While diving deep into all those variables is a fascinating journey (though perhaps a longer one than you’d initially sign up for), our aim here is to generate values that feel realistic for our dataset.
To achieve this, I’ve set up a system based on four predefined accuracy ranges, reflecting different real-world conditions:

3–15 meters: Pretty good signal, representing optimal or near-optimal conditions.
15–30 meters: Decent, but perhaps some minor interference — let’s call these sub-optimal conditions.
30–60 meters: Now we’re getting into noticeably noisy values.
60–90 meters: This represents more significant signal degradation, the kind of interference I’ve occasionally spotted in production databases.

To decide which of these ranges a particular sample’s accuracy will come from, I use a weighted selection approach with the following probabilities: [0.7, 0.25, 0.04, 0.01]. So, most of the time (70%) we’ll be in optimal conditions, but we’ll still get a sprinkle of the less ideal scenarios. Once a range is selected based on these weights, an actual accuracy value is then picked uniformly at random from within that specific range.

Coordinates Noise

So now that we’ve got the samples and an accuracy for each one, we can start introducing noise to each set of coordinates. This noise depends upon the accuracy we’ve set to the point, since the samples can be in any place inside this horizontal measurement, based on normal distribution. So we set our noise with mean 0 and std of the accuracy, and we generate the noise. The overall distribution of noise will seem like one single normal distribution, but when we separate it by the accuracy, we’ll see its multiple normal distributions behaving differently from one another, due to our parameters and number of samples in each group.
With our sample points and their GPS accuracy established, it’s time to introduce some realistic positional noise. The amount of noise added to each coordinate is directly tied to its pre-assigned accuracy value. We model this uncertainty by assuming the error around the initially calculated location follows a normal distribution. Specifically, for each coordinate, we generate noise from a normal distribution with a mean of zero and a standard deviation that matches the point’s specific accuracy value (in meters). Interestingly, if you looked at all the applied noise collectively, it might resemble a single normal distribution. However, it’s more nuanced than that. When you consider the noise based on our defined accuracy tiers (optimal, sub-optimal, etc.), it becomes clear that the overall effect is a composite of multiple distinct normal distributions. Each tier contributes a distribution with a different spread (its standard deviation), shaping a more varied and ultimately more realistic scatter of GPS points.

Bearing Noise

Now for the final step — the bearing. The bearing is calculated from successive position fixes, so if the device is moving slowly, the distance between these fixes is tiny, making the direction calculation quite sensitive to small GPS position errors [5]. Reflecting this, our model applies a more substantial random noise when the simulated speed drops below 1.5 m/s, drawing from a normal distribution with a mean of 0 and a hefty standard deviation of 45 degrees.
Once the speed picks up beyond that 1.5 m/s threshold, GPS bearing typically becomes more stable. The accuracy generally improves as the vehicle moves faster because the distance between position fixes increases, making the direction calculation less susceptible to minor positional noise [6]. Our simulation mirrors this by calculating the noise’s standard deviation as inversely proportional to the speed (using a scaling factor of 30.0 degrees*m/s). However, since no GPS is perfect, even at higher speeds, we ensure there’s always a touch of uncertainty by setting a minimum noise standard deviation of 2.0 degrees. This generated noise is then added to the true bearing, and the result is neatly wrapped to stay within the 0–360 degree range, giving us our final, realistically noisy, bearing.

Summary

In this initial part of our series on map matching, we’ve focused on discussing the essential concepts. This foundational understanding, coupled with the carefully prepared dataset, paves the way for the second part of our series, where we will delve into various map matching algorithms and assess their performance.

Resources

Code

https://github.com/ornachmias/map_matching

References

[1] https://www.researchgate.net/post/GPS_data_recording_at_equal_time_intervals_impossible

[2] https://www.ynet.co.il/news/article/SkJ4tTdaw

[3] https://developer.android.com/reference/android/location/Location#getAccuracy()

[4] https://web.stanford.edu/group/scpnt/pnt/PNT15/2015_Presentation_Files/I15-vanDiggelen-GPS_MOOC-Smartphones.pdf

[5] https://www.ardusimple.com/how-gps-can-help-you-measure-the-real-heading-of-your-vehicle/

[6] https://medium.com/anello-photonics/mastering-true-north-5-ways-to-determine-your-absolute-heading-d63fc3543c0

Map Matching Part 2 — Creating Synthetic Dataset was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Disrupting the Disruptions: How We Engineered Through GPS Chaos

Gil Goldenberg — Mon, 30 Jun 2025 17:54:54 GMT

GPS disruptions occur when the satellite signals used for location tracking and navigation are interrupted, manipulated, or blocked. This can happen due to jamming, where signals are intentionally overwhelmed; spoofing, where fake signals mislead receivers; or physical obstructions, such as buildings or terrain, that block the signal. These disruptions can severely impact systems relying on GPS, including ride-hailing apps, logistics, and navigation tools.

In this article I’ll try to share how GPS disruptions in Israel following October 7th 2023 presented us @ Gett with a business challenge, one of which we ended up solving technically.

Gett is an on demand ride hailing app serving millions of users in IL and the UK and it is heavily dependent on GPS signals for riders and especially for drivers.

Riders might be offered with a smoother, more slick and accurate experience when ordering a taxi - as their current location will be suggested to them when choosing where from/to order the taxi.

Drivers on the other hand are strictly dependent on the GPS signals. Retrieving an inaccurate GPS signal will essentially result in a bad experience for both the rider and the driver:

In the good case - drivers might be offered with irrelevant rides, causing longer ETAs, higher cancellation rates, a sharp drop in satisfaction rates - bad user experience for everyone.

In the bad case - inability to have GPS signals can shut the business down - offers which are distributed to drivers according to their location will simply not arrive the disrupted drivers - not allowing them to even provide bad service - but falling into not giving service at all!

Time went by, and we realized this issue is not going anywhere. In parallel our customer support steadily increased the pressure on finding some solution - providing relief to the suffering drivers - allowing them to work (and by that reduce much load on customer care department), but more importantly - stop the bleeding cut the company had - a driver that cannot work is a frustrated driver, indeed - but its also a driver which does not contribute to the company’s eco system.

This business stress and expectations translated into hours of discussions, brainstorming, numerous POCs and what not. Eventually we came up with a pretty straightforward solution - one that should allow drivers to work, and on the other hand won’t compromise the quality of service Gett aspires to provide to both drivers and riders.

The requirements

Identify when a driver is in disrupted mode
Let the driver know this is currently the case
Allow drivers to temporarily set their location (ignoring the GPS signals while in this state)
Relief some strict location based rules exists while driver is currently in an active ride (such as allowing them to mark they arrived to the pickup only when they are actually close to pickup coordinate)

In order to achieve this - we draw this flow chart representing what the app should be doing with each location update it receives from the OS.

Main points to pay attention to are:

Each location is being persistently saved to local cache as the last valid location - only if it is a valid location. This location is then used as the driver’s location - ignoring invalid reports.
Define a “grace” mechanism - to avoid UI flickering and annoy the driver with frequent disruption mode changes.
A clear distinction between free mode and in-ride mode - these are totally different use cases requiring different handling. For example for a driver that is in ride mode - we would like to ignore some in ride validations to allow a smoother workflow for the drivers.
Control the feature remotely using a feature flag - different parts of Israel experienced different disruptions characteristics, making it super important to have an easy way to control which driver gets the feature, when, and to what degree of customizability - Some will just use the last valid location and have an indication of the disruptions mode on the main screen. Others will also have the ability to manually set their location temporarily when disruptions mode are active.

The first obstacle we hit was the ability to test this.
The mechanism is indeed a straightforward one and quite simple to implement - however, how do you test it is functioning as expected? We don’t control the GPS disruptions (“This is military level sh*t…”)

We understood we can only play with what we control - and we control not only the mechanism we defined, but also our definition of a valid location.

We noticed that every time a GPS disruptions are active, we found ourselves in 1 of 2 places - Cairo or Beirut Airport.

Gett is not operating in those places so we could create 2 simple polygons around the spoofed location and define a valid point as any point that is NOT falling within these polygons.

Nice. Now we can define a polygon wherever we want and test the functionality when we need (and not only when disruptions are on). In order to allow us do it easily we added another feature setting we called `gps_disruption_polygons` which is an array containing the invalid polygons, and every GPS signal we get, we validate against these polygons.

This worked great, but soon enough we realized GPS disruptions have many faces and shapes:

For instance, We noticed some times the location services of the device reported new locations that were not falling into our predefined “invalid polygons” making a reported location that is disrupted considered as a valid point. This was easily addressed using the `gps_disruption_polygons` array described above - by dynamically define new invalid polygons as we encountered new disruption locations.

Another example was when we realized a disruption can sometimes be showed to us as an error by the location services of the OS - compromising our solution entirely since it was basing on checking if a reported location is in a polygon or not - while in this scenario, there was no location reported AT ALL.

Luckily we had analytics events covering the entire flow and we learned this use case is quite an edge case, happening only for a few seconds each time. Our mechanism could handle the vast majority of these cases since it prioritized the persisted last valid location.

Post production , feedback and learnings

During the following weeks we learnt the decision to have a dynamic list of invalid polygons proved itself. More disrupted locations joined the party and riders and drivers found themselves not only in Egypt or Lebanon, but also in Jordan and even in the middle of the sea.

Our main concern with this entire approach was (and to some degree still is) fraudsters trying to manipulate the mechanism and allow themselves to be positioned in places they are not actually at in order to catch and take the “hot” rides (for example to or from the airport)

For this, point 4 above proved to be key in handling such cases as well, allowing us to disable the ability of manually setting the location for identified fraudsters or drivers that tried to abuse it somehow.

This simple yet sophisticated solution proved itself quite quickly. We received great feedback from drivers, thanking us for allowing them to work again after long time they could simply weren’t able to.

Everybody were happy with the relatively quick and robust solution we created, and recently this feature have won a very pleasing nomination in an internal “Feature Of The Year Awards” ceremony that happens in our department every end of the year for the last 3 years - The highest business impact with least effort award.

Disrupting the Disruptions: How We Engineered Through GPS Chaos was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Archiving Automatisation on Tableau Server: Tableau + Python

Maria Konstantinova — Mon, 30 Jun 2025 06:01:57 GMT

Changes occur rapidly in a fast-paced business environment, impacting the content that business users expect to see in reports. Therefore, ensuring a clear and up-to-date reporting system is crucial for saving user time and delivering impactful insights and analytics. An automated content archiving process is essential for this purpose. In this article, I will outline the steps we took to implement and automate this process on Tableau Server at Gett.

Developing a scalable and automated Archiving process involved the following phases:

Content life span study & rules definition
Monitoring tool & organising archiving at Tableau server
Automation by Tableau REST API and Notification to content owners

Phase 1. A study of content lifespan on Tableau Server

The development of the server clean up process was started from the simple research of content at our Tableau Server. We took data from Tableau Server repository (it holds information of dashboard usage for the last 90 days) and used 2 metrics: Days of Use (total number of days when a dashboard was opened at least once) and Dashboard Age (number of days since date of dashboard creation).

We have defined a group of “Young dashboards” (dashboards with the Dashboards Age below 1st quartile) and excluded them from the archiving process, because in reality dashboard development is an iterative process that started from publication of simple MVP-version and then after some time a dashboard growth to well-developed version. In order to give a chance for this group of dashboards ‘to be adopted’ we excluded them from the clean up process. Then we defined the group ‘candidates for archiving’ — dashboard with the usage below 10% percentile of Days of use. We don’t use static thresholds (like Days of use=0) in order to make our process scalable, because after living with regular clean up the threshold might be changed.

But.. It is not so simple just to take dashboards with low usage and delete them from the server. Of course we have some exclusions. We excluded dashboards with Tableau subscriptions (some of our business users prefer to get data via email and not open a dashboard), automated crosstab/PDF downloads (some Gett employees prefer to play with the data in excel and we automated for them delivery of these files), and Quarterly & Yearly reports. Educational examples for newcomers were also excluded.

Phase 2. Monitoring tool & organising archiving at Tableau server

2.1. Our Tableau Server archive structure & storing concept

In Gett we follow Tableau best practices and set up access permissions on project and group level. All content is organised by top-level projects by business domain. Each domain has two top-level projects: the project is available for everyone in the company with dashboards that do not contain sensitive data and top-level project with restricted access. Each top-level project has the same structure and contains:

Country folders with dashboards in production (business users can view, developers can publish & view)
Domain data source folder with published data sources (business users can view and interact with them, developers can publish & view)
Domain sandbox (no access for business users, developers can publish & view)
Domain Archive (no access for business users, developers can view without access to publishing and editing)

Note: Each second-level project can have an unlimited number of subprojects.

2.2. Monitoring tool — Tableau dashboard for everyone

The second phase involved making the process transparent and accessible to everyone in the company. It is easy to do by creating a Tableau Server Content management dashboard available for every Tableau user in Gett. The dashboard is built on Tableau Server repository data and consists of 3 pages:

Overview page
Dashboards to archive
Archived dashboards

The Overview page provides information about the number of dashboards on Tableau Server, categorised by type and developer. This page is a useful tool for analytical managers who need to know the situation with Tableau dashboards in their team (e.g. a lot of dashboard in Sandbox folder might be a sight that something goes wrong with planning and processes in the team).

Dashboards to archive page shows the detailed list of dashboards that will be archived this month and explains the archiving criteria. In order to make our process more scalable we added the parameter of dashboard age threshold (after living with the process half a year we are doing to review it and this parameter will help us to change threshold easily in the future).

From the Archived dashboards you can get information about the dashboards that were archived, reach dashboard developer and click on the dashboard link (the access is restricted for business users, but the VisTeam will receive an automatic access request, so they we will provide the information about the process if the user would like to return back to production an archived dashboard).

2.3. The process of restoring an archived dashboard

In Gett, the developer of a dashboard is a content owner of their dashboard. It means that the developer is responsible for data quality, metric logic of a dashboard. Business users should contact the developer to restore an archived dashboard instead of contacting a Tableau Admin. Then the owner guides the requester that they should use instead of an archived dashboard or if the archived dashboard is needed to be returned back to archive the owner reviews the dashboard and requests restoration from the VisTeam (Tableau admins).

Phase 3. Automatisation

The third step involves automating the process using Tableau REST API and well-known Python libraries like pandas, os, requests, datetime, xml, smtplib, email. The automation includes 5 main steps.

Step 1. Connect to Tableau Server

First, create a Personal Token on Tableau Server (in your personal space). Use it in order to connect to Tableau Server via Tableau REST API, e.g.

requests.post(f’https://{server}/api/{api_version}/auth/signin', headers=headers, json=json_signin)

Step 2. Get list of dashboards that should be archived

We already have this list in Tableau Content management dashboard at “Dashboards to Archive” page, so in order to get this list we should just download excel table from respective view by Download View crosstab excel method and open it as pandas df.

Tableau hack: this method does not allow you to select a view from the page that you need to get as a crosstab in the code, it downloads the first view by default. To download a specific view, rename it to “A sheet_name” as Tableau downloads views in alphabetical order.”

However, the workbook data is incomplete. We need the workbook IDs from the API to archive them (Note: These IDs are different from the numerical IDs in URLs. They are longer strings like ‘a115ab6-ca9c-492e-a5d5–2ed67da5’). We enrich our workbooks data by using the Query workbook for site method and left join this information to the data frame.

Step 3. Get archive projects ids

Now that we have information about our workbooks, we need to determine the projects where we should archive the dashboards. To do it we use the same Tableau REST API method, but for projects Query projects for site method

Then for each project we find the respective archive project (find domain project id -> find domain archive project id, use recursive function in order to do it).

Step 4. Unschedule dashboards refresh

To optimise Tableau server performance, we should stop refreshing archived dashboards. Interesting thing is Tableau extract refreshes are separate entities called ‘Task refreshes.’ To stop refreshing an archived dashboard, we must delete its corresponding Task refresh. We use the ‘Delete Extract refresh’ method. Note that this requires Tableau Server Admin permissions.

Step 5. Archive Workbooks (move to the appropriate archive folder and add the archive date to the name).

We use the ‘Update Workbook’ method to move workbooks to the Domain archive (new project id=domain archive project id). The same method was used for workbook renaming with the date of archiving.

We do not use the last modified date as the archive date. If ownership changes (e.g., due to an employee leaving the company), the archive date remains unchanged.

Step 6. Notify content owners

The last part of the process is automatic notification to content owners that some of their dashboards have been just archived. Our developers received emails like this:

Realisation is simple:

Conclusion

Regular report archiving is essential for maintaining insightful and up-to-date reporting systems. In Gett we do content archiving at Tableau Server on a monthly basis that helps us to maintain a clean self-serve environment for our business users. Automating archiving with Tableau REST API and Python saves time for Tableau Admins and is also used for other automation tasks. Stay tuned! In the next articles I will share with you where we use Tableau REST API automatisation also 🙂

References

Tableau REST API documentation — https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm

Tableau Server Repository documentation — https://help.tableau.com/current/server-linux/en-us/perf_collect_server_repo.htm

https://tableau.github.io/tableau-data-dictionary/2024.2/data_dictionary.htm

Archiving Automatisation on Tableau Server: Tableau + Python was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Map Matching Part 1 — GPS, Maps and Graphs

Or Nachmias — Mon, 23 Jun 2025 12:04:30 GMT

Map Matching Part 1 — GPS, Maps and Graphs

Mirror: https://ornachmias.github.io/2025/05/26/map-matching-1.html

This series of articles will look into the subject of map matching, providing foundational knowledge. It examines the core concepts and inherent challenges within this domain, and details a comprehensive process for generating a synthetic dataset. I’ll then extends to exploring various map matching algorithms and a comparative evaluation of their performance, utilizing the dataset constructed.

GPS Data

While we often take for granted the availability of GPS on most of our mobile devices, in the background, this is actually quite a complex feat that is somehow available to us, essentially for free.

Originating from U.S. military research in 1973 and made open to civilians in 1980 [1], GPS, short for Global Positioning System, accurately describes itself: “Global” means it’s available worldwide, “Positioning” means it’s designed to determine locations, and “System” signifies that it requires multiple components.

Let’s explain it in a hand-wavy way: there are satellites in space whose locations are known, and some ground stations on Earth whose positions are also known. These two communicate to ensure both satellites and ground stations are precisely where we expect them to be. Now for the last part: the users’ devices, which contain a GPS receiver chip. This receiver is able to obtain signals from the satellites and triangulate its location based on multiple parameters received from them.

https://spaceplace.nasa.gov/gps/en/ [2]

GPS receivers can provide all kinds of useful information, but for our discussion, only five matter: longitude, latitude, bearing (or heading), accuracy, and timestamp. Longitude, latitude, and timestamp are pretty self-explanatory. Bearing, or heading, is the angle measured in degrees in a clockwise direction from true north [3]. Usually, bearing indicates an angle to a specific place, while heading is the angle of the device itself. Accuracy is more of a probability field — it’s the radius (in meters) within which there is a 68% chance that the actual signal is located [4]. This means if we receive an accuracy value of 3 meters alongside latitude and longitude, there is a 68% chance that the actual location is within 3 meters of the received coordinates.

Noisy Measurements

GPS measurements contain noise due to multiple factors. Atmospheric interference, signals that can bounce off objects, geometric arrangement of visible satellites, satellite clock inaccuracies and minor errors in orbital data also contribute to measurement noise. Receiver quality and internal processing algorithms further influence the level of noise in the final output. Meaning, even if we stand still, our GPS measurement may think we continue moving all the time.

Latitude and longitude typically have errors in the range of 5–10 meters for standard receivers under open sky conditions; high-accuracy receivers can achieve around 1 meter. Altitude measurements are generally less precise, with errors potentially reaching 10–20 meters or more. Heading accuracy can vary significantly, ranging from a few degrees to tens of degrees, especially at low speeds. Bearing accuracy depends on the accuracy of the start and end points and the distance between them. Speed accuracy is generally better than positional accuracy, potentially within a few tenths of a meter per second. [3]

Maps and Graphs

Geographic maps visually represent spatial data, displaying features like points, lines, and polygons with associated coordinates and attributes. Think of a map showing buildings (polygons), rivers (lines), and landmarks (points). Graphs, however, model the relationships between these geographic entities. In a graph, nodes represent discrete locations or points of interest, while edges represent the connections or relationships between them. For instance, a road network is perfectly suited for graph representation: intersections become nodes, and the roads connecting them become edges. This graph structure is then overlaid on a geographic map. This powerful combination allows us to apply advanced graph algorithms to solve spatial problems directly on geographic data, enabling applications like efficient routing or complex network analysis.

Now, while maps provide the visual context and graphs offer the relational structure, there’s a crucial step to bridge the gap between noisy, real-world GPS measurements and the clean, structured digital road network. This is where map matching becomes handy.

What is Map Matching

Map matching is a crucial process in location-based services, addressing the inherent inaccuracies present in raw GPS data by aligning recorded coordinates with a road network. At its core lies the concept of “snapping,” where individual GPS points are associated with the most probable road segment. This association is not solely based on proximity, different algorithms consider a multitude of factors, including the spatial closeness of the point to potential road segments, the direction and speed of travel inferred from the sequence of points, and the topological connectivity of the road network. By intelligently “snapping” these noisy measurements to the underlying infrastructure, map matching algorithms generate a more accurate and coherent representation of the traveled path, enabling reliable navigation, traffic analysis, and various other applications.

A simple example of the importance of such a thing is your everyday scenario — you’re driving and your navigation app loses GPS signal in a short tunnel. Without map matching, the software would struggle, likely misplacing your vehicle by hundreds of meters. However, map matching would have already aligned your position to a specific road. Assuming the tunnel has no intersections, the system predicts your continued movement along that same road, using your car’s speed and calculations (like linear algebra) to maintain an accurate location estimate.

Implementation First Step — Map to Graph

In order to properly implement and learn map matching algorithms, we first need to have a map and a matching graph. Lets have a quick look on how we can accomplish that.

We kick things off using the pyrosm library, which is handy for working with OpenStreetMap (OSM) data (basically a massive, free map of the world).

Here’s how we pull out the raw network parts:

import os
from pyrosm import OSM
import igraph as ig
import numpy as np

# Assuming map_path and crop_bounding_box are set up earlier
# map_path points to our downloaded OSM file
# crop_bounding_box lets us focus on a specific area if needed

osm = OSM(map_path, bounding_box=Config.crop_bounding_box)
# Get all the road segments and their intersections
nodes, edges = osm.get_network(nodes=True, network_type='driving')

What’s happening above is OSM loads our map file, and then get_network sifts through it to find all the bits that make up the “driving” network — that’s our roads and where they meet. The nodes are essentially the intersections or endpoints, and edges are the actual road segments connecting them. (Also, don’t worry, the code in the repo also includes downloading the map and setting the configuration value for the bounding box. Just a bit boring to add that in here as well).

Once we have these nodes and edges from the raw map data, we hand them over to igraph chosen mostly because of performance. This step builds the actual graph structure we need:

# Convert the extracted nodes and edges into an igraph object
graph = osm.to_graph(nodes, edges, graph_type='igraph')

This line is the core of the transformation. It takes those raw map elements and creates a proper graph where our intersections are vertices and the road segments are edges. The cool part is that pyrosm automatically carries over useful information from the OSM data, like the road’s name or its geometry (the actual shape of the road segment). We then take that geometry and use it to calculate things like the road’s bearing, and we also clean up other attributes like maxspeed, ensuring every edge has the data we need for later calculations.

# Loop through every edge (road segment) in our new graph
for edge in graph.es: 
    # Calculate the bearing (direction) of the road segment
    edge['bearing'] = self.calculate_bearing(edge)
    # Set a default maxspeed if it's missing, otherwise convert to float
    if edge['maxspeed'] is None:
        edge['maxspeed'] = 35.
    else:
        edge['maxspeed'] = float(edge['maxspeed'])

By adding bearing and cleaning up maxspeed, we’re essentially enriching our graph. It’s not just a bunch of connected lines anymore; it’s a smart representation of the road network that understands directionality and speed limits. This robust graph is then ready for all sorts of advanced uses, especially when we get to the tricky business of map matching.

Summary

Now that our graph is enriched and ready, we face a classic chicken-and-egg problem: we can’t test our map matching algorithms without data, and we can’t validate our data without algorithms. We’ll tackle this head-on by first constructing our own realistic, synthetic GPS dataset, paving the way for us to finally implement and benchmark various map matching approaches.

Resources

Code

https://github.com/ornachmias/map_matching

References

[1] https://en.wikipedia.org/wiki/Global_Positioning_System

[2] https://spaceplace.nasa.gov/gps/en/

[3] https://www.lifewire.com/what-is-bearing-in-gps-1683320

[4] https://developer.android.com/reference/android/location/Location#getAccuracy()

Map Matching Part 1 — GPS, Maps and Graphs was originally published in Gett Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.