Stories by Skyscanner Engineering on Medium

Skyscanner’s journey to effective observability

Skyscanner Engineering — Fri, 14 Mar 2025 10:34:23 GMT

The year was 2020 and Skyscanner, like the entire travel industry, faced unprecedented challenges due to the global COVID-19 pandemic. Yet, this difficult year also provided an opportunity for introspection, prompting us to enhance our tools and processes to emerge more resilient than ever, to be the world’s number one travel ally. This is where our journey to completely revolutionise our approach to observability begins.

The image below shows a “simplified” view of what our internal observability platform looked like at the time. As you can see, there was some room for simplification. This platform contained a mix of specialised vendors for RUM, tracing, or synthetics, and a large number of internal systems based on open-source backends like OpenTSDB, Prometheus, or multiple ELK stacks.

However, our challenges were not simply related to cost, or the complexity of running this platform with a small team. We understood that our most important problem to solve was improving the confidence of all engineers to understand and operate their services, to reliably connect more than 110 million users to over 1,200 flight, hotel and car hire partners each month. This required an observability platform that would…

Reduce cognitive load and context switching for engineers, with one single platform, and one single telemetry language.
Correlate traces, metrics, logs, and events, across multiple services and frameworks, from client devices all the way down to Kubernetes containers. Our components don’t work in isolation, and neither should the signals we use to observe them.
Optimise the cost and quality of the data produced, storing the data we need to operate our systems reliably, and no more. Meaningful, contextual data can be cheaper than low-quality, verbose data.
Implement open standards to future-proof our instrumentation and transport layers, to ready our tech stack for changes in the overall industry while reducing maintenance overhead.

If you know anything about observability, you’ll know that none of those are trivial problems to solve. We didn’t need a lift-and-shift, we needed a mindset shift.

OpenTelemetry and New Relic, central pieces in our cloud native strategy

Solving the challenges above required a long-term strategy, and set of a guiding policies. So, we got to work, and after weeks of PoCs, reviews, and prototypes, we defined a set of individual strategies based on two major principles:

One single standard to instrument our services and transport our data, OpenTelemetry.
One single backend to store and analyse our data, New Relic.

Skyscanner runs on cloud native tooling. We have a team of world-class platform engineers contributing to several CNCF projects like Kubernetes, Istio, or Argo. When a new kid on the block called OpenTelemetry started to make noise back in 2019, we listened. All our distributed tracing was already based in OpenTracing, now deprecated in favour of OpenTelemetry along with OpenCensus and other standards like ECS (Elastic Common Schema) — take that XKCD 927, we’re a net 2 positive! –. This was a logical next step for tracing, aligned with our cloud native ethos.

But we didn’t stop there. With OpenTracing, we had previously experienced the benefits of decoupling cross-cutting APIs and their implementation, allowing us to switch vendors with no changes to instrumentation code. OpenTelemetry follows this same client design principle. So we knew OpenTelemetry was not only going to be the next step for tracing, it was going to change the whole observability industry. We’re all in when it comes to the vision of high-quality, standardised, portable, and ubiquitous telemetry provided by OpenTelemetry. We decided to double-down, and to base our strategy in the use of these open standards for context propagation, semantic conventions, transport protocols, and APIs across traces, metrics, logs, and baggage. The future is OTel-native, not APM agents, and we’re ready for it.

We wanted to use a single observability vendor that could support the capabilities we required under a single platform, but we didn’t want to compromise our stack to the future of observability. This was one of the main reasons that made us choose New Relic as an observability platform, and commence a partnership. A platform that could ingest telemetry using the standard OTLP protocol, and use OpenTelemetry semantic conventions to provide enhanced analytics on top of our data. This allowed us to start relying on open standards while OpenTelemetry stabilised tracing, metrics, and logs, while complementing with New Relic instrumentation in limited places where we felt OpenTelemetry was not yet ready at the time, like mobile or browser. We’re now closer than ever to achieve our North Star architecture illustrated below.

Mindful migrations for higher return-on-investment

Our motto for Skyscanner platform engineering is the following: make the golden path the path of least resistance. If the easiest way is also the one that follows best practices and engineering standards, why would you do anything different?

To help with this, at Skyscanner, we provide a set of core libraries to automatically configure certain aspects out of the box, like security, identity, or telemetry SDKs. This is where we implement opinionated defaults. For instance, how we aggregate data — cumulative temporality for Prometheus or delta temporality for New Relic — or how to export that data, which in our case is a centralised Collector Gateway.

To execute a migration, an API design like OpenTracing’s (and OpenTelemetry’s) makes things a lot easier. You can simply swap the underlying implementation with no changes to instrumentation code. This allowed us to migrate over 300 microservices in a matter of weeks, only requiring service owners to bump the version on one of those core libraries.

Other migrations, like logging, required some extra config in each service to be changed. Here, we used one our of own open-sourced tools, Turbolift, to automatically create over 1,000 pull requests to different repos and change their appender settings. Finally, in cases were we still had to rely on New Relic SDKs, we did so by providing very thin wrappers, with the intention of facilitating a final move to OpenTelemetry APIs when ready.

However, we knew that a lot of the data produced by our services was not being used by their owners. This is where a less popular side-effect of OpenTelemetry semantic conventions for Resource attributes helped. We now know exactly the service, account, namespace, or cluster that telemetry is produced from, because those attributes are now standard across all signals. With this, we can distribute telemetry costs back to service owners, so they can understand the cost that their telemetry produces, and provide guidance for better return-on-investment.

And then, something magical happened: we had teams that wanted to find more optimal ways of using telemetry. We taught them about distributed tracing, how it provides much more granular and high-quality insights into our systems than logging or metrics, and we explained the advanced ways of sampling that it can enable, to store only data about the transactions that matter — the slowest ones, or those that had an error somewhere in the stack. This allows us to store about 4% of the 2M spans and 80K traces we produce every second, without losing any of the data important for debugging. When they saw the advantages, they were convinced, and started to rely on tracing rather than verbose logging or high-cardinality metrics. This made some teams reduce their telemetry costs by over 90%!

Due to this cultural change needed in how we approach telemetry instrumentation, not everything could be a version bump or an automated pull request. We asked service owners to evaluate the telemetry they produce, use automatic instrumentation provided via OpenTelemetry if possible and, only if needed, rely directly on the OpenTelemetry API to instrument custom aspects, using the right signal (metrics, traces, or logs) for each use case.

Finally, when thinking about dashboards and alerts, we also applied the principles of reusability and modularity we would apply to any software we write, reducing cognitive load and maintenance toil for service owners. Thanks to Terraform modules and common resources, we can provide pre-canned alert definitions, or standard dashboards that every engineer can re-use for their owner service. Paired with the use of Atlantis, this allowed us to follow standard CI/CD workflows, with changes to alerts and critical dashboards being reviewed as code. This has improved the quality of our alerting and dashboards, with no more unwanted changes.

From the technical to the sociotechnical

I could (and I did) write pages and pages about the different technical decisions we made, and why we made them. However, the most difficult aspect to change in a system is not technology, it’s humans. All this observability data would be fairly useless if it’s not being used by people to ultimately improve how travellers experience Skyscanner.

One aspect of this cultural shift is changing how engineers approach the monitoring and debugging of a system to use observability tooling effectively. Humans are creatures of habit, and when a team has been operating with a certain style of runbooks for years, it’s difficult to show them that there may better ways. That there are unknown unknowns lurking in the system that can render their runbooks useless when being paged in the middle of the night. Using tooling provided by observability platforms, like New Relic in our case, can give engineers access to more advanced analytics to automatically correlate anomalies, forecast trends, or profile errors. This can help them find regressions, or optimisations, relying on facts rather than intuition and prior knowledge.

To tackle this cultural change, we kick-started an initiative called Observability Ambassadors. We wanted to bring the best practices in observability to the application domain. These ambassadors are proficient in the use of the OpenTelemetry API and instrumentation packages, and help bridge the gap between observability engineers, experts in transport pipelines and SDK config, and developers in their corresponding teams.

Observability Ambassadors help to advise others and drive initiatives. This becomes a lot easier if you can make it fun! Last year we started hosting Observability Game Days using the official OpenTelemetry Demo, to gamify system debugging and show engineers how OpenTelemetry instrumentation, and advanced tooling, can help them to understand their systems better.

The final piece of the puzzle, and one that cannot go amiss, is connecting telemetry back to Skyscanner travellers, and to business value. This is where SLOs come into the picture. At Skyscanner, our approach to SLOs had been based purely on RED (Rate-Error-Duration) metrics in the past. But in reality, do travellers care about API response codes? Not really. With access to client-side telemetry, we can drive SLOs from signals that relate directly to our users, like “how many flight searches displayed valid results?”. Most importantly, thanks to distributed tracing, we can now understand what services are part of the critical path for a given experience. This helps set appropriate SLO targets, aligned across services, to meet our overall commitments to our travellers. All part of an interconnected system.

This has started a fundamental change in how we look at cross-domain dependencies, and how we approach discussions between the different teams required to provide a reliable service. We’re using observability not only as a technical tool, but also as a sociotechnical tool, to help us reason about our system and make data-driven decisions. We base our commitments on evidence, not intuition.

Not Sure What to Do After Graduation?

Skyscanner Engineering — Wed, 04 Sep 2024 08:59:32 GMT

Not Sure What to Do After Graduation? Here’s how Skyscanner’s Grad Program kick-started our careers in tech.

Hello everyone! I’m Gergana, and I joined the Software Engineering Graduate Program in 2022. I hold a degree in Mathematics and Computer Science from the University of Glasgow.

And I’m Michal and like Gergana, I also joined Skyscanner in 2022. I studied Software Engineering at the University of Sheffield.

Gergana & Michal

We first met during the Discovery Day while interviewing for our positions, and we’re delighted to have continued our journey as colleagues. Both of us were equally excited and nervous about finding our place in the tech world.

We’re thrilled to share our experiences with you and offer a glimpse into the Graduate Program from our perspective. Sounds interesting? Keep on reading to find out how we found it!

Tracing Our Journey To Skyscanner

Gergana: I was in my second year of university when I decided that Skyscanner is a company I want to work for. I was aware that the culture there is very well aligned with the idea of the perfect workplace I had in my mind and the thought of working for a travel tech company was very exciting and motivating. So, I decide to apply for an internship. I got through the coding exercise step and the culture interview and I received the great news that I am invited to the final Discovery Day stage. It was all going just the way I wanted it. Unfortunately, it so happened that this same year COVID19 reached its peak, which as we all know, was particularly damaging for the travel industry, and as a consequence, the internship got cancelled. As much of a disappointing moment that was, I knew I would keep striving to be in a company like Skyscanner one day.

And so it happened. During my final year at university I learnt that Skyscanner are opening their Graduate Program again and even though I already had an offer from another big corporation, I knew where I wanted to be. So, I took my chance and went for the position with full energy and motivation. I am very happy to say now, that I have just graduated from this wonderful experience the 2022 Graduate Program was, and I am in a happier place than I could have imagined.

Michal: In my final year, I knew I wanted to dive into the world of software engineering, but the specifics? Not so much. One day, I asked myself, what companies would it be exciting to work for? What companies work on interesting problems? I actually happened to be on Skyscanner’s page while planning a short trip to Italy. I thought I’d check their Careers page, and since they were hiring graduates, I thought it was a no-brainer and had to apply.

The application process was challenging but exceptionally well-communicated. I enjoyed the coding challenges, which weren’t the usual Leetcode questions but had an interesting travel twist. After passing that stage, I had a call to assess my cultural fit with the company. Throughout the process, the team was incredibly responsive; I always received quick answers to my questions and never felt ghosted. The third and final step was the Discovery Day in Edinburgh. I went in with the right mindset, aiming to be my best self, and found the experience highly engaging. Collaborating with other candidates on real problems made me feel both relaxed and excited about the opportunity.

I was accepted into the 2022 cohort in my city of choice, Edinburgh. Despite holding other offers, including one from a large hotel search company, I chose to go with Skyscanner. What truly convinced me was attending the Discovery Day. I never felt the same way during my internships at university; the atmosphere at Skyscanner is unique. Everyone acts as an owner, people are relaxed yet reliable and high-performing, trust is abundant, and the product is genuinely impressive.

I joined 20 other individuals spread across three locations. I really appreciated how diverse the group was, consisting of people from all walks of life, including recent graduates, career changers, and individuals from multiple countries.

Exploring Diverse Roles

One of the coolest things about Skyscanner’s program is the rotation system. We got to dive into four different squads, each with its own vibe and focus.

Michal: Oh dear, where do I even start? In short, I tackled CI/CD projects, built new flight components as an iOS engineer, and revamped the Profile page and Price Alerts services as a front-end engineer. During the rotations, I became an advocate for accessibility. I absolutely enjoyed learning how to ensure that everything we create can be used by everyone. I feel like this is quite overlooked at universities, but it’s absolutely core to an inclusive user experience.

Apart from hard skills, I improved my soft skills the most. I was already somewhat confident and decent at pitching my ideas, but thanks to the diversity of the teams, I learned how to work on a wide range of projects with an equally diverse group of people.

Some of my teams were split across different offices, which wasn’t always the most accommodating. Having your team in your office is great; you always have someone to talk to and bounce ideas off. When you’re an ‘island’, it can be hard to progress quickly and can feel a bit alienating. On the flip side, I loved the trips to London, where I got to see my team and socialize with them. I was very lucky during my graduate scheme, as I got to visit all the UK offices, as well as the offices in Barcelona and Shenzhen.

Gergana: I worked in various teams and disciplines as well. I have to be honest and say that not every area and team was the place I felt like I was thriving and not everything felt right for me. However, they all gave me a lot of learnings. From some teams, I learnt the difficulty and importance of being able to work with various personality types, people with different energy levels, life views and approaches to us. From others, I learnt how to write clear, reusable and functional JavaScript (ReactNative) code, how to not be afraid of Xcode and Mobile development, how to seamlessly facilitate meetings, and how to ask questions without overthinking. The most important skill all these rotations gave me was confidence — confidence that even if I don’t know everything, I can figure it out, I can be resilient and overcome whatever is thrown at me. This was an absolute game changer for the way I approach my work and I feel a lot happier and motivated now.

Skyscanner really trusts its grads, letting us work on the same projects as senior developers and encouraging us to pitch and pursue our own ideas.

Discovering New Passions

Gergana: The essence of the whole program for me was about discovering new passions, as well as pain points — things I became sure I do not find as energizing, interesting and motivating. From my software engineering experience in university, however small it was, I had convinced myself that I do not like front-end engineering because I find it too frustrating and distracting. As luck usually does to some of us, I happened to start off the program with a front-end rotation which, to be completely blunt, was quite a bummer at the beginning. It turned out, however, that it could be very engaging and involves a lot of logical and analytical thinking, and quite challenging and brain-twisting to learn JavaScript and to understand how React hooks work. This experience definitely challenged and disproved my beliefs and I am glad I had it.

Similar situation happened for me with mobile engineering, however this time, I took my lesson from before and I went into the rotation with a lot more positive thoughts and motivation. I still believed that this is not something I want to pursue as a career but I was curious to learn and sponge the knowledge out of the clever people around me. I must say this was one of my best rotations, I had a lot of fun and I am absolutely thrilled I got to experience it.

One main passion stayed constant throughout the program for me, and even got rediscovered on several occasions — the love for Machine Learning and Data Science. With the help of all the supportive people around me, I managed to get closer to different teams working in this area and I was more and more certain that this is the career I want to strive for. This all lead me to the luckiest moment when I saw a Slack post that a few data science teams have open positions and are looking to interview internally. I did not think twice!

Michal: At university I was an all-rounder. I wasn’t sure what I liked and so I would start each rotation without any preconceptions. I enjoyed working on back-end services and building APIs, but I found working on traveller-facing products more interesting, as there were more moving pieces and the work was more rewarding. Apart from building the logic, it’s important to know how to improve the product’s performance, localize it, add observability, and ensure inclusive designs. This was definitely true for my front-end rotation working on the Profile page, but, to my surprise, it also applied to my mobile rotation. I was surprised how ‘in the middle’ it is, combining building the interfaces with working with loads of data and powering the screens with it. By the time it was time to leave the squads, I knew I could see myself working with pixel-perfect designs and building experiences for travellers.

Just as I was finishing my mobile rotation, I discovered roles I had never considered before — Product Owner. Thrown into the deep end, I thrived with tons of support. I started analyzing how travellers interacted with our booking page and spotted a potential improvement. I rallied our squad and a designer, built the new experience together and launched an experiment. We split the traffic between the original and a modified page to see how the new version would perform. After just a week, my tweak brought an estimated £1.3 million per year increase in flight revenue. Each rotation was a blast, but getting to play the role of PO in such a dynamic company? That was a game-changer.

Final Thoughts

Our cohort recently “graduated.”

Gergana: I can very happily say now that I am part of one of the most exciting user-facing data science teams as a Data Scientist myself and I am quite proud that the graduate program gave me the confidence to go after my dreams, and grateful for everyone I got to meet, interact with and learn from along the way.

Michal: I decided to join a front-end team to gain more experience in traveller-facing space before switching to a product position in the future. I joined a team responsible for adding new features to the search page, which is visited by over 100 million travellers each month. It’s definitely a high-impact space but I am up for a challenge!

So, if you’re in your final year and still figuring things out, don’t stress. Opportunities like Skyscanner’s grad program might be the perfect launchpad for your career. Here are a few tips we picked up along the way:

Take Your Time: The culture in the company is unparalleled. Everyone’s on your side, so make sure you get the fundamentals right.
Lean on Others: Your colleagues and mentors are there to support you. Don’t be afraid to ask for help.
Be Proactive: Take initiative and don’t wait for work to be handed to you. If you see an opportunity to make a difference, go for it.
Acknowledge Imposter Syndrome: It’s real, and it’s okay to feel it. Remember, you’re not alone, and everyone starts somewhere.
Don’t stress and overthink: It’s easily said than done, we know! But this is your opportunity now to enjoy every step of the way without too much pressure and with a lot of time in your hands.

Take the leap — you never know where it might lead!

WISE: Skyscanner’s Bayesian AB experimentation library and decision engine

Skyscanner Engineering — Thu, 25 Apr 2024 06:22:08 GMT

By Dhanush Kishore, with Jose Parreño, May Alexander, Robert Shepherd, and the Skyscanner experimentation squad

At Skyscanner, product decisions are driven by experimentation. As a global leader in travel, with 110 million users every month, we rely on a data-driven approach to growing and improving our product. Product managers, engineers, and data scientists run hundreds of experiments on our in-house AB experimentation platform, Dr Jekyll, to help make decisions about what features to ship for travelers and partners.

When a team develops a new feature, the next step is to run a randomized controlled experiment. Every eligible user is randomly assigned either to the test variant B, where they interact with the new feature, or to the control variant A. After running the experiment for a certain duration, we test key metrics in variant B against variant A to determine whether the feature will be beneficial if rolled out to all users. This ensures that we follow an objective, scientifically rigorous approach to making product decisions.

Although we already had a strong experimentation culture and an excellent experimentation platform, we identified the following needs as we scaled our experimentation program:

Reduce manual effort in analyzing experiment results: While our experimentation platform, Dr Jekyll, reported test results for various metrics in an experiment, our data analytics team had to spend a considerable amount of time interpreting the results and making a decision on what variant to ship, which was becoming a bottleneck to scaling our experimentation program
Need for standardization: Different teams at Skyscanner used different approaches to AB experimentation, and inconsistencies in methodologies meant it was difficult to compare results and share learnings across teams
Consistency in adherence to experimentation best practices: While many teams maintained high standards of statistical rigor, others required additional support in designing and interpreting experiments

To solve these issues, the experimentation squad at Skyscanner built WISE.

What is WISE?

‘WISE Is a Statistics Engine’

WISE is Skyscanner’s centralized Bayesian AB experimentation Python library, that now powers our experimentation platform. It is also extensively used by data scientists and analysts in the company to perform custom analysis in notebooks.

The fundamental goal of WISE is to analyze an experiment and decide what variant to ship. To do so, WISE leverages Bayesian statistics to evaluate test variants against the control variant in terms of pre-registered primary and guardrail metrics. Based on the performance of each variant in terms of these metrics, WISE uses a decision logic to identify a recommended variant to launch, once it determines that sufficient data has been collected to make this decision confidently.

Key features of WISE

Turns Dr Jekyll into a decision-making platform, by providing the experimenter with a clear and transparent decision regarding which variant to launch
Eliminates the need for manual analysis of experimentation results, so we can ship faster
Enforces experimentation best practices and statistical rigor, and standardizes the approach used for experimentation and decision-making across all teams at Skyscanner
Centralizes various experimentation tools, such as sample ratio mismatch check, and estimating the long-term revenue impact of an experimented feature

An example

Let us consider an example experiment to understand how WISE works.

As you may already know, Skyscanner is a meta-search engine that helps users find and compare flight, hotel, and car hire options, searching through various airlines and online travel agencies to find the best deals. Skyscanner does not charge users any commission, and one way it makes money is from charging the travel agency or airline a small referral fee. However, users are sometimes concerned that the prices might include a commission to Skyscanner.

To address this concern, the team at Skyscanner believed that showing a message to the user clearly stating that Skyscanner does not charge any commission, could help build confidence with the user. To validate this hypothesis, they decided to run an experiment:

Variant A would display the list of available airlines and travel agencies to buy a ticket from as usual
Variant B would show a pop-up, mentioning that ‘Skyscanner never takes a cut’
Variant C would show a pop-up, with a similar message phrased differently, that the prices are directly from our partners, with no added fees

Figure 1: Screenshots from an example experiment

For our example, let us suppose that the metrics of interest for the above experiment are:

Primary metric: percentage of users who visited an airline or travel agency website. We refer to this as ‘redirector rate’
Guardrail metric: percentage of users who booked a flight
Monitoring metric: mean revenue per user

Primary metrics are metrics that an experiment is aiming to improve, and guardrail metrics are metrics we do not want to be hurting while improving the primary metric. Monitoring metrics are additional metrics that an experimenter may want to keep an eye on in the experiment.

An experiment can have only one primary metric but can have multiple guardrail and monitoring metrics.

Suppose the experiment was launched and ran for a week, with each variant having 1 million users each. Below is how we would use WISE to analyze the experiment (note that all numbers are placeholders):

from WISE import WISE

experimentation_data = {
    "variant_names": ["A", "B", "C"],
    "metrics": [
        {
            "metric_name": "redirector rate",
            "metric_category": "primary",
            "model": "beta_binomial",
            "priors": {"success_rate": 0.5, "confidence": 0.01},
            "aggregated_observations": {
                "total_counts": {"A": 1000000, "B": 1000000, "C": 1000000},
                "success_counts": {"A": 200000, "B": 205000, "C": 204000},  
            }
        },
        {
            "metric_name": "book rate",
            "metric_category": "guardrail",
            "model": "beta_binomial",
            "priors": {"success_rate": 0.5, "confidence": 0.01},
            "aggregated_observations": {
                "total_counts": {"A": 1000000, "B": 1000000, "C": 1000000},
                "success_counts": {"A": 30000, "B": 30100, "C": 28000},  
            }
        },
        {
            "metric_name": "revenue per user",
            "metric_category": "monitoring",
            "model": "hurdle_gamma_exponential",
            "aggregated_observations": {
                "total_counts": {"A": 1000000, "B": 1000000, "C": 1000000},
                "total_sums": {"A": 400000, "B": 400100, "C": 400200},
                "success_counts": {"A": 200000,"B": 201000,"C": 202000}
            }
        }
    ]
}

wise = WISE(experimentation_data=experimentation_data)
wise.run_analysis()

As you observe above, WISE expects input as a dictionary. The dictionary contains information about the variant names, the list of metrics the experimenter cares about, categorized as primary, guardrail, or monitoring, the Bayesian model to evaluate each metric, and aggregated values of the data collected so far. Notice also that one can pass priors for each metric.

Once instantiated, you can call the following method to display an experimentation scorecard that has test results for each metric:

wise.display_scorecard(html_version = True)

Running the above code in a notebook would display a scorecard table as below:

Figure 2: WISE’s scorecard shows the expected value and 90% credible interval for each metric, the relative lift over control, and test results for each variant

WISE also provides the experimenter with a decision on the recommended winning variant to ship. We can get the recommended variant to ship by running:

decision = wise.decision(verbose=True)
print(decision)

Running the above would print the following:

Figure 3: A WISE decision

In the above example, WISE recommends shipping variant B. This is because, as you can see in the scorecard in figure 2, while both B and C beat control in the primary metric, C was found to be worse than control in the guardrail metric, while on the other hand, B did not hurt the guardrail metric.

Monitoring metrics are not taken into account while making the final recommendation.

One can also visualize the distributions for the estimated values of each metric and relative lift by calling the method:

wise.visualise()

The above would print the following plots:

Figure 4: Plots of posterior probability distributions from WISE

The above plots allow the experimenter to visually study the posterior probability distributions of estimated values and relative lift.

When an experiment is launched and we begin collecting data, WISE can be used to evaluate the experiment. If sufficient data has not yet been collected, WISE’s results will say ‘keep testing’. The experiment can be stopped when sufficient data has been collected and WISE is able to provide a recommendation on the variant to ship.

In the next sections, we deep dive into how WISE works under the hood.

How WISE estimates values of metrics

An AB experiment typically runs for a short duration, during which, a sample of users are allocated to each variant. Based on the measurements of metrics within these samples, we aim to make inferences about the broader population parameters. To do so, WISE uses the Bayesian approach. We model our belief of the underlying parameter values as probability distributions, which we update as we observe new data from the experiment.

Building a product involves making decisions under uncertainty. Being Bayesian allows us to quantify and communicate the uncertainty in our estimates to business stakeholders by reporting credible intervals, that are more intuitive than confidence intervals.

At present, WISE implements the following Bayesian models to estimate the value of parameters:

For conversion metrics, the beta-binomial model
For revenue per user metrics, a hurdle gamma-exponential model

beta-binomial model

Let us suppose we run an experiment, with three variants A, B, and C, and we are interested in estimating the conversion rate of each variant.

For a given variant, let the probability of a user converting be θ

Suppose we were to run the experiment for a few days, exposing N users to the variant. Let Y represent the count of converted users in the variant. Statistically, Y follows a Binomial distribution, expressed as

Y | θ ∼ Binomial(N, θ)

We choose to model the conversion rate θ ∈ [0, 1] as a Beta distribution

θ ∼ Beta(α, β)

The Beta distribution is a convenient prior for θ because it’s the conjugate prior to the Binomial likelihood, which means the posterior distribution will also be a Beta distribution (source).

Thus, if we ran an experiment exposing a variant to N = n users, of which Y = y users were observed to convert, we can calculate the updated posterior distribution using Bayes’ rule as

θ | ( Y = y) ∼ Beta(α + y , β + n — y)

WISE allows the experimenter to set prior values for α and β based on past knowledge, or it will use uninformative priors of α = 1 and β = 1 by default.

A Bayesian views probabilities as degrees of belief, which are updated based on new data. The probability distribution of θ represents our belief of what the conversion rate of a variant is. As the experiment progresses, we use the above formula to obtain the posterior distribution of θ separately in each variant, thereby updating our belief about the true conversion rates in variants A, B, and C.

Figure 5: As the experiment progresses, we update the beta distribution that represents our belief of the conversion rate. Notice that as we collect more data, the distributions get narrower, reflecting a reduction in uncertainty

The revenue per user model (hurdle gamma-exponential model)

For modeling revenue per user, we use the approach outlined by Chris Stucchio.

Revenue per user data at Skyscanner is characterized by the following:

It is zero-inflated, as not every user will exit our site to book, and produce referral revenue in that way
Of the non-zero values, the revenue per user distribution is highly right-skewed

To model the data generation process of revenue per user, we set up the below Bayesian model:

Let αᵢ ∈ {0,1} be a Bernoulli random variable that refers to whether a user i converts to produce revenue or not, with conversion probability θ

αᵢ ∼ Bernoulli(θ)

Let rᵢ ∈ (0, ∞) be an exponential random variable that represents the value of the revenue if the user does produce revenue, with rate parameter 𝜆:

rᵢ ∼ Expon(𝜆)

Under this model, the revenue generated by each user i is a random variable vᵢ modeled as:

vᵢ ∼ αᵢ * rᵢ

We assume a beta prior for the conversion probability θ ∈ [0, 1]

θ ∼ Beta(α₁, β₁)

and a gamma prior for the rate parameter 𝜆 ∈ (0, ∞)

𝜆 ∼ 𝐺𝑎𝑚𝑚𝑎(α₂, β₂)

The beta distribution is the conjugate prior to the Bernoulli distribution, and the gamma distribution is a conjugate prior to the exponential distribution.

Suppose a variant was exposed to n users, and c were observed to convert(generated non-zero revenue). Suppose the observed mean revenue per converted user is s

It can be shown that posterior distributions are given by:

θ ∼ Beta(α₁ + c, β₁ + n − c) (source)

𝜆 ∼ 𝐺𝑎𝑚𝑚𝑎(α₂ + c, β₂ + cs) (source)

As the expected value (mean) of an exponential distribution is the inverse of its rate parameter, we can estimate mean revenue per converted user as follows:

mean revenue per converted user ∼ 1 / 𝜆

Further, we can estimate the overall mean revenue per user µ (including both users who did and did not convert) as follows:

µ ∼ θ / 𝜆

µ is a probability distribution that represents our belief about the mean revenue per user. As the experiment progresses, we update the parameters of the distributions of θ and 𝜆, and subsequently, µ for each variant, to obtain its posterior distribution. This represents our updated belief about the true mean revenue per user in that variant. We do this for each of the three variants A, B, and C separately.

How WISE evaluates variants by each metric

As you see above, we use Bayesian models to estimate the posterior distribution of metrics for each variant.

For each metric, we estimate the posterior distribution of the relative lift of each variant over control as follows:

Draw a random sample a₁ from the posterior distribution of control variant A
Draw a random sample b₁ from the posterior distribution of control variant B
Calculate relative lift l₁= ( b₁- a₁) / a₁
Repeat the above 3 steps thousands of times, generating samples l₁, l₂, l₃ … , which represent samples from the estimated posterior distribution of relative lift of B over A

WISE uses Numpy arrays to parallelize the above operations. WISE then evaluates the 90% highest density interval (HDI) of the relative lift to provide a result. In simple terms, the 90% HDI is the range of values in a probability distribution that encompass 90% of possible values.

Before providing a result, we need to ensure that sufficient data has been collected to make a decision confidently. To do so, we used a custom stopping rule.

As we run an experiment and collect more data, the uncertainty in the value of the relative lift reduces, and its posterior distribution becomes narrower. Consequently, the width of the 90% HDI, measured as (upper HDI bound — lower HDI bound) reduces.

Through Monte Carlo simulations of thousands of experiments, we determined a set of bespoke target 90% HDI widths to serve as our stopping criterion, which ensured an appropriate balance between the duration of experiments and the correct decision rate.

We arrived at this approach after various iterations and evaluating various other Bayesian stopping rules and found that this worked best for our case. When the 90% HDI of the relative lift achieves a width narrower than the target HDI width, WISE provides a result.

WISE provides the following possible results for each test:

Beats control: If the 90% HDI of relative lift is completely above 0
Worse than control: If the 90% HDI of relative lift is completely below 0
No conclusive difference: If the 90% HDI of relative lift contains 0
Keep testing: If the width of the 90% HDI of relative lift has not reached the target width yet

Figure 6: Tests are evaluated based on the posterior distribution of the relative lift over control

As WISE uses the Bayesian approach for parameter estimation, we can provide clear, intuitive results for each test, unlike in frequentist null hypothesis significance testing, which would yield p values, that require more careful interpretation.

How WISE makes a decision on what variant to ship

To identify the recommended variant to ship, WISE first categorizes each test variant as either a winning or a losing variant.

Winning variant: A variant is considered a winning variant if:

It ‘beats control’ in the primary metric
It is either evaluated as ‘beats control’ or ‘no conclusive difference’ in all guardrail metrics

Losing variant: A variant is considered a losing variant if:

It is found to be ‘worse than control’, or ‘No conclusive difference’ in the primary metric
And/or it is found to be ‘worse than control’ in any guardrail metric

For a variant to be considered a winning variant, it must not only beat control in the primary metric, but also not be worse than control in any guardrail metric. This is crucial because Skyscanner is a complex two sided marketplace, with verticals such as flights, hotels, and car hire, and growth in one area of the product must not come at the cost of another.

As the experiment progresses, and tests meet the stopping rule criteria and get evaluated, variants are added to either of the above two categories.

Recommending a variant to ship

If there are multiple winning variants, the recommended variant to ship is the winning variant that has the highest expected relative lift in the primary metric
If there is only one winning variant, then that variant is considered the recommended variant to ship
If there are no winning variants, the control variant A is recommended
If there are variants that have not yet been categorized as winning or losing because there are metrics for whom the stopping rule has not yet been met, then the recommendation is to ‘keep testing’

By providing an automated decision on the recommended variant to ship, along with a clear scorecard that makes the reason behind the decision transparent, we have been able to reduce the manual effort needed to analyze experiments and unblock product managers to make decisions on shipping features faster.

From an experimentation platform to a decision making platform

Dr Jekyll, our experimentation platform, was revamped to be powered by WISE. When an experiment is set up on Dr Jekyll, experimenters pre-register their primary and guardrail metrics, and WISE is then used in the back-end to evaluate the experiment and recommend a variant to launch.

Figure 7: A screenshot of an experiment’s results page on Dr Jekyll. Notice that there is a clear decision on the recommended variant to launch provided to the experimenter

By requiring metrics to be categorized as primary, guardrail, or monitoring, we ensure that the goals and success criteria of every experiment are clearly defined, and prevent cherry-picking of results. It also nudges experimenters to think more deeply about the experiment design and the product features they are testing.

WISE has seen widespread adoption across teams at Skyscanner. Since we use the same library to power our experimentation platform and for custom analysis in notebooks, it has helped standardize experiment analysis and product decision-making across the company.

WISE also implements additional tools, such as the methods wise.balanced_allocation_check() that checks for sample ratio mismatch, and wise.long_term_incremental_revenue(), which provides a simple extrapolated long-term revenue impact estimate. Both of these have also been incorporated into Dr Jekyll.

Using the Bayesian approach also allows us to monitor and interpret the metrics from an ongoing AB experiment, with reduced risk of issues such as peaking.

If your company is considering transitioning to a Bayesian approach for AB experimentation, it’s important to note that while there are many benefits, there are challenges as well. There is a lack of standardization in Bayesian methodologies, requiring time investment to evaluate various approaches for stopping and decision rules. Additionally, despite Bayesian methods being hailed as more interpretable, confusion can arise due to stakeholders’ reliance on frequentist terminology, requiring efforts to educate and ensure clear communication. Moreover, accommodating new metrics can be time-consuming, as custom Bayesian models tailored to each metric’s data generation process are often required, as compared to traditional frequentist methods where standard techniques can be applied.

Acknowledgments

WISE was conceived, designed, and built by Dhanush Kishore, under the guidance of Jose Parreño. It was further refined and improved by May Alexander and Robert Shepherd. We thank Joan Freire, who led the effort to revamp Dr Jekyll to be powered by WISE, along with Emile Gill, Zoltan Bacskai, Annalisa Magrì, Tiago Salema, Akash Rajput, Lynne Wallace and Nitish Pandey. We would also like to thank Jack Crowson for his work of refactoring WISE to bring it up to engineering production standards. We thank our product manager Hristina Racheva for her efforts in driving a widespread adoption of the revamped Dr Jekyll.

Externally, we are grateful to Chris Stucchio, on whose work our revenue per user model is built, and to John Kruschke for all his work in Bayesian inference, which we heavily relied upon.

What Happened?

Skyscanner Engineering — Tue, 21 Nov 2023 11:36:51 GMT

How recursion brought down flight search at Skyscanner

On 14th September, 2023 8:05 AM UTC (all timestamps in UTC) there was a critical bug in the output of our geo data pipeline which resulted in a number of geo locations being set as parents of themselves. This caused disruption to our service, and for this we’re sorry. It gave us the opportunity to evaluate what went wrong, what we learned, and ways we could prevent a situation like this from happening again.

Let’s take a deep dive and explain things further.

What is Geo Data?

Geo data is a key dataset at Skyscanner which is used to provide systems, industry partners and travellers with a complete and accurate representation of the world. In simpler terms, any time you see an Airport, City, Region or Country used in Skyscanner, it’s originating from this dataset. The most visible example across our offering can be seen when searching for flights where you will specify an airport, city or country for your journey.

We use the geo data to populate origins/destinations and look for flights

What was the issue?

Skyscanner has been on a journey to upgrade our geo data set. At this time there are two versions, two geo models, running in parallel. Flights generally need to know about airports, cities, countries, contrary to other parts of the business where we need to model more complex relations such as districts, countries, islands, etc. Those relationships form a complex graph where locations are related to each others as parents and children.

For this reason we kept our original “heritage” data set, merged it with our canonical dataset, our source of truth. We then basically generate (or reconstruct) our heritage dataset from the canonical data every day at 8am UTC. This generation is referred to as materialisation.

The Materialisation Process

On 14th September, a bug in the materialisation process updated some locations to be the parent of themselves. For example Scotland is now parent of Scotland. We’ve created a loop in the geo hierarchy.

You can immediately see how this is a problem when our libraries and systems expect some structure in the parent/children relationships.

How did this impact flight search?

The flight stack (our critical systems involved in flight search) loads this geo data and uses it for every search coming in from travellers. Because we serve lots of travellers, we scale the stack to multiple Kubernetes clusters in multiple regions in the world, totalling several hundreds of pods throughout the world.

Anytime a search involving the affected entity came in, a thread was stuck in an infinite loop searching for the parent entity. This happened in all the pods with corrupted dataset.

Every request with affected entity stole one thread from the service

Every stuck thread made the CPU work harder with memory also not being released by threads and as a result became unavailable. This resulted in CPU throttling, which in turn caused our autoscaler to provision more pods to cope up with new requests.

CPU Throttling

To make matters worse, an upstream service as part of a warmup request, was using an affected location giving our systems no chance to even start up and serve requests successfully.

All of this created the perfect storm, and the strain was felt throughout all our systems which were fighting for resources or depending on one of the affected systems as well.

What was the impact?

Our flight search was degraded for travellers. The error profile looked like this. In total, we served over one million errors to travellers searching for flights.

How did we fix it?

Once the data issue was identified, we were able to output the correct data after re-running the pipeline. The key thing here is the long time it took us to diagnose the root cause. Let’s talk about that.

It took us a long time to diagnose the root cause …

Bias
We have a bias towards performance issues which led us to a diverse set of investigations away from the cause.
A particular service was a source of previous incidents and all our efforts were concentrated on this service through the addition of more computing resources to alleviate the pressure.

Telemetry

We relied heavily on existing telemetry (metrics, traces, logs) which is the right thing to do but …
… it can only get you so far when things don’t crash. We ended up attaching a JVM profiler to obtain thread dumps and finally identify the infinite loop!

What did we learn?

Treat data changes like code changes: We did not immediately audit all the data changes which might have affected production. This should have been included in our analysis at step 1.
Beware of recency bias: Our recent incidents originated from a single service. Our attention immediately focused on recovering this service also in distress.
Practice running large incidents frequently: As we’ve matured our platform and services have become more resilient. This reliability resulted in fewer, less impactful incidents. While our engineers do frequently run their own wargames their scope is often limited by experience and imagination.
Look at all the signals: Thinking about mitigation is correct but don’t increase compute resources without establishing if you’ve experienced a genuine increase in requests.
Recap: For longer-running incidents take regular timeouts to step back and recap on what we’ve learnt.

What it’s really like to head into the virtual office

Skyscanner Engineering — Wed, 09 Aug 2023 12:34:59 GMT

by Anthony Byledbal

In the virtual office — iStock

Note: At Skyscanner, our teams work using a hybrid model. When not in the office, we want optimise our experience and keep close connection with our team members

Video calls are usually meetings. An invitation is sent to participants and, on the scheduled day and time, all join for a limited period of time. However, in my team at Skyscanner, we as a team have decided to take a step further by introducing an all-day-long video call. The operating instructions are short: the first person to start its day, starts the call; the last person to finish, closes the call. This online tool has proven very successful with regards to employee collaboration, training enhancement, boosting morale and overall sense of engagement and positive team spirit. I was not used to working this way, but very quickly I have noticed the positive impact it had on me and my development.

Before changing career from archaeology and moving to IT, I was unfamiliar with different kinds of online tools used in a working environment. I naively interpreted them as a poor management: I thought it was a way to track what me and my colleagues were doing. However, my experience in joining this call proved me wrong. Members of my squad created this online headquarter to connect their colleagues located across different cities around the United Kingdom in hope of improving communication, morale and collaboration within the team.

A Shared Working Environment

The call brings colleagues together. It is our virtual office where we feel less apart from each other. More importantly, this online space facilitates our daily work and knowledge sharing.

It is good to see more experienced colleagues coding. Sharing screens is something our team does all day long, it immerses everyone into the code. From new joiners to senior engineers, everyone values this precious time because it allows us to ask questions and rise concerns. Most of the time, this provides us with an insight into a specific concept or notion. There is always someone happy to answer and share an abundance of knowledge. No matter who asks a question, the entire team is always able to follow the discussions and jump in to add more substance.

It is especially good for mob or pair programming. Working together on a task is the perfect way to challenge each other and find answers to a problem. It gives a chance to less experienced members of the team to follow the investigations and make sure every step is understood. We often alternate who drives throughout the day. The new joiner is reassured to see that even experienced engineers encounter difficulties when writing their code or don’t know every answer to every question. It gives them more confidence when it is their turn to drive and teaches them working methods to solve various issues.

It is a great means of assistance, if anyone has any concerns, help is always available. The method is simple: instantly start sharing screen and thoughts, and colleagues will engage to help you and guide you in a right direction. This is definitely better than talking to a dull rubber duck, the debugging method which consists of explaining your code to this classic bathroom toy. If the main room is already busy, people can use the chat to ask a quick question, and if this leads nowhere, we can jump into a breakout room for a more in-depth chat and contribution. Receipt of an early feedback definitely raises the quality of our work.

Social Time In The Multiverse

The virtual office plays an important social role. It doesn’t need to just be work and sharing industry updates. Therefore, the call expands the work environment to different universes:

The casual universe is represented by chatter and banter time. With people available on the other side of the screen, it is easy to start any conversation. This is even encouraged. It creates a relaxed and friendly atmosphere. It improves morale and reinforces stronger team bond. Movies have proven to be the main topic of interest in our space, especially when it comes to the top of the worst of them. It is such a detailed subject of research that we gather in person every now and then to be able to watch some of them together and discuss the worst parts;
The playful universe is our online social events scene. It is often organised to celebrate successes and different holiday seasons. We play games like “skribbl”, a drawing and guessing game, or “Gartic Phone”, a combination of drawing and telephone game (where the first player whispers something into someone’s ear, and the next person has to pass on the original phrase until the whisper reaches the turn of the original speaker). We have even impersonated characters like funny astronauts in their dangerous mission to find saboteurs on board of their lightspeed travelling spaceship in the “Among Us” game. As the multiverse is infinite, our team is meeting in so many locations and worlds, and has the opportunity to have fun and relax as everyone can play so many different characters.

All these universes help working remotely feel less remote. It makes the more far away colleagues feel more connected to the rest of the team. Even our outside colleagues with whom we work closely with, jump in and off our call just because they feel alone at some point during their day and just want to chat to somebody.

Full Control Over Online Collaboration

Video calls are interactive live tools by nature. They ensure synchronous interaction at all times and encourage everyone to participate in different ways. Engaged speaker can easily invite participants to ask questions or share opinions. Quick reactions are an easy way to raise attention by displaying a comprehensive emoji, like a hand to ask questions or a slow down sign if someone starts feeling pressured by a speaker talking too fast. The chat is another option to write down questions which can be answered at the time or discussed later. It can also be used to share links or more information that people would be interested in, without disturbing the main action on screen.

If recorded, our meetings, ceremonies or demonstrations are rewatchable. They are automatically saved for a few weeks before being deleted, but can easily be moved to a more permanent space if needed. This is very useful for colleagues who are absent from the call for any reason or people who want to review the discussions at a later stage and come back to verify points raised or ask some follow up questions. The video call can essentially become a virtual library, which any member of the team can browse to find details on a discussed topic and refer back to when needed.

Joining the virtual office is also made easier as every one can use any kind of device from laptop to mobile. This isn’t to encourage colleagues to join while they are commuting or running late to one of our ceremonies. This is more about testing the functionality on other devices than computer or laptop. The video call has the option to share a phone screen which is useful to showcase any new feature or new accessibility functionalities on mobile web with the rest of the team and collect feedback.

Therefore, the online controls don’t restrict our interactions to a verbal one; they provide helpful tools which assist us in our work.

Growing Together

The virtual office is used by other teams at Skyscanner. Nonetheless, it still raises attention and questions from colleagues who haven’t tried it yet. The main concern is about being always on the call. But this is a misconception. It is fine to be off the call. Our communication tools are wide enough to be able to reach a colleague working outside our virtual office. It is also fine to simply turn off the camera. Not everyone is feeling great everyday. If so, a funny avatar can easily be created instead.

The immersive call has proven fruitful. It helps our new joiners to break down the barriers and speed up their training. More widely, it increases familiarity with the team and encourages involvement. It leads to a fewer knowledge silos as people collaborate on a task on a daily basis. The number of people contributing to a peculiar task increases and therefore, anyone is able to take over a task if someone is off sick or on an annual leave. The virtual office creates an experience where our team listens and participates. It encourages sharing and growing together for a better career development.

Building systems at scale: how Skyscanner approaches engineering design reviews

Skyscanner Engineering — Mon, 31 Oct 2022 10:04:46 GMT

By Tom Butterwith, Engineering Manager

Designing systems at scale can be a daunting task at the best of times and when you add in the complexity of managing stakeholders, the whole process can get complicated fast.

One of our engineering principles at Skyscanner is that we peer-review every change and we apply this principle to everything from one-line code changes to system design. We accomplish this by conducting a design review: a process centered around a document capturing the what, why, and how of a problem and its proposed solution.

What are design reviews, and why do we use them?

When designing systems, there are a few challenges we try to overcome with our design review process. The first is developing a shared context. In order to provide appropriate feedback, we need to ensure that everyone reading the document understands the “why” and the “what” of the problem. The “why” can be understood as why are we thinking about this in the first place, and why is this work important to the business. The “what” is the problem’s parameters — what issue are you trying to solve with this design, and what are you explicitly not doing. It’s also important to think deeply about who should be involved in the process and who should be informed of the changes. Often the most difficult part of building systems at scale is aligning your stakeholders, and setting the scene up front in a design document is a great way to bring people along in your process.

Given that we have over 80 engineering teams working on a wide variety of services, a large change such as replacing our web framework of choice naturally affects a wide number of people. The team responsible for this conducts a series of design reviews to outline their thoughts and how this will affect the wider business. This gives each of the teams involved a place to contribute to and challenge the proposal while highlighting any assumptions that might not be correct.

The design review process usually involves a single author but a number and variety of readers. As such design review documents are optimised for the reader with clear section headings and summaries for each section highlighted throughout the document. This way each person can pinpoint the information that is most relevant to them and provide feedback on areas that cross their specialism. For example, all design reviews have a section for security considerations and our fantastic in-house security team (including Maria) monitors this and provides guidance across the whole business.

To keep the document as readable as possible we use sub-documents to dig into the detail and provide additional information for those that want it. Keeping the detail separate but close to the main document allows us to cover complex topics without overwhelming the reader with too much information at once.

The second challenge we’re trying to solve is working asynchronously with a distributed team. Skyscanner has engineers split across eight offices and each one has a wealth of talent and expertise. We have designed our process to enable engineers to review the document in their own time, ask questions in the form of comments directly onto the document, and digest it at their own pace. We then follow up with a session scheduled at least a week in advance so the author can walk through some of the larger questions and comments on the document. This session is recorded and the link is embedded directly into the document.

Lastly, our design review documents are intended to be living documents. By this we mean they are kept up to date with changes to the system and act as the first point of call for anyone that wants to learn about a service. We have found that an up-to-date set of design reviews for a team’s services is a great starting point for anyone joining a new team at Skyscanner, providing both a technical overview of a service as well as the context behind its inception.

This has been particularly useful for us during our move to a Cells Based Architecture, which by design was something very new to us. Through a series of design reviews on each part of the system, the teams involved built a large collection of knowledge and context about the system design — something that comes in very handy when things don’t work out as planned.

How’s this working out for us?

Design reviews have been a fantastic tool for us: not only for ensuring we’re building the best solutions available to us at the time, but also for compiling a back-catalogue of system designs and solved problems with all of the context behind them. This library of information is so useful when redesigning parts of our system, troubleshooting an issue, or onboarding new engineers to one of our teams. Each reader can see the reasoning that led up to the current system and understand the constraints the system is working under. We’re currently running around 10 design reviews a month, and a quick search of our Confluence shows a count of about 1350 design review documents — that’s a huge repository of knowledge and a history of Skyscanner.

Recently we’ve started to apply this process to how we think about the data and metrics our systems produce. Rather than emitting as much information as possible and trying to make sense of it once the system is live, we’re trying to be more holistic in our approach and apply the same rigour that we apply to system designs. Although we’ve just started this approach, we’re starting to see the benefits by having clearly mapped-out data sets from which we can calculate KPIs consistently.

As a learning org, we’d love to hear how you tackle design reviews in your engineering teams — let us know in the comments.

About the author

Tom Butterwith is an Engineering Manager at Skyscanner. He’s been at Skyscanner for seven years, originally starting as a graduate in 2015. He’s worked all over the business including in sustainability, payments and data. Tom currently leads one of our web teams looking after the Skyscanner Homepage.

Setting your engineers up for success: how Skyscanner created greater clarity in our competencies…

Skyscanner Engineering — Thu, 22 Sep 2022 12:03:23 GMT

Setting your engineers up for success: how Skyscanner created greater clarity in our competencies and pathways to progression

By Emanuel Müller Ramos, Skyscanner Engineering Director

For many companies, this time of year is often appraisal or performance review season. Feedback is really important as we each seek to learn, progress and improve generally. However, making sure everyone is rated fairly and consistently can often be tricky if companies don’t have robust and clear frameworks in place. This is something we’ve worked hard to ensure we address here at Skyscanner, and today we’d like to share our journey of making competency frameworks, and performance expectations, really clear within our Engineering org.

Supporting our engineers’ growth is one of the most important responsibilities of our Engineering Managers. Part of this includes helping engineers understand the expectations of their role and level, while also having clarity as to what the next stage of someone’s career might look like. To do so, Skyscanner has a competency framework outlining skills and behaviours for each level, from entry level roles all the way to CEO.

However, while the competency framework was regularly shared and discussed, a few years ago we noticed that we’d occasionally find gaps in understanding. For instance:

Some Engineering Managers having a gut feel that a engineer had a gap in their performance, but couldn’t relate that to the competency framework expectations
Some Engineering Managers feeling uncomfortable to deliver feedback because they were not sure if they were being over-demanding or if their ask was within the expectations of that level
Some Engineers feeling frustrated because they were not fully sure how a gap highlighted by their manager connected to the expectations in the competency framework

While cases like this were isolated, leaving it unchecked could have consequences on how we set our engineers up for success in their current role and in their career progression. In response, we decided to make the competency framework simpler to understand and created dedicated training around it.

Making the competency framework second nature for everyone

In the tech industry, most progression frameworks are organised in competency pillars. Example of pillars commonly seen in the industry are:

Technical skills
Delivery and impact
Culture and behaviours
Scope of the work
etc

Normally, the expectations of these pillars are organised by level in a isolated manner:

Competency frameworks normally present the expectations for each role individually, one after the other. But this results in difficulties in identifying a difference in delivery expectation between levels — something that’s helpful to know when we think about personal progression

While in many ways presenting the information in this manner is valuable, it brings a few challenges. Indeed, as the description of each level can take one or multiple pages to be described, the connection between what changes between the competency pillars, and levels, might get lost. For instance, to clearly understand what changes in terms of the expectations around Technical Skills between a Graduate Engineer and a Senior Engineer, one needs to jump between multiple pages and carefully read their descriptions. When engineers rightly want to understand their progression path to the next level and beyond, that’s not the most helpful way to take this information in. We knew there had to be a easier way to do this.

Easy comparison

As engineers, looking for differences between pieces of code is part of our daily routine. For instance, when reviewing a Pull Request in GIT, we’re often looking at what changed from the previous change-set to the new one. Wouldn’t be great if we could do the same with the competency framework? This is exactly what we did!

We’ve built a tool which we call the ‘Levels Visualisation Guide’, which is basically a pivot of our competency framework, comparing and highlighting the differences in the competency pillars for each level.

In the same way Github allows you to compare two files side by side (source), the same was done with the competency framework

Let’s have a concrete look on how this works considering the Scope competency pillar at Skyscanner:

Example of side-by-side comparison of the Scope competency pillar. Same colours indicates the expectation is the same between two levels, while different colours highlight differences and unique expectations. The descriptions in the table give further clarification and examples of what the competency framework description means in practice.

Training all the engineering management community

After the Levels and Visualisations Guide was built, the next step was to make sure the message landed and all Engineering Managers were equipped to use it. Live training has been devised where the competency framework is presented using this format:

Firstly, one of the Competency Pillars is presented and the differences between each level outlined (e.g. Scope)
Then, the audience is split in breakout groups where an example is presented to them. Using the Levels Visualisation Guide, they need to assess the performance of the engineer for this very particular example
After a few minutes, the audience regroups and shares their findings. Usually, the exercises include some pitfalls, which are revealed to the audience
The same is repeated with the other competency pillars

Here is a concrete exercise:

Example of a practical exercise. On the left, a hypothetical scenario description for a given engineer. In the middle, the expectations for that level (Software Engineer) considering two Competency Pillars (Scope and Expertise). On the right, a list of common mistakes that engineers and engineering managers fall into when assessing performance.

Results and next steps

Since this initiative was piloted in 2021, training has rolled out to all Engineering Managers. The feedback and benefits outlined are very positive:

During our regular Performance Calibration process, we observed Engineering Managers are significantly more confident on their assessments, using the Levels Visualisation Guide as an additional tool to help understand performance
Feedback collected after training indicates an improvement of the understanding of the competency framework by 34%

This feedback also helped to surface areas of confusion and clarity in our Competency Framework, which has allowed us to iterate and improve training and tools again.

Now, we hope to expand this initiative even further, for the whole engineering community in Skyscanner, whether or not individuals are managers. This means that everyone will have a deeper understanding of roles, expectations and progression pathways. A pilot has been delivered to do so, and training is being made accessible offline also, so engineers can try it out at their own pace in case they can’t make the live session.

I like the idea! How can I do the same in my company?

As any agile project, consider starting small and iterating:

Take a Competency Pillar which is very clear and known in your company. For instance, Expertise/Skills is normally less ambiguous than something such as Impact/Delivery
Attempt to build a side by side comparison of each level for this pillar. In the beginning focus in levels up to Senior Engineer, leaving other levels such as Principal/Staff engineer for later
Present this comparison to a pilot group to collect early feedback and buy-in
As traction is gained, add more competency pillars, add the remaining levels and start expanding to other tribes

Let me know in the comments how this went, I would love to hear your feedback!

Emanuel Müller Ramos (aka Manu) is an Engineering Director at Skyscanner. Manu has been with us five years and leads the Flights Booking Tribe. He spends his time focussing on growing engineering teams and helping the engineering discipline to adopt a metrics-driven culture.

My Career Pivot: From IT recruiter to information security

Skyscanner Engineering — Fri, 24 Jun 2022 08:08:19 GMT

Senior Security Engineer Maria Sepulveda

As we celebrate International Women in Engineering Day this week, we’re profiling female-identifying engineers across our business. Maria Sepulveda is a Senior Security Engineer at Skyscanner. An expert within information security, Maria led the recertification of Skyscanner’s PCI DSS (Payment Card Industry data Security Standard) compliance. Maria took an unconventional path to information security, starting her career in customer service and IT recruitment. Here, she discusses her journey, imposter syndrome and what her role looks like today.

Maria, as a Senior Security Engineer, what does your role involve?

Well, I only recently joined Skyscanner so a typical day for me today might look different to a typical day in a couple of months time! Having said that, I have already been given responsibility to lead the recertification of our PCI-DSS compliance. It’s great that I can be trusted so early on in my Skyscanner journey. My day typically starts with following up on tasks that are due in the coming week. During the day I meet with people and various teams to understand what they do. I’ll also attend internal events, often to better understand the Skyscanner culture and the way things work here — as well as to make connections. Focus Time in the afternoon allows me to re-read my notes and absorb information I obtained during the day. The day might end with a recap of the day with my team, allowing me to ask questions I haven’t already asked or just generally talk about how the day went.

What was your career journey to this point?

I took an unconventional path into information security. Originally from Australia, I worked in customer service roles and IT recruitment. I arrived in London and found a job working at an in-house recruitment team for an online betting company. I knew that I wanted to move into a more tech-focussed role but still interfacing with the business. My colleague who was recruiting for the security team at the time suggested that I apply for a junior role that had opened. The role was supporting the team who managed compliance activities and security awareness. It was the perfect stepping-stone as I got to shadow the more experienced team members as well as apply my business acumen to the security awareness programme. I also took the opportunity to learn from other members of the security team. I have great memories of my time there and throughout my career journey I have bumped into people from that first job at various industry events. Never underestimate the power of connections early on in your career!

Why do you think you didn’t go into your current field when you first joined the workforce?

I didn’t know how to get into Security or even that it was a field. Networking opportunities didn’t exist at the time and any that did exist were more of a ‘boys club’. It just didn’t feel like an accessible place for me, or for women generally.

And were there any preconceptions about what you do now that held you back?

I suffered hugely from imposter syndrome. In my first role everyone in the Security team had a security-related degree, came from a bank or police/military background, or had worked in the industry for forever. Fortunately, I’m quite stubborn and so I wanted to prove that I could also be successful.

What advice would you give your younger self — or others considering a career change?

Don’t underestimate the skills you learn and the people you meet from any job that you do, no matter how unrelated they might seem to the career you want to move to.

Where you’re considering a change into a male-dominated industry, like engineering, remember that you can bring a fresh perspective. Personally I feel I bring a holistic point of view. I consider the views of different groups and encourage collaboration. This is useful in my role because the same solution can’t always be applied to the same problem in every organisation. We all have our strengths — play to yours.

International Women in Engineering Day: reflections from our VP of Engineering, Myra Fulton

Skyscanner Engineering — Thu, 23 Jun 2022 08:08:17 GMT

Skyscanner VP of Engineering, Myra Fulton

She is Skyscanner’s most senior female engineer, but Myra Fulton’s career as she knows it almost didn’t happen — a last-minute decision to change her university course from hospitality to engineering set her on a completely different path to the one she’d intended. This International Women In Engineering Day, we sat down with Myra to hear her about her journey to the top.

Myra, what lead you to engineering originally — was it always the plan?

Engineering was very much not the plan — I wanted to get into hospitality, and had dreams of running a hotel group. But fate — in the form of an amazing teacher called Mr Haggerty — intervened. He convinced me to take up computer science and basically changed the course of my career. He then did the same for a fellow engineer here at Skyscanner, Mhairi McClair several years later! So thank you, Mr Haggerty.

The way Mr Haggerty taught computer science brought the subject to life — I loved how you could solve problems in really interesting ways, and doing that with technology was exciting. It suited my analytical brain, so I decided to switch my uni course from Hospitality Management to Computer Studies.

What happened next — what was your journey to Skyscanner?

I graduated with a first class honours spent a year’s placement with IBM. After graduating, I got on the grad programme with a bank and spent nine years there. My career started in a number of first and second line support focused roles where I learned the importance of good handover documentation and run-books. I then got the privilege to lead the mortgage development team where I started to learn the other side of software development. It was during my time there that said bank had announced they’d undercharged people for mortgages due to a bug in services. While I wasn’t involved in that bug, it really taught me the fragility of software and the importance of the quality of code, of testing and of the impact that can be made on real people when something goes wrong.

After nine years in the financial sector, I was starting to itch for a new challenge, and happened to spy a job with Skyscanner on LinkedIn. It was a total culture shock. I’d come from a business of 10,000 people. At the time Skyscanner had around 200 members of staff, and the approach couldn’t have been more different. Back in the day, banks, like many other institutions, kind of saw technology as a necessary evil: a cost code the business had to suffer and excuse to shareholders. Coming here, the business was built on technology, the CEO was the company’s first software engineer, and it was front and centre of everything. It was (and still is) super people-centric: travellers and staff are key, in a way that is very different when you’re one of thousands of employees.

I used to think the bank I worked for really matched my values. Don’t get me wrong, it was an incredible place to learn, full of brilliant people. But I quickly started to appreciate that Skyscanner was aligned to my own personal values, and that was something quite powerful. I think that’s a massive driver of the longevity of my career here, from manager all those years ago, to VP now — it’s always felt like home, and I’ve always felt proud of the decisions we’ve made as a business.

What’s been the most rewarding project or piece of work you’ve delivered in your Skyscanner engineering career?

A few come to mind: a major IT project removing friction for our people internally, where we created a virtual and physical tech bar, created Slack channels and linked service tags, and sped up delivery significantly. It was staffed with engineers of all levels, and issues were answered on average within 60 seconds. I was given autonomy to spot a problem and fix it for our people across global offices.

More recently, I was asked to head up a team solving the problem of how we at Skyscanner approach hybrid working. Not just how we approach it technically, but how we do it from a cultural point of view, and how we do it in a way that protects and enhances our flexibility and enjoyment. On a personal level, that project came at a time of flux — my long term manager (Peter) was retiring, the pandemic had taken its toll on many of us, and I was questioning whether I had the energy and enthusiasm to take on something so big. In short, the pandemic fatigue was real. Happily, the more I got stuck into that project, the more I understood that I was exactly where I wanted to be. The way we’ve approach hybrid working and returning to work has reaffirmed a lot of what I love about Skyscanner — the autonomy, the trust, the collaboration and the empathy and care shown to people as individuals.

What piece of advice would you give to your younger self?

Always own your chair. No matter which table you’re at, you’re there for a reason. Own the chair, be comfortable and confident about being in that chair. If someone doesn’t hear you? Repeat yourself, without embarrassment. Make your voice heard (harder to do than say, I know).

I was lucky that our graduate sponsor at the bank was a senior leader in the business. But I know it’s not always the case that people are able to see those who look like them in leadership Female examples in engineering leadership still aren’t common enough. And that’s true all the more so for women of colour. As someone who now has a leadership role in the engineering profession, my advice would always be to have courage, build a strong network of people to support you, champion you and encourage you along. It is hard to be one of the few, but things are changing, and as a female engineer you can build those paths and be that role-model for future women in our profession.

While the tide is changing, there are still far fewer women in engineering than men. What sort of challenges have you come up against that feel unique to being a woman in a historically male profession?

I’ve definitely had a few — it’d be disingenuous to say otherwise.

One instance that probably stands out most clearly for me was when I was sat in an engineering meeting, the only woman at the table. It occurred to me within that meeting that some of these people didn’t know how to interact with me because of my gender. I had a strong sense of not belonging. It became clear to me that if I was going to continue in this team, I was going to have to build allies. I realised that not while everyone was going to interact with me the same way, I had the power to slowly change their perceptions of me: I could instead build allies across the group instead. I started to build personal relationships and professional relationships in order to better understand each person, and with that came mutual respect. It was hard, but it worked.

What books, podcasts or other learning resources do you recommend for women working in tech?

I’m not a big podcast listener, but I have a few books I keep on my desk and come back to frequently:

Daniel Pink’s Drive
Sophie Devonshire’s Lead at Speed
Viv Groskop’s How to Own the Room (there’s also a podcast version)
Spencer Johnson’s Who Moved My Cheese (so simple — I have it on audiobook and listen to it regularly on my commute to refocus)
Richard P. Rumelt’s The Crux — How Leaders become Strategists
Harvard Business Review’s Women at Work has great content

I’d love to hear from anyone reading this as to their favourite networks for women working in the tech sector — please do comment!

My Career Pivot: From Retail to Engineering

Skyscanner Engineering — Wed, 22 Jun 2022 09:59:02 GMT

Louise Reid, Software Engineer

As we celebrate International Women in Engineering Day this week, we’re profiling female-identifying engineers across Skyscanner’s business. Here, Louise Reid tells us how she took the leap from retail to software engineering, what her day to day looks like, her advice for anyone looking to career change, and how the perception of engineering being male-dominated is changing.

Louise, you didn’t set out to be an engineer — what was the path that took you from retail to where you are now?

I didn’t: I have a honours degree in Business and Management. Honestly, going through school, the concept of working as a software developer was so far off my radar. During my degree I worked in hospitality and really enjoyed it, so for a while saw myself working in hospitality management in some shape or form.

After uni I travelled in Australia for a bit and then went back to my role in retail with a large well-known shoe retailer, before moving to head office there and working in system-support for the stores. At 29 I found myself in a job that didn’t excite me and wasn’t really going anywhere. I was really good at my job, but there wasn’t a huge amount much transferable knowledge I could take from it into other roles, as a lot of it was in house-built systems. I was also at the level that any progression was reliant on someone above me leaving or the company expanding in some way. It wasn’t until I started working with developers there that I started to think of software engineering as a path I could take too.

I dipped my toe in the water a little, doing some free online courses with Code Academy, which I really enjoyed. I was then pointed in the direction of CodeClan by my line manager and introduced to their 16 week software development bootcamp. After completing the course I still had a sense of “I don’t know what I want to do” and whilst I got offered a few development roles, none of them really excited me. I ended up staying at CodeClan as a classroom assistant, working up to instructor for just short of two years. In that time I realised my passion was in front end development — having a better idea then of where my interest lay, I took the dive into the industry and after a couple of years finding my feet I ended up joining Skyscanner. My previous roles gave me a good understanding of what sort of environment I was looking for, and something that was really important to me was collaboration. There’s a misconception that software developers sit with their headphones on all the time and don’t talk to each other but that’s not the case. I’d read some blog pieces of what it was like to work for Skyscanner and all of them talked of the great culture. Something else that drew me to the company was their recent partnership with CodeClan. I liked that they were investing the time to help career changers into the industry.

What does your day to day involve now?

Now, I’m a software engineer within Shuttle Squad, working within Search Experience. A typically day might include a morning check in with the team (where we’ll also share our wordle scores that day), then either picking up on work in progress from the previous day or looking at the board for a new ticket. We’ll generally discuss recently completed/blocked/in progress tickets with the whole team at the morning stand up, and the rest of the day is spent in a continuation of sprint work (whether that alone, paired or mobbing). I might also try and catch up on recordings of knowledge sharing sessions that interest me.

The end of the day typically ends with another quick social catch up for anyone who want to take part — a really nice way of bedding into a team and getting to know people, especially in a hybrid working environment.

What stereotype about engineers do you think needs busted?

I think software developers have in the past had a reputation of people who put their headphones on and sit away in a corner not talking to anyone. That’s definitely not true: we do so much collaborative work. The industry moves so fast that you can never stop being open to learn new things or ways of working. And working with others, both junior and senior, helps you do that.

Were there any preconceptions about what you do now that held you back from going in?

Yes — definitely at school this area was seen as for boys’. I think my Higher computing class at school had five or six people in it, and they were all male. I think also at school when you’re trying so hard to be accepted you want to do what the “cool” kids are doing, and computer science wasn’t that. Looking back that is such a warped sense of view, and if I’m a geek now, so be it!

What advice would you give your younger self, or others considering a career change?

You’re never too old to change your career. While working at CodeClan I taught people who were career changers in their 50s. Also, what you pick as your subjects at school are likely to have no impact on what you do as an adult. I can remember thinking when I was 15/16 that the subjects I was picking were going to be so important to how my life was going to shape out. I can safely say they were definitely not. My advice would be pick the subjects you enjoy and are good at, the rest will work itself out.

Engineering is still an industry dominated by men. Is that changing?

It’s still dominated by men, yes, but it’s changing: programmes like CodeClan and Code First Girls, both of who we partner at at Skyscanner, are helping change that though. In my team we have three female developers out of 11 which still seems like a massive difference but when you compare it to the zero females in the Higher computer science class at my year at school it’s a big improvement. Any form of diversity can only be good for teams: different perspectives and ways of thinking will only make engineering teams stronger.

Can you see yourself pivotting again and changing careers in the future? If so, why, and what would you do?

Yes, it’s definitely something I’d consider again. Potentially into engineering management or even into a product-based role. It feels like that is a natural progression path of developers after they have been writing code for x amount of years, and I’m always open to learning and growing in whatever way I can.