StackAdapt Tech Blog - Medium

StackHack: Powering Bottom-Up Innovation

Julia Ren — Tue, 03 Feb 2026 01:20:12 GMT

This article was co-authored by Julia Ren and Max Woghiren.

StackHack

Since 2022, StackAdapt has hosted our annual internal hackathon, StackHack, during our December code freeze. StackHack has become a massive driver for bottom-up innovation and cross-team collaboration, and it has emerged as a marquee event featuring prizes, awards, and swag, and a judging panel including our CEO, Vitaly Pecherskiy, CTO, Yang Han, and Engineering SVP, Tishan Mills.

StackAdapt is a remote-first company, so StackHack makes sure to accommodate all of our employees, but for those near our hub in Toronto, the event culminates in a full-blown closing ceremony and on-site social!

More Than Just a Fun Event

StackHack is more than just a fun event. For one, the hackathon strongly embodies our recently-updated values. For instance, we get a chance to walk in our customer’s shoes, with many of our ideas focused directly on addressing our users’ pain points. We operate like underdogs during StackHack, with humility and ambition. Many of our hackathon teams bring together StackAdapters from a variety of teams…we only win together!

Beyond exemplifying our company’s values, StackHack provides opportunities for StackAdapters to explore and sharpen skills they don’t get to exercise regularly. Everyone has an opportunity to take on leadership and product/project management roles, put together mocks and designs, and fire up an IDE.

StackHack also gives StackAdapters the opportunity to explore emerging technologies. 2025 was another year of AI, and this buzz inspired some of our hackathon themes (which doubled as award categories), including Best Asset Auto Creation and Best GenAI Integration.

Every year, the hackathon inspires a wave of innovation. It shapes our roadmap and empowers StackAdapters across the company to directly and tangibly contribute to the company’s direction. While only selected projects win, it’s always a joy to see how many hacks were approved to be added to our product roadmap for the following year. The momentum is real!

StackHack award ceremony for in-person participants.

Looking Back at StackHack 2024

In 2024, we had 10 projects begin official development, many of which were added to our product and released. Transforming Ad Targeting with AI-Powered Precision is a shipped project that uses Generative AI to supercharge polygon targeting, a type of precise geographic targeting. Here is what the project team (kudos to Aamir, Hassaan, Mavelyn, and Ros!) pitched:

Imagine transforming a request like “Target all independent coffee shops in Toronto’s Entertainment District within 2.5km of Rogers Centre, excluding major chains like Starbucks” into optimized polygon clusters in seconds. Using generative AI and machine learning, we will create a precise and scalable solution tailored to the platform.

Not all hackathon projects need to be complex. Sometimes, huge impact comes from small changes. One of our sharpest engineers, Yuya, noticed the StackAdapt loading spinner was a legacy GIF contributing to ~95% of our index.html file size, and to top it off, this file cannot be cached locally. Yuya took care of this by redesigning the spinner using nothing but beautiful, light-as-a-feather CSS!

A Look at StackHack 2025

Following StackHack 2025, 14 projects were promptly green-lit and added to our 2026 roadmap.

Many of these ideas align with our emphasis on responsibly leveraging AI to better serve our users. One of the award-winning projects, Creative Intelligence, explored the application of AI into the creation and curation of the ads themselves. Incorporating this into our core offering has massive potential to improve our users’ results. Given the fast-moving nature of AI, it is essential to build proofs-of-concepts and swiftly integrate them into our current product offerings.

As with 2024, however, we had plenty of high-impact projects that weren’t necessarily complex. One award-winning hack explored upgrading TypeScript across our code, outlining the difficulty and measuring impact. Perhaps less glamorous than AI, but exploring the nuts and bolts of our coding environments is hugely beneficial, and we’re proud to recognize these kinds of projects in our judging.

StackHack opening ceremony for in-person participants.

Lessons Learned

We’re fortunate that StackAdapt champions bottom-up innovation and advocates for company-wide hackathons. Of course, we didn’t nail everything on the first try. Iteration isn’t just for software; it’s for events too. Here are some valuable lessons we’ve learned over the years.

Broaden the scope

We moved from narrow, platform engineering-specific categories to open-ended themes. This encouraged participants from all departments to find their own problems to solve, leading to more diverse innovation.

Provide access to necessary resources

An engineer is only as good as their tools! We provide temporary elevated AWS access, API keys, and OpenAI tokens upfront, ensuring no one is blocked by permissions. We also learned that cleanup is vital. To avoid any unwelcome billing surprises or security risks after the hackathon, we make sure to deactivate any temporary resources allocated during the event.

Be flexible

To accommodate 30+ teams and remote participants, we introduced the option to pre-record presentations or present live. This kept the last day on schedule and better accommodated for different presentation styles, all while giving everyone a moment in the spotlight!

StackHack presentation day for in-person participants.

Raising the Bar: StackHack 2026 and Beyond

This was StackHack’s fourth year, and it’s been a great opportunity for us to live out our values, try out new roles in the development cycle, explore new tech, and actively contribute to our company roadmap. It has become a cornerstone of development at StackAdapt, and it continues to grow and improve. We’re already excited to raise the bar for a bigger and better next edition!

Ready to turn your passion project into a paid reality? Want to have some productive fun with your fellow coworkers near the holiday season? If you have innovative ideas and the skills to execute them, StackAdapt is the place for you.

Apply today and we’ll see you at StackHack 2026!

StackHack: Powering Bottom-Up Innovation was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

GenAI for Ad Creatives

Abhishek Tanpure — Thu, 16 Jan 2025 18:48:54 GMT

The advertising landscape is undergoing a remarkable transformation, largely driven by advancements in generative AI (GenAI). By automating creative tasks, optimizing performance, and enabling hyper-personalization, GenAI is helping brands produce captivating, cost-effective campaigns at scale. As brands and marketers increasingly adopt these technologies, they are discovering new ways to engage consumers and enhance their creative processes. In this blog, we delve into the latest technical advancements and how they’re revolutionizing video ad production.

Generative AI in Advertising

Generative AI refers to a subset of artificial intelligence that focuses on creating new content. This can include text, images, videos, and more. By leveraging large datasets and sophisticated algorithms, GenAI can generate content that is not only high-quality but also tailored to specific audiences. This capability is particularly valuable in advertising, where personalized and engaging content can significantly impact campaign success.

One of the key benefits of GenAI is its ability to automate the creative process. Traditional ad creation can be time-consuming and resource-intensive, requiring input from multiple creative professionals. GenAI streamlines this process by generating content quickly and efficiently, allowing brands to produce more campaigns in less time. Additionally, GenAI can optimize ad performance by analyzing data and making real-time adjustments to improve engagement and conversion rates.

Diffusion models

Diffusion models are a type of generative model known for their ability to create high-quality visual and video content. These models are particularly useful in video ad production, where realism and visual appeal are crucial. The core idea behind diffusion models is to start with simple, easily generated data — often random noise — and gradually refine it into complex, realistic outputs. This process involves a series of steps that simulate the evolution of data, allowing the model to generate new samples that closely resemble the original training data.

1. Overview of Diffusion Models

Diffusion models are based on a process of iteratively denoising data. This involves:

Forward Diffusion Process: In this step Gaussian noise is progressively added to data (e.g., an image) over multiple steps until the data becomes completely noisy and unrecognizable.
Reverse Diffusion Process: Starting from the noisy data, the model is made to learn to reverse the noise step-by-step and reconstruct the original data distribution. This is done using a neural network trained to predict noise at each timestep.

Compared to traditional generative models like GANs (Generative Adversarial Networks), diffusion models often produce superior image quality and coherence. They are less prone to overfitting due to their structured approach to generating data. This makes them particularly effective for applications where high fidelity and consistency are important.

2. Video Diffusion Models

Recent advancements have extended diffusion models to video generation. Ho et al introduced a framework for video generation that is an extension of the existing image generation architectures. This architecture allows for both unconditional and conditional video generation, achieving state-of-the-art results in sample quality. This model generates videos in blocks of fixed frames, which can then be combined to create longer sequences.

The conditioning technique enhances the model’s ability to generate videos based on specific prompts or inputs. This means that the model can create videos that follow a particular storyline or respond to user-defined criteria, making it highly versatile for various applications, including targeted advertising and personalized content creation. The ability to conditionally generate videos opens up new possibilities for creating customized and engaging video ads that can capture the viewer’s attention more effectively.

As brands seek to leverage GenAI for video content, significant advancements have emerged. Meta’s introduction of MarDini, a next-generation video diffusion model, exemplifies this trend. MarDini offers advanced capabilities for video generation, including seamless frame interpolation and dynamic scene creation. It specifically uses Masked Autoregressive (MAR) Techniques which allows for flexible handling of various video tasks, such as interpolating frames or converting images into videos.

3. Key Innovations in Diffusion Models

Recent advancements in diffusion models have made them particularly well-suited for creative tasks like video generation:

Guided Diffusion: Techniques like CLIP-guided diffusion integrate text prompts or class labels to control the output generation, allowing for highly specific and creative results.
Improved Noise Schedules: Models like DDIM (Denoising Diffusion Implicit Models) reduce the number of reverse steps while maintaining quality, speeding up the generation process.
Temporal Consistency in Video: For video content, diffusion models maintain temporal coherence across frames by conditioning generation on previous frames, ensuring smooth transitions.

3. Applications in Video Ad Creatives

Video Super-Resolution: Diffusion models upscale low-resolution videos while adding realistic details.
Video Generation from Text Prompts: Models like RunwayML’s Gen-2 use diffusion principles to generate entire video sequences from simple textual descriptions.
Style and Content Control: Users can control aesthetics (e.g., color grading or animation styles) by guiding diffusion with reference images or specific attributes

Integrating diffusion models into generative AI workflows enhances the overall creative potential of tools used for advertising. Below, we explore broader applications and advancements in generative AI for ad creatives.

Personalized and Localized Ad Content

AI can adapt video scripts to specific languages and even synchronize localized audio with realistic lip movements using AI video dubbing. This creates tailored ads for global markets without expensive manual work. GenAI tools use demographic data to tweak ad visuals, such as background colors, text placements, and image overlays, ensuring relevance to the target audience

Creative Content Generation with AI

Emerging platforms like Runway and Sora can transform simple textual prompts into engaging video sequences, making video creation accessible to everyone. Adobe Premiere Pro integrates AI tools like Pika to extend shot durations and enhance videos with seamless 3D effects, such as parallax movements, perfect for immersive ads.

Amazon Nova Reel significantly enhances the ability of brands and influencers to create high-quality video content for social media platforms. By leveraging natural language prompts, users can effortlessly generate visually appealing videos that capture audience attention, drive engagement, and increase reach. The built-in safety features, such as watermarking and content moderation, ensure responsible use and maintain the integrity of the content shared across social media channels.

Veo 2 stands out for its ability to generate longer clips — over two minutes — while maintaining exceptional quality. Users can easily customize their videos by specifying angles and styles, allowing for a high degree of creative control. With its advanced understanding of cinematography, Veo 2 not only enhances the visual experience but also empowers creators to bring their unique visions to life in ways that were previously unimaginable.

Automated Video Resizing and Multi-Platform Adaptation

Tools like Meta’s Image Expansion automatically optimize video dimensions for different feeds and screens, saving hours of manual editing. AI-based tools analyze the focal point of a video and ensure key elements remain intact across all formats, maintaining the creative’s integrity while improving usability.

Opportunities and Challenges

There, however, notable pitfalls of generative AI that brands must navigate. One significant concern is the lack of clarity around copyright and ownership, which has hindered broader adoption of generative AI in marketing. Many companies worry about potential intellectual property violations when using AI-generated content. To address these issues, platforms like Amazon’s Titan and Adobe’s models emphasize training on licensed data and implementing watermarking solutions, which could provide a pathway to mitigate copyright concerns. Additionally, ethical considerations such as bias and fairness in AI outputs remain critical challenges that need ongoing attention to ensure responsible use in advertising.

Conclusion

The integration of generative AI, particularly diffusion models, into advertising is transforming the creative landscape. By automating and optimizing the creative process, GenAI enables brands to produce high-quality, personalized content at scale. As these technologies continue to evolve, they will unlock new possibilities for engaging consumers and driving campaign success. Brands that embrace these advancements will be well-positioned to lead in the competitive world of digital advertising.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

GenAI for Ad Creatives was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

In-app Advertising and SKAdNetwork

Frank Yan — Tue, 27 Jun 2023 20:46:51 GMT

Image showing crowd attribution for mobile ads

In-app advertising is a revenue stream for mobile apps to leverage their real-estate to show ads to their users. In other words, ad buyers pay them for displaying ads within their app.

Genres such as Gaming, Social, Utility, and Entertainment rely heavily on IAA (In-App Advertising).

In 2022, users on average spent 5 hours on their phones and mobile app ad spend surpassed $336 billion. The trend of consumer time and transactions in apps continues to grow, with projections that the market will reach an estimated $362 billion in 2023.

Let’s explore how you can develop a strong capability to execute on app install and in-app conversion campaigns for both Android and IOS. Below, we will go through in-app advertising for iOS and the SKAdNetwork framework.

What is SKAdNetwork

StoreKit Ad Network, otherwise known as SKAdNetwork or SKAN, is Apple’s API-based, privacy-centric framework for attribution and ad measurement. It provides aggregated ad activity measurements to advertisers with no user level data.

Post iOS 14.5 release, users have the choice to opt out from any app tracking their user level data via Apple’s Identifier for Advertisers, also known as the ATT framework (App Tracking Transparency). SKAdNetwork became the only way for advertisers to measure the success of ad campaigns while maintaining user privacy.

How Does SKAdNetwork Work

SKAdNetwork leverages 3 main components:

Ad network: signs ads and receives install-validation postbacks after ads result in conversions.
Publishing app: the app that displays the ad.
Advertised app: the app being advertised.

There are 2 types of ad engagements:

Views: ad shown for at least 3 seconds.
StoreKit renders (a miniature version of the advertised app page from the app store): There is an engagement rule that controls when the storekit renders, for example when the user clicks on the ad.

SKAdNetwork Flow:

SKAdNetwork flow diagram

When an ad is clicked and the store is opened, the publishing app and the network provide it with some basic information such as ad network, publisher, and campaign ID. The app store will then send a notification of successful conversion to the ad network.

If the user launches the app within an attribution time-window, the ad impression is eligible for install-attribution postbacks. The attribution time-window can be up to 35 days depending on the ad type. As the user engages with the app, the app updates the conversion value.

SKAdNetwork postbacks are delayed by a minimum of 24 hours, making sure to not tie the install to a specific user, thus preserving privacy.

Challenges

No real ROI: SKAN provides a very limited set of data, mostly on installs, conversion values and post-install.
Limited granularity: No device or creative-level data, only allows up to 100 campaigns and six bits of post-install conversion data.
Hard to optimize campaigns immediately: The postback delay makes it very hard to make decisions in a short amount of time.
No re-engagement attributions: No device or user-level data is collected or shared, thus it’s impossible to target users for re-engagement.

SKAdNetwork 4.0 (SKAN 4.0)

On October 24, 2022, Apple released the next version of SKAdNetwork (4.0), which introduces significant changes that allow advertisers and ad networks to measure more while maintaining user privacy. At StackAdapt, we are using SKAN 4.0.

Here are the benefits of using SKAN 4.0:

Three postbacks instead of one: Advertisers can now receive up to 3 postbacks, each based on a specific activity window (0–2 days, 3–7 days and 8–35 days). This allows advertisers to understand how users engage with their app over time.
LockWindow: Developers have the ability to lock the measurement window in a specific window to stop conversion measurements in order to receive postbacks sooner.
Crowd anonymity: Conversion values are split into 3 types: low, medium, or high. Postbacks contain masked conversion values when crowd anonymity is low, when crowd anonymity is medium, a coarse value is returned, and a fine grained conversion value is only returned when crowd anonymity is high.
Web-to-app support: In SKAN 4.0, web-to-app attribution for Safari is also supported.

Image showing crowd anonymity based on campaign, location and placement:

Source: Apple’s What’s new with SKAdNetwork — WWDC22 video

Summary

As the advertising industry continues to adapt a privacy-centric reality, it is important for us to understand the benefits and limitations of using different frameworks, and to create a more innovative ecosystem to make the most of them.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

In-app Advertising and SKAdNetwork was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Natural Language Processing in Contextual Advertising

Panteha Naderian — Wed, 19 Apr 2023 19:45:02 GMT

Graphic showing an example of an ad on a mobile phone powered by StackAdapt.

Contextual advertising models analyze the content on web pages and determine where to place the most suitable and relevant ads on websites. The primary assumption behind contextual advertising is that users consume content on topics they are interested in. For example, if a user is reading about the latest fashion trends in high heel shoes, then it’s likely that they are interested in purchasing a new pair of shoes. Studies have shown that people engage far more frequently with ads that appear in relevant contexts. Furthermore, with the rise of privacy concerns around browser cookies, it has become imperative for DSPs to invest in contextual advertising.

The central technology behind contextual advertising is natural language processing (NLP). This technology helps to better model content found on a web page and work with a bidding algorithm to ensure that a DSP wins the auction to place relevant ads in high-quality context.

At StackAdapt, we regularly explore the latest, natural-language processing techniques, and with recent technological approaches, including transformers, large pre-trained models, and few-shot learners, the sky’s the limit.

Below, I explore three NLP publications that can potentially be useful in building contextual targeting models:

Intriguing Properties of Compression on Multilingual Models

Multilingual models are powerful tools that can analyze and operate across several languages, eliminating the need to train separate models for each language. This approach offers several attractive benefits, including higher performance on low-resource languages, reduced maintenance, and cost savings. This can specifically be helpful in contextual advertising as it significantly accelerates the process of expanding to new languages and countries.

This paper investigates the impact of compression and sparsification on multilingual models. We know that with larger size and increased parameters of multilingual models, it’s increasingly more challenging to deploy on resource-constrained environments. Specifically, the authors focus on pruning sparsification methods in which all weights lower than a pre-specified threshold are eliminated from the model.

The study experimented with various compression parameters, revealing some interesting insights. First, low-resource languages typically suffer from lower performance with extreme sparsification; however, medium-range compressions may improve their performance. Second, it’s possible that sparsification can improve robustness by reducing overfitting.

Training Compute-Optimal Large Language Models

In recent years, researchers have observed promising improvements in a variety of NLP tasks by increasing the size of language models. As a result, larger language models have been trained over the past few years, such as GPT-3 with 175 Billion parameters, Gopher with 280 Billion parameters, and MT-NLG with 530 Billion parameters. A natural progression to the ongoing research is to discover methods to enhance computational resource optimization. These optimizations can result in cost-saving and more effective use of resources in contextual advertising.

In this paper, the authors aimed to find a compute-optimal language model given a specific resource constraint. Specifically, they aimed to find an optimal number of parameters (N) and the number of tokens (D) that minimize model loss given a pre-specified computational constraint. They experimented with 400 different models to empirically estimate the optimal values for N and D. Interestingly, they discovered that optimal models tend to have a higher number of tokens and a lower number of parameters compared to current pre-trained models.

For instance, for the same budget used to train Gopher, the optimal model should have four times more tokens and 1/4th the number of the parameters. By extracting these optimal numbers, the authors introduced Chinchilla with 1.4 Trillion and 70 Billion parameters. This approach not only resulted in improved results, but also reduced inference costs due to the lower number of model parameters.

Learned Incremental Representations for Parsing

Syntactic parsing can improve language comprehension by extracting grammatical dependencies in a sentence. Specifically, this paper focuses on incremental syntactic parsing, a process where the model gradually processes a sentence word-by-word to extract grammatical dependencies, and attach meaning and structure to each word. This method contrasts with other approaches where the model waits for the entire sentence to begin the analysis. The authors remind us that this is very similar to how humans comprehend language, processing sentences incrementally rather than waiting for the complete sentence to be spoken.

The key challenge the paper aims to address is false committing in incremental processing, where the model commits to an incorrect structure in times of ambiguity that only becomes apparent once the full sentence has been revealed. A simple approach to overcome this problem is beam search, where the model considers multiple plausible solutions simultaneously and selects the most accurate structure once the sentence is complete.

The authors offered a solution for false committing by training an end-to-end model. The first half of the model combines the GPT-2 encoder, followed by a discretization step where continuous vectors are collapsed into a small set of symbols. The second half of the model is a bidirectional read-out network that reads the discretized symbols and creates the final syntactic structure for the entire sentence.

Conclusion

We have explored several recent publications that could potentially be useful in contextual advertising. Multilingual models can assist in analyzing web pages across various countries and languages, compute-optimal models can help us better manage our computational resources, and using syntactic parsing can lead to more accurate language understanding.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

Natural Language Processing in Contextual Advertising was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

4 Requisites For Building Accessible Digital Products

Hok Laam Cheng — Wed, 22 Feb 2023 12:02:37 GMT

A wireframe for a page with semantic HTML, distinct button states and accessible colours

It’s common to think about accessibility in the context of physical spaces, but it’s an important aspect of digital spaces too. Building accessible digital products ensures inclusivity, creating an opportunity for these products to reach all users in a fair and beneficial way.

The best digital products, and the teams that build them, leverage important characteristics that make them truly accessible. In this article we’re exploring the processes and technical tools that make building an accessible digital product possible.

1. Understand Why Accessibility Is Important

The first requisite for building accessible digital products is to understand why we care about accessibility.

Inaccessible design is found all throughout the digital world. For example, you’ll still find digital products on the web that do not support assistive technologies like screen readers and magnification, which make web content more accessible. Accessible design makes a product equally usable by everyone, and in many countries there are digital accessibility laws in place to ensure this.

Digital accessibility is most often measured using the Web Content Accessibility Guidelines (WCAG). The WCAG defines three levels of conformance: A (lowest), AA, and AAA (highest), with each level of conformance requiring conformance against a set of success criteria, each building upon the previous. For example, in order to achieve the AA conformance level, a product must meet all the success criteria in both A and AA levels.

According to a 2021 report by WebIAm, analysis of the top 1,000,000 home pages in 2021 showed 97.4% of analyzed web pages failed to meet one or more of the criteria from the Web Content Accessibility Guidelines (WCAG) 2.0 Level AA, the consensus standard for digital accessibility. The analyzed websites had an average of 51.4 errors per page. Clearly, there is still more work to do.

Note: At the time of writing in February 2023, the published WCAG version is WCAG 2.1, however a new version WCAG 2.2 is in the draft stage and is set to be published later in the year.

The WCAG defines the accessibility goal that a digital product is trying to achieve. Now that there is a goal, how would an organization work towards that goal?

2. Get Everyone’s Buy-In

The responsibility for an accessible digital product does not lie on a single person, role, or even team, but requires the buy-in of the entire organization, including:

Leadership
Product
Design
Engineering

A focus on accessibility starts with leadership. Dedicating resources towards educating an organization about accessibility and hiring people with knowledge in this area is often one of the first steps towards building an accessible digital product. These top-level actions enable the rest of the organization to focus on accessibility.

It then continues at the product level. The 2022 State of Agile report highlights that 80% of respondents are predominantly using agile methodologies (although this statistic should be taken with a grain of salt, as the survey population skews towards agile users).

As agile methodologies that emphasize continuous iteration become more popular, a focus on accessibility must likewise increasingly be a continual, iterative effort. Not only should a digital product be built for accessibility, but it should also be maintained for accessibility. It is the responsibility of product and project managers to ensure that accessibility is prioritized on the product roadmap.

A digital product should also be designed such that it is accessible to all. Designers should treat accessibility as a first-class requirement that is equally as important as usability or aesthetics. In fact, good accessibility often promotes good UX as well. We will cover tools designers can use to ensure accessible designs in the next section.

Last but not least, the engineers that are bringing the designs to life should not only adhere to the (hopefully) accessible designs, but also proactively diagnose potential accessibility issues. It is also often their responsibility, alongside quality assurance, to test the accessibility of the product, with the aid of automated tools and manual testing.

Building accessible digital products is impossible without organization-wide buy-in. If any one of leadership, product, design, or engineering teams fails to buy-in, then the goal of building and maintaining an accessible product will become incredibly difficult to achieve.

That being said, the ultimate execution of accessible design and development lies in the hands of designers and engineers, so the next section will cover a range of tools that can be used to ensure a product is accessible during the design, development and testing phases.

3. Use The Tools at Your Disposal

When it comes to accessibility, designers and engineers are the hands and the feet that make an accessible digital product a reality. Therefore, it is important for designers and developers to know, understand, and use the tools available to them to ensure that a product is accessible to all.

Below are several categories of tools that are useful for ensuring accessibility in design and development, along with examples for each category.

Contrast Ratio

Contrast ratio is the difference in perceived luminance between two colours. Contrast ratios can be within the range of 1 to 21, or commonly written as 1:1 to 21:1. It is calculated using the following formula:

(L1 + 0.05) / (L2 + 0.05)

Where:

L1 is the relative luminance of the lighter of the colors
L2 is the relative luminance of the darker of the colors

WCAG success criterion 1.4.3 generally suggests a contrast ratio of at least 4.5:1.

Colour contrast checkers are widely available to validate that text and background colours have sufficient colour contrast so that text is clear and legible to all levels of visual capabilities. They should be used during both design and development processes to validate all text and background colours, as well as adjacent colours.

Some examples of colour contrast checkers:

WebAIM Colour Contrast Checker

Adobe Color — Colour Contrast Analyzer Tool

Colour Contrast Check

Contrast — Figma Colour Contrast Checker Plugin

Component States

Component states communicate the current status of a component and are especially important for interactive components. Some examples of component states include:

Enabled: A component is interactive.
Disabled: A component is non-interactive.
Hover: When the user’s mouse cursor is over a component.
Focused: When an element has received focus via keyboard or voice.
Selected: An option is chosen by the user.
Error: A mistake has occurred due to the user or the system.

Enabled, disabled, hover, and focused states of the button from Halo-UI, StackAdapt’s custom design system

There are more states for some components, while others have fewer, so designers should carefully consider the required states for each component in their designs. Material UI provides a helpful summary of states, outlining each state and their usage.

Semantic HTML

For developers, the most fundamental tool to ensure accessibility is HTML. HTML is the backbone of semantics on the web, or in other words, it is crucial in providing meaning to a web page. Using correct, semantic HTML elements at the correct places ensures convenient navigation and meaningful interactions.

For example, when implementing an interactable action, use the

How We Manage the StackAdapt UI Design System

Ellery Yang @ StackAdapt — Wed, 25 Jan 2023 12:02:31 GMT

This article was co-authored by Ellery Yang and Cloris Qian.

The Beginning of Halo

The year was 2020. StackAdapt was on a fast track of growth in terms of both the size of our team and the complexity of our platform.

With this growth, a new challenge presented itself in front of us: It became essential for us to build standardized tools and guidance to ensure UI/UX consistency across the platform.

To address this challenge, we introduced a new StackAdapt UI Design System, codenamed Halo. At its core, Halo was created as a standardized UI component library.

It is a collection of StackAdapt-branded UI building blocks that feature teams across the company could use to build customer-facing features for the StackAdapt platform.

Halo started as a small, experimental project. In its early days, 3 designers and 1 engineer came together in a casual, ad-hoc way to discuss UI components that may need to be standardized and become part of the library, which is later shared with the engineer to be built and added to the Halo library.

Quickly, this standardization-driven approach gained popularity among feature teams, and increasing usage of the Halo components demanded more resources be directed to the development of Halo.

The Challenge of Sustainability

As more members joined the team and the Halo library expanded, the ad-hoc style in which we ran Halo quickly became inefficient and cumbersome.

The growing Halo team found it increasingly challenging to reach consensus on the ever-larger number of issues needing discussion in our casual meetings, and lack of documentation resulted in repeated discussions.

The intuitive prioritization of work items that once worked just fine became clearly unscalable. And without a great system to keep each other informed on the growing team.

Engineers and designers were sometimes unaware of the full picture of what’s needed, what’s being developed, and the gaps between them.

This also impacted the boundary between the Halo team and feature teams. Sometimes, the Halo team spent a lot of effort developing a component, only to realize no feature team was planning to use it any time soon.

Other times, a feature team may have been looking for a UI component but didn’t know what was the best way to make such a request to the Halo team, resulting in delays in collaboration.

The Evolution of Our Design Process

It was clear our process was unsustainable so we started diagnosing the underlying issues. We did so in a two-phase exercise.

Looking Inwards

In the first phase, we looked inwards at our own effectiveness in running the Halo project. A few problems were identified and we started upgrading our process to address them.

Problem #1: No source of truth.

Ad-hoc discussions worked just fine when Halo was a small library casually taken care of by just a few StackAdapters. As both the project and the team grew, we needed Halo topics, work items, and requests to live in established venues of discourse for visibility.

Solution #1:

Create regular guided huddles and Slack channels to reach consensus.
Use a templated component request format.
Document the design and decision-making process in our project management tool (Jira).

Problem #2: Lack of guidance for prioritization.

The intuitive prioritization method that we used in our early days was clearly not sustainable as the project grew. The sometimes-adopted method of prioritizing the lowest-hanging fruits first was also problematic.

The increasing complexity of the Halo project required us to have a structured approach to work item prioritization.

Solution #2:

We installed systematic guidelines into our prioritization exercises. While we identified many factors to consider, at the core we decided to focus on two questions when assigning priority to a task:

Does this significantly improve our user experience (UX) or fix an important breakage that’s causing unexpected UX?
Does this have a downstream consumer already lined up?
(Is there a feature team ready to use this once it’s done?)

The philosophy we employed here goes back to the purpose of the Halo project, which is to help feature teams easily build great user experiences for StackAdapt.

We decided to look at the UX impact and proposed usage of component work items as the centrepieces of our prioritization.

Problem #3: Lack of transparency with stakeholders.

We didn’t have a great system in place to identify and notify stakeholders in the Halo team as work items unfolded. As the team and project grew, we needed to mindfully give each other visibility of our work.

Solution #3:

We implemented many process changes such as:

Identify individuals to be the core stakeholders of a work item as soon as it’s created.
Automate notifications to these stakeholders of status updates or actions required.
Keep feature team product managers (PMs) in the loop on updates to existing components to eliminate UI/UX “surprise updates.”

Looking Outwards

In addition to improving our own effectiveness running the Halo project, we also looked outwards at how we could become more effective collaborating with feature teams at StackAdapt.

We felt the necessity of this phase came from the nature of our work: Halo components were meant to be used by StackAdapt feature teams and designers, so it was imperative that we also up our game in cross-team collaboration.

Problem #1: No clarity on how our components are consumed by feature teams.

We often did not have clarity on how or if feature teams planned to use the proposed UI components in our Halo backlog.

This resulted in newly created Halo components not actually being used by feature teams, or requested components not being prioritized properly by the Halo team.

Solution #1:

We made it part of our process to align scenarios and timelines with feature teams. In other words, our investigation into a potential Halo backlog item is not complete until we know if someone will use it, and when.

Of course, this did not mean we could not implement components for their long-term benefits, but it effectively reduced the instances where our priorities and those of feature teams were misaligned.

Problem #2: Low awareness of the component request process.

This problem was in the other direction, where it was the feature teams that may need to reach out to the Halo team for component requests. We realized that not everyone was aware of how to do that, often resulting in delayed conversations or ineffective communication of requirements.

Solution #2:

With newly-created request templates and channels, we built a process for feature teams to easily follow and share all the required info with us, illustrated in the image below.

Once such requests were received, we triaged them and communicated with the requestor when we expected to take on the request, or in some cases, why we felt the component didn’t need to be built and what our recommended alternative was.

We shared this new process with the entire StackAdapt Platform team in a Lunch and Learn session, and have received positive feedback that the request process is much easier to follow now.

Our StackAdapt UI Design System Today

These self-reflections and course corrections were by no means easy, quick or painless. But as we embarked on these changes, we started to see great improvements in our process.

As our process allowed us to involve all relevant stakeholders more efficiently for component changes, we saw our communications and collaborations become more effective.

We were able to reduce the time it took to triage incoming UI component requests from a few weeks to within a week’s time. Our work item prioritization became more structured and, as a result, our Halo project backlog became a better representation of how we could help feature teams deliver the most impact to customers.

Feature teams also feel like it’s easier to work with us to address their feature scenarios in need of UI components, and we are keeping them informed of Halo changes that they should be aware of and plan for.

In addition to these tangible improvements, we also experience some more intangible but important changes that lubricate and accelerate our development process.

Build an adaptive team culture that constantly optimizes efficiency.

After seeing positive results from the aforementioned process evolution efforts, the Halo team didn’t stop there and continues to tweak our work process.

For example, we consistently evolve how we run our regular Halo team huddle as the team grows, sometimes making minor tweaks to the meeting structure every few weeks.

Recently, we started decentralizing our Halo huddle. We reserve it for topics needing the entire team’s attention and create follow-up meetings involving only the team members who will be working on an item once its scope is defined.

We made these changes due to the growing size of our team and the number of items needing our attention. More importantly, we were able to constantly make these changes thanks to the adaptive mindset of our team.

This adaptive culture ensures we as a team collectively keep a growth mindset and are constantly looking for and open to ways to improve how we work together.

Build an elevated level of trust in the process.

We also saw an increased level of trust in the process both within the Halo team and between the Halo team and feature teams.

What we mean by this is not a trust in the process being perfect, but rather that the process is always improving, and any imperfections we encounter can be addressed by our adaptive team culture.

To paraphrase author Stephen M. R. Covey, one of the most important factors that build trust is past results.

The Halo team and our partner feature teams have been through a lot of process-improving exercises together, and have observed the positive results we were able to achieve collaboratively.

Both teams now have a great track record of making our collaboration process better, and that gives us trust in our ability to continue improving it in the future.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

How We Manage the StackAdapt UI Design System was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to 5X your Background Job Processing With Sidekiq

Shahid Khaliq — Wed, 25 Jan 2023 12:02:31 GMT

How to 5X Your Background Job Processing With Sidekiq

Here at StackAdapt, excellent user experience is our highest priority. We are constantly evaluating the tools and systems we use to ensure that they meet the evolving needs of our users.

In a project led by our web infrastructure team, we are currently undergoing a migration from Delayed Job to Sidekiq for background job processing in our Ruby on Rails web application monolith. In this post, we will explain why we decided to make the switch, the challenges we faced during the migration, the benefits we’ve seen since completing the migration, and offer our conclusions on the process.

Introduction to Delayed Job and Sidekiq

Ruby on Rails is a synchronous web application framework. As it can only process one request at a time, it is crucial that a single request does not take too long to execute because it would block the web application and prevent it from being able to handle subsequent incoming requests.

In any web application, big or small, it is natural to have tasks, or “jobs” as we prefer to call them, which can take a long time to execute. Ruby gems like Delayed Job and Sidekiq solve this problem by enabling long-running jobs to execute in the background, asynchronously, and therefore, leaving the main web application free to process short and quick-running web requests.

First released in 2008, Delayed Job was written by Shopify CEO, Tobias Lütke, and extracted from the core Shopify Rails application. Delayed Job is a free to use, single-threaded, database-backed, asynchronous job processing gem. It offers a simple delay API that developers can use to process jobs asynchronously.

# without delayed_job
@user.activate!(@device)

# with delayed_job
@user.delay.activate!(@device)

Sidekiq is a more popular and modern alternative, first released in 2012 and authored by Mike Perham. It is multi-threaded, backed by Redis and offers much better community support. Sidekiq offers a free OSS (Open-Source Software) version that is actively maintained by the Ruby community. It also offers advanced features in its Pro and Enterprise versions. Sidekiq has a simple API, with the idea of “jobs” which are simply a way to encapsulate long-running tasks.

class HardJob
  include Sidekiq::Job

  def perform(name, count)
    # do something
  end
end

HardJob.perform_async(‘bob’, 5)

Why Sidekiq Over Delayed Job?

Tied to our tremendous growth, the number of background jobs we process for our web application has grown exponentially over the past couple of years. One of the main reasons for our move away from Delayed Job was our trouble scaling it to handle the increased load.

Redis as the Data Store

We had always used our main MySQL database as the data store for Delayed Job and, on any given day, 40% to 60% of the aggregate load on our database would be from the dozens of Delayed Job processes we had running.

If there was a particularly large influx of background jobs on a certain day, and we bumped up the number of Delayed Job workers, we would see noticeably degraded performance across the board on our web application because the database, congested by all the queries from the Delayed Job processes, would be slower in responding to regular queries from the web application itself.

Sidekiq, on the other hand, uses Redis as its data store. Being an in-memory data store, Redis is much faster than a traditional RDBMS. This lets Sidekiq push and pull jobs from Redis much faster than Delayed Job can from the database.

The added benefit for us is that using Sidekiq automatically introduces separation of concerns between the web application and our asynchronous task processing infrastructure. If there are an unexpectedly higher number of asynchronous jobs, it does not affect the performance of our web application at all because the data stores, that is Redis and MySQL, are completely independent of each other.

Multi-Threading

We run our Delayed Job and Sidekiq processes on AWS EC2 instances. We configure these processes as systemd services so they can start on system startup and restart on failure. There is, of course, a limit to the number of processes that can run on a single machine and, as our requirements went up, we started hitting this limit with Delayed Job.

Delayed Job and Sidekiq processes need to load the whole Ruby on Rails application to be able to process async jobs and for our web application, the resident set size (RSS) for these processes usually starts around the 400 MB to 500 MB mark. This size varies, based on the kinds of jobs the process is executing, and usually goes up during the life of a process.

Unlike Delayed Job, Sidekiq is multi-threaded by default. Each Sidekiq process can be configured to run multiple threads, where each thread independently processes asynchronous jobs. We have configured each of our Sidekiq processes to have 5 threads.

For newly started processes, this directly results in a 5x drop in memory overhead for us, for the same number of workers, and allows a single EC2 machine to process a lot more async jobs when using Sidekiq processes compared to Delayed Job processes.

More Features, Better Tooling, Bigger Community

Out of the box, Sidekiq offers more features than Delayed Job such as support for middleware and a built-in web UI. The pro and enterprise versions offer an even greater range of features such as batches, cron jobs, unique jobs and worker metrics. The Sidekiq community is also more active in building useful extensions that add lots of additional functionality to Sidekiq.

It is difficult to gauge popularity but, from our experience, Sidekiq definitely seems more popular in the Ruby community. The gem is updated more frequently, it has more open source contributors and the GitHub repository has more stars than Delayed Job.

Ultimately, the popularity only matters to us because it is easier to find guides and tutorials online and when new Ruby on Rails developers join our team, they are more likely to be familiar with Sidekiq than with Delayed Job.

Challenges Faced During the Migration

As with any large migration, there are bound to be some hiccups along the way. In this section, we will share some of the challenges we faced when migrating from Delayed Job to Sidekiq.

Database Transactions

The biggest problem we had in our migration was that many of our Delayed Job jobs were being created within database transactions.

As an example, if a new advertisement campaign was created within a transaction, there were instances where jobs related to that campaign would also be created within the same transaction.

Since Delayed Job uses the application’s database as the job store, this worked out perfectly because if a transaction was rolled back, the job would never be committed to the database or picked up by a worker.

As soon as we moved some of these jobs to Sidekiq, we started seeing two problems:

Jobs were pushed to Redis but the transaction had not finished.
Jobs were pushed to Redis and transaction had rolled back.

In both cases, Sidekiq processes would pick up the job, only to find that the object referenced by the job did not exist in the database. We solved this problem in three ways:

Move job queueing outside the transaction blocks in the code.
Modify jobs to exit if the parameters (records) are not present in the database.
Use the after_commit_everywhere gem to catch missed cases.

The first two solutions are quite self explanatory but we will go into more detail for the last one. As we use Ruby on Rails’ ActiveJob framework for our background jobs, we were able to leverage ActiveJob hooks to ensure that no jobs are ever pushed to Redis within a transaction.

require 'after_commit_everywhere'

module JobExtensions
  module SidekiqJob
    include AfterCommitEverywhere

    def self.included(base)
      base.queue_adapter = :sidekiq
      base.around_enqueue do |_, block|
        after_commit { block.call }
      end
    end
  end
end

We include this module in every job migrated to Sidekiq from Delayed Job. The first thing it does is change the job’s adapter to Sidekiq. Secondly, it uses the after_commit_everywhere gem to ensure that the job is enqueued on Redis only after all surrounding transactions are committed to the database.

This technique has proven to be an incredibly effective way to solve one of Sidekiq’s biggest drawbacks for us and has made our migration process much smoother.

Mitigating the Risk for Job Loss

For Stackadapt, it is of utmost importance that no jobs are ever lost from our web application as this would result in a terrible user experience for our clients.

Therefore, one of our biggest concerns when migrating to Sidekiq was that Redis was an in-memory data store and if it went down for whatever reason, it might result in data (job) loss. We have taken a number of steps to mitigate this risk.

We have configured Redis in Sentinel configuration. This ensures that there is automatic failover in case one of the nodes goes down.
By default, Sidekiq processes remove jobs from Redis to process them. As such, if a process terminates unexpectedly, the job is lost. Instead of using this default behavior, we have configured Sidekiq server to use Sidekiq Pro’s super_fetch! feature to only completely remove jobs from Redis after they are completed.
Going a step further, we have utilized ActiveJob hooks to ensure that if the Sidekiq client is unable to reach Redis to enqueue a job, the job will be enqueued to Delayed Job automatically instead. This is a temporary measure for the migration and we will keep this in place until we have Delayed Job processes running on the sidelines.
Another temporary measure we have in place, again using ActiveJob hooks, is dual-writing jobs to a table in our main database. When a job is finished processing, it is removed from the table. We only do this for low-frequency, high-priority jobs since this negatively affects performance. This is also a temporary measure for the migration and it is mostly a sanity check for us, as this can let us catch cases where a job was enqueued but not executed by Sidekiq for whatever reason.

Some of these measures are definitely overkill but, at the end of the day, it is in our clients’ best interest for us to be safe rather than sorry.

Migrating jobs from Delayed Job to Sidekiq

Most of our jobs on Delayed Job were already using the ActiveJob framework. Migrating these jobs was smooth sailing for the most part, as all that needed to be done was switching the job adapter.

Some of our legacy jobs are still directly using Delayed Job APIs. What makes things more complicated is that many of these jobs are also using an in-house implementation of a job uniqueness feature which relies on the jobs being present in a database table.

For the more complicated jobs, which account for between 5% to 10% of all of our jobs, the web infrastructure team has delegated the migration to Sidekiq to the teams that own the job. This slow but steady process ensures that these jobs are safely migrated over.

Conclusion and Future Outlook

Having migrated about 80% of our jobs to Sidekiq so far, we have seen a big decrease in database load. Sidekiq has been rock solid during the migration and the Web UI has been incredibly useful in helping us keep an eye on our background jobs.

We are pleasantly surprised at how well Redis performs compared to the database for the same use case and how little memory it consumes. Moving forward, we are hoping to continue seeing increased performance and reliability of our background job processing infrastructure.

While we wrap up this migration, the next big step for the web infrastructure team is training. On this front, some of the things we are already working on are Sidekiq runbooks for on-call, documentation to help developers run Sidekiq locally for testing and debugging and doing lunch-and-learns for the benefit of other engineering teams.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

How to 5X your Background Job Processing With Sidekiq was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

5 Takeaways From Building a Custom Component Library

Javier Ching — Wed, 25 Jan 2023 12:02:31 GMT

When I joined StackAdapt in 2019, the web platform team consisted of only a few engineers. We were building the platform frontend using React and at the time, the team had worked with several component libraries but ultimately decided upon Material UI (MUI).

Within the platform team, individual project teams consisted of one or two engineers with minimal interactions between engineers of other projects. Throughout the year, as our design and engineering teams grew, so did the number of component variants. This was caused by two main issues:

Designers were primarily focused on their own individual projects, which contributed to a lack of communication with other designers about components, theming, and user experience (UX). Each designer had their own design templates which further diverged designs.
There was infrequent communication between engineering project teams. Developers would build components exactly to the design spec. Because of that, shared components got extremely complex, while many new components were added to support one-off functionality.

Ultimately, it got to a point where the increasing amount of components weren’t scalable and it caused a lot of confusion within the team. My goal was to fix these issues with the introduction of an internal component library. These are the five most important learnings from that process.

1. The Importance of Proof of Concepts

When I had initially pitched the component library idea to the whole team, not everybody was on board. The team had yet to understand its value because resources assigned to building the component library were resources detracted from pushing out new features.

Because of this, I took the opportunity to build out a proof of concept to show the team. I took some of our components and hosted them on a tool called Storybook which is an isolated virtual playground for components.

Storybook had many controls to tweak the component’s props, display those changes immediately, and generate code snippets for them for developers to reference.

Storybook Button Example

I compiled a list of the different component variants we had, and explained how a customized component library built on top of MUI would be the single source of truth to standardize the components we use across different projects.

Once the team was able to visualize the benefits of the library using this tool, they were on board and our internal component library, which we named Halo UI, was born.

2. Communication is Everything

The next problem to solve was communication. Not only is it one of the most important parts of a well-functioning team, but it is also one of the most challenging problems to solve.

The product team, the design team, and the engineering team often spoke different languages because of the unique context each team held.

What this created in the past was a game of broken telephone; by the time the information reached the developers, it was no longer what was originally envisioned by the product team.

To address this, we started organizing a weekly meeting to align all these teams. Each team would bring their work to the meeting, allowing members to discuss, ask questions, and provide feedback.

No design or engineering work would be taken on until all teams fully understood the requirements and agreed upon the work.

Over the years, we invested more time to teach each other about UI, UX, and coding so that we could understand each other’s perspectives better.

3. Defining Better Processes

To add onto the communication point, we had to ensure the documented work was written in a way such that someone without context could understand what needed to be done.

We introduced the concept of product requirement documents (PRDs) to document design decisions, requirements, and edge cases.

These documents were provided to the Halo UI engineering team to review and ask questions. Any answers would be documented in the PRD so that whoever picked up the work would still have context.

To better support our feature teams, we set up a process where they can request new features or enhancements. These requests are researched and analyzed by the Halo UI team to determine whether this effort was worth taking on.

Finally, we also introduced backlog grooming sessions to review requirements with the Halo UI engineering team in advance to reduce the amount of time tickets get blocked due to unclear information.

4. Predicting the Future

Building the component library in a robust way was very important to account for changes in the future.

An example of this was deciding to build Halo UI as an internally hosted npm module. At the time, we only had one GitHub repo housing our platform code. But as the company scaled, it became clear that multiple codebases would benefit from importing our library as an npm module.

Fortunately, it was an easy transition a year later when we decided to start migrating our monorepo to micro frontends which managed their own independent repos. As our team was releasing regular package updates, we had to balance flexibility with strictness.

We needed to build our components to account for potential changes in product or design requirements but also prevent engineers from making drastic changes that could deviate it from the platform’s look and feel.

We made sure to spend a lot of time thinking of how to structure our components to account for this balance. This upfront effort really helped in minimizing the amount of breaking changes when major revisions were required.

5. Scaling the Team

The company was barely 100 people when I joined. Fast forward to today, and StackAdapt has grown to an impressive 900+ people over these past three years. As the product and design teams grew, being the sole developer on the Halo UI team didn’t scale with all the incoming requirements.

It became clear we needed to scale the Halo UI engineering team but hiring was a challenge because we wanted to find not only technically strong candidates, but candidates that had a real passion for frontend code, and those who would fit well within the team.

We also took this opportunity to improve our onboarding documentation and it really helped in expediting the onboarding of new frontend engineers. On top of that, to ensure the team’s code aligned with best practices, we improved our ESLint rules to automate this.

We found that building a proof of concept really helped convince team members to adopt a custom component library. Maintaining good communication with the team also helped us make sure everybody was on the same page.

Having well-defined processes expedited and improved the team’s work. Making sure we thought ahead when choosing technical approaches helped reduce pain points when making major revisions. And scaling the team helped maintain good team health.

Our team has made massive improvements over these last couple years, and I am excited to continue learning and growing with the team in the years to come.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

5 Takeaways From Building a Custom Component Library was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

A Gh-ost Story: Online Schema Change at StackAdapt

Kenneth Thomas — Wed, 25 Jan 2023 12:02:31 GMT

The StackAdapt platform’s backend is powered by a rails monolith. In the past, the way we ran database schema changes, similar to most rails applications, was using the ActiveRecord::Migration class.

This class is provided by rails to help to alter database schemas using a managed approach.

Since the StackAdapt platform was an always-up platform, rails migrations were our de facto approach to keeping it up and available. Database migrations were run on production whenever a new software release was deployed.

This approach was sustainable while the table sizes were relatively small. As StackAdapt scaled, so did our data, and some of our critical tables grew exponentially to a point where we could no longer run db schema migrations on those tables using rails migrations.

ActiveRecord::Migration is a versatile tool that we still use for most of our smaller models, but when it comes to our larger tables, we had to find another alternative.

As any engineering team would, we started looking at tools that could help us perform online schema changes to our production database with zero downtime. This led us to look at the Percona Toolkit.

Percona PT-Online Schema Change

Pt-osc is one of the most popular tools engineering teams use to run schema migration. To run the migrations using pt-osc we had to choose a time when production traffic was at its lowest and run the migration script.

The Percona Toolkit runs the migrations by creating a copy of the original table running the schema change on the new empty table and then copying over the data from the original table in chunks. This does not block reads and writes to the original table.

Pt-osc keeps both tables in sync by using triggers. Every time there is a CRUD operation performed on the original table, a trigger is created to keep the duplicate table in sync.

Pt-osc also throttles the migration if there is a sudden surge in traffic to the DB, to make sure the database was not under too much load.

This approach made sense to the team and we chose to use pt-osc to run migrations. At first there were no issues and migrations ran fine.

As we had to run more migrations to support the engineering team’s needs, we noticed migrations failing quite often and there were quite a few exceptions due to lock contention.

Issues with using Percona Toolkit to run migrations came down to reliability since most of the time the migrations failed or took too long to run.

Percona Toolkit’s trigger based approach was the main issue.

It adds overhead to write operations. For each write operation to the original table, another synchronous write operation to the pt-osc table is needed in the same transaction in order for the tool to function correctly, even if the tool is paused. This is at least 2X the writer’s workload.
While concurrent queries compete for locks in the original table, the triggers need to simultaneously compete for locks in the pt-osc table. This causes lock contention issues.

Because of the above concerns, we decided to look at other alternatives, which led us to GitHub gh-ost, an online schema migration tool which does not use a trigger-based approach to keep tables in sync.

Gh-ost

GitHub’s gh-ost works very similar to the Percona Toolkit. It creates a copy table called the gh-ost table, runs the schema migration on the empty table, and then copies over the data from the original table in chunks.

Instead of relying on triggers to keep the tables in sync, gh-ost uses the database bin logs to make sure new CRUD operations are also propagated over to the gh-ost table.

By not relying on triggers gh-ost reduced the load on the database writer when the migration was being throttled. This was what made it interesting to look deeper into gh-ost.

The first step we took was to benchmark the migrations using pt-osc and gh-ost while we also monitored the database and logs to see if there were any issues with lock contention.

Benchmarking

We wanted to compare running migration with pt-ost and gh-ost under various stress levels which would help us make an informed decision as to which tool could be a good fit for us.

We used mysql-slap a load emulation client to apply various levels of stress on the database while we ran the migration.

The sql script to simulate a naive CRUD operation would be run by concurrent threads on the database. The script updates a random row and then resets the data back to the original state.

SET autocommit=0;
BEGIN;
SET @randomId = (SELECT `id` FROM `table_name` ORDER BY RAND() LIMIT 1);
SET @oldName = (SELECT `name` FROM `table_name` WHERE `id` = @randomId);
SELECT * FROM `table_name` WHERE `id`=@randomId FOR UPDATE;
UPDATE `table_name` SET `name` = 'test' WHERE `id` = @randomId;
UPDATE `table_name` SET `name` = @oldName WHERE `id` = @randomId;
COMMIT;

Mysql-slap command

mysqlslap --create-schema="databse_name" --query="sql_script.sql" --host="127.0.0.1" --port=3306 --concurrency=20 --iterations=10000

Result of running the migration under 3 different load conditions. We had 20 concurrent running threads against the database. The schema migration was run on a table with approximately 3 millions rows.

Baseline: Run the migration with no load on the DB.

Experiment 1: Run the migration with 20 concurrent threads hammering the database, and set the max-load threshold to 25 threads on the database. Setting the max-load on the migration would throttle the migration in situations where the number of running threads to the database exceeds 25.

Experiment 2: Run the migration with 20 concurrent threads hammering the database, and set the max-load threshold to 35 threads on the database. Setting the max-load on the migration would throttle the migration in situations where the number of running threads to the database exceeds 35.

https://medium.com/media/46f179ff5a3ba375dfbd1c5f23e2bbd8/href

*Occurrences of lock contention/deadlocks on the table

In most cases gh-ost outperformed pt-osc and there were no contentions for locks when using gh-ost to run the migration. This made our decision easy.

Before choosing gh-ost as our default tool, we had to make sure we dropped foreign-keys and added a read replica to our database. Gh-ost can run migrations directly on master, but it prefers the replica approach which is safer and does not add additional load on the master writer.

Gh-ost does provide a flag (discard-foreign-keys) that can be turned on to drop foreign-keys on tables when running migrations.

Gh-ost also provides an option to delay cut-over which is when the gh-ost table and original table are flipped, this provides us the option to verify the tables are in sync and also perform the cutover when it’s safe to do so.

Tuning Gh-ost:

Gh-ost provides various options to tune the migration which we took advantage of and are described below.

`max-load`

This value tells gh-ost when it needs to pause the migration. This avoids puttubg the database under extra load when there are sudden bursts of CRUD operations running against the DB, which could be because of scheduled jobs that are db intensive or a spike in production traffic.

We had to determine a value of max-load that we were comfortable with, and we chose a value that was an average of our baseline threads and peak threads running over 24 hours.

`critical-load`

A threshold at which gh-ost will terminate the migration. This could be because of a random spike or an issue with production, but when the critical value is met, gh-ost will terminate the migration.

There are other critical-load related flags that can be set where gh-ost will not terminate the migration, you can read more about them on the gh-ost wiki.

`chunk-size`

The size of the chunk of data that is copied from the original table to the gh-ost table.

`postpone-cut-over-flag-file`

Allows you to postpone the cutover to a later time.

gh-ost \  
  --host= \
  --assume-master-host= \  
  --max-load=Threads_running= \
  --critical-load=Threads_running= \
  --chunk-size= \
  --cut-over-lock-timeout-seconds= \
  --default-retries= \
  --postpone-cut-over-flag-file= \
  --initially-drop-ghost-table \
  --initially-drop-old-table \
  --assume-rbr \
  --user= \
  --password="" \  
  --database= \  
  --table= \  
  --alter="" \  
  --verbose

Conclusion

After testing both Percona Toolkit and gh-ost our team was able to make an informed decision on the tool to choose for our system.

Both pt-osc and gh-ost are great tools to help run migration in a production environment with zero downtime, however we found that gh-ost works better when dealing with larger tables.

Interested in learning more about working at StackAdapt? Explore our Engineering career path!

A Gh-ost Story: Online Schema Change at StackAdapt was originally published in StackAdapt Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

StackAdapt Tech Blog - Medium

StackHack: Powering Bottom-Up Innovation

StackHack

More Than Just a Fun Event

Looking Back at StackHack 2024

A Look at StackHack 2025

Lessons Learned

Broaden the scope

Provide access to necessary resources

Be flexible

Raising the Bar: StackHack 2026 and Beyond

GenAI for Ad Creatives

Generative AI in Advertising

Diffusion models

1. Overview of Diffusion Models

2. Video Diffusion Models

3. Key Innovations in Diffusion Models

3. Applications in Video Ad Creatives

Personalized and Localized Ad Content

Creative Content Generation with AI

Automated Video Resizing and Multi-Platform Adaptation

Opportunities and Challenges

Conclusion

In-app Advertising and SKAdNetwork

What is SKAdNetwork

How Does SKAdNetwork Work

Challenges

SKAdNetwork 4.0 (SKAN 4.0)

Here are the benefits of using SKAN 4.0:

Summary

Natural Language Processing in Contextual Advertising

4 Requisites For Building Accessible Digital Products

1. Understand Why Accessibility Is Important

2. Get Everyone’s Buy-In

3. Use The Tools at Your Disposal

Contrast Ratio

Component States

Semantic HTML

and descending to

sequentially

ARIA

to

4. Test Your Product

Start Your Web Accessibility Journey

How We Manage the StackAdapt UI Design System

The Beginning of Halo

The Challenge of Sustainability

The Evolution of Our Design Process

Looking Inwards

Looking Outwards

Our StackAdapt UI Design System Today

Build an adaptive team culture that constantly optimizes efficiency.

Build an elevated level of trust in the process.

How to 5X your Background Job Processing With Sidekiq

How to 5X Your Background Job Processing With Sidekiq

Introduction to Delayed Job and Sidekiq

Why Sidekiq Over Delayed Job?

Redis as the Data Store

Multi-Threading

More Features, Better Tooling, Bigger Community

Challenges Faced During the Migration

Database Transactions

Mitigating the Risk for Job Loss

Migrating jobs from Delayed Job to Sidekiq

Conclusion and Future Outlook

5 Takeaways From Building a Custom Component Library

1. The Importance of Proof of Concepts

2. Communication is Everything

3. Defining Better Processes

4. Predicting the Future

5. Scaling the Team

A Gh-ost Story: Online Schema Change at StackAdapt

Percona PT-Online Schema Change

Gh-ost

Benchmarking

Tuning Gh-ost:

Conclusion