From the course: Advanced Data Engineering with Snowflake

DevOps in the world of data engineering - Snowflake Tutorial

From the course: Advanced Data Engineering with Snowflake

DevOps in the world of data engineering

If you were to ask 10 different engineers, what is DevOps? You'd likely get 10 different answers. So before we dive into how Snowflake does DevOps, let's take a quick detour and first understand what we mean by the term DevOps. To start, DevOps isn't a product. It's also not a feature. It's something more fluid than that. At a high level, it's most often a core set of philosophies and best practices for allowing teams to quickly deliver and maintain software at scale. Now I'm sure that even that high-level definition will encourage some debate, but for our purposes, it's just the right scope. Let's break it down a bit more. Data pipelines are living engineering systems that, like many engineering systems, have a set of critical requirements to satisfy. For example, a pipeline may need to be able to quickly react to changes in say, database objects, schemas, and more. They should also be reliable and minimize pipeline downtime as much as possible. They might also need to be modern enough to easily integrate with newer approaches like being able to capture streaming data or incorporating a third-party tool into an existing data environment. These are just a few examples. There are countless more requirements we could think of. To satisfy requirements like these, data engineering teams need to be able to quickly evolve data pipelines by deploying changes to them in a fast, but also safe and reliable way. DevOps best practices help engineering teams do exactly this. DevOps practices are common in the world of software development, and more and more they're finding their way into the field of data engineering. Okay. Enough of the high level, exactly which DevOps practices am I referring to? The first, source control and collaboration. Software engineering teams have employed a source control for decades. This helps them iteratively improve software as well as maintain a source of truth and log for all changes. This pattern is increasingly finding its way into the building of data pipelines as well. This helps teams keep track of the immense amount of changes to pipeline objects, logic, and much more. Next, declarative management of code. This means being able to incrementally update code without requiring time-intensive or error-prone procedures to advance or roll back source code. Coupled with source control, this is a powerful way of managing changes in data pipelines. We'll dive into much more detail on this in an upcoming video. The third is automation, specifically automation around testing and deployment. This is frequently known as continuous integration and continuous deployment, also known as CI/CD. We'll get into the details behind this concept in an upcoming video. For now, it's enough for you to know that teams must be able to continuously deploy changes to their software or their pipelines. This is so that software and data pipelines are able to maintain high up times while also reflecting the newest changes. The practice of automated deployments allows them to do this, meaning they can quickly and safely test, rollout, and roll back changes as needed. And finally, tooling. It would be a pain to manually have to do all these things, and one important aspect of DevOps is that teams are able to move quickly. They do this by using tooling that allows them to incorporate all of these best practices. Oftentimes, this means teams are using a set of special tools to achieve these outcomes, like product specific command line interfaces as an example. There are many more best practices in the world of DevOps, but these are the core ones that we'll touch on in this course. These practices allow data engineering teams to build resilient, reliable, and modern data pipelines. Join me in the next video to learn about how Snowflake is supporting DevOps practices.

Contents