Skip to content

Keyur23/Azure-Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Azure Data Engineering 🌐

Welcome to the Azure Data Engineering repository! πŸš€ In this repository, I have implemented various solutions utilizing Azure services to create a complete ETL pipeline. From data ingestion to transformation and analytics, this repository demonstrates the full lifecycle of data engineering on Azure.

Overview πŸ“Š

This project showcases a data pipeline leveraging the following Azure services:

  • Data Source πŸ“₯: The raw data is sourced from various datasets.
  • Data Ingestion πŸ”„: Azure Data Factory (ADF) is used to orchestrate data ingestion processes.
  • Raw Data Storage πŸ’Ύ: Data is stored in Azure Data Lake Gen 2 for scalable and secure storage.
  • Data Transformation πŸ”„: Azure Databricks is used for transforming the data using Spark.
  • Analytics πŸ“ˆ: Azure Synapse Analytics is leveraged for powerful analytics and querying.
  • Visualization πŸ“Š: Power BI is used to create insightful dashboards for reporting.

The entire process follows a typical ETL (Extract, Transform, Load) pipeline pattern.

Projects πŸ—‚οΈ

This repository contains the following Projects:

1. FIFA Analysis Using Azure Services

  • This FIFA Data Engineering project leverages Azure cloud services to process and analyze FIFA datasets efficiently. Using an Azure Storage Account, the raw data is stored in dedicated containers before being ingested into Azure Data Factory, where a pipeline is created to automate data movement. The data is then processed using Azure Data Lake Storage Gen2 and transformed with Azure Databricks for cleaning and structuring. Finally, Azure Synapse Analytics and SQL are utilized for advanced analysis, enabling comprehensive insights into FIFA datasets. This project ensures a seamless, scalable, and efficient data workflow for FIFA-related analytics.

(Note: In FIFA Transformation.ipynb, client id, tenant id and secret key are removed by me for security reasons.)

Azure Services Used πŸ’‘

  • Azure Data Factory (ADF) πŸ”„: Orchestration and automation of data workflows.
  • Azure Data Lake Gen 2 πŸ—„οΈ: Storage solution for storing raw and processed data.
  • Azure Databricks πŸ”₯: Big data processing and transformation using Apache Spark.
  • Azure Synapse Analytics πŸ“Š: Data warehousing and analytics solution for performing queries on large datasets.
  • Power BI πŸ“ˆ: Business intelligence tool for visualizing and sharing insights with interactive dashboards.

Features 🌟

  • Complete ETL pipeline from data ingestion to transformation and visualization.
  • Uses Spark for large-scale data processing in Azure Databricks.
  • Azure Synapse Analytics to run complex queries and extract insights.
  • Power BI dashboards for visualizing key metrics and trends.

Getting Started βš™οΈ

Prerequisites πŸ“

To run this project, you'll need:

  • An Azure account with necessary permissions.
  • Access to Azure Synapse Analytics, Azure Data Factory, Azure Databricks, Azure Data Lake Gen 2, and Power BI.
  • Azure resources set up to mirror the project architecture.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors