Hi, I'm Parth Chauhan

A Microsoft Certified

Analytics & Data Engineer

Building scalable data pipelines and infrastructure to transform raw data into actionable insights.

About Me

Profile

Transforming Data into Value

A data engineer who loves turning complex data challenges into streamlined, scalable solutions. With expertise in building end-to-end data pipelines, architecting cloud infrastructures, and mastering cutting-edge tools like Microsoft Fabric and Delta Lake, I craft reliable data foundations that empower smarter decisions. Always curious, driven by a passion to innovate and make data work smarter — not harder.

Data Pipeline Development
Cloud Infrastructure
Big Data Frameworks
Python & SQL Expertise
Download Resume

Technical Skills

Programming

Python 90%
SQL 90%

Data Tools

Microsoft Fabric 90%
Apache Spark 90%
Delta Tables 90%

Cloud

Azure 90%
AWS 70%

Concepts

Pyspark Optimization 90%
Medallion Architecture 90%
Data Modelling 90%

Featured Projects

Microsoft Fabric

Earthquake Data Pipeline

Developed and optimized scalable data pipelines on Microsoft Fabric using Apache Spark and Delta Tables for efficient, reliable, and high-performance data processing.

Microsoft Fabric Apache Spark Delta Tables
View Details
Google Cloud Platform

Data Pipeline Development on Azure

Designed and implemented scalable data pipelines using Azure Data Factory and Azure Databricks to ingest, transform, and store data in ADLS Gen2 for advanced analytics.

Azure Databricks Azure Datafactory ADLS Gen2
View Details
Gnews API Project

Gnews API Project

Engineered scalable news ingestion and transformation pipelines in Microsoft Fabric using Dataflows, PySpark Notebooks, and Delta Lake on ADLS Gen2 to support sentiment analysis and real-time reporting.

Data Ingestion Data Transformation Data Modelling
View Details
×

All Projects

Data Engineering Project

Query Garden

Curated a well-organized digital garden in Obsidian using HTML to showcase comprehensive solutions and insights for a wide array of SQL problems.

SQL HTML Obsidian
View Details
Data Visualization Dashboard

Dataframe Diaries

Built a rich digital garden in Obsidian with HTML to break down and explain advanced PySpark concepts through practical examples and structured notes.

Pypark HTML Obsidian
View Details
Machine Learning Pipeline

ML Pipeline Automation

Automated machine learning pipeline for predictive analytics and model deployment.

Python Scikit-learn MLflow
View Details
Real-time Streaming

Real-time Data Streaming

Real-time data processing pipeline using Kafka and Spark Streaming.

Kafka Spark Streaming Flink
View Details
Data Governance

Data Governance Framework

Implemented data governance policies and metadata management system.

Collibra Data Catalog Data Quality
View Details
Data Lake

Enterprise Data Lake

Designed and implemented a scalable enterprise data lake architecture.

ADLS Gen2 Delta Lake Medallion
View Details

Get In Touch

Let's Talk About Your Data Needs

Whether you're looking to build a new data infrastructure, optimize existing pipelines, or implement advanced analytics, I'd love to discuss how I can help.