My Projects

Selected Work

How Much Is Your House Worth?

Data-Driven Real Estate Prediction

This project builds a machine learning-powered web application to predict housing prices in Bengaluru based on key property features such as location, square footage, number of bedrooms (BHK), and bathrooms. Using a dataset of over 13,000 real estate listings, the project involves end-to-end data science workflow, including data cleaning, feature engineering, model building, and deployment. Key steps include data preprocessing (cleaning location data, handling missing values, and removing irrelevant features), feature engineering (extracting BHK from textual data and one-hot encoding categorical variables), outlier detection & removal, model building using regression algorithms, and deployment using Flask API to serve real-time predictions. The final system allows users to input property details and receive instant price predictions via a web interface.

13,000+ Listings
~240+ Locations
Real-Time Flask Predictions
Key Insights
  • Location is the most influential factor, with price variations significantly driven by neighborhood demand and accessibility (modeled using one-hot encoding of 200+ locations).
  • Price increases proportionally with square footage, making total area one of the strongest continuous predictors in the model.
  • Number of bedrooms (BHK) and bathrooms directly impact pricing, with higher configurations consistently associated with premium valuations.
  • Data cleaning improved model reliability, including removal of irrelevant features like availability and balcony, and standardization of location names to eliminate inconsistencies.
  • The deployed Flask API enables real-time predictions, allowing users to dynamically estimate house prices based on custom inputs.
  • The model architecture supports scalable deployment, with serialized model artifacts (pickle) and structured feature inputs enabling efficient inference.
Machine LearningRegressionFlask API Feature EngineeringModel DeploymentScikit-learn

What Drives Medical Costs?

Machine Learning Analysis

This project applies machine learning and exploratory data analysis (EDA) to predict individual healthcare insurance costs based on demographic and lifestyle factors. Using a dataset of over 1,300 insurance records, the analysis identifies how variables such as age, BMI, smoking habits, family size, and region impact medical expenses. The workflow includes data cleaning and preprocessing (handling duplicates, encoding categorical variables), exploratory data analysis to uncover patterns in healthcare costs, feature engineering using label encoding and one-hot encoding, model building using Random Forest Regression and evaluation using MAE, RMSE, and R² metrics. The final model provides accurate cost predictions, enabling better financial planning for individuals and more informed pricing strategies for insurance providers.

$32K vs $8.4K Smokers vs Non-Smokers
0.88 R² Score
$4.6K RMSE
Key Insights
  • Smoking is the strongest cost driver, with smokers incurring ~4x higher medical expenses ($32,050 vs $8,440), making it the most influential predictor in the model.
  • Age and BMI show positive correlations with healthcare costs, with age having a stronger relationship (r ≈ 0.30) compared to BMI (r ≈ 0.20), indicating increasing costs with aging and higher body mass.
  • The Random Forest model achieved an R² score of 0.88, demonstrating high predictive accuracy and strong capability in capturing complex cost patterns.
  • Prediction error (RMSE ≈ $4,647) indicates the model can estimate insurance costs with relatively low deviation compared to actual charges.
  • Minimal gender-based cost variation was observed, suggesting that lifestyle and health factors (like smoking and BMI) are far more significant drivers than demographic attributes like gender.
  • Regional differences had limited impact on costs after encoding, indicating that individual health and behavior factors dominate geographic influence.
PythonMachine LearningRandom Forest EDAFeature EngineeringScikit-learn SeabornMatplotlib

Optimizing Oncology Operations

Data-Driven Performance Analysis

This project analyzes operational and financial performance at a multi-location oncology practice to uncover key drivers of patient volume, revenue, and provider efficiency. Using Power BI dashboards and KPI-based analysis, the project evaluates how changes in staffing — specifically the addition of a new oncologist and the retirement of two providers — impacted overall clinic performance. The analysis focuses on four critical healthcare KPIs: patient volume (unique patients & visits), new patient growth, drug administration activity, and payer mix and revenue contribution. By combining time-series analysis, provider-level benchmarking, and location-based insights, the project identifies performance gaps, growth opportunities, and strategic recommendations for improving both clinical operations and financial outcomes. The dashboard enables stakeholders to monitor monthly and quarterly trends in patients and revenue, compare provider performance against targets, analyze location efficiency and revenue generation, and evaluate the impact of staffing changes on business performance.

$14.5M Total Annual Revenue
8,267 Unique Patients
48K Total Visits
Key Insights
  • Provider 2 led overall performance, generating ~$1.8M in revenue and treating the highest number of patients, making them the benchmark for operational efficiency.
  • Provider 5 drove the highest new patient growth (653 patients), contributing significantly to future revenue pipeline and practice expansion.
  • The addition of Provider 1 partially offset the loss of two retiring providers, but a noticeable drop in patient volume followed their departure, highlighting dependency on high-performing physicians.
  • Location 1 dominated in patient volume and revenue (~$7.28M), while Location 3 achieved the highest revenue per patient, suggesting higher-value treatments or payer mix differences.
  • Medicare accounted for ~64% of revenue, indicating reliance on lower-reimbursement payers and potential opportunity to optimize payer mix.
  • Significant variation in provider performance vs targets (new patients & chemo administrations) reveals inefficiencies and opportunities for standardizing best practices across providers.
Power BIHealthcare AnalyticsDAX Time Series AnalysisOperational Analytics

Decoding Airline Passenger Satisfaction

Data-Driven Customer Experience Analysis

This Tableau analytics project investigates the key drivers of airline passenger satisfaction using verified airline review data. The analysis combines exploratory data analysis, correlation analysis, and interactive dashboards to identify how factors such as seat comfort, cabin service, food quality, in-flight entertainment, and delays influence overall passenger experience. Using Tableau dashboards and statistical exploration, the project analyzes passenger feedback across aircraft types, cabin classes, and travel routes to uncover patterns in satisfaction and complaint behavior. The dashboard enables users to compare satisfaction across Economy, Premium Economy, and Business cabins, identify which service factors most strongly influence passenger ratings, analyze complaint patterns and service gaps, and explore correlations between comfort metrics and customer loyalty.

45% Complaints from Delays
IFE Top Complaint Driver
Economy Cabin Lowest Satisfaction
Key Insights
  • 45% of passenger complaints were associated with flight delays and operational inefficiencies, highlighting operational reliability as the largest driver of dissatisfaction.
  • In-flight entertainment (IFE) and seat comfort showed the highest correlation with overall satisfaction scores, indicating that cabin experience significantly impacts passenger perception.
  • Economy class passengers reported the lowest satisfaction levels, particularly on large aircraft such as the A380, where seat comfort ratings were substantially lower.
  • Seat comfort demonstrated a strong positive correlation with Net Promoter Score (r ≈ 0.72), suggesting that improvements in seating experience could significantly increase passenger loyalty.
  • Business class passengers showed nearly 3× higher retention intent, indicating premium service quality plays a critical role in long-term customer loyalty.
TableauEDACorrelation Analysis NPS AnalysisCustomer AnalyticsDashboard Design

Verified Credentials

Industry certifications validating expertise in ETL, machine learning, and data visualization.

Alteryx Designer Core Certification
Alteryx
Verified
ALTX

Alteryx Designer Core

Alteryx Academy

Demonstrates proficiency in building end-to-end ETL workflows, data blending across multiple sources, spatial analytics, and predictive tool deployment within the Alteryx Designer platform.

Issued October 24, 2024
Expires October 24, 2026
ETL WorkflowsData Blending Predictive ToolsSpatial Analytics
Alteryx Machine Learning Fundamentals
Alteryx
Verified
ALML

Alteryx Machine Learning Fundamentals

Alteryx Academy

Covers model building, evaluation metrics, and deployment using Alteryx Intelligence Suite. Focuses on practical ML workflows including classification, regression, clustering, and time-series forecasting — all without writing code.

Platform Alteryx Intelligence Suite
Level Fundamentals
Model BuildingEvaluation Metrics ClassificationClusteringForecasting
Tableau Desktop Foundations
Tableau
Verified
TBLO

Tableau Desktop Foundations

Tableau / Salesforce

Validates skills in data visualization, dashboard design, and data storytelling using Tableau Desktop. Covers connecting to data sources, building interactive charts, Level of Detail (LOD) expressions, and publishing to Tableau Server/Public.

Issued August 07, 2025
Credential ID 6653705
Data VisualizationLOD Expressions Dashboard DesignData Storytelling

Have a Data Challenge
Worth Solving?

From predictive models to executive dashboards — let's figure out the right solution together.

Get In Touch