My Projects
Selected Work
How Much Is Your House Worth?Data-Driven Real Estate Prediction
This project builds a machine learning-powered web application to predict housing prices in Bengaluru based on key property features such as location, square footage, number of bedrooms (BHK), and bathrooms. Using a dataset of over 13,000 real estate listings, the project involves end-to-end data science workflow, including data cleaning, feature engineering, model building, and deployment. Key steps include data preprocessing (cleaning location data, handling missing values, and removing irrelevant features), feature engineering (extracting BHK from textual data and one-hot encoding categorical variables), outlier detection & removal, model building using regression algorithms, and deployment using Flask API to serve real-time predictions. The final system allows users to input property details and receive instant price predictions via a web interface.
- Location is the most influential factor, with price variations significantly driven by neighborhood demand and accessibility (modeled using one-hot encoding of 200+ locations).
- Price increases proportionally with square footage, making total area one of the strongest continuous predictors in the model.
- Number of bedrooms (BHK) and bathrooms directly impact pricing, with higher configurations consistently associated with premium valuations.
- Data cleaning improved model reliability, including removal of irrelevant features like availability and balcony, and standardization of location names to eliminate inconsistencies.
- The deployed Flask API enables real-time predictions, allowing users to dynamically estimate house prices based on custom inputs.
- The model architecture supports scalable deployment, with serialized model artifacts (pickle) and structured feature inputs enabling efficient inference.
What Drives Medical Costs?Machine Learning Analysis
This project applies machine learning and exploratory data analysis (EDA) to predict individual healthcare insurance costs based on demographic and lifestyle factors. Using a dataset of over 1,300 insurance records, the analysis identifies how variables such as age, BMI, smoking habits, family size, and region impact medical expenses. The workflow includes data cleaning and preprocessing (handling duplicates, encoding categorical variables), exploratory data analysis to uncover patterns in healthcare costs, feature engineering using label encoding and one-hot encoding, model building using Random Forest Regression and evaluation using MAE, RMSE, and R² metrics. The final model provides accurate cost predictions, enabling better financial planning for individuals and more informed pricing strategies for insurance providers.
- Smoking is the strongest cost driver, with smokers incurring ~4x higher medical expenses ($32,050 vs $8,440), making it the most influential predictor in the model.
- Age and BMI show positive correlations with healthcare costs, with age having a stronger relationship (r ≈ 0.30) compared to BMI (r ≈ 0.20), indicating increasing costs with aging and higher body mass.
- The Random Forest model achieved an R² score of 0.88, demonstrating high predictive accuracy and strong capability in capturing complex cost patterns.
- Prediction error (RMSE ≈ $4,647) indicates the model can estimate insurance costs with relatively low deviation compared to actual charges.
- Minimal gender-based cost variation was observed, suggesting that lifestyle and health factors (like smoking and BMI) are far more significant drivers than demographic attributes like gender.
- Regional differences had limited impact on costs after encoding, indicating that individual health and behavior factors dominate geographic influence.
Optimizing Oncology OperationsData-Driven Performance Analysis
This project analyzes operational and financial performance at a multi-location oncology practice to uncover key drivers of patient volume, revenue, and provider efficiency. Using Power BI dashboards and KPI-based analysis, the project evaluates how changes in staffing — specifically the addition of a new oncologist and the retirement of two providers — impacted overall clinic performance. The analysis focuses on four critical healthcare KPIs: patient volume (unique patients & visits), new patient growth, drug administration activity, and payer mix and revenue contribution. By combining time-series analysis, provider-level benchmarking, and location-based insights, the project identifies performance gaps, growth opportunities, and strategic recommendations for improving both clinical operations and financial outcomes. The dashboard enables stakeholders to monitor monthly and quarterly trends in patients and revenue, compare provider performance against targets, analyze location efficiency and revenue generation, and evaluate the impact of staffing changes on business performance.
- Provider 2 led overall performance, generating ~$1.8M in revenue and treating the highest number of patients, making them the benchmark for operational efficiency.
- Provider 5 drove the highest new patient growth (653 patients), contributing significantly to future revenue pipeline and practice expansion.
- The addition of Provider 1 partially offset the loss of two retiring providers, but a noticeable drop in patient volume followed their departure, highlighting dependency on high-performing physicians.
- Location 1 dominated in patient volume and revenue (~$7.28M), while Location 3 achieved the highest revenue per patient, suggesting higher-value treatments or payer mix differences.
- Medicare accounted for ~64% of revenue, indicating reliance on lower-reimbursement payers and potential opportunity to optimize payer mix.
- Significant variation in provider performance vs targets (new patients & chemo administrations) reveals inefficiencies and opportunities for standardizing best practices across providers.
Decoding Airline Passenger SatisfactionData-Driven Customer Experience Analysis
This Tableau analytics project investigates the key drivers of airline passenger satisfaction using verified airline review data. The analysis combines exploratory data analysis, correlation analysis, and interactive dashboards to identify how factors such as seat comfort, cabin service, food quality, in-flight entertainment, and delays influence overall passenger experience. Using Tableau dashboards and statistical exploration, the project analyzes passenger feedback across aircraft types, cabin classes, and travel routes to uncover patterns in satisfaction and complaint behavior. The dashboard enables users to compare satisfaction across Economy, Premium Economy, and Business cabins, identify which service factors most strongly influence passenger ratings, analyze complaint patterns and service gaps, and explore correlations between comfort metrics and customer loyalty.
- 45% of passenger complaints were associated with flight delays and operational inefficiencies, highlighting operational reliability as the largest driver of dissatisfaction.
- In-flight entertainment (IFE) and seat comfort showed the highest correlation with overall satisfaction scores, indicating that cabin experience significantly impacts passenger perception.
- Economy class passengers reported the lowest satisfaction levels, particularly on large aircraft such as the A380, where seat comfort ratings were substantially lower.
- Seat comfort demonstrated a strong positive correlation with Net Promoter Score (r ≈ 0.72), suggesting that improvements in seating experience could significantly increase passenger loyalty.
- Business class passengers showed nearly 3× higher retention intent, indicating premium service quality plays a critical role in long-term customer loyalty.
Verified Credentials
Industry certifications validating expertise in ETL, machine learning, and data visualization.
Alteryx Designer Core
Alteryx AcademyDemonstrates proficiency in building end-to-end ETL workflows, data blending across multiple sources, spatial analytics, and predictive tool deployment within the Alteryx Designer platform.
Alteryx Machine Learning Fundamentals
Alteryx AcademyCovers model building, evaluation metrics, and deployment using Alteryx Intelligence Suite. Focuses on practical ML workflows including classification, regression, clustering, and time-series forecasting — all without writing code.
Tableau Desktop Foundations
Tableau / SalesforceValidates skills in data visualization, dashboard design, and data storytelling using Tableau Desktop. Covers connecting to data sources, building interactive charts, Level of Detail (LOD) expressions, and publishing to Tableau Server/Public.
Have a Data Challenge
Worth Solving?
From predictive models to executive dashboards — let's figure out the right solution together.
Get In Touch