Project summary
The goal of this project is to build a model that predicts tip amount for a new ride sharing company in NYC based on the New York taxi data.
The report is consisted of three parts:
- Data exploration and cleaning
- Data Summary
- Model Building
Data
The data required is stored in the data folder and it is downloaded from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page under 2017 Yellow Taxi Trip Records for March, June and November. The whole dataset consists of approximately 3 million observations.
Accompanied is the data dictionary that describes the data set.
Result
The model used is linear regression model and it achieves an R-squared of 0.75. (The result of R-squared might be different if you are using RStudio 3.6.0)
You can find the code of this project here.
You can find the code of this project here.
Visualization
Here is a visualization of this project in Tableau.
You can find it here.