Project summary

The goal of this project is to build a model that predicts tip amount for a new ride sharing company in NYC based on the New York taxi data. The report is consisted of three parts:
  • Data exploration and cleaning
  • Data Summary
  • Model Building


Data

The data required is stored in the data folder and it is downloaded from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page under 2017 Yellow Taxi Trip Records for March, June and November. The whole dataset consists of approximately 3 million observations. Accompanied is the data dictionary that describes the data set.


Result

The model used is linear regression model and it achieves an R-squared of 0.75. (The result of R-squared might be different if you are using RStudio 3.6.0)
You can find the code of this project here.


Visualization

Here is a visualization of this project in Tableau. You can find it here.