Project summary

The goal of this project is to visualize the Broadway data set with Tableau and extract useful insights from the dashboard.

Personally I am a Broadway fan, every time I told people this hobby, their first response is "Hamilton". Yes I would like to watch it live if I have 500 dollars. However, despite its high price, it's still hard to secure a ticket. Richard Rogers Theatre must be the most profitable theatre since it can make tons of money out of Hamilton. Is this true? So I downloaded a broadway dataset, which contains all the broadway shows information ranging from 1990 to 2016. Plus, I'm also interested in early 2000s shows. Things must be very different 20 years ago.
Then I divided this project into two parts:
  • data cleaning with Python
  • data visualization with Tableau

Data Cleaning

I don't expect my data to be perfect, so I inspected it first. Here are the steps I performed during data cleaning: First I imported pandas and read the dataset as dataframe. Then I get a glimpse of the first several rows and printed the summary of the data. The data has 12 columns and I'm only going to use a few of them. Since I'm interested in finding the most profitable theatre, so I will use capacity or gross potential as the key metric.

Capacity means the percentage of seats that are filled with customers for that particular week. If the tickets sell very well (like Hamilton, they even sell standing room tickets), then the capacity would be very high, even a number higher than 100. But if the tickets don't sell well, then we might expect a capacity of 50, 60. However, some theatres are very small (they only has two floors). Then these theatres can also get a high capacity because it only can hold a small amount of people. When I go to see the play 'Angels in America'(Neil Simon Theatre), not all the seats are filled. That doesn't mean this is not a good play, a part of the reason is that the theatre is quite huge (three floors and 'Cats' was in there for decades). So I don't think using capacity as the key metric is a good idea.

On the other hand, gross potential would be a perfect metric. First, gross means gross profit for that particular week. And gross potential here means the week over week change in gross profit. So I am going to use this as a metric to see if a theatre is profitable.

Then after I inspecting the data, I deleted some incorrect data: such as rows with zero performance that week and over 120 capacity observations. I also checked for missing values and removed duplicate rows.

Since I'm interested in shows that run over a year (running longer means the theatre is more stable), I applied a filter to filter only those fit the requirement. Then I sort over their average gross potential and get a list of theatres with top 10 average gross potential.

Data Visualization

Then here comes my Tableau visualization. You can view it below or go to this link.