A quick take on Capital BikeShare data

What’s even more fun than riding a BikeShare bike down the Mall on a sunny afternoon? Analyzing the data about BikeShare bikes! It’s available to the public and easy to get, and BikeShare actively encourages the curious to look kick the tires and under the hood (er, basket) of their data: ergo, the existence of Bikeshare Hack Night, which combines the panache of data science with the dernier cri of The Black Cat.

We looked at data from the 4th quarter of 2017. Here’s what we found:

How long were most bike trips?

  • There were more than 800,000 bike trips during this time period.
  • The average trip was about 16 minutes long, and 75% of trips were less than 18 minutes long. That makes sense — if you were planning a long trip, you probably don’t want to use a BikeShare bike (they’re heavy, slow, and you have to pay for trips longer than 30 minutes).
  • The shortest trip was 1 minute, and the longest trip was 1,437 minutes (or 24 hours) long!
count    815370.000000
mean 16.561572
std 31.220295
min 1.000417
25% 6.339492
50% 10.736758
75% 18.121937
max 1437.924233

Which stations saw the most trip starts?

The bike dock at Columbus Circle was the clear winner, with over 16,000 bike trips starting from this location. That makes sense — hop off the train, and get on your bike! The other stations in the top 10 didn’t even come close to the Columbus Circle bike dock. Most of them were clustered downtown, with the outliers being the New Hampshire Ave and Easter Market stations.

Image for post
Image for post

How long were most trips?

So we know that the average trip was about 16 minutes. Let’s visualize that: the great majority of trips are below the mean, but there’ s a long tail of trips that are as long as 60 minutes (for this visualization, we removed the outliers greater than 60 minutes — they were a tiny fraction of the whole dataset.

Image for post
Image for post

Do frequently-used stations have shorter trips?

We guessed that stations with heavy use (like the one at Columbus Circle) would probably have shorter trips originating from that spot. We visualized using a scatter plot (y is the length of the trip, and x is the number of trips). Most stations were clustered around the means: they have less than 5000 trips, and most trips were about 12 minutes long. We can see that Columbus Circle is a huge outlier on the far right, with over 16,000 trips, but the average length of those trips is right at the mean (about 16 minutes). There are also a few outliers at the top, suggesting that stations with a high average trip tend of have fewer riders. All in all, the scatter plot did not confirmed our guess that high-frequency stations have shorter trips. Instead, it suggested that trips from high-frequency stations are about the average length of all trips. When you think about it, that makes sense, since the average trip-length of the whole dataset is heavily influenced by the average trip-length of the high-frequency stations.

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store