More Cool Public Datasets and Lots of Ideas for Exploring Them

A few weeks ago, Google announced a new feature, Dataset Search, that aims to make it easier for anyone to discover public datasets. Journalists, analysts, and data nerds will likely find Dataset Search useful: it uses metadata to help people discover different versions of a dataset and publications that reference it.

In the same spirit of encouraging data discovery and exploration, today we’re once again featuring five interesting public datasets that live in the Mode Public Warehouse. To take them for a spin in Mode, sign up for a free account and start exploring.

To help you get started, below we’ve provided a brief description of each dataset, along with some questions you might ask and interesting visualizations you could make for each. Let’s dive in!


NBA data, 2017-18 seasons

Sports and data viz. It’s a classic pairing by now.

That’s why there’s a LOT of sports data to explore in the Mode Public Warehouse. For all you basketball fans out there, we’ve pulled together a series of the most interesting data about the 2017-18 NBA season. The tables assembled below will give you a window into player stats, salaries, elo ratings (which measure the strength of a team’s record), and even players’ activities on Twitter.

What kinds of questions can I ask?

  • Are there differences between positions— point guard, center, and power forward—in number of free throws, field goals, and assists?
  • What are some of the highest-paid positions in the league?
  • Does elo rating correlate with team attendance? How have elo ratings for a given team changed over time? For example, for the Sacramento Kings during the greatest show on court era?
  • What are the odds that a particular team will win based on previous performance?
  • Which players are the most popular on Twitter? How does popularity relate to their performance, efficiency, or pay?

What are the tables called?
victor_low.nbaallelo, marquette_mscs.duo_player_shooting_stats_17_18, marquette_mscs.nba_2017_players_stats_combined, marquette_mscs.duo_player_salary_17_18, jlburkhead.nba_stats, ahill3233.nba_2016_regular_season

What kinds of cool visualizations can I make?

  • Playoff shot charts, like the one above for Ben Simmons, or this one for LeBron James.
  • Density plots and hexbins for fancier shot charts.
  • Histograms or pairwise scatterplots comparing player stats.


Hot springs & swimming hole data

Now that fall is setting in here in the US, you might be wistfully putting away your swimming gear til next season rolls around. But wait! When it’s cold outside, make like these monkeys and find a body of hot water to lounge in. This hot springs and swimming hole dataset has you covered.

You can also plan your next warm-weather vacation using this dataset to explore where to find swimming holes in the US. Both tables include latitude and longitude data, so you can create interactive maps to share with your friends. Have fun and send us a postcard.

What kinds of questions can I ask?

  • Where can I find natural hot springs near my home? Or near my next vacation spot?
  • Just how hot are these hot springs? How cool are these swimming holes? (Temperature data included where available–but procede with caution, because some of them are over 200 degrees Fahrenheit.)
  • How many swimming holes are in my home state and where are they?

What are the tables called?
benn.us_hot_springs_and_swimming_holes and benn.us_hot_springs

What kinds of cool visualizations can I make?

  • Maps, maps, and more maps. For example: maps of swimming holes in the US, or your state or region.
  • Or, heat maps showing the temperatures of hot springs (like the map above), so you can pick your preferred hot-or-cold plunge experience.


Beer Advocate data

Beer Advocate is home to a thriving online beer community, where millions of enthusiasts and professionals gather to trade, discuss, and review beer. Founded in 1996 as a clearinghouse for reviews, the site now sources reviews and discussions from a huge community of users.

For each review, this dataset includes extensive information like about things like beer style, reviewer, scores for appearance, aroma, palate, taste, and overall quality. In the sample report embedded just below, we’ve joined the BA dataset with geographic information about breweries in the US, for some added texture.

What kinds of questions can I ask?

  • What are the highest-rated beers over time? The lowest-rated? The breweries with the highest-rated beers? The states with the highest-rated beers?
  • How many unique beers are there in the reviews? How many different styles of beer are there? How many people have reviewed beers on BA since the 90s?
  • Which states have had the most growth in breweries since the 1980s? Which states (if any) have seen a decline in beer brewing over the last 30 years?

What are the tables called?
marquette_mscs.mjk_beer_advocate and marquette_mscs.modified_brewery_counts

What kinds of cool visualizations can I make?

  • Maps! For example, showing which states have the most breweries or the highest-rated beers.
  • Or, maps showing the growth of breweries by state over time.
  • Wordclouds of reviews for a given beer, brewery, style, reviewer, etc.
  • Chord diagrams clustering beers together by aroma, style, palette, etc.


State and national parks data

If you’re the outdoorsy type (or even if you prefer glamping to camping), this dataset about state and national parks in the US is stuffed with useful information. It starts with the basics— park name, website, latitude and longitude info, and state— so you can make maps. There are also columns about the availability of a wide variety of activities: camping, hiking, fishing, hunting, picnicking, swimming, water sports, birdwatching, and winter activities.

Or maybe you want to analyze reviews or learn more about the history of the parks? You can use this dataset as a jumping off point for that, too—there are links to the parks’ Yelp, Tripadvisor, and Wikipedia pages. Happy trails.

What kinds of questions can I ask?

  • Which states or regions have the most parks with birdwatching? Or with camping or hiking? Or with camping AND hiking?
  • Where can I find fishing, water sports, winter activities, camping, etc. near me?
  • Where should I go for my next vacation? 🤔

What’s the table called?
sethteicher.crowdflower_final_state_national_park_output_aug_19_2014

What kinds of cool visualizations can I make?

  • Maps! Of activities by state or region.
  • Join this dataset with the hot springs and swimming holes data to make maps of places to take a dip near campsites, winter activities, or picnic spots.
  • Wordclouds of review data.


Fivethirtyeight risk survey data

Back in 2014, the data nerds over at the Fivethirtyeight culture desk decided to satisfy a personal curiosity. An analyst at their culture desk, Walt Hickey (now of Numlock), wanted to know: “why do people order their steak well-done?” He had a hunch that it might be because of a general tendency to avoid risks. Maybe, he reasoned, “one reason people prefer tougher, less juicy steaks is the threat of food-borne illness”

So, Fivethirtyeight conducted a survey about Americans’ attitude towards risk, and included a question about…steak. The survey had questions about Americans’ participation in all different kinds of risky activities, from speeding and gambling to smoking and cheating on their significant others. The results? Let’s just say that eating steak well-done does is not just for the faint of heart. We’ll leave it at that, and let you explore this bizarre and interesting dataset yourself.

What kinds of questions can I ask?

  • Is there a correlation between eating steak rare and speeding? What about between smoking cigarettes and eating rare steak? Between eating steak rare and cheating on your significant other? No? Nothing?
  • Why did I ask these ^ questions anyway?
  • More importantly, how do Americans who live in different regions like their steak? (See the embedded report above for our analysis.)

What’s the table called?
cooldata.fivethirtyeight_steak_survey

What kinds of cool visualizations can I make?

  • Scatterplots of linear regressions showing correlations between education, household income, region, age, etc. and various risky behaviors.
  • Mosaic plots) exploring correlations. (For example, do eating steak and cheating on one’s significant other correlate?)
  • Maps or bar charts of risky behavior by region: speeding, gambling, smoking, etc.

Ideas for More?

Inspired to do something fun with one of these datasets? Send us a link to your project on Twitter, and we’ll share some of the best work!

Looking for even more fun data? Check out this post featuring five other public datasets. From UFOs and campaign finance, there’s something for everyone. Happy analyzing.