An Analysis Of Popular Open Datasets

Arthur Gron
September, 2017.

I did a quick analysis of what are some of the most popular open datasets that people are downloading. This is part of a Budgetpedia program to compare Toronto to other cities. Here I am trying to determine what variables people would be most interested in contrasting Toronto with other cities.

Most Popular Open Datasets
1) Police
2) Building Permits
3) GIS
4) Social Services
5) Employee Earnings
6) Crime
7) Business Database
8) Waste
9) Traffic
10) 311
11) Public Transport
12) Parking
13) Food Inspection
14) Budgeting
15) Weather
16) Tree Map
17) Property Tax
18) Lobbyist
19) Bicycling
20) Vehicle Collisions

This was not a scientific study so the results should be taken with a high level of skepticism.

A couple of trends.

Police stats were the most popular but this could be skewed because many of the datasets in Houston that are popular are police datasets. This could be just some oddity with the way Houston collects stats. Chicago also collected many crime stats. It may also be that there is an effort to collect crime and police stats, quantify those items, hence a bit of an over representation. But even so, there does seem to be much demand for them.

There was also a demand for GIS datasets such as road maps. This may be a biases, as those interested enough in downloading open data are are people with the knowledge to build mapping tools, and hence not represent the general public.

311 is a phone tool that is available in N. America, and the data is primed for quantification. This may explain its popularity. If the 311 service was available outside of N. America it might be an even more popular data download.

Employee earnings was a popular download. In my sample it was even more popular than such basic things as business databases, and data regarding public transportation.

Datasets regarding taxes and budgets do not get as much attention as I thought they would. Perhaps because the budget is released as a pdf document.

Tree maps and bicycle infrastructure do make an appearing in the top 20. This could be because such data is easy to plot using lat./long. information.


I first selected 16 cities (Toronto plus 15 that are about the same size);
1 Toronto
2 Dublin
3 Atlanta
4 Sydney
5 Los Angeles
6 Melbourne
7 Miami
8 Chicago
9 Boston
10 San Francisco
11 Washington
12 Manchester
13 Dallas
14 Houston
15 Edinburgh
16 Auckland

Of these Atlanta, Washington, Manchester, and Auckland, it was too difficult to sort the open datasets by popularity and were excluded. For Dublin I used the national open dataset in Ireland to fill in the for the municipal open data. For Sydney I used the data from New South Wales. The open data office at the City of Toronto emailed popularity of the open datasets as the website does not list it in such order.

I then found the top ten downloaded datasets. 11 cities X 10 datasets each was 110 datasets in total. Next step was to categories the datasets into a handful of categories, this took a couple of iterations.


Chicago Data Portal. (2017). City of Chicago. Retrieved from

Dallas Open Data. (2017). City of Dallas. Retrieved from

Data Boston. (2017). City of Boston. Retrieved from

Datasets. (2017). The City of Edinburgh Council. Retrieved from

Department of Public Expenditure and Reform, Open Data Unit. (n.d.). Total Number of Datasets. Dublin, Ireland. Retrieved from

Los Angeles Open Data. (2017). City of Los Angeles. Retrieved from

Melbourne Data. (2017). City of Melbourne. Retrieved from

Miami-Dade County's Open Data Portal. (2017). Miami-Dade County. Retrieved from

Open Data. (2017). Atlanta Regional Commission. Retrieved from

Open Data Stats. (2017). DataSF. Retrieved from

OpenGov NSW. (2017). Datasets. New South Wales, Austrailia. Retrieved from

Rayes, J. (2017, September 12). RE: Sorting the Open Data Catalogue.


A spreadsheet with the data I used is available, this a Gnumeric file.