SQL Data Exploration for BoardGameGeek datasets

SQL Logo

Based on the BoardGameGeek data fetched with the Python script (and this script) I’ve created before, I have made some SQL data exploration for the datasets that are available in the repository. You can check the SQL queries in the queries.sql file in this repo, or clone the repo to your machine and use the datasets for your own exploration needs.

For start I will find number of games published in each year, and which year have the highest number of published games.

SELECT year_published, COUNT(*) total_games
FROM bgg.items
WHERE type = 'boardgame'
	AND year_published <> '0'
	AND rating <> '0'
	AND year_published < 2024
	AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;

The output:

year_publishedtotal_games
20132539
20142941
20153158
20163439
20173619

Next exploration is about the average rating of games per year for the past 20 years:

SELECT year_published, ROUND(AVG(rating), 3) AS rating
FROM bgg.items
WHERE type = 'boardgame'
	AND year_published <> '0'
   	AND rating <> '0'
   	AND year_published < 2024
	AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;

The output:

year_publishedrating
20136.284
20146.289
20156.369
20166.517
20176.645

Followed by a discovery for the top 10 board game categories in the past 10 years:

SELECT categories, SUM(games) AS total_games
FROM (
    SELECT categories, COUNT(*) AS games
    FROM bgg.items AS items
    LEFT JOIN bgg.categories AS categories
    USING (game_id)
    WHERE year_published <> '0'
        AND categories <> '0'
        AND type = 'boardgame'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY categories
) AS category_counts
GROUP BY categories
ORDER BY total_games DESC
LIMIT 10;

The output:

categoriestotal_games
Card Game14860
Party Game5345
Wargame5129
Fantasy4928
Children’s Game4164

And also, beside the top categories, here are the 10 most popular mechanics in the past 10 years:

SELECT mechanics, SUM(games) AS total_games
FROM (
    SELECT mechanics, COUNT(*) AS games
    FROM bgg.items AS items
    LEFT JOIN bgg.mechanics AS mechanics
    USING (game_id)
    WHERE year_published <> '0'
        AND mechanics <> '0'
        AND type = 'boardgame'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY mechanics
) AS mechanics_counts
GROUP BY mechanics
ORDER BY total_games DESC
LIMIT 10;

The output:

mechanicstotal_games
Dice Rolling10461
Hand Management7833
Set Collection5305
Variable Player Powers4215
Cooperative Game4195

Next in the list are the top five categories in the past 10 years with total games per category / per year ratio

WITH top_categories AS (
    SELECT categories, COUNT(*) AS total_games
    FROM bgg.items AS items
    LEFT JOIN bgg.categories AS category
    USING (game_id)
    WHERE type = 'boardgame'
        AND year_published <> '0'
        AND rating <> '0'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY categories
    ORDER BY total_games DESC
    LIMIT 5
)
SELECT year_published, categories, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS category
USING (game_id)
WHERE type = 'boardgame'
	AND year_published <> '0'
	AND rating <> '0'
	AND year_published > 2012
	AND year_published < 2024
	AND categories IN (SELECT categories FROM top_categories)
GROUP BY year_published, categories
ORDER BY year_published

The output:

year_publishedcategoriestotal_games
2013Card Game817
2013Dice238
2013Fantasy234
2013Party Game269
2013Wargame325
2014Card Game1022

In addition, I’ve done the similar exploration for the top five mechanics in the past 10 years with total games per category / per year ratio:

WITH top_mechanics AS (
    SELECT mechanics, COUNT(*) AS total_games
    FROM bgg.items AS items
    LEFT JOIN bgg.mechanics AS mechanic
    USING (game_id)
    WHERE type = 'boardgame'
        AND year_published <> '0'
        AND rating <> '0'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY mechanics
    ORDER BY total_games DESC
    LIMIT 5
)
SELECT year_published, mechanics, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanic
USING (game_id)
WHERE type = 'boardgame'
	AND year_published <> '0'
	AND rating <> '0'
	AND year_published > 2012
	AND year_published < 2024
	AND mechanics IN (SELECT mechanics FROM top_mechanics)
GROUP BY year_published, mechanics
ORDER BY year_published

The output:

year_publishedmechanicstotal_games
2013Cooperative Game148
2013Dice Rolling635
2013Hand Management487
2013Set Collection327
2013Variable Player Powers260

Next are the top 15 most active publishers in the past 10 years,

SELECT publishers, COUNT(*) AS total_games
FROM bgg.items as items
LEFT JOIN bgg.publishers
USING (game_id)
WHERE publishers <> '0'
	AND year_published > 2012
	AND year_published < 2024
	AND type = 'boardgame'
	AND publishers NOT IN (
        '(Self-Published)',
        '(Web published)',
        'Inc.',
        'LLC',
        'Ltd.',
        '(Unknown)',
        '(Looking for a publisher)',
        '(Public Domain)'
    )
GROUP BY publishers
ORDER BY total_games DESC
LIMIT 15;

The output:

publisherstotal_games
Hasbro506
Pegasus Spiele468
Hobby World397
Korea Boardgames Co.384
Rebel Sp. z o.o.369

top 15 game designers in the past 10 years,

SELECT designers, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.designers
USING (game_id)
WHERE type = "boardgame"
	AND year_published > 2012
	AND year_published < 2024
	AND designers NOT IN (
		'0',
		'(Uncredited)',
		'Jr.'
	)
GROUP BY designers
ORDER BY total_games DESC
LIMIT 15;

The output:

designerstotal_games
Reiner Knizia203
Paul Rohrbaugh130
Joseph Miranda110
Prospero Hall109
Charles Darrow108

and the top 15 game artists in the past 10 years:

SELECT artists, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.artists
USING (game_id)
WHERE type = "boardgame"
	AND year_published > 2012
	AND year_published < 2024
	AND artists NOT IN (
		'0',
		'(Uncredited)'
	)
GROUP BY artists
ORDER BY total_games DESC
LIMIT 15;

The output:

artiststotal_games
Joe Youst168
Mark Mahaffey135
Ilya Kudriashov123
Klemens Franz109
Michael Menzel106

There are two more discoveries that I have made, one being the top 15 categories in the past 10 years that have the highest average rating (although, these are not ordered by the games published in these categories; some other categories have more games published)

SELECT categories, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS categories
USING (game_id)
WHERE type = 'boardgame'
	AND categories <> '0'
	AND year_published > 2012
	AND year_published < 2024
GROUP BY categories
ORDER BY average_rating DESC
LIMIT 15;

The output:

categoriesaverage_ratingtotal_games
World War II6.5041448
Vietnam War6.47396
Civilization6.437466
Renaissance6.382311
Civil War6.353199

And finally, the top 15 mechanics in the past 10 years that have the highest average rating:

SELECT mechanics, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanics
USING (game_id)
WHERE type = 'boardgame'
	AND mechanics <> '0'
	AND year_published > 2012
	AND year_published < 2024
GROUP BY mechanics
ORDER BY average_rating DESC
LIMIT 15;

The output:

mechanicsaverage_ratingtotal_games
Auction: English8.3482
Tags7.632
Neighbor Scope7.5578
Auction Compensation7.3742
Ratio / Combat Results Table7.374137