Spotify Rewind 2022

Author

Jun Ryu

Published

August 11, 2023

yes… I know I’m 8 months late.

This post is heavily inspired by David Sjoberg and their tutorial found here. Here, I will essentially walk through the tutorial but with my own listening data. Huge thanks to David!

Before we begin, I need to clarify some of the adjustments I made when collecting data. In David’s tutorial, the process to import Spotify data and attach tags through the last.fm API is thoroughly explained and one could easily replicate it if need be. However, I decided that since I will be categorizing by month (instead of seasons) and the monthly tag chart is already available on last.fm (but only with their subscription service, last.fm pro), I have pulled and manually created my own data. Due to this, I have skipped the first half of the tutorial.


As always, we import the necessary packages for data manipulation and visualization. Here, our focus is on the ggbump package, which will allow us to create a smooth bump chart in R.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr) 
library(ggplot2)
library(cowplot)
library(ggbump)

…and the following is the dataset that I have manually created:

gen <- read.csv("musicgenres-2022.csv")
head(gen)
    tags month_year order rank
1 ballad   Jan 2022     1    5
2 ballad   Feb 2022     2    5
3 ballad   Mar 2022     3    3
4 ballad   Apr 2022     4    3
5 ballad   May 2022     5    3
6 ballad   Jun 2022     6    5

Now, the rest is David’s code with minor changes to fit my own data:

Show the Code
# data manipulation
gen <- gen %>%
  group_by(tags) %>%
  mutate(first_top5 = min(order[rank <= 5]),
         d_first_top5 = if_else(order == first_top5, 1, 0)) %>%
  filter(!is.na(first_top5),
         order >= first_top5) %>%
  ungroup()

gen <- gen %>% 
  arrange(tags, order) %>% 
  group_by(tags) %>% 
  mutate(lag_zero = if_else(lag(rank) %in% c(6, NA) & rank <= 5, 1, 0, 0)) %>% 
  ungroup() %>% 
  mutate(group = cumsum(lag_zero))

# create custom palette
set.seed(567)
custom_palette <- c(RColorBrewer::brewer.pal(9, "Set1"),
                    RColorBrewer::brewer.pal(8, "Dark2")) %>% 
  sample(n_distinct(gen$tags))

# initial plot with highlighted groups
p <- gen %>% 
  ggplot(aes(order, rank, color = tags, group = tags)) +
  geom_bump(smooth = 15, size = 2, alpha = 0.2) +
  scale_y_reverse() 

p <- p +
  geom_bump(data = gen %>% filter(rank <= 5), 
            aes(order, rank, group = group, color = tags), 
            smooth = 15, size = 2, inherit.aes = F)

# add starting points
p <- p + 
  geom_point(data = gen %>% filter(d_first_top5 == 1),
             aes(x = order - .2),
             size = 5) +
  geom_segment(data = gen %>% filter(rank <=5), 
               aes(x = order - .2, xend = order + .2, y = rank, yend = rank),
               size = 2,
               lineend = "round")

# customization
p +
  scale_x_continuous(breaks = gen$order %>% unique() %>% sort(),
                     labels = gen %>% distinct(order, month_year) %>% arrange(order) %>% pull(month_year), 
                     expand = expand_scale(mult = .1)) +
  geom_text(data = gen %>% filter(d_first_top5 == 1),
            aes(label = tags, x = order-.2),
            color = "white",
            nudge_y = .225, 
            nudge_x = -.05,
            size = 3.5,
            fontface = 2,
            hjust = 0) +
  geom_text(data = gen %>% filter(order == max(order)),
            aes(label = tags),
            color = "gray70",
            nudge_x = .31,
            hjust = 0,
            size = 3,
            fontface = 2) +
  cowplot::theme_minimal_hgrid(font_size = 14) +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        plot.title = element_text(hjust = .5, color = "white"),
        plot.caption = element_text(hjust = 1, color = "white", size = 8),
        plot.subtitle = element_text(hjust = .5, color = "white", size = 10),
        axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        axis.text.x = element_text(face = 2, color = "white"),
        panel.background = element_rect(fill = "black"),
        plot.background = element_rect(fill = "black")) +
  labs(x = NULL,
       title ="Tag Timeline - 2022",
       subtitle ="Top 5 genre tags for each month",
       caption = "\nSource:\nlast.fm") +
  scale_colour_manual(values = custom_palette) +
  geom_point(data = tibble(x = 0.55, y = 1:5), aes(x = x, y = y), 
            inherit.aes = F,
            color = "white",
            size = 10,
            pch = 21) +
  geom_text(data = tibble(x = .55, y = 1:5), aes(x = x, y = y, label = y), 
             inherit.aes = F,
            color = "white")

It seems like I just couldn’t get enough of Korean music…