← dean's list
data engineering · 2026 · ● live

Music Growth Pipeline

Weekly Last.fm pipeline tracking 7,755 artists — Postgres, dbt, GitHub Actions

view on github ↗
architecture

The Last.fm API returns only cumulative all-time stats — there is no native time series. To study whether chart position correlates with listener growth over time, you have to build the longitudinal dataset yourself by snapshotting repeatedly.

Weekly ingestion pipeline snapshots listener and playcount data for 7,755 artists from the Last.fm global chart into a cloud Postgres database on Neon. Artists split into mainstream tier (pages 1–50) and indie tier (pages 500–2000). A dbt transformation layer (staging + mart models) powers cross-sectional and longitudinal analysis. GitHub Actions runs the snapshot job every Sunday at 9AM UTC with zero manual intervention.

  • 7,755 artists tracked: 250 mainstream (pages 1–50), 7,505 indie (pages 500–2000)
  • dbt mart models: artist_tiers, genre_stats, artist_similarity_network
  • Cross-sectional finding: ~4x plays-per-listener gap (mainstream median 74.76 vs indie 17.69) consistent across full distribution — not driven by outliers
  • Indie P90 listeners (782K) falls below mainstream P25 (2.3M) — distributions do not overlap
  • Genre associations: 15 genres × 500 artists; similarity networks: ~2,000 artists, 20 similar artists each
  • Longitudinal analysis accumulating — weekly snapshots running since April 2026
PythonPostgreSQL (Neon)dbt Core (dbt-postgres)Last.fm APIGitHub ActionsSQL