Music Growth Pipeline
Weekly Last.fm pipeline tracking 7,755 artists — Postgres, dbt, GitHub Actions
view on github ↗architecture
problem
The Last.fm API returns only cumulative all-time stats — there is no native time series. To study whether chart position correlates with listener growth over time, you have to build the longitudinal dataset yourself by snapshotting repeatedly.
what i built
Weekly ingestion pipeline snapshots listener and playcount data for 7,755 artists from the Last.fm global chart into a cloud Postgres database on Neon. Artists split into mainstream tier (pages 1–50) and indie tier (pages 500–2000). A dbt transformation layer (staging + mart models) powers cross-sectional and longitudinal analysis. GitHub Actions runs the snapshot job every Sunday at 9AM UTC with zero manual intervention.
highlights
- → 7,755 artists tracked: 250 mainstream (pages 1–50), 7,505 indie (pages 500–2000)
- → dbt mart models: artist_tiers, genre_stats, artist_similarity_network
- → Cross-sectional finding: ~4x plays-per-listener gap (mainstream median 74.76 vs indie 17.69) consistent across full distribution — not driven by outliers
- → Indie P90 listeners (782K) falls below mainstream P25 (2.3M) — distributions do not overlap
- → Genre associations: 15 genres × 500 artists; similarity networks: ~2,000 artists, 20 similar artists each
- → Longitudinal analysis accumulating — weekly snapshots running since April 2026
stack