Web Scraping for the CPI: Learning Resources on Reproducible Analytical Pipelines (RAP)

This site contains resources developed by ESCAP to introduce the concept of Reproducible Analytical Pipelines (RAP) in the context of web scraping for Consumer Price Indexes (CPI). These materials were created as part of ESCAP’s project on Big Data for Official Statistics, funded by the 2030 Agenda Sub-Fund of the UN Peace and Development Trust Fund (UNPDF).
Background
During 2024, ESCAP provided support to a group of countries on webscraping for price statistics. This support included a series of remote training sessions, an in-person workshop and ongoing mentoring. All materials from this support are available on this page and can be used to support self-paced learning in the use of Python for webscraping prices. Check out the Webscraping for CPI Learning Resources page for more details about the training in general, watch videos of past sessions, and review the materials.
An important component of this support was on the development of Reproducible Analytical Pipelines or RAPs. RAP was originally developed in the United Kingdom government to improve the processes that government teams use to make their outputs more reproducible, as well as speed up their production processes. RAP can be thought of as an approach to working which brings together a range of open source tools, techniques from fields like reproducible research, software engineering, and DevOps to make statistical releases easily reproducible, testable, and auditable (check out the original article from 2017 explaining the concept and the purpose). Given that its a framework for mature processes, it can also be applied to the critical aspect of web scraping or related steps in making official statistics with this data – as data collection and preparation for the CPI should be robust and highly reproducible!
All materials from this support are available on this site and can be used for self-paced learning or adaptation in other national statistical contexts.
Other useful reading
This site (and the training) does not aim to be comprehensive but simply demonstrate and document the key concepts so that learners have enough to do their own training. For more on RAP, we encourage you to check out:
- The NHS RAP Communitity of Practice – that contains lots more useful content on a large number of RAP concepts
- The RAP Companion - which provides a good guide for each component (although focuses on R)
- The Udemy class on Reproducible Analytical Pipelines (RAP) using R- provides a good overview of RAP with R
- A presentation and a paper providing a good guide of the application of RAP to price statistics, specifically for rail fares
Each page will try to provide external and supporting links, allowing learners to explore related content and refer to it later.
