Data Registry¶
The OCP Data Registry helps you find and access public procurement datasets that are available in the OCDS format. This documentation describes the open-source software itself.
(If you are viewing this on GitHub, open the full documentation for additional details.)
How it works¶
This project is made up of two apps:
data_registry
: Serves the website, metadata API and admin site, and performs orchestration.exporter
: Creates the bulk downloads in JSON, Excel and CSV format.
The complex part of the project is the orchestration. The tasks to orchestrate are:
Collect data, via Scrapyd, used as the interface to Kingfisher Collect
Pre-process data, via Kingfisher Process
Measure quality, via Pelican frontend, used as the interface to Pelican backend
Export JSON files, via the exporter worker
Export Excel and CSV files, via the flattener worker
Each task is implemented as a TaskManager
under the data_registry/process_manager/task
directory. The JOB_TASKS_PLAN
setting controls the order and choice of tasks.
The most relevant logic is:
the manageprocess command, which calls…
the
data_registry.process_manager.process()
function with each publication, which calls…the
data_registry.models.Collection.is_out_of_date()
method to decide whether to start a job.
Word choice¶
“collection” has a different meaning in this project’s code than in Kingfisher Collect or Kingfisher Process. It should be “publication”, as used in the UI and documentation.