Site administrators¶
Use the Django admin to:
Add and edit publications
Note
Once a new publication is added, the manageprocess command will collect its data, unless Frozen is checked.
Check the status of a job and its tasks
Note
The flattener task was added in September 2022, so earlier jobs have a Last completed task of “exporter (4/5)”.
Review log entries for other administrators’ actions
Publications and jobs can be searched by publication country and publication title.
Note
The search performs only case normalization. For example, “Montreal” will not match “Montréal” (with an accent).
Add a publication¶
Refer to our internal documentation (contains links to internal resources).
Review publications¶
From time to time, use the filters in the right-hand sidebar to:
Review publications for out-of-date or missing information:
Non-frozen publications that weren’t recently reviewed (By last reviewed: More than a year ago)
Publications without all English and Spanish translations (By untranslated: Yes)
Publications without licenses (By data license: Empty)
Non-frozen publications without quality summaries (By quality summary [en]: Empty)
Non-frozen publications without other information (By incomplete: Yes), one or more of:
Country flag
Country (en)
Retrieval frequency
Source URL
Language (en)
Description (en)
Data availability (en)
Review publications for visibility and processing:
Unpublished publications (By public: No)
Frozen publications (By frozen: Yes)
Historical publications (By retrieval frequency: This dataset is no longer updated by the publisher)
Review jobs¶
From time to time, use the filters in the right-hand sidebar to:
Check for failed jobs, and restart tasks as appropriate (By failed: Yes)
Check for completed jobs whose temporary data has not been deleted (By temporary data deleted: No, By status: COMPLETED)
Check for running jobs that are old (By status: RUNNING)
Troubleshoot a job¶
A job’s detail page:
Displays the status, result and note (e.g. error messages) for each task, in the Job tasks section.
If a task’s result is
FAILED, butprocess()considers the failure to betemporary, then the manageprocess command retries the task until it succeeds or fails permanently. Read the Note, and judge whether the failure is permanent. If so, you can set the job’s Status to COMPLETED to stop the retries. The manageprocess command will then delete the job’s temporary data. The next job will be scheduled according to the publication’s retrieval status.Attention
If you want it scheduled sooner, prioritize #350.
Defines and displays metadata (Context) from its tasks, in the Management section
Use the metadata to troubleshoot other applications. For example, to check the Scrapy log, replace the hostname and port in the
scrapy_logvalue withcollect.data.open-contracting.org.See also
How to check on progress in:
This project’s RabbitMQ management interface is at rabbitmq.data.open-contracting.org.
Cancel a job¶
A job can stall (always “running”). The only option is to cancel the Scrapyd job and set the job’s Status to COMPLETED using the Django admin.
Attention
To properly implement this feature, see #352.
Restart a task¶
You can restart the Exporter and Flattener tasks. Do this only if the data_registry_production_exporter_init and data_registry_production_flattener_init queues are empty in the RabbitMQ management interface.
Note
The Flattener task publishes one message per file. You might receive a Sentry notification about a failed conversion, while other conversions are still enqueued or in-progress.
The Exporter task publishes one message per job. This task can be restarted while the queue is non-empty – as long as another administrator has not restarted it independently.
Set only the Exporter and/or Flattener task’s Status to PLANNED
Click SAVE
Any lockfiles are deleted to allow the task to run.
Attention
See #350.
Unblock the Process task¶
Bugs can cause a job to get stuck on the Process task. To diagnose and fix a bug, run Kingfisher Process’ collectionstatus command and select the collection’s notes, for example:
SELECT * FROM collection_note WHERE collection_id = 100;
If the collection is large, you can manually unblock the Process task.
No data collected¶
Note
This bug is fixed. The Process task fails with “Collection is empty”.
If the collectionstatus command shows that no collection files were created and that the compiled collection has started but not ended:
steps: compile
data_type: to be determined
store_end_at: 2001-02-03 04:05:06.979418
completed_at: 2001-02-03 04:05:07.074971
expected_files_count: 0
collection_files: 0
processing_steps: 0
Compiled collection
compilation_started: True
store_end_at: None
completed_at: None
collection_files: 0
processing_steps: 0
completable: yes
Then, confirm that the Collect task didn’t write files, by checking the crawl’s log file in Scrapyd for a message like:
2001-02-03 04:05:06 [my_spider] INFO: +---------------- DATA DIRECTORY ----------------+
2001-02-03 04:05:06 [my_spider] INFO: | |
2001-02-03 04:05:06 [my_spider] INFO: | Something went wrong. No data was downloaded. |
2001-02-03 04:05:06 [my_spider] INFO: | |
2001-02-03 04:05:06 [my_spider] INFO: +------------------------------------------------+
If so, run Kingfisher Process’ closecollection command using the ID of the original collection, to allow the task to finish.
Processing step remaining¶
Note
This bug is fixed. It was diagnosed by observing one remaining load step and a note like:
Empty format 'empty package' for file /data/my_spider/20010203_040506/E76/my_file.json (id: 55555).
The fix was to delete load steps for empty packages.
If the output looks like:
steps: compile
data_type: release package
store_end_at: 2001-02-03 04:05:06.979418
completed_at: None
expected_files_count: 654321
collection_files: 654321
processing_steps: 1
2001-02-03 04:05:07,074 DEBUG [process.management.commands.compiler:120] Collection my_spider:2001-02-03 04:05:06 (id: 100) not compilable (load steps remaining)
compilable: no (or not yet)
2001-02-03 04:05:07,074 DEBUG [process.management.commands.finisher:130] Collection my_spider:2001-02-03 04:05:06 (id: 100) not completable (steps remaining)
completable: no (or not yet)
Compiled collection
compilation_started: False
store_end_at: None
completed_at: None
collection_files: 0
processing_steps: 0
2024-07-04 14:45:01,718 DEBUG [process.management.commands.finisher:114] Collection my_spider:2001-02-03 04:05:06 (id: 101) not completable (compile steps not created)
completable: no (or not yet)
Then, confirm that the messages corresponding to the remaining processing steps have already been consumed by the file_worker worker, by checking RabbitMQ’s management interface. If so, select the remaining load steps for the original collection, for example:
SELECT collection_file_id FROM processing_step WHERE name = 'LOAD' AND collection_id = 100;
collection_file_id
--------------------
55555
(1 row)
And, re-publish the messages, using the Django shell command, for example:
from process.util import get_publisher
with get_publisher() as client:
message = {"collection_id": 100, "collection_file_id": 55555}
client.publish(message, routing_key="api_loader")
Freeze or unpublish a publication¶
A publication is frozen if the source is temporarily broken or otherwise unavailable. Unfreeze the publication when the source is fixed.
A publication is unpublished if there are security concerns (like Afghanistan), if it duplicates another publication, or if it was added in error.
Only delete a publication if it is a duplicate or if it was otherwise created in error.
Note
If the publication is no longer updated, or the spider is removed from Kingfisher Collect, set the retrieval frequency to NEVER, instead of freezing the publication.
Tip
To audit whether publications ought to be frozen, run scrapy checkall from Kingfisher Collect.
If freezing: Check Frozen, to stop jobs from being scheduled
If unpublishing: Uncheck Public, to hide the publication
Click Save at the bottom of the page
Add an administrator¶
Click Add next to Users in the left-hand menu
Fill in Username and Password, using a strong password
Click Save and continue editing
On the next form:
Fill in First name, Last name and Email address
Check Staff status (only James and Yohanna should have Superuser status)
Assign Groups (multiple can be selected, as they have non-overlapping permissions)
- Viewer
Can view publications, licenses, jobs and job tasks
- Contributor
Can add/change publications and licenses
Click SAVE