Sphnc 197
Proposal is to have an additional optional flag that can be configured in the scrambleField
de-identification rule:
"scrambleField": {
"defaultScrambling": {
"applies_to_fields": ["sphn:SubjectPseudoIdentifier/id", "sphn:SubjectPseudoIdentifier/sphn:hasIdentifier", "sphn:AdministrativeCase/id", "sphn:AdministrativeCase/sphn:hasIdentifier", "sphn:Sample/id", "sphn:Sample/sphn:hasIdentifier", "sphn:hasSample/id", "sphn:hasSample/sphn:hasIdentifier", "sphn:hasAllergen/sphn:hasCode/id"],
"anonymize": true
}
}
-
anonymize: false
: same logic as before -
anonymize: true
: when a patient is processed, instead of extracting fromde_identification_scrambling
the previously scrambled values for the patients, we setscrambling_map={}
. This means that all fields will be scrambled from scratch at the first encounter. We still store the new scrambled values in the tablede_identification_scrambling
such that the downloaded data has the scrambled patient ID (if de-identified) (addedON CONFLICT DO UPDATE SET
statement to insert statement). Repeated processing of a patient will give different results.
UPDATE: in order to "forget" about the ingested patient we apply the following additional steps:
- We do not write any record in
de_identification
history table ifanonymize: true
. - Table
de_identification_scrambling
has an additional columnanonymized
which is used later on for cleanup and scrambled patient ID extraction. - If one of the scrambling de-id rules has
anonymize: true
we extract the scrambled patient ID (if it has been scrambled) fromde_identification_scrambling
table and update the internal patient ID in the connector. Patient file is written to refined zone with the scrambled ID and that ID is used in the tablesrefined_zone, graph_zone, release_zone
. - When the pre-checks are run we cleanup the table
de_identification_scrambling
from all the records withanonymized=True
.
Warnings:
- When a patient is re-ingested, the anonymized patients in release, graph, refined zones won't be removed because we do not know that those patients are related to the re-ingested patient. A full reset project is needed to cleanup the data.
- The patient logs when extracted for anonymized patients, will be split in the sense that logs up to refined_zone will be related to the original patient ID, while logs from integration step will be related to the anonymized ID.
Edited by Nicola Stoira