Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Principles for Sharing Sensitive Biodiversity Data: Protecting Ethical Data Management, Lecture notes of Data Mining

High-level principles for managing and sharing sensitive biodiversity data, focusing on the importance of ethical data management and respecting access restrictions. It includes guidelines for assessing the sensitivity of data, determining the level of generalization, and documenting reasons for data restrictions.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

aaroncastle1
aaroncastle1 🇬🇧

4.3

(8)

223 documents

1 / 50

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Les bonnes pratiques actuelles pour la
généralisation des données
d’occurrences d’espèces sensibles
Arthur D. Chapman
Version HEAD detached, 2022-02-26 10:19:43 UTC
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32

Partial preview of the text

Download Principles for Sharing Sensitive Biodiversity Data: Protecting Ethical Data Management and more Lecture notes Data Mining in PDF only on Docsity!

Les bonnes pratiques actuelles pour la

généralisation des données

d’occurrences d’espèces sensibles

Arthur D. Chapman

Version HEAD detached, 2022-02-26 10:19:43 UTC

Table des matières

  • Colophon.
    • Suggestion de citation
    • Auteur
    • Licence
    • URI permanent
    • Contrôle du document
    • Image de couverture
  • Introduction
    • Objectifs
    • Publics
    • Périmètre d’application
    1. Principes
    1. Determining sensitivity.
    • 2.1. Criteria for determining sensitivity
    • 2.2. Categories of sensitivity
    1. Generalizing textual information
    1. Generalizing spatial information
    • 4.1. Generalization versus randomization.
    • 4.2. Generalization
    • 4.3. Documentation
    • 4.4. Duplicates and GUIDS
    1. Documentation and metadata
    • 5.1. Documenting sensitivity
    • 5.2. Spatial fit
    1. Authentication and Authorization.
    1. Implementations
  • Afterword
    • Listing sensitive taxa.
    • Metadata recommendations
  • Glossary.
  • Acknowledgements.
  • References
  • Annex 1: Scenarios using Criteria 1 and 2 as Triggers.
    • Criterion 1.
    • Criterion 2.

Introduction

La distribution non protégée des données primaires d’occurrences d’espèces sensibles (par exemple la localisation exacte des espèces rares, des taxons en danger ou à valeur commerciale) était une préoccupation du GBIF – Global Biodiversity Information Facility - (Système mondial d’information sur la biodiversité) depuis ses débuts. Le secrétariat du GBIF a tout intérêt à rendre les données disponibles via ses portails, tout en respectant la volonté des fournisseurs de données de restreindre l’information sur les taxons sensibles. Début 2006, le GBIF a initié un processus pour résoudre ce problème, particulièrement en ce qui concerne les données partagées via le réseau GBIF et rendues visibles via GBIF.org et d’autres initiatives d’agrégation des données.

Cela s’est traduit par le Guide des bonnes pratiques pour la généralisation des données sur les occurrences d’espèces sensibles. Ce document s’est largement appuyé sur les résultats d’une enquête en ligne menée sur Survey Monkey et les ateliers qui suivirent, dont les rapports ont été initialement publiés sur le site Web du GBIF (Chapman 2006).

Un rapport final sur le traitement des données primaires sur les occurrences d’espèces sensibles a été élaboré à la suite de ces processus et discussions, et a été présenté au GBIF en avril 2007 (Chapman 2007). Ce rapport a proposé un certain nombre de recommandations, et beaucoup d’entre elles ont été incluses dans ce document.

La dernière étape de ce processus consistait à élaborer un Guide des bonnes pratiques pour les données primaires sur les occurrences d’espèces. Ce document a été proposé comme directive importante pour les institutions, les fournisseurs de données et les nœuds GBIF à utiliser pour élaborer leurs propres lignes directrices internes. Les organisations et les institutions ont été encouragées à produire leurs propres documents internes en prenant en compte les pratiques décrites dans le Guide et les documents connexes tels que le https://doi. rg/10.15468/doc-2zpf- zf42[ Guide des bonnes pratiques pour le géoréférencement ^] (Chapman et Wieczorek 2006) et les incorporer dans leur propre environnement de travail. Malheureusement, pas autant d’institutions que nous l’avions espéré n’ont relevé le défi et produit leurs propres documents internes. Toutefois, deux agences clés l’ont fait : SANBI en Afrique du Sud (http://biodiversityadvisor.sanbi.org/wp- content/uploads/2012/09/SANBI-Biodiversity-Information-Policy-Series-Digital-Access-to-Sensitive- Taxon. df[SANBI 2010^]) et Atlas of Living Australia (Tann and Flemons 2009, ALA 2018a) (voir Implementations).

Il est également important de comprendre l’impact possible que les approches de restriction des données sensibles peuvent avoir sur la science de la biodiversité et, tout en limitant la disponibilité ou la résolution de certaines données, ne pas restreindre excessivement les utilisations pour lesquelles les données peuvent être faites. Pour cette raison, un ensemble de principes est expliqué ci-dessous. L’un des principaux entre eux est la nécessité de rendre l’information sur la biodiversité librement disponible dans la mesure du possible, dans l’intérêt de la science, de l’environnement et de la biodiversité elle-même.

Les mots mis en hyperlien dans le texte renvoient aux termes qui sont inclus dans le [Glossaire]; les citations renvoient vers les sources où elles sont disponibles en ligne et aux [Références] si elles ne le sont pas.

Objectifs

Ce document vise à fournir des bonnes pratiques (ou les bonnes pratiques actuelles) pour traiter des données primaires d’occurrence d’espèces sensibles, et fournir des conseils sur la façon de mettre à disposition le plus de données sans en même temps exposer l’espèce à des dommages parce que les données ont été placées dans le domaine public.

Cela fait maintenant plus d’une décennie que le premier Guide a été publié, et cette nouvelle publication a pour but de mettre à jour ces pratiques et d’intégrer les expériences acquises par les institutions qui ont mis en œuvre le Guide en tout ou partie.

Publics

Ce travail est conçu pour ceux qui ont besoin ou veulent savoir comment ils peuvent mettre au mieux à disposition autant de données que possible sur les taxons sensibles sans que ces données publiées ne nuisent à l’espèce. Ce document s’adresse également aux personnes ou aux organisations confrontées à l’élaboration d’une politique sur le traitement des données primaires d’occurrence d’espèces sensibles et à la rédaction d’une documentation interne conforme aux bonnes pratiques actuelles.

Surtout, ce document aidera les utilisateurs finaux des données à comprendre les implications qu’il y a à essayer d’utiliser des enregistrements qui peuvent avoir été généralisés pour protéger les espèces sensibles, et comment comprendre le sens de cette généralisation à différentes précisions.

Périmètre d’application

Le terme « bonnes pratiques » fait généralement référence à la meilleure façon de faire quelque chose. Il est couramment utilisé dans les domaines de la gestion d’entreprise, du génie logiciel et de la médecine, et de plus en plus dans l’administration publique. Le terme « bonnes pratiques actuelles » (ou meilleures pratiques courantes) est plus spécifique dans la mesure où il indique la possibilité de développements futurs et de meilleures pratiques. En raison de l’immaturité de ce sujet, cette publication fait généralement référence aux bonnes pratiques « actuelles » et est certaine de mûrir avec le temps, au fur et à mesure qu’un plus grand nombre d’institutions adopteront et adapteront les principes décrits dans cette dernière.

Deux questions que ce document et le document précédent n’ont pas couvertes sont les questions de la vie privée des personnes vivantes et l’élaboration d’accords de partage de données et de licences de données. Ces deux questions ont des implications juridiques et varient considérablement d’une juridiction à l’autre. Ces problèmes ont été couverts en détail par d’autres (par exemple, Corti et al. 2000, https://doi. rg/10.1177/0038038504039366[Parry and Mauthner 2004^], GBIF 2017, GBIF 2019 , ALA 2018b, OEH 2019b).

Éthique et biodiversité est un sujet qui a été peu couvert, bien que depuis des centaines d’années, les biologistes aient suivi une éthique implicite dans leur travail. La gestion de données sur des données sensibles nécessite une pratique éthique considérable et, dans de nombreux cas, une grande confiance et une grande collaboration. Souvent, les biologistes amateurs et les scientifiques citoyens sont conscients des emplacements des taxons sensibles, et il appartient aux biologistes de travailler avec ces groupes pour assurer la survie de l’espèce. Cela ne sera pas toujours possible car il y aura

1. Principes

Biodiversity information should be made freely available to be shared globally to enable their use for not-for-profit decision making, education, research and other public benefit purposes. Making the full detail of biodiversity information available should reduce the risk of damage to the environment and help safeguard a sustainable future. Where release will have the opposite effect, access to the full detail may need to be controlled.

Below are a set of high-level principles related to the sharing of data generally and the sharing of sensitive data in particular.

  1. The management of sensitive data is integral to ethical data management.
  2. Wherever possible, environmental information should be freely available to all. Generally this benefits the environment by increasing awareness, enabling better decision-making and reducing risk of damage.
  3. Public release of information can sometimes result in environmental harm. In such cases availability of information may need to be controlled; although the presumption remains in favour of release and any restrictions should be assessed and reviewed rigorously.
  4. All data regarded as being sensitive should include a date for review of their sensitivity status, along with documented reasons for the sensitivity status. The date for review may be short or long depending on the nature of the sensitivity.

Whenever a data provider receives an application for enhanced access to restricted data, they should avoid assuming continued sensitivity and use it as an opportunity to revisit the determination.

  1. If the data is to be restricted for distribution, then this should only be done to a copy of the data at the time of their distribution. Data should never be altered, falsified or deleted from the stored record.
  2. Documentation is essential for many reasons, and where data have been restricted or generalized, it is important that the reason(s) for the categorization is recorded as metadata that remains with the record.
  3. Where data is restricted or generalized for distribution (such as the name of a collector, textual locality information, etc.), this should be documented by replacing with appropriate wording − the field should not be left blank or null.
  4. There are extremely strong reasons not to restrict data on related collections (e.g. collector’s numbers in sequence, collector’s name, etc.) because of the restrictions this places on data quality and data validation procedures, etc.
  5. Users of sensitive data should comply with any and all restrictions of access that the data provider has placed on the data. If granted enhanced access to restricted information, users must not compromise or otherwise infringe the confidentiality of such information.
  6. Data providers should respect the needs of data users to have access to data and documentation in order to determine the ‘fitness for use’ of the data and to ensure that analyses are robust and

not misleading.

A few examples with which I have had direct experience include Centropyge

boylei , Centropyge narcosis [and] Belonoperce pylei … among a number of

others.

Often in these and other cases, the existence of the new species is brought

to the attention of the scientific community ‘by’ the commercial (aquarium)

trade; rather than the other way around. Thus, it is usually not considered so

much of a ‘problem’, but rather a sort of ‘symbiotic’ relationship between the

commercial trade and the taxonomists. Moreover, in such cases in reef

fishes, the species has eluded prior discovery not so much because it is rare

or has an extremely restricted distribution, but because it simply lives

somewhere that scientists have not yet been able to survey. Hence, there are

usually few, if any, conservation implications in this context.

— Richard Pyle, personal communication 2006

2.1. Criteria for determining sensitivity

The National Biodiversity Network (NBN) in the UK (Countryside Agencies OIN 2007), and the Department of Environment and Conservation in New South Wales, Australia (Department of Environment and Conservation 2007 ) developed detailed sensitivity criteria, and the previous version of this publication (Chapman and Oliver 2008) relied heavily on the work of those two agencies. Since the publication of the previous Guide , both these agencies (DECCW 2009, NBN 2019a, NBN 2019b, OEH 2019a) along with the South African National Biodiversity Institute (SANBI 2010, SANBI 2016), the Atlas of Living Australia (ALA 2018a) and others, have given a lot of thought to criteria for determining sensitivity within their jurisdictions. Documentation from all of them have contributed greatly to this document.

A series of criteria for determining the sensitivity of taxa and data along with recommended metadata statements for documenting the reasons for the determination are set out in Table 1. The first two are for use by biodiversity data holders and those creating trigger lists of potentially sensitive taxa and refer largely to the taxa themselves. The last two are for use by biodiversity data holders and deal with an assessment of the data they hold and are considering making available – they are not suitable for the creation of trigger lists.

The criteria are used to determine:

Table 1. Criteria for determining the sensitivity of taxa and data along with recommended metadata statements for documenting the reasons for the determination

  1. Risk of harm An assessment of whether the taxon is subject to harmful human activity.
  2. Impact of harm An assessment of the sensitivity of the taxon to the harmful human activity.
  1. Sensitivity of data An assessment on whether the release of data will increase harm.
  2. Decision on release and category of sensitivity

A balanced decision regarding the release of the data and a determination of the category of sensitivity, and thus the level of generalization, of the data for release.

A set of scenarios using Criteria 1 and 2 above to determine triggers for sensitivity of taxa is attached as an Annex to this document.

The first step in the process of determining sensitivity is to make an assessment on whether or not the taxon is subject to a harmful human activity or not and if the availability of related biodiversity data will increase the likelihood of the harmful activity occurring.

If it is not then there would appear no reason to list it as a potential environmentally sensitive taxon. It is recommended that you use the documented wording supplied but with additional supporting rational documenting the specifics of the threat, for example:

The taxon is at risk from harmful human activity – it is subject to attack by

Phytophthora which is transported by human operated vehicles.

Table 3. Impact of harm. Assessing sensitivity of taxa to a harmful human activity.

2.1. Does the taxon have characteristics that make it significantly vulnerable to the harmful human activity? YES ↓ Document with supporting rationale using Statement 2a: “The taxon has characteristics that make it significantly vulnerable to the harmful human activity.” ↓ Go to 2.

NO

Document with supporting rationale using Statement 2b: “The taxon is not significantly vulnerable to the harmful human activity.” ↓ Go to 2.

2.2. Is the taxon vulnerable to harmful human activity over its total range, or are there areas (such as in conservation zones, or other parts of the world) where the taxon is not at the same level of risk? YES ↓ Document with supporting rationale using Statement 2c: “The taxon is vulnerable to harmful human activity over its total range.” ↓ Go to 3

NO

Document with supporting rationale using Statement 2d: “The taxon is not vulnerable to harmful human activity over its total range and/or there are areas where the taxon occurs but is not at significant risk.” ↓ Go to 3

Once it has been decided that the taxon is subject to a significant risk and impact from harm or not, then a decision needs to be taken on whether the release of specific data on that taxon – or other related data – will increase the risk and impact of harm.

Table 4. Sensitivity of data. Assess whether the release of data will increase harm.

3.1. Is the content and detail of the biodiversity data such that their release would enable someone to carry out a harmful activity upon the taxon or attribute? YES ↓ Document with supporting rationale using statement 3a: “The content and detail of the data is such that their release would enable someone to carry out a harmful activity upon the taxon or attribute.” ↓ Go to 3.

NO

Data is not sensitive Document with supporting rationale using statement 3b: “The content and detail of the data if released would not enable someone to carry out a harmful activity upon the taxon or attribute.” ↓ Go to 4

3.2. Is information already in the public domain, or already known to those individuals or groups likely to undertake the harmful activity? YES ↓ Document with supporting rationale using statement 3d: “The information is already in the public domain, or is already known to the individuals or groups likely to undertake harmful activities.” ↓ Go to 3.

NO

Document with supporting rationale using statement 3c: “The information is not in the public domain, and is not already known to individuals or groups likely to undertake harmful activities.” ↓ Go to 3.

3.3. Would disclosure damage a partnership or relationship (especially where the maintenance of which is essential to helping achieve a specific conservation objective)? YES ↓ Document with supporting rationale using statement 3e: “Disclosure of the data is likely to damage a partnership or relationship the maintenance of which is essential to helping achieve a specific conservation objective.” ↓ Go to 3.

NO

Document with supporting rationale using statement 3f: “Disclosure of the data will not damage any partnership or relationship essential to conservation.” ↓ Go to 3.

3.4. Would disclosure allow the locations of sensitive features to be derived through combination with other publicly available information sources?

Table 5. Decision on release and category of sensitivity. Make a balanced decision regarding the release of data and determining the category and level of generalization.

4.1. On balance, considering criteria 1 to 3 above and any important wider context, will withholding the information increase the risk of environmental harm or harm to a living person? YES ↓ Document using statement 4a: “On balance, release of the information will, or is likely to, increase the risk of environmental harm or harm to a living person.” ↓ Go to 4.

NO

Document using statement 4b: “On balance, release of the data will not increase the risk of environmental harm or harm to a living person.” ↓ Go to 4.

4.2. Is the taxon distinctive and of high biological significance, under high threat from exploitation/disease or other identifiable threat where even general locality information may threaten the taxon? Or could the release of any part of the record cause irreparable harm to the environment or to an individual? YES ↓ Document using statement 4c, collating all supporting rationale and documenting the decision to withhold the data: “The species is a distinctive species of high biological significance, is under high threat from exploitation/disease or other identifiable threat and even general locality information may threaten the taxon, or the release of the information could cause irreparable harm to the environment, an individual, or some other feature.” Category 1

NO

Go to 4.

4.3. Is the taxon such that the provision of precise locations at finer than 0.1 degrees (~10 km) would subject the taxon to threats such as disturbance and exploitation? Or does the record include highly sensitive information, the release of which could cause extreme harm to an individual or the environment?

YES

Document using statement 4d, collating all supporting rationale and documenting the decision to release the data: “The species is classed as highly sensitive, and the provision of precise locations would subject the species to threats such as disturbance and exploitation, and/or the record includes highly sensitive information, the release of which could cause extreme harm to the environment or an individual.” Category 2

NO

Go to 4.

4.4. Is the taxon such that the provision of precise locations at finer than 0.01 degrees (~1 km) would subject the species to threats such as collection or deliberate damage? Or does the record include sensitive information, the release of which could cause harm to an individual or the environment?

YES ↓ Document using statement 4e, collating all supporting rationale and documenting the decision to release the data: “The species is classed as of medium to high sensitivity, and the provision of precise locations could subject the species to threats such as collection or deliberate damage, and/or the record includes sensitive information, the release of which could cause harm to the environment or to an individual.” Category 3

NO

Go to 4.

4.5. Is the taxon subject to low to medium threat if precise locations (i.e. locations with a precision greater than 0.001 degrees or 100m) become publicly available and where there is some risk of collection or deliberate damage?

2.2. Categories of sensitivity

Table 6. Categories of sensitivity

Criterion Reasoning Category 1

Species or records for which no records will be provided at all, or which are only released as present within a large region such as a county, watershed, etc.

The reason for non-disclosure is that:

  1. a distinctive species of high biological significance is under high threat from exploitation/disease or other identifiable threat where even general locality information may threaten the taxon.
  2. the information in the record is of such a nature that its release could cause irreparable harm to the environment, to an individual or to some other feature.

Data may only be supplied under strict Licence conditions or as presence in a large region such as a watershed, county, or biogeographic region.

Criterion Reasoning

Category 2

Species or records for which coordinates will be publicly available ‘denatured’ (to 0.1 degrees) and/or other information in the record is generalized. Finer scale data (Category 3, Category 4 or detailed data) may be supplied to individuals under Licence.

The reasons for restriction are that:

  1. The species is classed as highly sensitive , and the provision of precise locations would subject the species to threats such as disturbance and exploitation.
  2. The record includes highly sensitive information, the release of which could cause extreme harm to an individual or to the environment.

Data is supplied to the public

  1. with the georeference denatured to 0. degrees (~10 km) and/or
  2. with sensitive fields generalized or removed and replaced with suitable replacement wording

Data may be supplied at finer scales on request under the conditions of a written data agreement, usually a Data Licence Agreement. When data is provided to clients, they will be advised which species or fields are sensitive and may have their coordinates denatured to that available under Category 3 or Category 4.

NB : In the case where the sensitivity is triggered by fields other than the georeference, it may be more appropriate to class the record as Category 3 or Category 4.