Data Quality

Descriptions, referrals, instructions how and why to improve the data quality.

Based on the result of the project ‘Zahedi, Z. (2019). A report on the data quality issues of different content types in Pure & some recommendations for data quality improvement, Information policy department, KNAW’, it is important to:

  • Validate content types with accurate and complete information; 
  • Use persistent identifiers for better identification of outputs (such as DOI), individuals (Such as ORCID), and organizations (such as GRID);
  • Use uniform names and use it consistently when registering and validating records (for instance for internal and external organizations or publishers);
  • Register and validate content types under uniform top or (sub)types of organizational units;
  • Consider and think carefully about what (sub)category is needed to assign to different content types;
  • Be clear and transparent about the choice each institute at KNAW makes when registering and validating different content types in Pure;
  • Regularly monitor and update the information of the different content types, individuals, affiliations, publishers,  journals, etc. of each KNAW institute;
  • Think how relevant it is for each KNAW institute to make use of different content types and their subtypes and use them in a consistent way;
  • Use Pure in a responsible way in line with the aim of the institute. Pure should be used as a platform to register and validate relevant content types to make report or to be used for any other purpose related to the need or aim of each institute at KNAW.

The above recommendations will be served as starting points to improve data quality issues of KNAW research outputs in Pure. In the coming years we (KNAW and its institutes) will work together to meet all the above issues.

Suggested workflow for registering, validating, monitoring, and updating different research outputs in Pure.

A persistent identifier (such as DOIs, PubMed IDs, ISSN/ ISBNs, ORCIDs, GRID IDs, etc.) is a globally unique identifier that is permanently assigned to a resource.

Persistent Identifiers (PIDs) are an important component of Open Science and FAIR data practices.  PIDs ensure that research outputs are findable, accessible, interoperable, and reusable.

Benefits of Persistent identifiers

  • They are required by publishers, research funders, and universities,
  • They increase the discoverability and the accessibility of research outputs and researchers,
  • They enable sharing and reuse,
  • They facilitate the opportunity for system integrations and interoperability.

For more info on Persistent identifiers see:

What is Open Access?

Based on Budapest Declaration (2002) and Berlin Declaration (2003), Open Access (OA) means unrestricted online access to Peer-reviewed scholalry research for any user. It includes the right to read, download, print, copy, distribute, search, link, crawl and mine. Another definition of Open Access by Suber (2004)is: “Open-access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions“, (See Open Access Overview, CC BY 3.0 US). OA articles are free to read online, either on the publisher website or in an OA repository (Piwowar et al., 2018). For more info visit:

Why Open Access is important?

  • Requried by funders such as NWO and EU as a condition for funding.
  • Facilitates transtition to Plans S (see also PlanS 10 Principles and revised PlanS princiles) that by 2020 scientific publicaitons resulted by public funds must be published in OA journals.
  • Increase accessibility, usage, visibility, and impact of research outputs.

For pros and cons of OA visit here: