A Risk-based Approach for Assessing R package Accuracy within a Validated Infrastructure: White Paper Summary

By Andy Nicholls | January 30, 2020

In this article I summarise some of the key themes from the R Validation Hub’s white paper, A Risk-based Approach for Assessing R package Accuracy within a Validated Infrastructure.

What is the R Validation Hub?

The R Validation Hub is a cross-industry initiative whose mission is to enable the use of R by the Bio-Pharmaceutical Industry in a regulatory setting, where the output may be used in submissions to regulatory agencies. The group was initially formed in 2018 by members of PSI’s AIMS SIG but has now expanded to include just under 100 members from multiple organisations across the pharmaceutical sector.

Why write a white paper?

The R Validation Hub is following a three-phase roadmap. Phase 1 of the roadmap has been focussed on consolidating relevant information for those wishing to use R for regulatory work. The white paper reflects the current thinking of the R Validation Hub working group and may evolve over time. It consolidates the collective high-level thinking that has been established through various communication channels over the past 12-18 months. Additional detail will be provided via the website and future papers.

Key Themes from the White Paper

Validation, R and R Packages

The FDA provide a clear definition of validation in the ‘Glossary of Computer System Software Development Terminology’. This can be broken down into three core components:

  1. Accuracy
  2. Reproducibility
  3. Traceability

As a constantly evolving language, R presents challenges in each of these areas but the greatest challenges typically concern accuracy.

When assessing accuracy it is important to make the distinction between core R (base and recommended packages) R and the many contributed R packages. The R Foundation have produced R: Regulatory Compliance and Validation Issues. A Guidance Document for the Use of R in Regulated Clinical Trial Environments which is a very useful reference for anyone wishing to use core R for regulatory work. Based on an assessment of the available information we conclude that there “is minimal risk in using base and recommended (core) packages as a component in a validated system for regulatory analysis and reporting with R”.

For the contributed R packages we propose a risk-based approach to establish package accuracy/validity.

An R Package Risk Assessment Framework

We propose to assess the risk of contributed R packages based on four criteria:

  • Purpose
  • Maintenance Good Practice (Software Development Life Cycle)
  • Community Usage
  • Testing

The criteria form part of a proposed workflow summarised in the figure below:

source: Assessing Package Accuracy

The purpose (eg statistical modelling, data manipulation), maintenance practices, community usage and testing each helps establish confidence (or otherwise) in the accuracy of an R package. We propose to gather information in each of these areas in order to determine an overall risk score for a package. For packages that present a medium/high degree of risk we may suggest to mitigate the risk by clearly defining usage requirements and developing tests against them through a unit-testing framework such as testthat. For low-risk packages we feel that “additional remediation for such packages is unlikely to yield any significant reduction in risk.” Based on the earlier conclusion this would include base and recommended R packages.

Trust

A vendor audit can be used establish trust in a company that develops proprietary software. A successful audit may result in the vendor being allocated a ‘trusted resource’ status and thus any software produced by that vendor could be deemed to be low risk. We may also begin to develop a similar level of trust in certain package authors or collections of packages (eg ‘tidyverse’) that continue to demonstrate good practice over a sustained period of time.
# Next Steps

The white paper represents the R Validation Hub’s current thinking and will likely evolve over time as we explore key themes, such as testing, in more depth. As we do so, further shorter papers will follow.

In addition, R Validation Hub has now commenced phase 2 of its roadmap. The aim of this phase is to supplement the white paper with tools. This will begin with the release of an R package, riskmetric, for the collection of metrics that can be used to evaluate package risk and a corresponding Shiny app that can be used to generate package reports. See www.pharmar.org for further information and updates.