Risk Assessment

The term, “risk” refers to the risk of an error in the code that, when used appropriately, could lead to an incorrect calculation and hence an incorrect decision when analysing pharmaceutical data. The relative impact of an error should be determined by the individual organisation. Impact is therefore not considered as part of the risk assessment.

Risk Philosophy

The following tables highlight metrics that could be used in order to assess the risk of an R package. The risk assessment has been grouped into two areas:

  1. Package Maintenance
  2. Community Usage and Testing

Package Maintenance

Metric Reason for Inclusion
Vignette It is good practice to provide additional help documentation for a package in the form of one or more vignettes.
Website It is good practice to maintain a website with further information about the package, contact details, and other supporting information.
Source control (public) The use of source control is good practice that facilitates development. In general it will not be possible to determine whether the maintainer is using a source code repository unless this is in the public domain.
Formal bug tracking Bugs can be logged by emails, but the better practice is to provide a formal mechanism for bug tracking.
News A news feed is good practice as it makes users aware of updates to the package, thereby highlighting the areas of greatest risk.
Release rate (18 months) A stable but continuing release pattern is a sign of active maintenance. Active maintenance reduces the risk of bugs/errors and so a higher frequency indicates a reduced risk. However, it should be noted that as a package stabilises over time, the release rate may slow. When interpreting this metric it is therefore also important to consider the package maturity.
Size of codebase (lines of code) The larger the code base, the greater the risk of error
License? [It is important to understand which licenses are OK to be used for regulatory works under corporate environment and submit to the agency]
Author reputation? [Need to think about how we’d measure this - number of packages authored? The webiste https://www.rdocumentation.org/trends provide a metric for author reputation ]

Community Usage and Testing

Metric Reason for Inclusion
Maturity (package) The longer a package has been in existence, the more exposure it has had to community testing
Maturity (version) When measuring package maturity, it is also important to consider the version maturity. The more recent a version, the less exposure it has had to community testing
Package available from CRAN or Bioconductor CRAN and Bioconductor are the two standard public repositories for R packages. To publish a package on these repositories, a package maintainer must ensure that the package passes a series of technical checks including the ‘R CMD check’. Packages on GitHub and other popular public repositories are not required to meet these checks and pose a greater risk.
Implements a standard unit-testing framework The three standard R unit-testing frameworks are testthat, RUnit and svUnit. It is good practice to adopt one of these frameworks in order to write unit tests. A user may implement their own framework but this increases the risk of error.
Code coverage For packages that implement a test framework, it is possible to use the R package, covr, to work out the proportion of the source code covered by these tests. Although there are an infinite number of ways of testing code, a 100% covr score would mean that every line of source code is at least called when running the tests.
Number of reverse dependencies A high number of reverse dependencies increases the indirect exposure of the package and reduces the risk of error
Average downloads in past 12 months (all versions) The more times a package has been downloaded the more extensive the user testing and the greater chance there is of someone finding a bug and logging it.