Other packages > Find by keyword >

Explore topics

By keyword . Oldest R packages . Newest R packages . Top downloaded R packages . Datasets . Archived R packages . Top downloaded last week . Top downloaded yesterday

planningML

Star Watch Copy Code

A Sample Size Calculator for Machine Learning Applications in Healthcare

View on CRAN: Click here

Download and install planningML package within the R console
Install from CRAN:

 install.packages("planningML")

Install from Github:

 library("remotes")

                        install_github("cran/planningML")

Install by package version:

 library("remotes")

                        install_version("planningML", "1.0.1")

Attach the package and use:

 library("planningML")

Maintained by
Xinying Fang
[Scholar Profile | Author Map]

All associated links for this package

10.32614/CRAN.package.planningML . planningML results . planningML.pdf . planningML User Guide . planningML_1.0.1.tar.gz . planningML_1.0.1.zip . planningML_1.0.1.zip . planningML_1.0.1.zip . planningML_1.0.1.tgz . planningML_1.0.1.tgz . planningML_1.0.1.tgz . planningML_1.0.1.tgz . planningML archive . https://CRAN.R-project.org/package=planningML .

First Published: 2022-11-08

Latest Update: 2023-06-23

Description:

Advances in automated document classification has led to identifying massive numbers of clinical concepts from handwritten clinical notes. These high dimensional clinical concepts can serve as highly informative predictors in building classification algorithms for identifying patients with different clinical conditions, commonly referred to as patient phenotyping. However, from a planning perspective, it is critical to ensure that enough data is available for the phenotyping algorithm to obtain a desired classification performance. This challenge in sample size planning is further exacerbated by the high dimension of the feature space and the inherent imbalance of the response class. Currently available sample size planning methods can be categorized into: (i) model-based approaches that predict the sample size required for achieving a desired accuracy using a linear machine learning classifier and (ii) learning curve-based approaches (Figueroa et al. (2012) ) that fit an inverse power law curve to pilot data to extrapolate performance. We develop model-based approaches for imbalanced data with correlated features, deriving sample size formulas for performance metrics that are sensitive to class imbalance such as Area Under the receiver operating characteristic Curve (AUC) and Matthews Correlation Coefficient (MCC). This is done using a two-step approach where we first perform feature selection using the innovated High Criticism thresholding method (Hall and Jin (2010) ), then determine the sample size by optimizing the two performance metrics. Further, we develop software in the form of an R package named 'planningML' and an 'R' 'Shiny' app to facilitate the convenient implementation of the developed model-based approaches and learning curve approaches for imbalanced data. We apply our methods to the problem of phenotyping rare outcomes using the MIMIC-III electronic health record database. We show that our developed methods which relate training data size and performance on AUC and MCC, can predict the true or observed performance from linear ML classifiers such as LASSO and SVM at different training data sizes. Therefore, in high-dimensional classification analysis with imbalanced data and correlated features, our approach can efficiently and accurately determine the sample size needed for machine-learning based classification.

How to cite:

Xinying Fang (2022). planningML: A Sample Size Calculator for Machine Learning Applications in Healthcare. R package version 1.0.1, https://cran.r-project.org/web/packages/planningML. Accessed 07 May. 2025.

Previous versions and publish date:

1.0.0 (2022-11-08 11:20)

Other packages that cited planningML R package

View planningML citation profile

Other R packages that planningML depends, imports, suggests or enhances

View planningML dependency map

Complete documentation for planningML

View documentation PDF

Functions, R codes and Examples using the planningML R package

Some associated functions: featureselection . fit_learningcurve . learningcurve_data . plot.planningML . samplesize . summary.planningML .
Some associated R codes: calculate_PCC_by_DS_Updated.R . featureselection.R . fit_learningcurve.R . learningcurve_data.R . plot.planningML.R . samplesize.R . summary.planningML.R . Full planningML package functions and examples

Downloads during the last 30 days

Today's Hot Picks in Authors and Packages

funLBM

Model-Based Co-Clustering of Functional Data

The funLBM algorithm allows to simultaneously cluster the rows and the columns of a data matrix wher ...
Download / Learn more Package Citations See dependency

Maintainer: Charles Bouveyron (view profile)

humanize

Create Values for Human Consumption

An almost direct port of the 'python' 'humanize' package . Thi ...
Download / Learn more Package Citations See dependency

Maintainer: Gerry Manoim (view profile)

quickcode

Quick and Essential 'R' Tricks for Better Scripts

The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to im ...
Download / Learn more Package Citations See dependency

Maintainer: Obinna Obianom (view profile)

aroma.affymetrix

Analysis of Large Affymetrix Microarray Data Sets

A cross-platform R framework that facilitates processing of any number of Affymetrix microarray samp ...
Download / Learn more Package Citations See dependency

Maintainer: Henrik Bengtsson (view profile)

MLDS

Maximum Likelihood Difference Scaling

Difference scaling is a method for scaling perceived supra-threshold differences. The package cont ...
Download / Learn more Package Citations See dependency

Maintainer: Kenneth Knoblauch (view profile)