This is another classification example. We have to classify breast tumor as malign or not.

The dataset is available on the UCI Machine learning website as well as on Kaggle.

We have taken idea from several blogs listed below in the reference section.

Import the data

library(tidyverse)
df <- read_csv("../bookdown-MachineLearningwithR/dataset/BreastCancer.csv")

# This is defintely an most important step:  
# Check for appropriate class on each of the variable.  
glimpse(df)
## Observations: 569
## Variables: 32
## $ id                      <int> 842302, 842517, 84300903, 84348301, 84...
## $ diagnosis               <chr> "M", "M", "M", "M", "M", "M", "M", "M"...
## $ radius_mean             <dbl> 17.990, 20.570, 19.690, 11.420, 20.290...
## $ texture_mean            <dbl> 10.38, 17.77, 21.25, 20.38, 14.34, 15....
## $ perimeter_mean          <dbl> 122.80, 132.90, 130.00, 77.58, 135.10,...
## $ area_mean               <dbl> 1001.0, 1326.0, 1203.0, 386.1, 1297.0,...
## $ smoothness_mean         <dbl> 0.11840, 0.08474, 0.10960, 0.14250, 0....
## $ compactness_mean        <dbl> 0.27760, 0.07864, 0.15990, 0.28390, 0....
## $ concavity_mean          <dbl> 0.30010, 0.08690, 0.19740, 0.24140, 0....
## $ concave points_mean     <dbl> 0.14710, 0.07017, 0.12790, 0.10520, 0....
## $ symmetry_mean           <dbl> 0.2419, 0.1812, 0.2069, 0.2597, 0.1809...
## $ fractal_dimension_mean  <dbl> 0.07871, 0.05667, 0.05999, 0.09744, 0....
## $ radius_se               <dbl> 1.0950, 0.5435, 0.7456, 0.4956, 0.7572...
## $ texture_se              <dbl> 0.9053, 0.7339, 0.7869, 1.1560, 0.7813...
## $ perimeter_se            <dbl> 8.589, 3.398, 4.585, 3.445, 5.438, 2.2...
## $ area_se                 <dbl> 153.40, 74.08, 94.03, 27.23, 94.44, 27...
## $ smoothness_se           <dbl> 0.006399, 0.005225, 0.006150, 0.009110...
## $ compactness_se          <dbl> 0.049040, 0.013080, 0.040060, 0.074580...
## $ concavity_se            <dbl> 0.05373, 0.01860, 0.03832, 0.05661, 0....
## $ concave points_se       <dbl> 0.015870, 0.013400, 0.020580, 0.018670...
## $ symmetry_se             <dbl> 0.03003, 0.01389, 0.02250, 0.05963, 0....
## $ fractal_dimension_se    <dbl> 0.006193, 0.003532, 0.004571, 0.009208...
## $ radius_worst            <dbl> 25.38, 24.99, 23.57, 14.91, 22.54, 15....
## $ texture_worst           <dbl> 17.33, 23.41, 25.53, 26.50, 16.67, 23....
## $ perimeter_worst         <dbl> 184.60, 158.80, 152.50, 98.87, 152.20,...
## $ area_worst              <dbl> 2019.0, 1956.0, 1709.0, 567.7, 1575.0,...
## $ smoothness_worst        <dbl> 0.1622, 0.1238, 0.1444, 0.2098, 0.1374...
## $ compactness_worst       <dbl> 0.6656, 0.1866, 0.4245, 0.8663, 0.2050...
## $ concavity_worst         <dbl> 0.71190, 0.24160, 0.45040, 0.68690, 0....
## $ concave points_worst    <dbl> 0.26540, 0.18600, 0.24300, 0.25750, 0....
## $ symmetry_worst          <dbl> 0.4601, 0.2750, 0.3613, 0.6638, 0.2364...
## $ fractal_dimension_worst <dbl> 0.11890, 0.08902, 0.08758, 0.17300, 0....

So we have 569 observations with 32 variables. Ideally for so many variables, it would be appropriate to get a few more observations.

Tidy the data

Basics change of variable type for the outcome variable and renaming of variables badly encoded

df$diagnosis <- as.factor(df$diagnosis)

df <- df %>% rename(concave_points_mean = `concave points_mean`, 
                    concave_points_se = `concave points_se`, 
                    concave_points_worst = `concave points_worst`)

As you might have noticed, in this case and the precedent we had very little to do here. This is not usually the case; we’ll take it!.

Understand the data

This is the circular phase of our dealing with data. This is where each of the transforming, visualizing and modeling stage reinforce each other to create a better understanding.

Check for missing values

map_int(df, function(.x) sum(is.na(.x)))
##                      id               diagnosis             radius_mean 
##                       0                       0                       0 
##            texture_mean          perimeter_mean               area_mean 
##                       0                       0                       0 
##         smoothness_mean        compactness_mean          concavity_mean 
##                       0                       0                       0 
##     concave_points_mean           symmetry_mean  fractal_dimension_mean 
##                       0                       0                       0 
##               radius_se              texture_se            perimeter_se 
##                       0                       0                       0 
##                 area_se           smoothness_se          compactness_se 
##                       0                       0                       0 
##            concavity_se       concave_points_se             symmetry_se 
##                       0                       0                       0 
##    fractal_dimension_se            radius_worst           texture_worst 
##                       0                       0                       0 
##         perimeter_worst              area_worst        smoothness_worst 
##                       0                       0                       0 
##       compactness_worst         concavity_worst    concave_points_worst 
##                       0                       0                       0 
##          symmetry_worst fractal_dimension_worst 
##                       0                       0

Good news, there are no missing values.

In the case that there would be many missing values, we would go on the transforming data for some appropriate imputation.

Let’s check how balanced is our response variable

round(prop.table(table(df$diagnosis)), 2)
## 
##    B    M 
## 0.63 0.37

The response variable is slightly unbalanced.

Let’s look for correlation in the variables. Most ML algorithms assumed that the predictor variables are independent from each others.

Let’s check for correlations. For an anlysis to be robust it is good to remove mutlicollinearity (aka remove highly correlated predictors)

df_corr <- cor(df %>% select(-id, -diagnosis))
corrplot::corrplot(df_corr, order = "hclust", tl.cex = 1, addrect = 8)

Indeed there are quite a few variables that are correlated. On the next step, we will remove the highly correlated ones using the caret package.

Transform the data

library(caret)
# The findcorrelation() function from caret package remove highly correlated predictors
# based on whose correlation is above 0.9. This function uses a heuristic algorithm 
# to determine which variable should be removed instead of selecting blindly
df2 <- df %>% select(-findCorrelation(df_corr, cutoff = 0.9))

#Number of columns for our new data frame
ncol(df2)
## [1] 22

So our new data frame df2 is 10 variables shorter.

Pre-process the data

Using PCA

Let’s first go on an unsupervised analysis with a PCA analysis. To do so, we will remove the id and diagnosis variable, then we will also scale and center the variables.

preproc_pca_df <- prcomp(df %>% select(-id, -diagnosis), scale = TRUE, center = TRUE)
summary(preproc_pca_df)
## Importance of components%s:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     3.6444 2.3857 1.67867 1.40735 1.28403 1.09880
## Proportion of Variance 0.4427 0.1897 0.09393 0.06602 0.05496 0.04025
## Cumulative Proportion  0.4427 0.6324 0.72636 0.79239 0.84734 0.88759
##                            PC7     PC8    PC9    PC10   PC11    PC12
## Standard deviation     0.82172 0.69037 0.6457 0.59219 0.5421 0.51104
## Proportion of Variance 0.02251 0.01589 0.0139 0.01169 0.0098 0.00871
## Cumulative Proportion  0.91010 0.92598 0.9399 0.95157 0.9614 0.97007
##                           PC13    PC14    PC15    PC16    PC17    PC18
## Standard deviation     0.49128 0.39624 0.30681 0.28260 0.24372 0.22939
## Proportion of Variance 0.00805 0.00523 0.00314 0.00266 0.00198 0.00175
## Cumulative Proportion  0.97812 0.98335 0.98649 0.98915 0.99113 0.99288
##                           PC19    PC20   PC21    PC22    PC23   PC24
## Standard deviation     0.22244 0.17652 0.1731 0.16565 0.15602 0.1344
## Proportion of Variance 0.00165 0.00104 0.0010 0.00091 0.00081 0.0006
## Cumulative Proportion  0.99453 0.99557 0.9966 0.99749 0.99830 0.9989
##                           PC25    PC26    PC27    PC28    PC29    PC30
## Standard deviation     0.12442 0.09043 0.08307 0.03987 0.02736 0.01153
## Proportion of Variance 0.00052 0.00027 0.00023 0.00005 0.00002 0.00000
## Cumulative Proportion  0.99942 0.99969 0.99992 0.99997 1.00000 1.00000
# Calculate the proportion of variance explained
pca_df_var <- preproc_pca_df$sdev^2
pve_df <- pca_df_var / sum(pca_df_var)
cum_pve <- cumsum(pve_df)
pve_table <- tibble(comp = seq(1:ncol(df %>% select(-id, -diagnosis))), pve_df, cum_pve)

ggplot(pve_table, aes(x = comp, y = cum_pve)) + 
  geom_point() + 
  geom_abline(intercept = 0.95, color = "red", slope = 0)

With the original dataset, 95% of the variance is explained with 10 PC’s.

Let’s do the same exercise with our second df, the one where we removed the highly correlated predictors.

preproc_pca_df2 <- prcomp(df2, scale = TRUE, center = TRUE)
summary(preproc_pca_df2)
## Importance of components%s:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     3.2051 2.1175 1.46634 1.09037 0.95215 0.90087
## Proportion of Variance 0.4669 0.2038 0.09773 0.05404 0.04121 0.03689
## Cumulative Proportion  0.4669 0.6707 0.76847 0.82251 0.86372 0.90061
##                            PC7     PC8    PC9    PC10    PC11    PC12
## Standard deviation     0.77121 0.56374 0.5530 0.51130 0.45605 0.36602
## Proportion of Variance 0.02703 0.01445 0.0139 0.01188 0.00945 0.00609
## Cumulative Proportion  0.92764 0.94209 0.9560 0.96787 0.97732 0.98341
##                           PC13    PC14   PC15   PC16    PC17   PC18   PC19
## Standard deviation     0.31602 0.28856 0.2152 0.2098 0.16346 0.1558 0.1486
## Proportion of Variance 0.00454 0.00378 0.0021 0.0020 0.00121 0.0011 0.0010
## Cumulative Proportion  0.98795 0.99174 0.9938 0.9958 0.99706 0.9982 0.9992
##                           PC20    PC21    PC22
## Standard deviation     0.09768 0.08667 0.03692
## Proportion of Variance 0.00043 0.00034 0.00006
## Cumulative Proportion  0.99960 0.99994 1.00000
pca_df2_var <- preproc_pca_df2$sdev^2

# proportion of variance explained
pve_df2 <- pca_df2_var / sum(pca_df2_var)
cum_pve_df2 <- cumsum(pve_df2)
pve_table_df2 <- tibble(comp = seq(1:ncol(df2)), pve_df2, cum_pve_df2)

ggplot(pve_table_df2, aes(x = comp, y = cum_pve_df2)) + 
  geom_point() + 
  geom_abline(intercept = 0.95, color = "red", slope = 0)

In this case, around 8 PC’s explained 95% of the variance.

To visualize which variable are the most influential on the first 2 components

library(ggfortify)
## Warning: namespace 'DBI' is not available and has been replaced
## by .GlobalEnv when processing object 'quiet'

## Warning: namespace 'DBI' is not available and has been replaced
## by .GlobalEnv when processing object 'quiet'
autoplot(preproc_pca_df2, data = df,  colour = 'diagnosis',
                    loadings = FALSE, loadings.label = TRUE, loadings.colour = "blue")

Let’s visuzalize the first 3 components.

df_pcs <- cbind(as_tibble(df$diagnosis), as_tibble(preproc_pca_df2$x))
GGally::ggpairs(df_pcs, columns = 2:4, ggplot2::aes(color = value))

As it can be seen from the above plots the first 3 principal components separate the two classes to some extent only, this is expected since the variance explained by these components is not large.

Using LDA

The advantage of using LDA is that it takes into consideration the different class.

preproc_lda_df <- MASS::lda(diagnosis ~., data = df, center = TRUE, scale = TRUE)
preproc_lda_df
## Call:
## lda(diagnosis ~ ., data = df, center = TRUE, scale = TRUE)
## 
## Prior probabilities of groups:
##         B         M 
## 0.6274165 0.3725835 
## 
## Group means:
##         id radius_mean texture_mean perimeter_mean area_mean
## B 26543825    12.14652     17.91476       78.07541  462.7902
## M 36818050    17.46283     21.60491      115.36538  978.3764
##   smoothness_mean compactness_mean concavity_mean concave_points_mean
## B      0.09247765       0.08008462     0.04605762          0.02571741
## M      0.10289849       0.14518778     0.16077472          0.08799000
##   symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## B      0.174186             0.06286739 0.2840824   1.220380     2.000321
## M      0.192909             0.06268009 0.6090825   1.210915     4.323929
##    area_se smoothness_se compactness_se concavity_se concave_points_se
## B 21.13515   0.007195902     0.02143825   0.02599674       0.009857653
## M 72.67241   0.006780094     0.03228117   0.04182401       0.015060472
##   symmetry_se fractal_dimension_se radius_worst texture_worst
## B  0.02058381          0.003636051     13.37980      23.51507
## M  0.02047240          0.004062406     21.13481      29.31821
##   perimeter_worst area_worst smoothness_worst compactness_worst
## B        87.00594   558.8994        0.1249595         0.1826725
## M       141.37033  1422.2863        0.1448452         0.3748241
##   concavity_worst concave_points_worst symmetry_worst
## B       0.1662377           0.07444434      0.2702459
## M       0.4506056           0.18223731      0.3234679
##   fractal_dimension_worst
## B              0.07944207
## M              0.09152995
## 
## Coefficients of linear discriminants:
##                                   LD1
## id                      -2.512117e-10
## radius_mean             -1.080876e+00
## texture_mean             2.338408e-02
## perimeter_mean           1.172707e-01
## area_mean                1.595690e-03
## smoothness_mean          5.251575e-01
## compactness_mean        -2.094197e+01
## concavity_mean           6.955923e+00
## concave_points_mean      1.047567e+01
## symmetry_mean            4.938898e-01
## fractal_dimension_mean  -5.937663e-02
## radius_se                2.101503e+00
## texture_se              -3.979869e-02
## perimeter_se            -1.121814e-01
## area_se                 -4.083504e-03
## smoothness_se            7.987663e+01
## compactness_se           1.387026e-01
## concavity_se            -1.768261e+01
## concave_points_se        5.350520e+01
## symmetry_se              8.143611e+00
## fractal_dimension_se    -3.431356e+01
## radius_worst             9.677207e-01
## texture_worst            3.540591e-02
## perimeter_worst         -1.204507e-02
## area_worst              -5.012127e-03
## smoothness_worst         2.612258e+00
## compactness_worst        3.636892e-01
## concavity_worst          1.880699e+00
## concave_points_worst     2.218189e+00
## symmetry_worst           2.783102e+00
## fractal_dimension_worst  2.117830e+01
# Making a df out of the LDA for visualization purpose.
predict_lda_df <- predict(preproc_lda_df, df)$x %>% 
  as_data_frame() %>% 
  cbind(diagnosis = df$diagnosis)

Model the data

Let’s first create a testing and training set using caret

set.seed(1815)
df3 <- cbind(diagnosis = df$diagnosis, df2)
df_sampling_index <- createDataPartition(df3$diagnosis, times = 1, p = 0.8, list = FALSE)
df_training <- df3[df_sampling_index, ]
df_testing <-  df3[-df_sampling_index, ]
df_control <- trainControl(method="cv",
                           number = 15,
                           repeats = 3, 
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary)

Logistic regression

model_logreg_df <- train(diagnosis ~., data = df_training, method = "glm", 
                         metric = "ROC", preProcess = c("scale", "center"), 
                         trControl = df_control)

prediction_logreg_df <- predict(model_logreg_df, df_testing)
cm_logreg_df <- confusionMatrix(prediction_logreg_df, df_testing$diagnosis, positive = "M")
cm_logreg_df
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  B  M
##          B 71  2
##          M  0 40
##                                           
##                Accuracy : 0.9823          
##                  95% CI : (0.9375, 0.9978)
##     No Information Rate : 0.6283          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9617          
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 0.9524          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9726          
##              Prevalence : 0.3717          
##          Detection Rate : 0.3540          
##    Detection Prevalence : 0.3540          
##       Balanced Accuracy : 0.9762          
##                                           
##        'Positive' Class : M               
## 

Random Forest

model_rf_df <- train(diagnosis ~., data = df_training,
                     method = "rf", 
                     metric = 'ROC', 
                     trControl = df_control)

prediction_rf_df <- predict(model_rf_df, df_testing)
cm_rf_df <- confusionMatrix(prediction_rf_df, df_testing$diagnosis, positive = "M")
cm_rf_df
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  B  M
##          B 70  2
##          M  1 40
##                                           
##                Accuracy : 0.9735          
##                  95% CI : (0.9244, 0.9945)
##     No Information Rate : 0.6283          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9429          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9524          
##             Specificity : 0.9859          
##          Pos Pred Value : 0.9756          
##          Neg Pred Value : 0.9722          
##              Prevalence : 0.3717          
##          Detection Rate : 0.3540          
##    Detection Prevalence : 0.3628          
##       Balanced Accuracy : 0.9691          
##                                           
##        'Positive' Class : M               
## 
plot(model_rf_df)

plot(model_rf_df$finalModel)

varImpPlot(model_rf_df$finalModel)

model_rf2_df <- train(diagnosis ~., data = df_training, 
                      method = "ranger", 
                      metric = "ROC", 
                      preProcess = c("scale", "center"), 
                      trControl = df_control)
prediction_rf2_df <- predict(model_rf2_df, df_testing)
confusionMatrix(prediction_rf2_df, df_testing$diagnosis, positive = "M")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  B  M
##          B 70  3
##          M  1 39
##                                           
##                Accuracy : 0.9646          
##                  95% CI : (0.9118, 0.9903)
##     No Information Rate : 0.6283          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9235          
##  Mcnemar's Test P-Value : 0.6171          
##                                           
##             Sensitivity : 0.9286          
##             Specificity : 0.9859          
##          Pos Pred Value : 0.9750          
##          Neg Pred Value : 0.9589          
##              Prevalence : 0.3717          
##          Detection Rate : 0.3451          
##    Detection Prevalence : 0.3540          
##       Balanced Accuracy : 0.9572          
##                                           
##        'Positive' Class : M               
## 

KNN

model_knn_df <- train(diagnosis ~., data = df_training, 
                      method = "knn", 
                      metric = "ROC", 
                      preProcess = c("scale", "center"), 
                      trControl = df_control, 
                      tuneLength =31)

plot(model_knn_df)

prediction_knn_df <- predict(model_knn_df, df_testing)
cm_knn_df <- confusionMatrix(prediction_knn_df, df_testing$diagnosis, positive = "M")
cm_knn_df
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  B  M
##          B 71  5
##          M  0 37
##                                           
##                Accuracy : 0.9558          
##                  95% CI : (0.8998, 0.9855)
##     No Information Rate : 0.6283          
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.9029          
##  Mcnemar's Test P-Value : 0.07364         
##                                           
##             Sensitivity : 0.8810          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9342          
##              Prevalence : 0.3717          
##          Detection Rate : 0.3274          
##    Detection Prevalence : 0.3274          
##       Balanced Accuracy : 0.9405          
##                                           
##        'Positive' Class : M               
## 

Support Vector Machine

model_svm_df <- train(diagnosis ~., data = df_training, method = "svmRadial", 
                      metric = "ROC", 
                      preProcess = c("scale", "center"), 
                      trace = FALSE, 
                      trControl = df_control)

prediction_svm_df <- predict(model_svm_df, df_testing)
cm_svm_df <- confusionMatrix(prediction_svm_df, df_testing$diagnosis, positive = "M")
cm_svm_df
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  B  M
##          B 71  1
##          M  0 41
##                                           
##                Accuracy : 0.9912          
##                  95% CI : (0.9517, 0.9998)
##     No Information Rate : 0.6283          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.981           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9762          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9861          
##              Prevalence : 0.3717          
##          Detection Rate : 0.3628          
##    Detection Prevalence : 0.3628          
##       Balanced Accuracy : 0.9881          
##                                           
##        'Positive' Class : M               
## 

Neural Network with LDA

To use the LDA pre-processing step, we need to also create the same training and testing set.

lda_training <- predict_lda_df[df_sampling_index, ]
lda_testing <- predict_lda_df[-df_sampling_index, ]
model_nnetlda_df <- train(diagnosis ~., lda_training, 
                          method = "nnet", 
                          metric = "ROC", 
                          preProcess = c("center", "scale"), 
                          tuneLength = 10, 
                          trace = FALSE, 
                          trControl = df_control)
## Loading required package: nnet
prediction_nnetlda_df <- predict(model_nnetlda_df, lda_testing)
cm_nnetlda_df <- confusionMatrix(prediction_nnetlda_df, lda_testing$diagnosis, positive = "M")

Models evaluation

model_list <- list(rf = model_rf_df, svm = model_svm_df, 
                   logisic = model_logreg_df, Neural_with_LDA = model_nnetlda_df)
results <- resamples(model_list)

summary(results)
## 
## Call:
## summary.resamples(object = results)
## 
## Models: rf, svm, logisic, Neural_with_LDA 
## Number of resamples: 15 
## 
## ROC 
##                      Min.   1st Qu.    Median      Mean 3rd Qu. Max. NA's
## rf              0.9425837 0.9828947 0.9952153 0.9887879       1    1    0
## svm             0.9617225 0.9952153 1.0000000 0.9929825       1    1    0
## logisic         0.8640351 0.9150718 1.0000000 0.9616959       1    1    0
## Neural_with_LDA 0.9617225 0.9952153 1.0000000 0.9949548       1    1    0
## 
## Sens 
##                      Min.   1st Qu.    Median      Mean 3rd Qu. Max. NA's
## rf              0.8421053 0.9473684 1.0000000 0.9684211       1    1    0
## svm             0.8421053 0.9736842 1.0000000 0.9719298       1    1    0
## logisic         0.8421053 0.9236842 0.9473684 0.9477193       1    1    0
## Neural_with_LDA 0.9473684 1.0000000 1.0000000 0.9896491       1    1    0
## 
## Spec 
##                      Min.   1st Qu.    Median      Mean 3rd Qu. Max. NA's
## rf              0.8181818 0.8712121 0.9166667 0.9181818       1    1    0
## svm             0.8333333 0.9128788 1.0000000 0.9651515       1    1    0
## logisic         0.8181818 0.9090909 1.0000000 0.9530303       1    1    0
## Neural_with_LDA 0.9090909 0.9090909 1.0000000 0.9585859       1    1    0
bwplot(results, metric = "ROC")

#dotplot(results)

The logistic model has to much variability for it to be reliable. The SVM and Neural Network with LDA pre-processing are giving the best results. The ROC metric measure the auc of the roc curve of each model. This metric is independent of any threshold. Let’s remember how these models result with the testing dataset. Prediction classes are obtained by default with a threshold of 0.5 which could not be the best with an unbalanced dataset like this.

cm_list <- list(cm_rf = cm_rf_df, cm_svm = cm_svm_df, 
                   cm_logisic = cm_logreg_df, cm_nnet_LDA = cm_nnetlda_df)
results <- map_df(cm_list, function(x) x$byClass) %>% as_tibble() %>% 
  mutate(stat = names(cm_rf_df$byClass))

The best results for sensitivity (detection of breast cases) is LDA_NNET which also has a great F1 score.

References

A useful popular kernel on this dataset on Kaggle Another one, also on Kaggle