- **Purpose:** Using the cut function to add a categorical column to a dataframe based on a numerical column in R
- **Author:** Tamim Ahsan
- **Date:** April 07, 2025

In [1]:
# Install a package to keep values within a range
install.packages("truncnorm")
library(truncnorm)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [2]:
# Make a dataframe
set.seed(345)
temperature <- round(rtruncnorm(50, mean = 35, sd = 20, a = -15, b = 60), 2)
place <- rep(c("A", "B", "C", "D", "E"), each = 10)
df <- data.frame(place = place, temperature = temperature)
df

place,temperature
<chr>,<dbl>
A,19.3
A,29.41
A,31.77
A,29.19
A,33.65
A,22.33
A,16.45
A,7.0
A,18.01
A,41.37


In [3]:
# Summarize the dataframe
summary(df)

    place            temperature   
 Length:50          Min.   : 6.94  
 Class :character   1st Qu.:18.33  
 Mode  :character   Median :27.91  
                    Mean   :30.59  
                    3rd Qu.:43.19  
                    Max.   :55.92  

In [4]:
# Add a new column using cut based on the temperature
df$weather = cut(df$temperature,
breaks = c(-15, 0, 20, 30, 40, 60),
labels = c("very cold", "cold", "normal", "hot", "very hot"))

In [5]:
# Look at the weather column
table(df$weather)


very cold      cold    normal       hot  very hot 
        0        14        13         7        16 

In [6]:
sessionInfo()

R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] truncnorm_1.0-9

loaded via a namespace (and not attached):
 [1] digest_0.6.37     IRdisplay_1.1     base64enc_0.1-3   fastmap_1.2.0    
 [5] glue_1.8.0        htmltools_0.5.8