Skip to content
#

data-wrangling

Here are 654 public repositories matching this topic...

thadguidry
thadguidry commented May 28, 2022

It is a bit confusing when someone imports a file or pastes from clipboard and wants to enable column headers.
The wording that we chose "Parse next" isn't quite what we meant as "Parse first".
We have several translation files that likely need to be updated.
For example, the EN translation:
https://github.com/OpenRefine/OpenRefine/blob/3cbb5b143df3caa6e50d5493a34cee3f12cbcf77/extensions/dat

UI good first issue import
dakirsc
dakirsc commented Jun 7, 2022

My comment is specifically about the example code showing how to add points to a ggplot boxplot. It's possible this is something that was fixed in an update to R or the ggplot2 package, but I know historically there have been issues with using geom_jitter and geom_boxplot together and these combined functions causing outlier points to duplicate.

I realize the code in the episode takes care of

help wanted good first issue
davis68
davis68 commented Sep 17, 2020
  • I felt like nunique was arbitrarily (re)introduced when it was necessary. It wouldn't be top-of-mind for students solving problems.
  • The lesson answers need to be adjacent to the exercises.
  • I like the pre-introduction of masks and then circling back around to explain them.
  • I feel like Part 4 needs to be broken up and integrated across other lessons: it felt thin on its own.
  • Horizo
good first issue
umnik20
umnik20 commented May 4, 2020

Dear Community,

There is a typo in the section titled "The StringsAsFactors argument" after the second block of code that demonstrates the use of the str() function. Right after the code boxes is written "We can see that the $Color and $State columns are factors and $Speed is a numeric column", but the box shows that the $Color column is a vector of strings.

Regards,

Rodolfo

good first issue
dsmanufacturing
dsmanufacturing commented Apr 15, 2022

In the second episode I believe that it would be more usefull to have the sections "Python is case-sensitive" and "Use meaningful variable names" right after "Use variables to store values". It would keep all the information and instructions for using and allocating variables in one place before going into printing and slicing.

help wanted type:enhancement good first issue
lachlandeer
lachlandeer commented Jul 30, 2018

In episode _episodes_rmd/12-time-series-raster.Rmd

There is a big chunk of code that can probably be made to look nicer via dplyr:

# Plot RGB data for Julian day 133
 RGB_133 <- stack("data/NEON-DS-Landsat-NDVI/HARV/2011/RGB/133_HARV_landRGB.tif")
 RGB_133_df <- raster::as.data.frame(RGB_133, xy = TRUE)
 quantiles = c(0.02, 0.98)
 r <- quantile(RGB_133_df$X133_HARV_landRGB.1, q
good first issue
jjmedinaariza
jjmedinaariza commented Feb 17, 2022

The Setup section (https://datacarpentry.org/r-socialsci/setup.html) probably needs to be updated to make references to Rtools in the Windows instructions. As noted in the CRAN repository "Starting with R 4.0.0 (released April 2020), R for Windows uses a toolchain bundle called rtools4." As I discovered through a recent install in a new machine not having gone through this process gave me problems

help wanted good first issue

Improve this page

Add a description, image, and links to the data-wrangling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-wrangling topic, visit your repo's landing page and select "manage topics."

Learn more