A hodgepodge of notes for learning R for my reference, segmented by

Visualisations

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>
  • use facets: subplots that display subsets of data (categorical variables)
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)
  • also can use facet_grid() to plot combination of two variables:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy)) + 
      facet_grid(drv ~ cyl)
    
  • geom_point vs geom_smooth; which object to map plots; every geom function takes mapping as an argument; instead of linetype below, can also use group to show different types of categories with same linetype

    ggplot(data = mpg) + 
      geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
    

img

  • repetition isn’t necessary:

    ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
      geom_point(mapping = aes(color = class)) + 
      geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
    
  • changing order of bar chart for categorical mappings (so doesn’t order based on increasing frequency):

    demo <- tribble(
      ~cut,         ~freq,
      "Fair",       1610,
      "Good",       4906,
      "Very Good",  12082,
      "Premium",    13791,
      "Ideal",      21551
    )
      
    ggplot(data = demo) +
      geom_bar(mapping = aes(x = cut, y = freq), stat = "identity"
      #use  y = stat(prop) for proportion
    

img

  • lineplots (w stats)/ boxplots:

    ggplot(data = diamonds) + 
      stat_summary(
        mapping = aes(x = cut, y = depth),
        fun.min = min,
        fun.max = max,
        fun = median
      )
      
      ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
      geom_boxplot() 
      #add + coord_flip() to make it horizontal
    

    img

  • can use fill (also for categorical) to fill whole bars, or normal color for outlines

  • position = "identity" will place each object exactly where it falls in the context of the graph. This is not very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars slightly transparent by setting alpha to a small value, or completely transparent by setting fill = NA; basically doesn’t stack like fill, but overlaps on top of each other to see original value compared to y

  • position = "fill" works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups

    img

  • position = "dodge" places overlapping objects directly beside one another. This makes it easier to compare individual values; below image shows this

    img

  • use position = jitter for scatterplot when points over plot with each other; + geom_jitter()
  • can use bar + coord_polar() to make bar charts into pie charts
  • explore labs for tags/titles/etc

For more resources:

  1. Notes taken from an online book here