Session 4: Manipulating Data
Session Description
In this session, we’ll spend some more time learning how to use more advanced tools for data manipulation. We will talk about principles of tidy data, and explore the use of the dplyr
package for more efficient data manipulation and summarization.
Before Class
Data Carpentry (Manipulating Data Frames)
Please download the Session 4 workbook.
Optional: Check out Allison Horst’s interactive dplyr tutorial
Slides
Other Resources
In this lab, you are introduced to the dplyr
package, which is designed to help us manipulate rectangular data frames. dplyr
is a useful replacement for many of the base R commands for querying our data.
- The use of pipes (
%>%
) greatly helps with the legibility of our commands and allows us to see each of the manipulation steps we are taking our data through before producing a result. - Commands like
filter()
andselect()
allow us to query out rows and columns of our dataset respectively. - The
summarise()
command allows us to produce summary tables of our dataset which are very useful for exploring patterns and producing information on small multiples of data. We frequently combinegroup_by()
andsummarise()
in order to produce summaries of subsets of our data.