How to extract part of your dataframe - something you might do when you find outliers, or if your data is so big that you want to pull out only the part you are interested in. Often the first task in data processing is to create subsets of your data in R for further analysis. You’re already familiar with the three subset operators: $: The dollar-sign operator selects a single element of your data (and drops the dimensions of the returned object). When you use this operator with a data frame, the result is always a vector; when you use it with a named list, you get that element.
In this tutorial, you will learn how to select or subset data frame columns by names and position using the R function select and pull in dplyr package. We’ll also show how to remove columns from a data frame. You will learn how to use the following functions:. pull: Extract column values as a vector. The column of interest can be specified either by name or by index.
select: Extract one or multiple columns as a data table. It can be also used to remove columns from the data frame.
selectif: Select columns based on a particular condition. One can use this function to, for example, select columns if they are numeric. Helper functions - startswith, endswith, contains, matches, oneof: Select columns/variables based on their names Contents. Select columns by names Select columns by names: Sepal.Length and Petal.Length mydata%% select(Sepal.Length, Petal.Length) ## # A tibble: 150 x 2 ## Sepal.Length Petal.Length ## ## 1 5.1 1.4 ## 2 4.9 1.4 ## 3 4.7 1.3 ## 4 4.6 1.5 ## 5 5 1.4 ## 6 5.4 1.7 ## #. With 144 more rows Select all columns from Sepal.Length to Petal.Length mydata%% select(Sepal.Length:Petal.Length) ## # A tibble: 150 x 3 ## Sepal.Length Sepal.Width Petal.Length ## ## 1 5.1 3.5 1.4 ## 2 4.9 3 1.4 ## 3 4.7 3.2 1.3 ## 4 4.6 3.1 1.5 ## 5 5 3.6 1.4 ## 6 5.4 3.9 1.7 ## #. With 144 more rows.
There are several special functions that can be used inside select: startswith, endswith, contains, matches, oneof, etc. # Select column whose name starts with 'Petal' mydata%% select(startswith('Petal')) # Select column whose name ends with 'Width' mydata%% select(endswith('Width')) # Select columns whose names contains 'etal' mydata%% select(contains('etal')) # Select columns whose name maches a regular expression mydata%% select(matches('.t.' )) # selects variables provided in a character vector. Mydata%% select(oneof(c('Sepal.Length', 'Petal.Length'))). Note that, to remove a column from a data frame, prepend its name by minus. Removing Sepal.Length and Petal.Length columns: mydata%% select(-Sepal.Length, -Petal.Length) Removing all columns from Sepal.Length to Petal.Length: mydata%% select(-(Sepal.Length:Petal.Length)) ## # A tibble: 150 x 2 ## Petal.Width Species ## ## 1 0.2 setosa ## 2 0.2 setosa ## 3 0.2 setosa ## 4 0.2 setosa ## 5 0.2 setosa ## 6 0.4 setosa ## #. With 144 more rows Removing all columns whose name starts with “Petal”: mydata%% select(-startswith('Petal')) ## # A tibble: 150 x 3 ## Sepal.Length Sepal.Width Species ## ## 1 5.1 3.5 setosa ## 2 4.9 3 setosa ## 3 4.7 3.2 setosa ## 4 4.6 3.1 setosa ## 5 5 3.6 setosa ## 6 5.4 3.9 setosa ## #.