This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article provides insufficient context for those unfamiliar with the subject. Please help improve the article by providing more context for the reader. (September 2020) (Learn how and when to remove this template message) The topic of this article may not meet Wikipedia's notability guidelines for products and services. Please help to demonstrate the notability of the topic by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to be merged, redirected, or deleted.Find sources: "Dplyr" – news · newspapers · books · scholar · JSTOR (March 2024) (Learn how and when to remove this template message) (Learn how and when to remove this template message)


dplyr
Original author(s)Hadley Wickham, Romain François, Lionel Henry, Kirill Müller, Davis Vaughan
Initial releaseJanuary 7, 2014; 10 years ago (2014-01-07)
Stable release
1.1.0 / January 29, 2023; 14 months ago (2023-01-29)
Written inR
LicenseMIT License
Websitedplyr.tidyverse.org//

One of the core packages of the tidyverse in the R programming language, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.[1][2]

For instance, someone seeking to analyze an enormous dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset.

dplyr was launched in 2014.[3] On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges."[4]

The five core verbs

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs:[5]

filter(), which is used to extract rows from a dataframe, based on conditions specified by a user;

select(), which is used to subset a dataframe by its columns;

arrange(), which is used to sort rows in a dataframe based on attributes held by particular columns;

mutate(), which is used to create new variables, by altering and/or combining values from existing columns; and

summarize(), also spelled summarise(), which is used to collapse values from a dataframe into a single summary.

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

count(), which is used to sum the number of unique observations that contain some particular value or categorical attribute;

rename(), which enables a user to alter the column names for variables, often to improve ease of use and intuitive understanding of a dataset;

slice_max(), which returns a data subset that contains the rows with the highest number of values for some particular variable;

slice_min(), which returns a data subset that contains the rows with the lowest number of values for some particular variable.

Built-in datasets

The dplyr package comes with five datasets. These are: band_instruments, band_instruments2, band_members, starwars, storms.        

Copyright & license

The copyright to dplyr is held by Posit PBC, formerly RStudio PBC. Dplyr was originally released under a GPL license[citation needed], but in 2022 Posit changed the license terms for the package to the "more permissive" MIT License.[6] The chief difference between the two types of license is that the MIT license allows subsequent re-use of code within proprietary software, whereas a GPL license does not.

References

  1. ^ Yadav, Rohit (2019-10-29). "Python's Pandas vs R's Tidyverse: Who Comes Out On Top?". Analytics India Magazine. Retrieved 2021-02-06.
  2. ^ Krill, Paul (2015-06-30). "Why R? The pros and cons of the R language". InfoWorld. Retrieved 2021-02-06.
  3. ^ "Introducing dplyr". blog.rstudio.com. 17 January 2014. Retrieved 2020-09-02.
  4. ^ "Function reference". dplyr.tidyverse.org. Retrieved 2021-02-06.
  5. ^ Grolemund, Garrett; Wickham, Hadley. 5 Data transformation | R for Data Science.
  6. ^ "A Grammar of Data Manipulation". tidyverse.org. Retrieved 2023-01-14.