Data frames in R
A Data Frame is one of the most important and widely used data structures in R. It is a two-dimensional tabular data structure where each column contains values of one variable and each row contains one set of values from each column. The data stored in a data frame can be numeric, factor or character type.
Creating a Data Frame
Creating a data frame in R is straightforward. You can use the data.frame()
function to create a data frame. Here is an example:
# create a data frame
students <- data.frame(
"Name" = c("John", "Sara", "Anna", "Mark"),
"Age" = c(21, 23, 22, 24),
"Grade" = c("A", "B", "A", "C")
)
print(students)
In this example, we create a data frame named students
containing three columns: Name
, Age
, and Grade
.
Accessing Data Frame Elements
You can access elements of a data frame similar to how you would with a matrix. You can use the row and column index to access specific elements.
# access 1st row, 1st column
print(students[1,1])
You can also access elements using column names:
# access 'Name' column
print(students$Name)
Adding Rows and Columns
To add more rows to your data frame, you can use the rbind()
function. The rbind()
function combines vectors by rows.
# add a new row
new_student <- data.frame("Name"="Emma", "Age"=22, "Grade"="B")
students <- rbind(students, new_student)
Adding a new column is as easy as adding a new row. You can simply use the $
operator to add a new column:
# add a new column
students$GPA <- c(3.5, 3.7, 3.8, 3.2, 3.6)
Subsetting Data Frames
Subsetting data frames in R can be done in multiple ways. One common way is to use the subset()
function.
# subset data frame
high_GPA <- subset(students, GPA > 3.6)
In this example, we create a new data frame that only includes students with a GPA greater than 3.6.
Summary of a Data Frame
The summary()
function gives a quick overview of the data in each column of the data frame. It provides the minimum value, 1st quartile, median, mean, 3rd quartile, and maximum value for numeric data. For factor columns, it provides the count of each factor level.
# get summary of data frame
summary(students)
In conclusion, the data frame is a powerful tool for data manipulation in R. It's flexible and easy to use. Once you get comfortable with it, you'll find it indispensable for your data analysis tasks.