# 《R in Action》的笔记-recoding variables

###### MissAiyo (sense and sensibility)

在读 R in Action

- 章节名：recoding variables
- 2015-07-06 14:18:15

variable[condition]<-expressionassignment will be made when condition is true.代码都是为了加强记忆瞎敲的，不要复制！leadership<- within(leadership, { agecat <- NA agecat[age>75mydata <- transform(mydata, sumx=x1+x2)reshape package has a rename() function that's useful for altering the names of variables. the format of the rename() function is rename(dataframe, c(oldname="newname"))library(reshape)is.na()missing values are considered not comparable, even to themselves. this means we cannot use comparison operators to test for the presence of missing values. eg. logical test myvar == NA is never TRUE. instead, you have to use missing values functions. na.rm=TRUEonce you have identified the missing values, you need to eliminate them in some way before analyzing data further. the reason is that arithmetic expression and functions that contain missing values yield missing values. na.omit()we can remove any observation with missing values by using na.omit() function. it deletes any rows with missing data. newdata <- na.omit(leadership)this is called listwise deletion 4.6 Date Values P81as.Date(x, "input_format")x is the character data and input format gives the appropriate format for reading the date.the default format is yyyy-mm-ddonce the variable is in date format, you can analyze and plot the dates using the wide range of analytic techniques. Sys.Date() returns today's date date() returns the current date and time. format(today, format="%B %d %Y")format()we can use format(x, format="") function to output dates in a specified format, and to extract portions of dates. when R stores dates internally, they re represented as the number of days since jan.1 197-, with negative values for earlier dates. this means we can perform arithmetic operations on them. difftime()to calculate a time interval and express it as seconds, minutes, hours, days, or weeks, as we like. Packages we may uselubridate package contains a number of functions that simplify working with date, including functions to identify and parse date-time data, extract date-time components and perform arithmetic calculations on date-times.fCalendar package provides a myriad of functions for dealing with dates, can handle multiple time zones at once, and provides sophisticated calendar manipulations that recognize business days, weekends, and holidays. as.datatype()is.datatype()order()default asendingprepend the sorting variable with a minus sign to indicate a descending orderleadership[order(gender, -age), ]逗号什么意思merge()cbind()rbind()selecting variables dataframe[row indices, column indices] myvars <- names(leadership) %in% c("q3","q4")we can remove variables by set columns undefined, which is NULL. note that null is not the same as NA.wellsubset() function is probably the easiest way to select variables and observations. newdata <- subset(leadership, age >=35|age<24, select=c(q1,q2,q3,q4))sample()sampling from larger dataset is a common practice in data mining and machine learning. for example, you may want to select two random samples. creating a predicative model from one and validating its effectiveness on the other. the sample() function enables you to take a random sample with or without replacement of size n from a dataset. Argumentsx Either a vector of one or more elements from which to choose, or a positive integer. n a positive number, the number of items to choose from. size a non-negative integer giving the number of items to choose.replace 是否放回抽样Should sampling be with replacement?prob A vector of probability weights for obtaining the elements of the vector being sampled.

3人阅读

## 说明 · · · · · ·

表示其中内容是对原文的摘抄