recoding variables
Miss我要改个名 (sense and sensibility)
在读 R in Action
 章节名：recoding variables
 20150706 14:18:15
variable[condition]<expression assignment will be made when condition is true. 代码都是为了加强记忆瞎敲的，不要复制！ leadership< within(leadership, { agecat < NA agecat[age>75 mydata < transform(mydata, sumx=x1+x2) reshape package has a rename() function that's useful for altering the names of variables. the format of the rename() function is rename(dataframe, c(oldname="newname")) library(reshape) is.na() missing values are considered not comparable, even to themselves. this means we cannot use comparison operators to test for the presence of missing values. eg. logical test myvar == NA is never TRUE. instead, you have to use missing values functions. na.rm=TRUE once you have identified the missing values, you need to eliminate them in some way before analyzing data further. the reason is that arithmetic expression and functions that contain missing values yield missing values. na.omit() we can remove any observation with missing values by using na.omit() function. it deletes any rows with missing data. newdata < na.omit(leadership) this is called listwise deletion 4.6 Date Values P81 as.Date(x, "input_format") x is the character data and input format gives the appropriate format for reading the date. the default format is yyyymmdd once the variable is in date format, you can analyze and plot the dates using the wide range of analytic techniques. Sys.Date() returns today's date date() returns the current date and time. format(today, format="%B %d %Y") format() we can use format(x, format="") function to output dates in a specified format, and to extract portions of dates. when R stores dates internally, they re represented as the number of days since jan.1 197, with negative values for earlier dates. this means we can perform arithmetic operations on them. difftime() to calculate a time interval and express it as seconds, minutes, hours, days, or weeks, as we like. Packages we may use lubridate package contains a number of functions that simplify working with date, including functions to identify and parse datetime data, extract datetime components and perform arithmetic calculations on datetimes. fCalendar package provides a myriad of functions for dealing with dates, can handle multiple time zones at once, and provides sophisticated calendar manipulations that recognize business days, weekends, and holidays. as.datatype() is.datatype() order() default asending prepend the sorting variable with a minus sign to indicate a descending order leadership[order(gender, age), ] 逗号什么意思 merge() cbind() rbind() selecting variables dataframe[row indices, column indices] myvars < names(leadership) %in% c("q3","q4") we can remove variables by set columns undefined, which is NULL. note that null is not the same as NA. well subset() function is probably the easiest way to select variables and observations. newdata < subset(leadership, age >=35age<24, select=c(q1,q2,q3,q4)) sample() sampling from larger dataset is a common practice in data mining and machine learning. for example, you may want to select two random samples. creating a predicative model from one and validating its effectiveness on the other. the sample() function enables you to take a random sample with or without replacement of size n from a dataset. Arguments x Either a vector of one or more elements from which to choose, or a positive integer. n a positive number, the number of items to choose from. size a nonnegative integer giving the number of items to choose. replace 是否放回抽样 Should sampling be with replacement? prob A vector of probability weights for obtaining the elements of the vector being sampled.
Miss我要改个名对本书的所有笔记 · · · · · ·

第38页 输入输出
By default, launching R starts an interactive session with input from the keyboard and ...

第42页 1.5 Batch processing
C CMD BATCH infile out file where infile is the name of the file containing R code to b...

recoding variables
说明 · · · · · ·
表示其中内容是对原文的摘抄