Miss我要改个名对《R in Action》的笔记(3)

Miss我要改个名 (sense and sensibility)

在读 R in Action

R in Action
  • 书名: R in Action
  • 作者: Robert Kabacoff
  • 副标题: Data Analysis and Graphics with R
  • 页数: 472
  • 出版社: Manning Publications
  • 出版年: 2011-8-27
  • 第38页 输入输出

    By default, launching R starts an interactive session with input from the keyboard and output to the screen. But you can also process commands from a script file and direct output to a variety of destinations. Input source ("filename") eg. source ("myscript.R") if the filename does not include a path, the file is assumed to be in the current working directory. Output sink ("filename") by default, if the file already exists, its contents are overwritten. Include the option append=TRUE to append text to the file rather than overwriting it. Including the option split=TRUE will send output to both the screen and the output file. Issuing the command sink () without options will return output to the screen alone. Graphic output Although sink () redirects text output, it has no effect on graphic output. To direct graphic output,use one of the functions listed in table 1.4 Use dev.off() to return output to the terminal table 1.4 pdf ("filename.pdf") win.metafile ("filename.wmf") Windows metafile png jpeg bmp postscript ("filename.ps") postscript file eg. sink ("myoutput", append=TRUE, split=TRUE) pdf ("mygraphs.pdf") source ("script2.R")

    2015-06-19 13:48:37 回应
  • 第42页 1.5 Batch processing

    C CMD BATCH infile out file

    where infile is the name of the file containing R code to be executed, outfile is the name of the file receiving the output, and options lists options that control execution. By convention, infile is given the extension .R and outfile is given the extension .Rout.

    2015-06-19 17:34:00 回应
  • recoding variables

    variable[condition]<-expression assignment will be made when condition is true. 代码都是为了加强记忆瞎敲的,不要复制! leadership<- within(leadership, { agecat <- NA agecat[age>75 mydata <- transform(mydata, sumx=x1+x2) reshape package has a rename() function that's useful for altering the names of variables. the format of the rename() function is rename(dataframe, c(oldname="newname")) library(reshape) is.na() missing values are considered not comparable, even to themselves. this means we cannot use comparison operators to test for the presence of missing values. eg. logical test myvar == NA is never TRUE. instead, you have to use missing values functions. na.rm=TRUE once you have identified the missing values, you need to eliminate them in some way before analyzing data further. the reason is that arithmetic expression and functions that contain missing values yield missing values. na.omit() we can remove any observation with missing values by using na.omit() function. it deletes any rows with missing data. newdata <- na.omit(leadership) this is called listwise deletion 4.6 Date Values P81 as.Date(x, "input_format") x is the character data and input format gives the appropriate format for reading the date. the default format is yyyy-mm-dd once the variable is in date format, you can analyze and plot the dates using the wide range of analytic techniques. Sys.Date() returns today's date date() returns the current date and time. format(today, format="%B %d %Y") format() we can use format(x, format="") function to output dates in a specified format, and to extract portions of dates. when R stores dates internally, they re represented as the number of days since jan.1 197-, with negative values for earlier dates. this means we can perform arithmetic operations on them. difftime() to calculate a time interval and express it as seconds, minutes, hours, days, or weeks, as we like. Packages we may use lubridate package contains a number of functions that simplify working with date, including functions to identify and parse date-time data, extract date-time components and perform arithmetic calculations on date-times. fCalendar package provides a myriad of functions for dealing with dates, can handle multiple time zones at once, and provides sophisticated calendar manipulations that recognize business days, weekends, and holidays. as.datatype() is.datatype() order() default asending prepend the sorting variable with a minus sign to indicate a descending order leadership[order(gender, -age), ] 逗号什么意思 merge() cbind() rbind() selecting variables dataframe[row indices, column indices] myvars <- names(leadership) %in% c("q3","q4") we can remove variables by set columns undefined, which is NULL. note that null is not the same as NA. well subset() function is probably the easiest way to select variables and observations. newdata <- subset(leadership, age >=35|age<24, select=c(q1,q2,q3,q4)) sample() sampling from larger dataset is a common practice in data mining and machine learning. for example, you may want to select two random samples. creating a predicative model from one and validating its effectiveness on the other. the sample() function enables you to take a random sample with or without replacement of size n from a dataset. Arguments x Either a vector of one or more elements from which to choose, or a positive integer. n a positive number, the number of items to choose from. size a non-negative integer giving the number of items to choose. replace 是否放回抽样 Should sampling be with replacement? prob A vector of probability weights for obtaining the elements of the vector being sampled.

    2015-07-06 18:13:12 回应