Announcements
- Our November 12th class will be held online. Information on how to connect is forthcoming.
Agenda
- Finish lecture on chapter 1
- Work on lab 0
September 24, 2018
A census involves collecting data for the entire population of interest. This is problematic for several reasons, including:
Sampling involves measuring a subset of the population of interest, usually randomly.
Randomly select cases from the population, where there is no implied connection between the points that are selected.
Simple Random Sample
Strata are made up of similar observations. We take a simple random sample from each stratum.
Clusters are usually not made up of homogeneous observations so we take random samples from random samples of clusters.
Difference between blocking and explanatory variables
ggplot2
ggplot2
is an R package that provides an alternative framework based upon Wilkinson’s (2005) Grammar of Graphics.ggplot2
is, in general, more flexible for creating "prettier" and complex plots.ggplot2
has at least three ways of creating plots:
qplot
ggplot(...) + geom_XXX(...) + ...
ggplot(...) + layer(...)
data(diamonds) ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()
ggplot2
Statementggplot(myDataFrame, aes(x=x, y=y)
geom_point()
, geom_histogram()
facet_wrap(~ cut)
, facet_grid(~ cut)
scale_y_log10()
ggtitle('my title')
, ylim(c(0, 10000))
, xlab('x-axis label')
ls('package:ggplot2')[grep('geom_', ls('package:ggplot2'))]
## [1] "geom_abline" "geom_area" "geom_bar" ## [4] "geom_bin2d" "geom_blank" "geom_boxplot" ## [7] "geom_col" "geom_contour" "geom_count" ## [10] "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" ## [16] "geom_errorbar" "geom_errorbarh" "geom_freqpoly" ## [19] "geom_hex" "geom_histogram" "geom_hline" ## [22] "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" ## [28] "geom_point" "geom_pointrange" "geom_polygon" ## [31] "geom_qq" "geom_qq_line" "geom_quantile" ## [34] "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" ## [40] "geom_smooth" "geom_spoke" "geom_step" ## [43] "geom_text" "geom_tile" "geom_violin" ## [46] "geom_vline" "update_geom_defaults"
ggplot(legosets, aes(x=Pieces, y=USD_MSRP)) + geom_point()
ggplot(legosets, aes(x=Pieces, y=USD_MSRP, color=Availability)) + geom_point()
ggplot(legosets, aes(x=Pieces, y=USD_MSRP, size=Minifigures, color=Availability)) + geom_point()
ggplot(legosets, aes(x=Pieces, y=USD_MSRP, size=Minifigures)) + geom_point() + facet_wrap(~ Availability)
ggplot(legosets, aes(x='Lego', y=USD_MSRP)) + geom_boxplot()
ggplot(legosets, aes(x=Availability, y=USD_MSRP)) + geom_boxplot()
ggplot(legosets, aes(x=Availability, y=USD_MSRP)) + geom_boxplot() + coord_flip()
Likert scales are a type of questionaire where respondents are asked to rate items on scales usually ranging from four to seven levels (e.g. strongly disagree to strongly agree).
library(likert) library(reshape) data(pisaitems) items24 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST24Q'] items24 <- rename(items24, c( ST24Q01="I read only if I have to.", ST24Q02="Reading is one of my favorite hobbies.", ST24Q03="I like talking about books with other people.", ST24Q04="I find it hard to finish books.", ST24Q05="I feel happy if I receive a book as a present.", ST24Q06="For me, reading is a waste of time.", ST24Q07="I enjoy going to a bookstore or a library.", ST24Q08="I read only to get information that I need.", ST24Q09="I cannot sit still and read for more than a few minutes.", ST24Q10="I like to express my opinions about books I have read.", ST24Q11="I like to exchange books with my friends."))
likert
R Packagel24 <- likert(items24) summary(l24)
## Item low ## 10 I like to express my opinions about books I have read. 41.07516 ## 5 I feel happy if I receive a book as a present. 46.93475 ## 8 I read only to get information that I need. 50.39874 ## 7 I enjoy going to a bookstore or a library. 51.21231 ## 3 I like talking about books with other people. 54.99129 ## 11 I like to exchange books with my friends. 55.54115 ## 2 Reading is one of my favorite hobbies. 56.64470 ## 1 I read only if I have to. 58.72868 ## 4 I find it hard to finish books. 65.35125 ## 9 I cannot sit still and read for more than a few minutes. 76.24524 ## 6 For me, reading is a waste of time. 82.88729 ## neutral high mean sd ## 10 0 58.92484 2.604913 0.9009968 ## 5 0 53.06525 2.466751 0.9446590 ## 8 0 49.60126 2.484616 0.9089688 ## 7 0 48.78769 2.428508 0.9164136 ## 3 0 45.00871 2.328049 0.9090326 ## 11 0 44.45885 2.343193 0.9609234 ## 2 0 43.35530 2.344530 0.9277495 ## 1 0 41.27132 2.291811 0.9369023 ## 4 0 34.64875 2.178299 0.8991628 ## 9 0 23.75476 1.974736 0.8793028 ## 6 0 17.11271 1.810093 0.8611554
likert
Plotsplot(l24)
likert
Plotsplot(l24, type='heat')
likert
Plotsplot(l24, type='density')
Some problems1:
library(DATA606) shiny_demo('DualScales', package='DATA606')
My advise:
1 http://blog.revolutionanalytics.com/2016/08/dual-axis-time-series.html 2 http://ellisp.github.io/blog/2016/08/18/dualaxes