R Summer School Information Page


Introduction

The GMU R Summer School is a free, week-long course available on a first-come, first-serve basis. I give priority to GMU students and post-docs and then GMU community members followed by students/post-docs outside GMU and finally other faculty members (by invitation only). The purpose of the free summer course is to help students and faculty become familiar with the R/S programming language so that they may take advantage of the most flexible and powerful statistical tools. My hope is that by spreading R use, R will get better and gain more support in the process. Better statistical tools will make for more informed users and hopefully better science.

LOCATION: George Mason University, Fairfax campus
Monday through Friday: David King Hall (ARCH Lab Conference Room: 2nd floor)
Refer to this campus map and the building numbers if you are unfamiliar with GMU.

Date: July 25th - 29th, 2011
Times: 10:30am-3pm

Prerequisites

  • Everyone must have a notebook computer that she can install software on and use for the entire week. I often have additional notebooks in my lab but I cannot guarantee that they will either be available or in working condition by the time the course begins.

  • Everyone must have R and an editor installed and in running condition no later than one week prior to the beginning of the course. R does not come with a Graphical User Interface (GUI) but that does not mean that you cannot make R more user friendly. For many users, emacs and ESS are all you need to edit R files and submit them to the R processor. Windows users might prefer to use a handy program such as TINN-R. Linux users might prefer RKward. I need to strictly enforce this requirement because we cannot spend time configuring systems during the sessions. Instructions for installing and configuring your system can be found at the following links. Please let me know if you have any problems with the documentation. I will be happy to edit it to make it more user friendly.

Windows XP instructions

Mac OS X instructions

Linux instructions

  • Everyone must read through the following documents prior to the course (NB. you do not need to study these documents. Instead, please skim them at a minimum so you are at least familiar with the R system):

R for Beginners

R reference card

Bill Revelle’s Excellent R Primer

Course Details

R summer school runs for 5 consecutive weekdays - usually the last full week in July. The course content is divided by days so that each day focuses on one aspect of R and one aspect only.

My aims and expectations

My primary aim for the course is to introduce the R programming language and demonstrate the benefits of the system over traditional graphical user interface (aka “point and click” applications) or procedural programming languages (e.g., SPSS and SAS). I expect that everyone who participates in R Summer School will attend all the sessions every day throughout the week. It is too difficult to cover material repeatedly for those who miss material due to absences. Given that the course is free, I ask that everyone who commits to the course commit to the entire course.

Your aims

You - the student - ought to expect to have a fairly strong introduction to the R statistical language. You will be able to read and write different data formats as well as conduct basic (GLM) statistical procedures and graph the results. Please do not expect to leave with a sense of mastery. If you expect that outcome, you will leave frustrated and revert to your current package after wasting five precious days. I implore you to have realistic expectations. Also, due to the nature of the course, I cannot cover statistical content during the lectures. Please do not come to the course with the expectations that you will learn graduate statistics. This is a software course and thus you ought to aim to master the software and not the content.

Format

The content for each day will be divided into two, 2.5 hour sessions totaling 5 hours of focused classroom activity per day. The 2.5 hour sessions are divided roughly as follows:

  • Part 1 (45 minutes): I will introduce the topic and demonstrate it in R.
  • Part 2 (1 hour): Students will work independently on several exercises.
  • Part 3 (30 minutes): Students will volunteer to show their work and explain the exercise to the class.
  • Part 4 (15 minutes): I will provide feedback to the class.

In addition to the focused classroom activity, there will be a 60 minute period between sessions for lunch and approximately an hour of review or introduction each day. The review and introduction periods will be largely informal discussions that allow me to tailor the day’s activities to the student’s wishes. The goal of this structured format is to minimize lecture time and to maximize learning time. Some of you may recognize the “see one, do one, teach one” method in parts 1–3 for each session. This method is common in medical education and research suggests that this approach maximizes learning and retention.



Schedule Outline (Updated each day during the course)



Day 0: Introduction (Weekend before the course)

Please familiarize yourselves with the following materials and resources. There is no need to read them in their entirety but I would like you to read through the material - especially the R frequently asked questions (FAQ).

Also, please check to make sure R is installed properly. The way to check your R installation is to follow the procedure in the box below.

  1. Start R

  2. Input the following line at the prompt:

demo(graphics)

  • A new window should appear and you should be able to cycle through several graphics.

  1. Input the following line at the new prompt:

example(lm)

  • Regression statistics should appear in the R buffer and a new window of standard regression diagnostics plots should appear.

If the output appeared then you have correctly installed R. Now, Check if you have installed R+ESS+Emacs with the following commands.

  1. Start Emacs

  2. Select the window. (click on it)

  3. Type: C-x 2 (A capitilized ‘c’ represents the ‘Control’ Key)

This should open two buffers within that same window.

  1. Select the top buffer.

  2. Type: alt-x

This will create a prompt at the very bottom of the window.

  1. At that new prompt Type:(shift)R and hit the “enter” key twice

This will open R in the top buffer

  1. Select the bottom buffer.

  2. Enter the following code: demo(graphics)

  3. Select that line of code. (This can be done just like selecting text in a word document)

  4. Type: C-c C-j

At the very bottom of the window “Process to load into: R” will appear.

  1. Click on that line and hit the “enter” key.

A new window should appear and you should be able to cycle through several graphics.

  1. Enter and submit the following line of code:

example(lm)

If the output appeared then you have correctly installed R+ESS+Emacs. If the output is not the same review the installation instructions.


Day 1: Introduction and Working with Data (10:30am-3pm)

  • Introduction to basic concepts and course overview (1 hour)
    • objects, assignment, operators
    • interactive vs. batch processing systems
    • the R advantage
    • the R disadvantage
    • R summer school and the pragmatic approach to data analysis

  • Working with Data (6 hours)
    • Session 1: Reading and Writing Data (2.5 hours)
    • Break (1 hour)
    • Session 2: Navigating and Editing Data (2.5 hours)


Day 2: R Graphics (10:30am-3pm)

  • Please read through the following chapters prior to tomorrow:

  • Review from Day 1 and Introduction to Day 2 (30 minutes)
    • the R graphics engine
    • customizing graphics
    • par()
    • labels
    • paste()
    • assign()

  • Session 1: Univariate and Bivariate graphs (2.5 hours)
  • Break (1 hour)
  • Session 2: Special Graphs and Customizations (2.5 hours)
  • Questions and Answers from Day 2 (30 minutes)


Day 3: Linear Models (10:30am-3pm)

  • Review from Day 2 and Introduction to Day 3 (30 minutes)
    • formula specification
    • data formats
    • objects from models
    • diagnostics

  • Session 1: The GLM (2.5 hours)
  • Break (1 hour)
  • Session 2: Extending the GLM (2.5 hours)
  • Questions and Answers from Day 3 (30 minutes)


Day 4: Measurement Models and Psychometrics (10:30am-3pm)

  • Review from Day 3 and Introduction to Day 4 (30 minutes)
    • classical test theory
    • latent response models
    • handling response time data
    • latent trait models
    • generalizability theory

  • Session 1: The different measurement models (2.5 hours)
  • Break (1 hour)
  • Session 2: Diagnosing measurement models (2.5 hours)
  • Questions and Answers from Day 4 (30 minutes)


Day 5: Extending the R system for your own needs (10:30am-3pm)

  • Review from Day 4 and Introduction to Day 5 (30 minutes)
    • functions
    • objects
    • inheritance
    • methods
    • classes

  • Session 1: Your first function (2.5 hours)
  • Break (1 hour)
  • Session 2: Making use of generic functions with you new function (2.5 hours)
  • Questions and Answers from Day 5 (30 minutes)


Last updated July 23, 2011, at 12:44 AM