Random forests have only recently been included in standard textbooks on statistical learning, such as Hastie et al. (2009) (while the previous edition, Hastie et al. 2001 , did not cover this topic yet). In addition to a short introduction of random forests, this reference gives a thorough background on classification trees and related concepts of resampling and model validation, and is therefore highly recommended for further reading. For the social sciences audience a first instructive review on ensemble methods, including random forests and the related method bagging, was given by Berk (2006) . We suggest this reference for the treatment of unbalanced data (for example in the case of a rare disease or mental condition), that can be treated either by means of asymmetric misclassification costs or equivalently by means of weighting with different prior probabilities in classification trees and related methods (see also Chen, Liaw, and Breiman 2004 , for the alternative strategy of “down sampling”, i.e., sampling from the majority class as few observations as there are of the minority class), even though the interpretation of interaction effects in Berk (2006) is not coherent, as demonstrated above. The original works of Breiman (1996a ,b , 1998a,b , 2001a ,b ), to name a few, are also well suited and not too technical for further reading.
For practical applications of the methods introduced here, several up-to-date tools for data analysis are freely available in the R system for statistical computing (R Development Core Team 2008 ). Regarding this choice of software, we believe that the supposed disadvantage of command line data analysis criticized by Berk (2006) is easily outweighed by the advanced functionality of the R language and its add-on packages at the state of the art of statistical research. However, in statistical computing the textbooks also lag behind the latest scientific developments: The standard reference Venables and Ripley (2002) does not (yet) cover random forests either, while the handbook of Everitt and Hothorn (2006) gives a short introduction to the use of both classification trees and random forests. This handbook, together with the instructive examples in the following section and the R-code provided in a supplement to this work, can offer a good starting point for applying random forests to your data. Interactive means of visual data exploration in R, that can support further interpretation, are described in Cook and Swayne (2007) .