Learning R (and statistics)
Updated often. If you’re an enrolled students and would like to contribute content here, please open a thread on Piazza under “Misc” and say you’d like to share a useful link.
0.1 Getting help with R
START HERE, with fbriatte.org, where you can learn how to find answers to your burning R questions online.
- Getting help with R
- Stack overflow: topics are tagged, and “r” is a very popular tag on the site. To go directly to R-related topics, visit http://stackoverflow.com/questions/tagged/r.
- Rstudio Community
- Reddit: r/rstats, r/Rlanguage, r/rprogramming, r/RStudio
- Github: github is a must for anyone in datascience and has many functions. For your purposes, searching for a particular package and looking into their “issues” page can be very helpful. E.g., here is the issues page for the
rmarkdownpackage.
0.2 Free Online Books
“Introduction to Data Science: Data Wrangling and Visualization with R”, by Rafael Irizarry
“Introduction to Data Science: Statistics and Prediction Algorithms Through Case Studies”, by Refael Irizarry
“R for Data Science”, by Hadley Wickham & Garrett Grolemund
“YaRrr! The Pirate’s Guide to R”, by Nathaniel Phillips
“An introduction to Biostatistics using R”, by Glover & Mitchell
“Introductory Biostatistics with R”, by Dylan Childs, Bethan Hindle & Philip Warren
“Applied Biostats”, by Yaniv Brandvain.
0.3 Videos
- Overview of R Markdown: a crash course
- Code Like a Pro
- Intro to Rstudio and Rstudio Cloud
- Overview of R markdown
- Rmarkdown with Rstudio
- This youtube playlist - I curated this and it contains both R info and biostatistics in general.
0.4 Tutorials & Simulations
0.5 Web visualizations
CLT: https://www.zoology.ubc.ca/~whitlock/Kingfisher/CLT.htm
CLT: https://shiny.maths.nottingham.ac.uk/pmzdjc/YujingCLTApp/?showcase=0
Another one from CLT: https://shiny.maths.nottingham.ac.uk/pmzdjc/YujingCLTApp/?showcase=0
Sampling from the Normal: https://brandvain.shinyapps.io/standardnormal/
Sample means with a normal distribution: https://www.zoology.ubc.ca/~whitlock/Kingfisher/SamplingNormal.htm
Confidence intervals for the mean: https://www.zoology.ubc.ca/~whitlock/Kingfisher/CIMean.htm
This link has web apps for several topics covered in the course: https://shinyapps.science.psu.edu/
0.6 Freely Available (Biological) Datasets
I will keep adding stuff here, but here are some good places to search.
- Dryad - lots of scientific papers make their datasets available here.
- Zenodo - lots of scientific papers make their datasets available here.
- National Library of Medicine Dataset Catalog - great resource from the NIH, points to datasets elsewher elike zenodo, dryad, figshare, etc.
- re3data
- FigShare - lots of scientific papers make their datasets available here.
- Wold Population Review - not focused on biology but very cool nevertheless. There are subsections that have biological datasets though like the one focusing on health, for example.
1 Articles & Other Resources
Statistics for Biologists - Nature collection of articles highlighting important statistical issues that biologists should be aware of and provides practical advice to help them improve the rigor of their work.nature.com/collections/qghhqm
G*power: a tool to compute statistical power analyses for many different t tests, F tests, χ2 tests, z tests and some exact tests. G*Power can also be used to compute effect sizes and to display graphically the results of power analyses. www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower
The
pwrpackage in R is a powerful tool for conducting power analysis. Power analysis is essential in determining the sample size needed to detect an effect of a given size with a certain level of confidence. It is widely used in experimental design and statistical hypothesis testing. data-wise.github.io/doe/appendix/r-packages/pwr.htmlR markdown Gallery: Check out the range of outputs and formats you can create using R Markdown.