The Coursera course "Getting and Cleaning Data" teaches the concepts of raw and tidy data in the context of getting the data and cleaning them for later analysis. Practical demonstrations on how to implement the concepts are done using the programming language R.
The following lists some features of the course:
- Motivation supporting the effort to transform raw data to tidy data.
- Raw and processed data defining the terms raw and tidy data and gives examples like
- Raw data: As obtained from a gene sequencer.
- Tidy data: As in an excel spreadsheet.
- Reading data from sources using R depending on where the files are located:
- How to read local files.
- How to read specific types of files like excel, xml and json.
- How to read from database systems like mySql.
- How to read from hdf5 (used for storing large data sets).
- How to read from the Web in order to scrape html pages, call web apis and such.
- Process data using R as a step on the way to tidy data is about:
- Subsetting data to get only the observations of interest.
- Data summarization for a succint overview of the data.
- Merging data for combining data from various sources.
- Text editing using for example regular expressions.
- Video lectures enabling you to stop and play parts over again if there is something you need to have repeated.
- Quizzes to check your understanding of the stuff presented in the video lectures.
- Course project to demonstrate your practical skills in working with the stuff presented in the video lectures.
- Discussion forums so you can discuss topics with possibly thousands of other course participants.
Visit the course web site here.