The Data Scientist's Toolbox course at Coursera presents some of the tools and concepts used by data scientists. It introduces data science as a science meaning that results should be reproducible and well communicated. The course outlines the very basics at an absolute beginner's level.
The following lists some features of the course:
- Command line interface. Basics on using the command line interfaces such as dos and shell to navigate and update the file system and to execute programs.
- Git. Git is a version control system for local repositories. The course teaches how to install and configure Git and basic usage of it.
- GitHub. GitHub is a web-based hosting service for software
projects that use Git. Users synchronize the software projects with their own local Git repositories.
The course teaches the fundamentals of GitHub such as
- Setting up a GitHub account.
- Creating a GitHub repository.
- Synchronizing with a local Git repository.
- Markdown. Markdown is a plain text formatting syntax designed to be easy to read for humans and to be easily converted into HTML. The course presents Markdown as a means to scientifically communicate the results from the data science work. Basic ways to format text headings, lists and such are presented.
- Installation of R packages. R is a programming language and runtime environment which is very well suited for statistical computing. R packages extend the functionality of R from the base domains into other domains such as biostatistics. The course teaches how to install and use R packages.
- Types of data science questions. The course stresses the importance
of the skill of asking the right kinds of questions. The questions are grouped as belonging to the following
categories of analysis:
- Descriptive analysis: Quantitatively describe the main features of a collection of information.
- Exploratory analysis: Find relationships that were not known about. The found correlations are useful for defining further studies. The well known rule, that correlation does not imply causality, is stressed.
- Inferential analysis: Use a relatively small sample of data to say something about a bigger population. Remember to be aware of the sample population and the sample scheme when evaluating the inferred result.
- Predictive analysis: Use data on some objects (historical facts) to predict values for another object (a future or otherwise unknown event). The course notes that, more data and a simple prediction model can work really well.
- Causal analysis: To find out the cause and effect relationship between variables. The causality is usually identified through randomized studies and are averaged so they may not apply to every individual variable.
- Mechanistic analysis: The find out the exact effect on a variable when changing some other variable. The causality is usually identified as deterministic equations known from for example engineering science.
- The concept of data. The concept of data is defined and examples of data are given. The examples are raw data like those coming from DNA sequencing machines, image data from medias like Youtube and the more tidy data like those in a spreadsheet.
- The concept of big data. In today's world, the amount of available computer processable data is colossal compared to just some forty years ago. This data availability gives rise to the concept of big data enabling insights into diverse domains such as social media, bioscience, cosmology and much more.
- Data experiments. Advices on doing good data experiments are given enabling the Data Scientist to reach scientifically acceptable and well documented conclusions. It is stressed, that it is absolutely important to be skilled at experimenting with the data and documenting the data processing steps. In healthcare, for example, patients can be treated based on the results of the data analysis and it is most important to be able to track down the path followed to reach the conclusion.
- Video lectures enabling you to stop and play parts over again if there is something you need to have repeated.
- Quizzes to check your understanding of the stuff presented in the video lectures.
- Course project to demonstrate your practical skills in working with the stuff presented in the video lectures.
- Discussion forums so you can discuss topics with possibly thousands of other course participants.
Visit the course web site here.