We (mostly) have Renee Descartes with his cartesian coordinate system to thank for graphical representations of data as we know today: enabling us to use algebra to describe geometry. Or in other words: translate numbers using math into a shape to represent the information contained in the data.
Exploratory Data Analysis sadly takes a lot of practice, so that's what we'll do. You'll be looking up information, assessing, cleaning data and creating descriptive statistics and visualisations of data.
So let's get cracking!
Download the R swirl course Getting and Cleaning Data by running the following code in the R-studio console:
swirl::install_course("Getting and Cleaning Data")
Complete module 1: Manipulating Data with dplyr.
Download the R swirl course Data Analysis lessons directly from the Swirl Git repo and install the lessons manually from GitHub. Then complete all 3 modules: Central Tendency, Data Visualisation and Dispersion.
Hint: You can download the course directly from GitHub using a function or from a zip file when you download the course using your browser.
For help use:
?InstallCourses
If you get stuck, first ask your peers and then us. We're available from 9:00 to 17:00 as usual.
Create an R script called "MeanMockAssessment" for this excercise.
A) Quick exercise: Now calculate the mean using (R as) a calculator. Open R-studio and run the following code:
sample(1:100, 10, replace = TRUE)
Now we generated 10 observations which have a random value from 1 to 100.
The mean is the sum of all the values divided by the amount of observations(N).
Or in other words:
The sum of all values/the number of observations
B) Now translate the following function to a signma notation:
sample(1:100, 10, replace = TRUE)
Hint: the "sample()" function goes on the place where the formula goes typically.
Save your script to your GitHub repository.
At 16:00, there's a online meeting on our Microsoft Team Channel you're encouraged to take part in to ask questions and to discuss our progress and reflect on today activities.
That was it for today; tomorrow, we'll continue until module 10. Then we'll perform an exploratory data analysis on our own data based on the problem statement and research question we made in the previous data lab (Datalab_00_SDG_Indicators).