Introduction to Business Intelligence

Assignment

Part 1: Pre-processing

By now you should have mastered the basics of SQL and data architecture & pipeline design (Datalab 2, Week 1). The knowledge you acquired so far will enable you to complete this section of the assignment. You will work with two data sources:

  1. A PostgreSQL database containing both socio-economic and health data. Hereinafter, data collection 1.
  2. A csv./flat-file containing health data. Hereinafter, data collection 2.

SQL meme
Figure 1. Joining tables in SQL…

Additionally, you will have to prepare and run various SQL queries on the loaded data. After you have successfully prepared and run the queries, listed in the procedure below :point_down:, you should be familiar youth care dataset.
Requirements:

For this part of the assignment, we work with three levels that correspond to the proficiency levels in the assessment form (except the ‘Missing' criterion). For example, to get a ‘Sufficient' on this section of the assignment you should master the skills listed in the yellow block (i.e. Level 2)

LEVEL 1 ('Insufficient')
Elementary SQL queries & Modifying databases in SQL 1
A) Prepare, and run SQL queries that use DELETE, UPDATE, SET, ADD, SELECT INTO, INSERT INTO SELECT, CREATE TABLE, ALTER TABLE, and COPY.
B) Prepare, and run SQL queries that use SELECT, FROM, ORDER BY, WHERE, IN, BETWEEN, OR, AND, NOT, LIKE, EXISTS, and LIMIT.
LEVEL 2 ('Sufficient')
Intermediate SQL queries
A) Prepare, and run SQL queries that use COUNT, SUM, MIN, MAX, AVG, GROUP BY, DISTICT, DISTICT ON, HAVING, DATE, and CASE.
LEVEL 3 ('Good')
Modifying databases in SQL 2
B) Prepare, and run SQL queries that use LEFT JOIN, INNER JOIN, OUTER JOIN, IS NULL, and IS NOT NULL.
A) Prepare, and run SQL queries that use PRIMARY KEY, FOREIGN KEY, and REFERENCES.
LEVEL 4 ('Excellent')
Advanced SQL queries
A) Prepare, and run SQL subqueries.

Procedure:

Each colour block corresponds to a specific criterion of ILO 4.1. To attain a certain skill level, e.g. yellow (Level 2, ‘Sufficient'), you need to answer all the questions in the corresponding colour block correctly.

1. Install the PostgreSQL database server.
2. Install a client that is compatible with PostgreSQL (e.g. DBeaver, pgAdmin etc.), and connect them.
3. Place the data collection files into your computer's 'Public' folder (e.g. C:\Users\Public).
4. Load data collection 1 into your PostgreSQL database (for instructions, see Importing and exporting Database in Postgres Using PGadmin).
5. a) Use CREATE to construct a new table for data collection 2 in the PostgreSQL database, and name it health.
b) Import data collection 2 into the table health by using the COPY statement.
6. Produce a list of instances (maximum of 5) where the value of column [...], located within the table health, exceeds a certain value.
7. a) Calculate the mean value for each numerical column in the health table.
b) Use the statement ROUND() to turn your calculated averages into integers.
c) Insert the newly created values into the first row into the socio_eco table.
d) Rename the table socio_eco_v2.
8. a) Produce a list of each client's name (i.e. first name, middle name and last name), ClientKey, NotaRegelKey, and their first 'nota' (= invoice) after the date [....]. Order by client's ID.
b) Create a table where you store the result from question 8a , and name it query_result.
9. Connect the tables from data collection 1 in the database by specifying appropriate relationships:
a) Draw an ERD (a.k.a. relational database schema) by hand or automatically generate one in your database client.
b) Prepare, and run the SQL clause(s).
c) Explain your approach.
10. Write a subquery to find those clients whose NotaRegelBedrag are larger than the average NotaRegelBedrag. Produce a list of the client's names (i.e. first name, middle name and last name), IDs, and corresponding NotaRegelBedrag from the table query_result.

Deliverable(s):

  • A .sql-file, and a picture with the answers to the SQL assignment questions (i.e. SQL code, and additional written/drawn explanations).

The .sql-file, and the picture with the ERD are to be uploaded to GitHub Classroom no later than 5pm on last DataLab day. Confer with a lecturer beforehand if you are handing in something other than a .sql-file and/or picture.


Part 2: Analysis & Visualization

For part 2 of the assignment, you are going to perform operations on the youth care data to provide insightful visualizations (See ILO 4.2).

Power BI dashboard
Figure 2. Power BI dashboard example

Requirements:

  • Create a maximum of 5 dashboard pages.
  • Import the results from ML analyses, in ‘flat-file' format, to Power BI.
  • Clean and/or transform the data that is imported into Power BI.
  • Use visualizations that are appropriate for the data type you are working with. See Microsoft's visualization guideline in Literature.
  • Visualize descriptive statistics in the dashboard.
  • Visualize results of the ML model(s) in the dashboard.
  • Record or perform a live demo of the dashboard (to stakeholders). The demo should not take longer than 5 minutes.
  • Provide a pleasant User experience (UX) for the dashboard: layout, buttons etc. See book by Stull (2018) in Literature.
  • Optimize (a part of) the data pipeline (i.e. dashboard) in terms of performance and security. See Microsoft's (e.g. Kesselman et al.) optimizing performance and security in Literature.

Deliverable(s):

  • A Power BI dashboard in .pbix-format.
  • A recorded or live demo of the Power BI dashboard.
  • Additional documentation (e.g. filters per page, codebook etc.)

For information on how to record your demo, see Windows 10s screen recorder tool guide.

The Power BI dashboard in .pbix-format and the additional documentation is to be uploaded to GitHub Classrooms no later than 5pm on last DataLab day.

The recorded demo of the Power BI dashboard is to be uploaded to GitHub Classrooms no later than 5pm on last DataLab day. Confer with a lecturer beforehand if you want to demo your dashboard in front of a live audience.


Codebook

Variable Data type Description Data source
BuurtCode String Neighbourhood ID Youth care database
BuurtNaam String Neighbourhood name Youth care database
ClientAchternaam String Client's last name Youth care database
ClientGeboorteJaar Integer Client's birth year Youth care database
ClientGeslacht String Client's biological sex Youth care database
ClientKey Integer Client ID VNG/WaarStaatJeGemeente
Clienttussenvoegsel String Client's middle name Youth care database
Clientvoornamen String Client's first name Youth care database
Datum date Date of NotaRegelBedrag Youth care database
DrankGebruik Integer The percentage of people aged 19 and older who drink no alcohol or at most one glass of alcohol per day. RIVM
EducationalAttainmentLow Integer The proportion of the labor force with a low level of education aged 18-74 for every 1000 inhabitants . This includes education at the level of primary education, VMBO, the first 3 years of HAVO/VWO or the assistant training (MBO-1). VNG/WaarStaatJeGemeente
GemeenteCode String Municipality ID CBS/SN
GemeenteCode Integer Municipality ID Youth care database
GemeenteNaam String Municipality name CBS/SN
GemeenteNaam String Municipality name Youth care database
GoedErvarenGezondheid Integer The percentage of people aged 19 and older with an answer category ‘very good' or ‘good' to the question about general health status. RIVM
HoogRisicoDepressie Integer The percentage of people aged 19 and older who are at risk of suffering from an anxiety disorder or depression RIVM
IncomeLow Float The lowest 10% of all standardized incomes group. VNG/WaarStaatJeGemeente
LangdurigeZiekteAandoening Integer The percentage of people aged 19 and older who answered yes to the question, ‘Do you have one or more long-term illnesses or conditions? Long-term is (expected to be) 6 months or longer'. RIVM
MantelzorgGeven Integer The percentage of people aged 19 and older who provide informal care (at least 8 hours a week and/or longer than 3 months). RIVM
NotaKey Integer Youth care invoice ID Youth care database
NotaRegelBedrag Float Youth care invoice Youth care database
Overgewicht Integer The percentage of individuals aged 19 years and older with a Body Mass Index (BMI) of 25.0 kg/m2 and above, based on self-reported height and weight. RIVM
ProductKey Integer Youth care product ID Youth care database
ProductNaam String Youth care product name Youth care database
Roker Integer The percentage of people aged 19 and older who answered the question ‘Do you ever smoke?' answers with ‘yes', excluding electronic cigarette. RIVM
Unemployment Float The percentage of persons without paid work, who have recently looked for work and are immediately available for it. VNG/WaarStaatJeGemeente
UnemploymentBenefits Float The number of people who receive benefits under the social assistance act or social assistance-related law aged 18 or older for every 1000 inhabitants. VNG/WaarStaatJeGemeente
VoldoenBeweegrichtlijnen Integer The percentage of people aged 19 and older that comply with the Exercise Directive. RIVM
ZorgleverancierKey Integer Youth care supplier ID Youth care database
ZorgleverancierNaam String Youth care supplier name Youth care database

Literature

Batra, R. (2018). SQL Primer: An Accelerated Introduction to SQL Basics. (Freely accessible through BUAS's MetaSearch!)

Database data Warehousing Guide. (2017, June 07). Retrieved June 30, 2021, from https://docs.oracle.com/database/121/DWHSG/toc.htm

Kesselman, Y., Osborne, P., Neely, M., & Bencic, T. (2021, April 2). Power BI security white paper - Power BI. Retrieved June 30, 2021, from https://docs.microsoft.com/en-us/power-bi/guidance/whitepaper-powerbi-security

Master a skill - design databases with postgresql. (n.d.). Retrieved June 30, 2021, from https://www.codecademy.com/learn/paths/design-databases-with-postgresql

Optimization guide for Power BI - Power BI. (2021, April 2). Retrieved June 30, 2021, from https://docs.microsoft.com/en-gb/power-bi/guidance/power-bi-optimization

Visualization types in Power BI - Power BI. (2021, March 18). Retrieved June 30, 2021, from https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-types-for-reports-and-q-and-a#feedback

Stull, E. (2018). UX Fundamentals for Non-UX Professionals: User Experience Principles for Managers, Writers, Designers, and Developers. Berkeley, CA: Apress. (Freely accessible through BUAS' MetaSearch!)