SQL: Data Query Language (DQL) clauses



Books banner

1. Introduction

Today's independent study material focuses on a specific type of SQL, Data Query Language (DQL), which is comprised of SQL clauses that allow you to retrieve data from a database.

Books banner

Figure 1. Types of SQL.

1a Complete the What Can I Do With A Database? module in Codeacademy, which you can find here.


2. Basic queries

In this section, you are going to experience how you, as a data analyst, can truly benefit from using SQL:

One of the core purposes of the SQL language is to retrieve information stored in a database. This is commonly referred to as querying. Queries allow us to communicate with the database by asking questions and returning a result set with data relevant to the question (Codeacademy, Design databases with PostgreSQL).

In order to complete the exercises in this section, you need to use the information displayed in Table 1, which you can download as a csv.-formatted file, here.

ClientKey ClientVoornamen ClientTussenvoegsel ClientAchternaam GeboorteJaar ClientGeslacht NotaRegelBedrag NotaJaar NotaKwartaal NotaMaand BuurtCode BuurtNaam
1 Bader van Wilgenburg 2007 Jongen 49.8 2019 Qtr 3 September BU08260102 Slotjes-West
2 Lorena ? Steensma 2010 Meisje 0.3 2020 Qtr 4 October BU08261300 Dorst
3 Jikke ? Bosveld 2007 Jongen 50.1 2020 Qtr 2 June BU08261307 Buitengebied Dorst-Zuid
4 Maroua de Backer 2009 Meisje 2022.21 2015 Qtr 4 November BU08260601 Sterrenbuurt
5 Izaak ? Lake 2008 Jongen 67.8 2019 Qtr 2 June BU08260300 Vogelbuurt
5 Izaak ? Lake 2008 Jongen 90.5 2019 Qtr 4 October BU08260300 Vogelbuurt

Table 1. A selection of demographic, geographic and financial information from the youth care database (Version 1)

2a Is the following SQL clause written correctly? Explain your answer.

select ClientKey, ClientGeslacht
FROM youth;

2b The data analyst of the municipality of Oosterhout wants to permanently change the title of the column ClientKey to ClientID in the table youth. Write an SQL statement to successfully solve this task.

2c Once in a while the youth care database gets filled with erroneous data. To tackle this issue the data analyst of the municipality of Oosterhout regularly runs an SQL query that checks if the values in the column NotaJaar are always greater than the values in the column GeboorteJaar. Prepare and run such an SQL statement.

2d Can you apply an ORDER BY with multiple columns? Explain your answer.

2e What will happen when the data analyst runs the following SQL query on Table 1:

SELECT *
FROM youth
LIMIT 25;

Write your answer down.

2f Which of the three SQL queries below is correctly written?

Example 1:

SELECT ClientKey
  CASE
    WHEN NotaRegelBedrag < 50 THEN 'Categorie A'
    ELSE 'Categorie B'
  END AS 'Kosten'
FROM youth;

Example 2:

SELECT ClientKey,
FROM youth
  CASE
    WHEN NotaRegelBedrag < 50 THEN 'Categorie A'
    ELSE 'Categorie B'
  END AS 'Kosten';

Example 3:

SELECT ClientKey
FROM youth
  CASE
    WHEN NotaRegelBedrag < 50 THEN 'Categorie A'
    ELSE 'Categorie B'
  END AS 'Kosten';

Write your answer down.

3. Aggregate functions

An aggregate function performs a calculation on a set of values and returns a single value. For example, the SQL function AVG() takes a list of values and returns the mean value of them.

3a The data analyst of the municipality of Oosterhout is almost finished for the day when his manager pops in and asks if he can provide a list with the number of distinct clients for the municipal council meeting. Write an SQL statement that solves this task.

3b How does the COUNT() function differ from the SUM() function? Write your answer down.


4. Multiple tables

By deploying SQL join statements you can get information from columns in more than one table. These operations are specified by placing the names of those tables that you want to join in the same FROM clause of a SELECT statement.

Books banner

Figure 2. SQL joins & Set theory.

Tip: Want to explore the mathematical foundations of these SQL join operators, see Khanacademy's Basic set operations tutorial.

4a There is no difference between a JOIN and INNER JOIN statement. True or false? Write your answer down.

4b List at least two requirements for a PRIMARY KEY. Write your answer down (Hint: take a look at Table 1!).

4c What happens if the tables you perform a UNION operator on have duplicate rows? Explain your answer.

4d What is the main difference between the join operators (e.g. INNER JOIN, LEFT JOIN etc.) and the UNION operator? Write your answer down.

4e The data analyst of the municipality of Oosterhout has created a new table, by combining Table A and Table B (See Table 2, and Codebook). Clearly, something went wrong :weary:. Can you identify, and subsequently fix the problem with applying appropriate SQL clauses?

The original tables can be downloaded as a csv.-formatted file:

Tip: Want to import and/or export the table into the PostgreSQL client, see the article Import CSV File Into PostgreSQL Table.
clientkey clientvoornamen clienttussenvoegsel clientachternaam geboortejaar clientgeslacht clientkey-2 notaregelbedrag notajaar notakwartaal notamaand buurtcode buurtnaam
1 Bader van Wilgenburg 2007 Jongen 1 49.8 2019 Qtr 3 September BU08260102 Slotjes-West
2 Lorena ? Steensma 2010 Meisje 2 0.3 2020 Qtr 4 October BU08261300 Dorst
3 Jikke ? Bosveld 2007 Jongen 3 50.1 2020 Qtr 2 June BU08261307 Buitengebied Dorst-Zuid
4 Maroua de Backer 2009 Meisje [null] [null] [null] [null] [null] [null] [null]
5 Izaak ? Lake 2008 Jongen [null] [null] [null] [null] [null] [null] [null]

Table 2. A selection of demographic, geographic and financial information from the youth care database (Version 2)


5. Additional resources (optional)

Books banner

Figure 3. An example of how you might look when you reach the level of data wrangling SQL wizard/action hero/avatar etc…

Do you have time left, and want to stay on track to eventually become a data wrangling SQL wizard, explore the following additional resources:

Interactive exercises:

Tip: Select your practice exercises wisely, not all are relevant for the assignment (See Assignment Part 1 requirements in the Project Brief).

Instructional videos:

Books/articles/webpages etc.:

  • Batra, R. (2018). SQL primer: An accelerated introduction to SQL basics. (Freely available through BUAS' MetaSearch).
  • W3schools: SQL Tutorial.

Codebook

Number Variable Description
1 ClientKey Client ID
2 ClientVoornamen Client first name
3 ClientTussenvoegsel Client middle name
4 ClientAchternaam Client last name
5 GeboorteJaar Client birth year
6 ClientGeslacht Client sex
7 NotaRegelBedrag Invoice amount in Euro's
8 NotaJaar Invoice year
9 NotaKwartaal Invoice quarter
10 NotaMaand Invoice month
11 BuurtCode Neighborhood ID
12 BuurtNaam Neighborhood name

Literature

Batra, R. (2018). SQL primer: An accelerated introduction to SQL basics. (Freely available through BUAS' MetaSearch).