Learn Intermediate SQL with Databricks

Learn Intermediate SQL with Databricks

·

4 min read

Table of Contents

  1. Aggregate Functions

    • SUM

    • AVG

    • COUNT

  2. GROUP BY and HAVING

    • GROUP BY

    • HAVING

  3. DISTINCT and NULL Handling

    • DISTINCT

    • NULL

  4. Math Functions and SQL Arithmetic

    • Math Functions

    • SQL Arithmetic

  5. CASE Statements

    • Simple CASE

    • Searched CASE

  6. JOIN Operations

    • INNER JOIN

    • LEFT JOIN

    • RIGHT JOIN

    • FULL OUTER JOIN

Sample test data files for Hands-on

  1. Employee: https://github.com/vipinputhanveetil/ctb_blog_demo_files/blob/main/databricks_files/employees.csv

  2. Orders: https://github.com/vipinputhanveetil/ctb_blog_demo_files/blob/main/databricks_files/orders.csv

  3. Products: https://github.com/vipinputhanveetil/ctb_blog_demo_files/blob/main/databricks_files/products.csv

  4. Sales: https://github.com/vipinputhanveetil/ctb_blog_demo_files/blob/main/databricks_files/sales.csv

1. Aggregate Functions

SUM

The SUM function calculates the sum of a set of values.

SELECT SUM(amount) AS total_sales FROM sales;

This query calculates the total sales amount from the sales table.

AVG

The AVG function calculates the average of a set of values.

SELECT AVG(amount) AS avg_amount FROM sales;

This query calculates the average amount from the sales table.

COUNT

The COUNT function counts the number of rows returned by a query.

SELECT COUNT(*) AS total_orders FROM sales;

This query counts the total number of orders in the sales table.

2. GROUP BY and HAVING

GROUP BY

The GROUP BY clause groups rows that have the same values into summary rows.

SELECT region, SUM(amount) AS total_sales FROM sales GROUP BY region;

This query groups sales by region and calculates total sales for each region.

HAVING

The HAVING clause is used to filter records returned by the GROUP BY clause.

SELECT region, SUM(amount) AS total_sales FROM sales GROUP BY region HAVING SUM(amount) > 500;

This query selects regions with total sales greater than 500.

3. DISTINCT and NULL Handling

DISTINCT

The DISTINCT keyword is used to return unique values in a specified column.

SELECT DISTINCT region FROM sales;

This query selects distinct regions from the sales table.

NULL

Handling NULL values using IS NULL and IS NOT NULL.

SELECT product, amount FROM sales WHERE region IS NOT NULL;

This query selects products with NULL region values.

4. Math Functions and SQL Arithmetic

Math Functions

SQL supports standard math functions like ROUND, CEIL, FLOOR, ABS, etc.

SELECT product, ROUND(amount/1.1, 4) AS rounded_amount FROM sales;

This query rounds amount to four decimal places.

SQL Arithmetic

Basic arithmetic operations (+, -, *, /) can be performed in SQL.

SELECT product, amount * 1.1 AS new_amount FROM sales;

This query calculates 10% increase in the amount column.

5. CASE Statements

Simple CASE

The CASE statement evaluates a list of conditions and returns one of multiple possible results.

SELECT product,
       CASE region
           WHEN 'North' THEN 'N'
           WHEN 'South' THEN 'S'
           ELSE 'Other'
       END AS region_code
FROM sales;

This query assigns region codes based on the region value.

Searched CASE

The CASE statement can also be used for searched conditions.

SELECT product,
       CASE
           WHEN amount > 200 THEN 'High'
           WHEN amount > 100 THEN 'Medium'
           ELSE 'Low'
       END AS amount_category
FROM sales;

This query categorizes products based on the amount.

6. JOIN Operations

INNER JOIN

An INNER JOIN returns rows when there is at least one match in both tables.

SELECT s.product, s.amount, p.category
FROM sales s
INNER JOIN products p
ON s.product_id = p.product_id;

This query joins sales and products tables based on product_id.

LEFT JOIN

A LEFT JOIN returns all rows from the left table and matching rows from the right table.

SELECT s.product, s.amount, p.category
FROM sales s
LEFT JOIN products p
ON s.product_id = p.product_id;

This query retrieves all sales and matches them with products.

RIGHT JOIN

A RIGHT JOIN returns all rows from the right table and matching rows from the left table.

SELECT s.product, s.amount, p.category
FROM sales s
RIGHT JOIN products p
ON s.product_id = p.product_id;

This query retrieves all products and matches them with sales.

FULL OUTER JOIN

A FULL OUTER JOIN returns all rows when there is a match in either table.

SELECT s.product, s.amount, p.category
FROM sales s
FULL OUTER JOIN products p
ON s.product_id = p.product_id;

This query retrieves all sales and products, with matching rows from both tables.

Conclusion

Mastering these SQL queries will empower you to manipulate and analyze your data effectively. Whether you are a beginner or an experienced SQL user, understanding these fundamental concepts is essential for advanced data analysis and reporting.