KevsRobots Learning Platform

DuckDB - Fast, free analytics

77% Percent Complete

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Performance tuning (practical)

Quick wins to keep queries fast: schema, files, PRAGMAs, stats, EXPLAIN, profiling.

By Kevin McAleer, 3 Minutes

This lesson is a checklist of practical steps you can apply to most DuckDB analytics workloads.

Seeing new terms (threads, memory limit, statistics, EXPLAIN)? See the Beginner glossary.

PRAGMAs: threads and memory

PRAGMA threads = 8;          -- use more CPU cores
PRAGMA memory_limit = '2GB'; -- cap memory usage

You try it

Set threads to the number of cores you have; re-run a heavy aggregation and time it

Create the right objects

Views for logic reuse; Tables for persistence and faster reuse.
Materialize expensive subqueries to tables when reused in multiple steps.

CREATE OR REPLACE VIEW v_orders_clean AS
SELECT * FROM orders WHERE order_status = 'COMPLETE';

CREATE OR REPLACE TABLE orders_daily AS
SELECT order_date, SUM(amount) AS revenue
FROM v_orders_clean
GROUP BY order_date;

You try it

Convert one reused view in your project into a materialized table and compare timings

Statistics and ANALYZE

DuckDB can collect column stats to guide the optimizer.

ANALYZE;            -- whole database
-- or
ANALYZE my_table;   -- a specific table

You try it

Run ANALYZE after bulk loads; compare query plans before/after with EXPLAIN

EXPLAIN basics (read the plan)

EXPLAIN
SELECT c.category, SUM(o.amount) AS revenue
FROM orders o
JOIN customers c USING (customer_id)
WHERE o.order_date >= DATE '2024-01-01'
GROUP BY c.category;

How to read:

Look for Scan nodes: are they filtered early? (predicate pushdown)
Join order: small/selective inputs should be on the build side for hash joins.
Aggregation: ideally after filters and joins, not earlier.

You try it

Add an index-like hint via materialization: create a filtered table orders_2024 and re-check EXPLAIN

Profiling walkthrough (operator timings)

Enable profiling per session, run a query, then inspect timings.

PRAGMA enable_profiling = json;                 -- or query_tree / query_profile
PRAGMA profiling_output = 'exports/profile.json';

SELECT c.category, SUM(o.amount) AS revenue
FROM orders o
JOIN customers c USING (customer_id)
WHERE o.order_date >= DATE '2024-01-01'
GROUP BY c.category;

Then open exports/profile.json in your editor. Look for:

The slowest operator by time (e.g., Hash Join, Aggregate Hash)
Bytes read vs returned (column pruning effective?)
Whether filters reduced rows early (good) or late (optimize)

You try it

Turn profiling off: PRAGMA enable_profiling = none;

Change the filter to a tighter range and compare operator timings

File/layout tips

Parquet: prefer bigger files (tens-hundreds MB) over thousands of tiny ones.
Partition by low-cardinality keys you filter on (year, month, region).
Avoid wide rows with many rarely-used columns; split into slim tables when helpful.

You try it

Rewrite a CSV dataset to Parquet and compare read time for a simple aggregation

Recap

Control parallelism and memory with PRAGMAs.
Use ANALYZE and EXPLAIN to understand and improve plans.
Profile occasionally to find real bottlenecks; fix the biggest one first.

See also: Troubleshooting DuckDB for quick fixes that impact performance (file formats, partitions, stats).

< Previous Next >

You can use the arrows ← → on your keyboard to navigate between lessons.

Learn Programming

Learn Linux

Learn Raspberry Pi Pico

Learn MicroPython

Learn Docker

Learn SMARS

Learn Robotics

Learn Python

Learn 3D Design

Learn Databases

DuckDB - Fast, free analytics

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Performance tuning (practical)

Quick wins to keep queries fast: schema, files, PRAGMAs, stats, EXPLAIN, profiling.

PRAGMAs: threads and memory

Create the right objects

Statistics and ANALYZE

EXPLAIN basics (read the plan)

Profiling walkthrough (operator timings)

File/layout tips

Recap

Posts

Robots

Projects

Videos

How it works

Courses

Reviews

Links

Gear

Boards

RSS Feeds

home

Robots

Projects

Links

Recommendations

About Kevs Robots

Blog

Learn

Learn Programming

Learn Linux

Learn Raspberry Pi Pico

Learn MicroPython

Learn Docker

Learn SMARS

Learn Robotics

Learn Python

Learn 3D Design

Learn Databases

How It Works

Reviews

Glossary

Boards

Events

DuckDB - Fast, free analytics

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Performance tuning (practical)

Quick wins to keep queries fast: schema, files, PRAGMAs, stats, EXPLAIN, profiling.

PRAGMAs: threads and memory

Create the right objects

Statistics and ANALYZE

EXPLAIN basics (read the plan)

Profiling walkthrough (operator timings)

File/layout tips

Recap