KevsRobots Learning Platform

DuckDB - Fast, free analytics

7% Percent Complete

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Introduction

Start here to learn what DuckDB is, why it shines for analytics, and how this course is structured.

By Kevin McAleer, 3 Minutes

DuckDB for Analytics — Introduction

DuckDB is an in-process, columnar SQL database that runs anywhere you can run your app. It excels at fast analytical queries on local files like Parquet and CSV, without needing a server, a cluster, or cloud credits.

New to terms like Parquet, CTE, or materialize? See the Beginner glossary.

This course is part of the Databases learning pathway, focused on practical analytics: querying files, creating small analytical datasets, and integrating with Python.

What you’ll learn

Install and run DuckDB (CLI and Python)
Query CSV/Parquet directly with SQL
Create and persist a local analytics database
Work with DuckDB from Python and Pandas/Polars
Read data from data lakes (local, HTTP/S3) and tune performance
Use extensions and best practices for analytical workloads

Who this is for

Makers, analysts, and developers who want fast local analytics
Beginners to databases and data lakes who prefer practical examples

Prerequisites

Basic command line and Python familiarity
macOS, Linux, or Windows
Optional: Python 3.10+ and VS Code for notebook-style work

Why DuckDB

No server: embed it in your scripts and apps
Fast analytical SQL: vectorized, columnar execution
File-first: query Parquet/CSV directly, no ETL required
Great with Python: query DataFrames and return results as DataFrames
Portable: a single .duckdb file you can version and ship

When to choose it:

You need quick, local analytics (OLAP) on files
You want to prototype or build repeatable data workflows without infra
You want SQL over DataFrames, or to complement Pandas/Polars

Quick start (macOS)

CLI: brew install duckdb
Python: python -m pip install duckdb

Try the CLI:

duckdb

Then run a tiny analytics query directly on a CSV from the web:

-- Create a table from a remote CSV
CREATE TABLE tips AS
SELECT * FROM read_csv_auto('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv');

-- Simple aggregate
SELECT day, ROUND(SUM(total_bill), 2) AS revenue, COUNT(*) AS orders
FROM tips
GROUP BY day
ORDER BY revenue DESC;

Exit with .quit.

Or from Python:

import duckdb, pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')

# Query a DataFrame directly
result = duckdb.query("""
  SELECT day, AVG(total_bill) AS avg_bill
  FROM df
  GROUP BY day
  ORDER BY avg_bill DESC
""").df()

print(result)

What we’ll build in this course

A small, local analytics workspace using DuckDB
Reproducible queries over CSV/Parquet (local and remote)
A persisted .duckdb database and lightweight “data mart”
Python integrations for analysis and notebooks (like Jupyter notebooks)
Practical performance tips (statistics, Parquet predicate pushdown, and more)

Course structure

00 intro: what DuckDB is and why use it
01 installing: CLI, Python, VS Code setup
02 basic queries: selecting, filtering, aggregations, joins
03 advanced features: views, CTEs, extensions, persistence
04 python integration: DataFrames, results, notebooks
05 data analysis: end-to-end mini analysis
06 data lakes: reading from local/remote object storage
07 parquet: formats, partitioning, predicates
08 performance tuning: tips for speed and memory
09 conclusion: recap and patterns
10 additional resources: docs, tools, datasets

Next up

If you want a one-page setup and first queries, see the Quickstart. Otherwise, head to 01 installing to set up DuckDB and the Python environment.

Next >

You can use the arrows ← → on your keyboard to navigate between lessons.

Learn Programming

Learn Linux

Learn Raspberry Pi Pico

Learn MicroPython

Learn Docker

Learn SMARS

Learn Robotics

Learn Python

Learn 3D Design

Learn Databases

DuckDB - Fast, free analytics

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Introduction

Start here to learn what DuckDB is, why it shines for analytics, and how this course is structured.

DuckDB for Analytics — Introduction

What you’ll learn

Who this is for

Prerequisites

Why DuckDB

Quick start (macOS)

What we’ll build in this course

Course structure

Next up

Posts

Robots

Projects

Videos

How it works

Courses

Reviews

Links

Gear

Boards

RSS Feeds

home

Robots

Projects

Links

Recommendations

About Kevs Robots

Blog

Learn

Learn Programming

Learn Linux

Learn Raspberry Pi Pico

Learn MicroPython

Learn Docker

Learn SMARS

Learn Robotics

Learn Python

Learn 3D Design

Learn Databases

How It Works

Reviews

Glossary

Boards

Events

DuckDB - Fast, free analytics

Introduction

Getting Started with DuckDB

Integrating DuckDB with Python

Datalakes, Parquet, and Performance

Conclusion and Next Steps

Introduction

Start here to learn what DuckDB is, why it shines for analytics, and how this course is structured.

DuckDB for Analytics — Introduction

What you’ll learn

Who this is for

Prerequisites

Why DuckDB

Quick start (macOS)

What we’ll build in this course

Course structure

Next up