Is the new Raspberry Pi AI Kit better than Google Coral?
135561 Views
Build Your Own AI Assistant Part 1 - Creating the Assistant
121827 Views
Control Arduino with Python using Firmata / PyFirmata
88872 Views
How to Map with LiDAR - using a Raspberry Pi Zero 2W, RPLidar and Rviz
65602 Views
Creating a Supercomputer with a Raspberry Pi 5 Cluster and Docker Swarm!
60799 Views
Node-Red Automation, MQTT, NodeMCU & MicroPython
53953 Views
How to Keep your Raspberry Pi happy
How to Install Pi-Apps on a Raspberry Pi
Pikon II, The Ultimate DIY Raspberry Pi Camera!
Pico Plotter
LEGO Gets Lights & Sound with Tiny FX
Thinkman
DuckDB - Fast, free analytics
1h 36m
Obsidian
1h 0m
Getting Started with C on the Raspberry Pi Pico
0h 50m
Running K3s on Raspberry Pi
1h 28m
From Docker to Podman
1h 2m
MicroPython Robotics Projects with the Raspberry Pi Pico
0h 54m
Learn how to Program in Python, C, Rust, and more.
Learn Linux from the basics to advanced topics.
Learn how to use a Raspberry Pi Pico
Learn MicroPython the best language for MicroControllers
Learn Docker, the leading containerization platform. Docker is used to build, ship, and run applications in a consistent and reliable manner, making it a popular choice for DevOps and cloud-native development.
Learn how to build SMARS robots, starting with the 3D Printing the model, Designing SMARS and Programming SMARS
Learn how to build robots, starting with the basics, then move on to learning Python and MicroPython for microcontrollers, finally learn how to make things with Fusion 360.
Learn Python, the most popular programming language in the world. Python is used in many different areas, including Web Development, Data Science, Machine Learning, Robotics and more.
Learn how to create robots in 3D, using Fusion 360 and FreeCAD. The models can be printed out using a 3d printer and then assembled into a physical robot.
Learn how to create Databases in Python, with SQLite3 and Redis.
KevsRobots Learning Platform
70% Percent Complete
By Kevin McAleer, 7 Minutes
Parquet is a columnar file format ideal for analytics. DuckDB reads and writes Parquet natively and efficiently.
New to terms like Parquet, partitioning, or predicate pushdown? See the Beginner glossary.
Why this matters:
CREATE TABLE trips AS SELECT * FROM read_parquet('data/trips/*.parquet');
Notes:
*.parquet
What is a âglobâ? A glob is a simple wildcard pattern for matching files and folders: *.parquet â all Parquet files in a folder trips_2024-*.parquet â files starting with trips_2024- year=*/month=*/*.parquet â partitioned folders by year/month DuckDB accepts these patterns in functions like read_parquet() and read_csv_auto().
What is a âglobâ? A glob is a simple wildcard pattern for matching files and folders:
trips_2024-*.parquet
trips_2024-
year=*/month=*/*.parquet
read_parquet()
read_csv_auto()
Select only the columns you need and filter early to reduce IO:
SELECT vendor_id, pickup_date, fare_amount FROM trips WHERE pickup_date >= DATE '2024-01-01';
Produce compact, portable outputs for downstream tools.
COPY ( SELECT vendor_id, DATE_TRUNC('month', pickup_date) AS month, SUM(fare_amount) AS revenue FROM trips GROUP BY vendor_id, month ) TO 'exports/trips_monthly.parquet' (FORMAT 'parquet');
exports/
*_monthly
Partitioning creates subfolders by keys; DuckDB can skip folders when those keys are filtered.
COPY ( SELECT *, strftime(pickup_date, '%Y') AS year, strftime(pickup_date, '%m') AS month FROM trips ) TO 'exports/trips_by_year_month' (FORMAT 'parquet', PARTITION_BY (year, month));
Now you can scan selected partitions quickly:
SELECT COUNT(*) FROM read_parquet('exports/trips_by_year_month/year=2024/month=09/*.parquet');
Partitioning, explained Imagine a filing cabinet: you put trips into drawers by year, then folders by month. If you only need 2024â09, you open just that folder. On disk this looks like year=2024/month=09/âŚ. DuckDB understands this pattern and can skip all other folders. A good partition key is something you filter by often (date, region) and has limited distinct values (12 months, a handful of regions). When to partition Datasets larger than a few hundred MB, or when you regularly query slices (by month, by region). Donât overâpartition (e.g., by minute or by user ID) â it creates thousands of tiny files. How to query partitions By folder path: SELECT * FROM read_parquet('âŚ/year=2024/month=09/*.parquet'); Or by predicate using virtual columns: SELECT COUNT(*) FROM read_parquet('âŚ/year=*/month=*/*.parquet') WHERE year = 2024 AND month = 9;
Partitioning, explained
year
month
year=2024/month=09/âŚ
When to partition
How to query partitions
SELECT * FROM read_parquet('âŚ/year=2024/month=09/*.parquet');
SELECT COUNT(*) FROM read_parquet('âŚ/year=*/month=*/*.parquet') WHERE year = 2024 AND month = 9;
What is âpredicate pushdownâ? A predicate is your filter (the WHERE clause). Pushdown means DuckDB sends that filter into the file reader, so it only reads the parts that could match. Parquet stores min/max stats per row group. If fare_amount > 0 and a row groupâs max is 0, DuckDB can skip that group entirely. Result: less data read from disk/network and faster queries. Without pushdown (conceptually): read data -> then filter -> keep a few rows. With pushdown: tell the reader the filter first -> skip nonâmatching chunks -> read much less.
What is âpredicate pushdownâ?
WHERE
fare_amount > 0
Without pushdown (conceptually): read data -> then filter -> keep a few rows. With pushdown: tell the reader the filter first -> skip nonâmatching chunks -> read much less.
SELECT vendor_id, SUM(fare_amount) AS revenue FROM read_parquet('exports/trips_by_year_month/year=2024/*.parquet') WHERE fare_amount > 0 GROUP BY vendor_id;
What it is:
Why it matters:
Safe patterns:
NULL
COALESCE
CAST(id AS BIGINT)
Inspect schemas quickly:
-- Peek at columns and types via DESCRIBE on a scan DESCRIBE SELECT * FROM read_parquet('exports/trips_by_year_month/year=2024/*.parquet');
Normalize when combining different vintages:
-- Example: newer files have `surcharge`, older files do not CREATE OR REPLACE VIEW trips_all AS SELECT vendor_id, pickup_date, fare_amount, surcharge FROM read_parquet('data/new/*.parquet') UNION ALL SELECT vendor_id, pickup_date, fare_amount, CAST(NULL AS DOUBLE) AS surcharge FROM read_parquet('data/old/*.parquet');
Persist a curated, consistent table:
CREATE OR REPLACE TABLE trips_curated AS SELECT CAST(vendor_id AS VARCHAR) AS vendor_id, CAST(pickup_date AS TIMESTAMP) AS pickup_ts, fare_amount, COALESCE(surcharge, 0) AS surcharge FROM trips_all;
UNION ALL
CAST
schema.md
Common pitfalls:
COPY
year, month, day
SELECT *
SELECT COUNT(*) FROM read_parquet('data/sales/part-*.parquet');
You try it Replace with any folder you have; confirm count matches number of rows across files
You try it
COPY ( SELECT * FROM my_table ) TO 'exports/my_table.parquet' (FORMAT PARQUET);
You try it Export a filtered subset (e.g., WHERE date >= '2024-01-01')
WHERE date >= '2024-01-01'
COPY ( SELECT * FROM my_table ) TO 'exports/sales_by_year_month' (FORMAT PARQUET, PARTITION_BY (year, month));
You try it List the created folder tree and spot year=YYYY/month=MM folders
year=YYYY/month=MM
SELECT customer_id, SUM(amount) AS revenue FROM read_parquet('exports/sales_by_year_month') WHERE year = 2024 AND month = 6 GROUP BY customer_id;
You try it Add a second filter on amount > 0 and run EXPLAIN to see filtered scans
amount > 0
EXPLAIN
-- Old files without column `coupon` -- New files add nullable `coupon` string SELECT COUNT(*), COUNT(coupon) AS with_coupon FROM read_parquet('exports/sales_by_year_month');
You try it Add one small Parquet with an extra nullable column and confirm queries still work
country
< Previous Next >
You can use the arrows â â on your keyboard to navigate between lessons.
â â