Duckdb

Duckdb, the (tabular) data exploration tool I use supports window operations. I recently discovered that it can also perform cumulative sums in a very efficient manner. Let us generate a toy dataset where we want to calculate the sum of one column relative to the order of another one. CREATE OR REPLACE TABLE seed AS SELECT SETSEED(0.1); -- seeding for reproducibility, creating a table to hide output -- Create a mock dataset with two integer columns CREATE OR REPLACE TABLE my_table AS SELECT #1 AS column_1, CAST(FLOOR(RANDOM() * 100) AS INT) AS column_2 FROM generate_series(1, 10); -- This generates 10 rows SELECT * FROM my_table; -- We write it to a csv for future use COPY my_table TO my_table.csv; ┌──────────┬──────────┐ │ column_1 │ column_2 │ │ int64 │ int32 │ ├──────────┼──────────┤ │ 1 │ 27 │ │ 2 │ 45 │ │ 3 │ 2 │ │ 4 │ 84 │ │ 5 │ 84 │ │ 6 │ 26 │ │ 7 │ 18 │ │ 8 │ 65 │ │ 9 │ 97 │ │ 10 │ 11 │ ├──────────┴──────────┤ │ 10 rows 2 columns │ └─────────────────────┘ If we wanted to calculate the distribution of the cumulative sum of the table we could use the OVER clause to perform the sum of column_2 in the order defined by column_1. ...

Duckdb

Use dired-do-shell to explore the parquet schema from Emacs

Calculate the cumulative sum of a column using DuckDB