Recursive SQL queries over tree
Some 10+ years ago on a job interview I was asked to query the database for the comments tree in a front-end friendly way, meaning no heavy processing/sorting to happen on the back-end. Turns out this question was to check if I knew about the Hierarchical Queries with Oracle (WITH RECURSIVE
on other databases), which I did not back then.
So this is how an idiomatic way to do it looks like, with SQLite.
Sample comment tree:
- (id=101) uno: A
- (id=102) due: B
- (id=108) tre: H
- (id=110) uno: I
- (id=109) tre: J
- (id=102) due: B
- (id=103) due: C
- (id=104): uno: D
- (id=105) tre: E
- (id=106) uno: F
- (id=107) due: G
- (id=106) uno: F
Create and open SQLite database:
touch test.db
sqlite3 test.db
Define the table:
CREATE TABLE comments (
entry_id SERIAL PRIMARY KEY,
reply_to BIGINT UNSIGNED,
issue_id BIGINT UNSIGNED NOT NULL,
entry_datetime DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
entry_content TEXT NOT NULL,
entry_author TEXT NOT NULL,
FOREIGN KEY (reply_to) REFERENCES comments(entry_id)
);
Insert comments:
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (101, NULL, 42, 'uno', 'A');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (102, 101, 42, 'due', 'B');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (103, NULL, 42, 'due', 'C');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (104, 103, 42, 'uno', 'D');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (105, NULL, 42, 'tre', 'E');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (106, 105, 42, 'uno', 'F');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (107, 106, 42, 'due', 'G');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (108, 102, 42, 'due', 'H');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (109, 101, 42, 'tre', 'I');
INSERT INTO comments (entry_id, reply_to, issue_id, entry_author, entry_content)
VALUES (110, 102, 42, 'uno', 'J');
Query the tree (pre-order traversal):
WITH RECURSIVE
is followed by a table-defining clause, that describes a starting rows set (WHERE reply_to is NULL
) and a “connection” between tree layers with (JOIN comments c ON c.reply_to = ct.entry_id
) and then glues them together using UNION ALL
.
WITH RECURSIVE comment_tree (
issue_id,
entry_id,
reply_to,
entry_author,
entry_content,
depth
) AS (
SELECT
issue_id,
entry_id,
reply_to,
entry_author,
entry_content,
0 AS depth
FROM comments
WHERE reply_to is NULL
UNION ALL
SELECT
c.issue_id,
c.entry_id,
c.reply_to,
c.entry_author,
c.entry_content,
ct.depth+1 AS depth
FROM comment_tree ct
JOIN comments c ON c.reply_to = ct.entry_id
)
SELECT entry_id, reply_to, entry_author, entry_content, depth
FROM comment_tree
WHERE issue_id = 42
ORDER BY entry_id, reply_to
;
Get the results (with NULL
represented as ---
just to keep formatting straight):
101|---|uno|A|0
102|101|due|B|1
103|---|due|C|0
104|103|uno|D|1
105|---|tre|E|0
106|105|uno|F|1
107|106|due|G|2
108|102|due|H|2
109|101|tre|I|1
110|102|uno|J|2
More details in the book: SQL Antipatterns, Volume 1.