Saturday, December 25, 2010

updatable SQL table with external ordering

Nearly half a year a crucial bug was found in uguu. It was fixed that time and it haven't bothered us since then. I've always wanted to write about it here.

We store all files in a single SQL table. Each row has a path_id, and a file_id. The first is same for all files from the same directory. The second represents a (lexicographical) order on files. We assign file ids sequentially without a gap. While updating contents of a share we need to insert/delete some rows in the files table preserving the sequential order.


First solution.

To insert a row with file id N, increment ids for rows with ids >= N
UPDATE files SET file_id = file_id + 1 WHERE file_id >= N AND path_id = some_path
then insert the new row with file_id N.

Deletion is performed analogously:
DELETE FROM files WHERE file_id = N AND path_id = some_path
UPDATE files SET file_id = file_id - 1 WHERE file_id > N AND path_id = some_path

Clearly, for a directory with M files addition of K new files with ids 1,2....,K can result in O(M*K^2) time. And which is more frustrating if all changes are done in a single transaction, the amount of additional space can rise dramatically for Postgresql 8.4. The same is true for deletion.


Linear solution.

Work is done in two passes. In the first pass removed files are deleted from the table and new files are inserted into a new temporary table. In the second pass file ids are being corrected in a single loop with the help of postgresql cursors. We make loop over rows from old table and use cursor to point to current entry in the new table. After the second pass gaps in file ids in the old table are exactly ids of new files (in a temporary table). Finally, we insert everything from temporary table into old one. For an implementation of the second pass see function push_path_files here.

No comments: