Páginas

Sunday, May 27, 2012

How to select random rows from a SQLite table fast

Today I wanted to select some random rows from a large (~3GB, ~900K rows) SQLite database, where the approach found elsewhere on the Web is simply too slow:

SELECT * FROM table ORDER BY random() LIMIT n; 

 Instead of doing that, I came up with a faster alternative:

SELECT * FROM table WHERE random() % k = 0 LIMIT n;

In the examples above, adjust *, table, k and n as suitable for you. n is the (maximum) number of rows returned, and k is an integer constant that determines how probable it is to select a given row.
For instance, k = 2 means (about) 0.5 probability, k = 3 means 0.33, k = 4 means 0.25 and so on.

My alternative above will return random rows, but sorted by the primary key. If you want random rows in random order you can save the retrieved rows in a temporary table and then shuffle them:

CREATE TEMP TABLE temp_rows AS
    SELECT * FROM table WHERE random() % k = 0 LIMIT n;
SELECT * FROM temp_rows ORDER BY random();

That went faster than a compound select.

No comments: