Páginas

Showing posts with label sqlite. Show all posts
Showing posts with label sqlite. Show all posts

Wednesday, July 25, 2012

Upgrade your Python's SQLite module on WebFaction

I needed to deploy an application that uses some "new" SQLite features, such as Full-Text Search support. The sqlite3 library available on the system and the standard Python module sqlite3 were too old.
To see the installed versions:

$ python2.7 -c 'import sqlite3; print "lib version:", sqlite3.version, "| SQLite version:", sqlite3.sqlite_version'
lib version: 2.6.0 | SQLite version: 3.3.6

To test FTS support you can run this on the command line:

$ python2.7 -c 'import sqlite3; db = sqlite3.connect(":memory:"); db.execute("create virtual table test using fts4()"); print db.execute("select count(*) from test").fetchone()[0]'
Traceback (most recent call last):
  File "", line 1, in
sqlite3.OperationalError: near "virtual": syntax error

It should print "0", and give no error.

After talking with WebFaction's support team (these guys are great!), they recommend installing pysqlite2 from source. That is the project from where the standard sqlite3 is taken with 1 minor modification that we will reproduce soon.

The steps to get your sqlite3 upgraded:

1. Download and extract the pysqlite2 sources:
$ wget http://pysqlite.googlecode.com/files/pysqlite-2.6.3.tar.gz
$ tar xf pysqlite-2.6.3.tar.gz

2. Build using the most recent version of SQLite:
$ cd pysqlite-2.6.3
$ python2.7 setup.py build_static

Note: do not use "setup.py build", use "build_static" instead. It will download the newest version of SQLite and link against it. (Learned from this question on StackOverflow)

3. Install to your home directory:
$ mkdir -p $HOME/lib/python2.7
$ PYTHONPATH=$HOME/lib/python2.7 python2.7 setup.py install --home=$HOME

4. (optional) Override the standard sqlite3 module:
$ echo "from dbapi2 import *" >> $HOME/lib/python2.7/pysqlite2/__init__.py
$ ln -s $HOME/lib/python2.7/{pysqlite2,sqlite3}

This will let your application work without having to change your code (to update import lines).

 We should be done by now:
$ python2.7 -c 'import sqlite3; print "lib version:", sqlite3.version, "| SQLite version:", sqlite3.sqlite_version'
lib version: 2.6.3 | SQLite version: 3.7.13
$ python2.7 -c 'import sqlite3; db = sqlite3.connect(":memory:"); db.execute("create virtual table test using fts4()"); print db.execute("select count(*) from test").fetchone()[0]'
0

Tuesday, May 29, 2012

Compiling SQLite as a shared library on Ubuntu

I wanted to have the most recent SQLite available for my Python applications.
Ubuntu repositories couldn't help, so, for future reference, here is what I did:

cd /tmp
wget http://www.sqlite.org/sqlite-autoconf-3071201.tar.gz
tar xvzf sqlite-autoconf-3071201.tar.gz
cd sqlite-autoconf-3071201/
# set your own options
CFLAGS="-Os -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_RTREE" ./configure --prefix=/usr
make

# quick-n-dirty way to replace the original lib
sudo mv /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6{,.orig}
sudo mv /usr/bin/sqlite3{,.orig}

chmod -x .libs/libsqlite3.so.0.8.6
sudo cp .libs/libsqlite3.so.0.8.6 /usr/lib/x86_64-linux-gnu/
sudo cp .libs/sqlite3 /usr/bin/

cd
# check that Python sees the new version
python -c 'import sqlite3; print sqlite3.sqlite_version'
# check that SQLite shell works
sqlite3

Sunday, May 27, 2012

How to select random rows from a SQLite table fast

Today I wanted to select some random rows from a large (~3GB, ~900K rows) SQLite database, where the approach found elsewhere on the Web is simply too slow:

SELECT * FROM table ORDER BY random() LIMIT n; 

 Instead of doing that, I came up with a faster alternative:

SELECT * FROM table WHERE random() % k = 0 LIMIT n;

In the examples above, adjust *, table, k and n as suitable for you. n is the (maximum) number of rows returned, and k is an integer constant that determines how probable it is to select a given row.
For instance, k = 2 means (about) 0.5 probability, k = 3 means 0.33, k = 4 means 0.25 and so on.

My alternative above will return random rows, but sorted by the primary key. If you want random rows in random order you can save the retrieved rows in a temporary table and then shuffle them:

CREATE TEMP TABLE temp_rows AS
    SELECT * FROM table WHERE random() % k = 0 LIMIT n;
SELECT * FROM temp_rows ORDER BY random();

That went faster than a compound select.