September 2009 Archive

lxml OSX compilation madness

September 29th, 2009

I don’t think that there has ever been a point in time where installing lxml on OSX was not a horrible pain in the ass. I think that at one point, it was enough just to install updated versions of libxml2 and libxslt, and lxml would compile nicely. But I must have done something bad, cause at some point recently, lxml just stopped compiling for me.

This would not do at all. I use virtualenv (and virtualenvwrapper) pretty religiously, and am loathe to install much of anything in the system-wide site-packages. I needed to get things to a point where a straight-up easy_install lxml would work. I didn’t wanna be messing around with custom install flags or whatever every time I cut a new env.

This is not a neat step-by-step guide for getting lxml working in OSX. This is just a record of some stuff that I saw, and some things that I did. Some combination thereof was sufficient to get things working.

For the record, I’m using lxml 2.2.2, libxml2 2.7.5, and libxslt 1.1.26.

One day, I fired up python, typed from lxml import etree and got an error much like the following:

>>> from lxml import etree
Traceback (most recent call last):
  File "", line 1, in 
ImportError: dlopen(/Users/nwilliams/.virtualenvs/lxml1/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.5-universal.egg/lxml/etree.so, 2): Symbol not found: _xmlFree
  Referenced from: /Users/nwilliams/.virtualenvs/lxml1/lib/python2.6/site-packages/lxml-2.2.2-py2.6-macosx-10.5-universal.egg/lxml/etree.so
  Expected in: dynamic lookup

>>>

Needless to say, I was less than pleased.

Looking though the source, I gathered that _xmlFree was a symbol exported by libxml2. My first thought was that lxml was somehow compiling against the old system version of libxml2, or for some other reason couldn’t find the new version.

I noticed a few funny things about the output when installing lxml. First, there were two lines right before all the heavy compilation work:

Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/local/lib

The second line suggested that lxml was, in fact, seeing the version of libxml2 that I wanted it to. The first was a problem though, because normally, it looks like this:

Using build configuration of libxml2 2.7.5 and libxslt 1.1.26

For some reason, lxml wasn’t finding xml2-config. Doing export XML2_CONFIG=/usr/local/bin/xml2-config seemed to fix things.

The other issue was probably the important one. While compiling lxml, I would see a bunch of warnings that looked like this:

ld: warning in /usr/local/lib/libxml2.dylib, file is not of required architecture
ld: warning in /usr/local/lib/libxslt.dylib, file is not of required architecture

Also for another file or two, and all repeated a few times.

Checking file /usr/local/lib/libxml2.dylib told me that the files in question were only compiled as i386 binaries. I eventually found this page, which showed me how to compile libxml2 and libxslt as universal binaries. I did modify things somewhat, though, most notably adding 64-bit architectures.
The configure command I used for libxml2 was:

env CFLAGS="-arch i386 -arch ppc -arch x86_64 -arch ppc64" ./configure --enable-static=no --without-python --disable-dependency-tracking

And for libxslt:

env CFLAGS="-arch i386 -arch ppc -arch x86_64 -arch ppc64" ./configure --disable-dependency-tracking

The CFLAGS bit is necessary, that’s what’s making things work. I don’t actually know exactly what the point of --enable-static=no is. I’m sure it’ll come back to bite me in the ass at some point. --without-python sounds scary, but since lxml is actually an alternative to the gross “real” bindings, we don’t care about them. I also have no idea what --disable-dependency-tracking does, but make will fail without it.

Once I had these universal libraries built and installed, easy_install lxml worked like a charm.