Intermittent Pickle Problems in Jython

The previous post mentioned "integrating" Jython and CPython by transmitting a stream of pickles between the two. I encountered one intermittent problem with this approach, and I'm unsure of its cause. (Hm, and I should probably post this to a Jython mailing list...)

Problem

In Jython I'd pickled the str() of a java.io.StringWriter, into which I'd just written the SD representation of a CDK molecule. Jython could create the pickle alright. But when I tried to unpickle it in CPython, sometimes, for some molecules, I got a traceback:

File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py", line 970,  in load_string
    raise ValueError, "insecure string pickle"
ValueError: insecure string pickle

The error occurred consistently in my application code, always on the same input structure. But I couldn't derive a simple test script to demonstrate the problem.

Investigation

Examination of the problematic pickle data showed that a Python unicode string literal marker had somehow been inserted, and the type code for the item was somehow S (for string) rather than V (for unicode):

...
sS'sdf'
p3
Su'ZINC00000181\n  CDK...
 ^ What the... ?

Workaround

Google turned up a usable workaround: encode the offending string as utf-8 before trying to pickle it.

import codecs

enc = codecs.getencoder('utf8')
...
sdf = enc(sdf)
...