I’ve been working a lot recently on an internal project for a large software company, and we’re doing it in Google App Engine with Google Cloud SQL behind it. I like this combo, because it allows me to code a pretty powerful web application in Python and use a mySQL backend. Also, it’s pretty easy to code locally and deploy to Google’s servers.

But the setup isn’t without its quirks. For one thing, while Google claims that anything that you develop locally will just work when you deploy it to GAE, that is not true at all. Here’s an example:

Our front end is basically static HTML and jQuery, making AJAX calls to a web service that I set up in GAE’s Python layer, which talks to the Google Cloud SQL / mySQL backend. The data on the front end is displayed using Google Chart Tools (formerly Google Visualizations) tables and charts. When a data set is passed to a visualization, the data types of the columns need to be passed in as well. I didn’t want my front-end developer to maintain a list of the column types, since the schema is subject to frequent change anyway. The smart thing seemed to ask the database to pass back the schema of the data sets, including the column types, so I could pass those on to the front-end developer.

This seems deceptively easy when using Google App Engine’s implementation of DB-API with MySQLdb (aka mysql-python): rdbms. When a cursor is returned with data from the database, it includes the schema in the cursor’s description, including, along with the column names, integers that represent the column’s type. So for example:

from google.appengine.api import rdbms
...
# Set up data connection
conn = rdbms.connect(instance=INSTANCE_NAME, database=DATABASE_NAME)
sql = 'SELECT * FROM test_table'
cursor = conn.cursor()

# Get the data
cursor.execute(sql)

# Get the schema of the data returned
for column in cursor.description:
    column_name = column[0]
    column_type = column[1]
...

So in the example above, column[1] returns an integer that corresponds to the column’s type. Seems simple enough. In fact, the standard mappings for the codes are easy enough to find. So what’s the problem?

The problem is that when using Google App Engine with Google Cloud SQL, the codes are different from the standard codes shown above. Here’s a table comparing them:

type MySQLdb (localhost) rdbms (Google App Engine)
varchar 253 12
char 254 1
datetime 12 93
date 10 91
time 11 92
timestamp 7 93
year 13 91
bigint 8 -5
double 5 8
decimal 246 3
float 4 6
mediumint 9 4
int 3 4
smallint 2 5
tinyint 1 -6
longtext 252 2005
mediumtext 252 2005
tinytext 252 12
text 252 12
bit 16 -7

As you can see, the type codes aren’t logically related or close in any way. So how can the code reliably test for the column type if the codes are totally different on the server and in my dev environment?

Well, I ended up cheating. The code of my application tests whether the current URL contains “localhost” or not. If so, it uses one set of type codes, and if not, the other set of type codes are used.

I’m open to better ideas.

UPDATE: September 6, 2012: So I ended up posting the issue to the Google Cloud SQL forum, and I got an official answer from Google: it’s a bug.  They were returning the JDBC codes instead of the MySQLdb codes.  They plan to fix this, so this problem will eventually go away.  However, Google can’t tell me when the switch will take place, and can’t guarantee that the fix won’t be rolled back for one reason or another.

So, faced with type code mappings that will fluctuate unpredictably, I had to modify the Python code to determine the db status by explicitly testing a known type against its known MySQLdb type code.  Specifically, the Python code checks that a field typed as a decimal returns 246 (the standard MySQLdb type code for decimal), in which case Python uses the MySQLdb mappings.  If anything else comes back, the Python code assumes that the JDBC codes are being used.

The Python code looks something like this:

def _is_using_mysqldb_type_codes(self):
    conn = rdbms.connect(instance=self._instance_name,
                         database=self._database_name)

    sql = 'SELECT cast(1.0 as decimal) as my_decimal'
    cursor = conn.cursor()
    cursor.execute(sql)

    column = cursor.description[0]
    type_code = column[1]
    if type_code == 246:
        return True

    return False

 

This might actually qualify as the “better idea”, since I won’t have to worry about changing any code when Google does actually make the switch.

A hearty “thank you” to Ken Ashcraft at Google for actually getting back to me on the forum and logging the bug.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Set your Twitter account name in your settings to use the TwitterBar Section.