Psycopg and Python: Perfect Together

106 9
Introduction

As a complement to my tutorial on inserting data into a PostgreSQL database, we will insert data into a PostgreSQL database in this tutorial. While MySQL is perhaps the most popular database system of the day, particularly for SMBs (small and medium businesses), it is dwarfed in its power and flexibility by PostgreSQL. As mentioned previously, MySQL may be free as in gratis (i.e., cost-free) but it is not free in terms of libre (i.e., ability to modify it).

PostgreSQL is free in both aspects and is powerful enough to rival a certain DBMS that begins with an "O".

This tutorial presupposes a basic familiarity with SQL. It will show how to read an entire PostgreSQL database without need for a function or class.

Psycopg and Python: Perfect Together

The module we will use for this tutorial is again psycopg. It is available from http://www.initd.org/projects/psycopg1. So download it and install it using the directions that come with the package.

Once it is installed, you can import it like any other module:

# libs for database interfaceimport psycopg
If you would like your program to take input from the keyboard, you will also want to import the sys module.

In this tutorial, I will illustrate how every major part of the SQL statement can be input from the keyboard. So, let your import line read as follows:

import sys, psycopg

Receiving Command Line Arguments for the PostgreSQL Statement - I

Before opening a connection to the database, we should attend to the variables we would like to define. As we intend to read from the database, we will be using a SELECT statement; in our case, we will ask for all data that matches a given string to be returned unformatted. In the interest of increased flexibility on our returned data, we will also use a WHERE clause. So the essential skeleton of the statement to be run looks like this:

SELECT * FROM <table> WHERE <column> <operator> <string> As you can probably guess, we will be inputting four pieces of data:
  • table: the name of the table from the given database
  • column: the name of the column within which the operation is to be performed
  • operation: the actual operation to be performed; this may be any of the unary or boolean operations or a simple "IS" or "IS NOT"
  • string: the string to be evaluated against the data
SQL gurus will rightly object at the absence of the semi-colon to end the statement. Psycopg, however, takes care of line termination signals for us.


Receiving Command Line Arguments for the PostgreSQL Statement - II

With that understanding, we then need to assign values to these variables using the sys module. The sys module has an attribute argv which is an array holding arguments from when the program is executed. By way of example, when one uses the shell command 'mkdir', the name of the directory to be created is the first (and only) argument of the command. If using Python's sys.argv, this argument would be sys.argv[1] -- the name by which the program is executed is always sys.argv[0].

Each additional argument follows in sequence.

Our program will take four arguments, one for each part of the SELECT statement to be made.

table = sys.argv[1]column = sys.argv[2]string = sys.argv[3]operation = sys.argv[4]

Connecting to PostgreSQL Through Psycopg

To open a connection to a database, psycopg needs two arguments: the name of the database ('dbname') and the name of the user ('user'). If the program is to be executed in the name of a user other than the one used for the PostgreSQL account, you will also need to use the 'password=' option. The syntax for opening a connection follows this format:

<variable name for connection> = psycopg.connect('dbname=<dbname>', 'user=<user>')

For our database, we shall use the database name 'Melange' and the username 'tempsql'. For the connection object within the program, let's use the variable 'connection'. As mentioned, we are writing this program without classes and without any other function than main(). So, the beginning of main(), including our connection command will read as follows:

def main():connection = psycopg.connect('dbname=Melange', 'user=tempsql')
Naturally, this command will only work if both variables are accurate: there must be a real database named 'Melange' to which a user named 'tempsql' has access. If either of these conditions are not filled, Python will throw an error.

Next, Python likes to be able to keep track of where it last left off in reading and writing to the database. In psycopg, this is called the cursor, but we will use the variable 'mark' for our program. So, we can then construct the following assignment:

mark = connection.cursor()

Now we can define the statement we would have executed. Since the variables are already defined at runtime (i.e., when the program is executed), we can create the statement and plug the variables in like concatenating a string.

statement = 'SELECT * FROM ' + table + ' WHERE ' + column + ' ' + operator + ' ' + string Do note that this statement will work for any value. When using this statement one must supply the quotes for any character strings.

However, if one wants to match character strings alone, not allowing for numerical calculations, one may supply the quotes within the statement itself.
statement = 'SELECT * FROM ' + table + ' WHERE ' + column + ' ' + operator + ' \'' + string +\ '\'' If you use this statement instead of the previous one, it is a good idea to evaluate the variable operator and to reject any numerical operators. Otherwise, PostgreSQL, and therefore both psycopg and Python, will throw an error.

Next, we need to tell psycopg to pass the statement to PostgreSQL. We do this by use the method 'execute', a method of connection.cursor(). The next statement thus looks like this:

mark.execute(statement) The data returned will be an array of lists, one list for every line returned.
Given how flexible Python tends to be, one might ask why we define statement separately instead of passing its contents directly to execute.

By defining statement separately, one is able to debug the program with fewer complications by simply inserting a print command at the appropriate point.

Consider, for example: PostgreSQL keeps throwing an error (e.g., "ERROR: syntax error at or near "' "]). You look at the SELECT statement a thousand times and still cannot figure out what is off. If you have embedded the statement, you have no way of printing the argument and thereby seeing the SELECT statement from the computer's perspective. If you define it separately, you can print it and better grasp where things are awry. In this way, form is kept separate from function.

Now, after executing the statement, we need a container into which psycopg can pour the results. We shall call this 'records'. One assigns the results to records using the 'fetchall' method as follows:

record = mark.fetchall() 'record' is an array holding the lists returned by execute. [Note that, if one merely wants the first hit, one can use the method 'fetchone' similarly.]
Having dumped the results into an array, we now need to extract them in an orderly fashion.

The simplest and neatest way of doing this is with a 'for' loop:

for i in record:print i
Note that I say 'neatest' from the perspective of programming, not of viewing the data. One naturally programs for functionality, but one must always keep in mind the person who will need to read, modify, or debug your program in six months or a year from its creation and avoid spaghetti code.

This loop will return the records in list form. The user will probably not like this format, but I leave it to you to configure the output according to their needs.

Finally, we should return Python's attention from the main() function and finish off the program.

return if __name__ == '__main__':main() As usual, the last loop evaluates the runtime command from the user and passes Python's attention to main().
One can now call the program with the necessary four arguments.
python ./readpostgresql.py cornucopia id > 0 The results are tasty:
(1, 'banana')
(2, 'kiwi')
(3, 'nectarine')
Using this knowledge, you can easily automate access to several PostgreSQL databases and collate the data for any purpose. With my tutorial on inserting data, one can even create new databases "automagically".


Source...
Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.