Python, SAX and creating instances of your classes from it
So I made up a small XML-Snippet to send data of of probes around. Basically it's just the information
which host collected data of
when it was and
what it's value is.
What we could end up with in the worst case is an XML document of quite a few megabytes. Now to efficiently parse that thing I wanted to use something that doesn't need to keep the whole document in memory.
I chose to use sax as I didn't want to jump into XML to deep, and a bit of googling around suggestedSo what's the simplest way of using this SAX thing with python?
First let's define some sample data
Now our first draft of using the SAX module in python. What you do with SAX essentially is:
import the needed modules
make a subclass of sax.ContentHandler
Later we will implement the methods for:
what to do when the document starts
what to do when the document ends
what to do when an element starts
what to do when an element ends
what to do with content of elements
No that code doesn't really do much except that it doesn't generate errors. Let's enhance it so that we at least get some info on what's happening
If you run this code you should be able to see how SAX works, it's actually just executing the method defined for an event. Now the good news is you don't need to worry about memory (mostly, that is). The bad news is that you don't really have any object like access so you need some sort of state to know when to do what. Let's create a simple class that will hold the values of your items defined in the XML sample
A simple data structure like Item class. The hard part with SAX now is that you don't have access to the necessary parameters without some work to create an instance of the Item class. Let's fix that by first doing a step backwards and writing some documentation for the ItemListParser class so that we have a plan of what we want.
That's nice, but our script is back to a noop. So let's implement what we just defined.
Now we are all good to go, but still we don't have anything done to our Item class that holds the data we encountered. Let's change that and at least create a single Item that we can print. To do that just modify the endelement method to look like the following:
If we now call our python script with a file that contains our sample XML you should get the following output: