Wednesday, September 18, 2013

Yi Jiang's note, 20130918

Yi Jiang's note:

I find out that these two methods are what we need to parse those huge XML files: cElementTree.iterparse() and ElementTree.iterparse().
(1) http://effbot.org/zone/celementtree.htm
(2) http://effbot.org/zone/element-iterparse.htm

Both can load a small piece of the XML file into the memory and then parse it, which prevents occupying too much memory resources. And the iterparse() method in cElementTree library is faster than that in ElementTree library.

There are two questions I need to figure out:
1. How to remove items.
once an item is parsed, it needs to be removed to release the memory. Things to be released not only include the content within the item, but also include the item itself. Also, we also need to remove those items that we are not interested in.

2. namespace
Some documents discussed on how to solve problems related to the namespace. I will try to figure out whether it will be a problem in our project.

No comments:

Post a Comment