Yi Jiang's note:
I
find out that these two methods are what we need to parse those huge
XML files: cElementTree.iterparse() and ElementTree.iterparse().
(1) http://effbot.org/zone/celementtree.htm
(2) http://effbot.org/zone/element-iterparse.htm
Both can load a small piece of the XML file into the memory and then
parse it, which prevents occupying too much memory resources. And the
iterparse() method in cElementTree library is faster than that in
ElementTree library.
There are two questions I need to figure out:
1. How to remove items.
once
an item is parsed, it needs to be removed to release the memory. Things
to be released not only include the content within the item, but also
include the item itself. Also, we also need to remove those items that
we are not interested in.
2. namespace
Some documents discussed on how to solve
problems related to the namespace. I will try to figure out whether it
will be a problem in our project.
No comments:
Post a Comment