I have an XML file that has documents inside 100. Each block looks like this:
& lt; DOC & gt; & Lt; DOCNO & gt; FR940104-2 -00001 & lt; / DocNO & gt; & Lt; PARENT & gt; FR940104-2 -00001 & lt; / Parent & gt; & Lt; Text & gt; & Lt ;! - PJG FTAG 4703 - & gt; & Lt ;! - PJG STAG 4703 - & gt; & Lt ;! - PJG Iitag L = 90G = 1F = 1 - & gt; & Lt ;! - PJG / ITAG - & gt; & Lt ;! - PJG Iitag L = 90G = 1F = 4 - & gt; Federal Register & lt ;! - PJG / ITAG - & gt; & Lt ;! - PJG Iitag L = 90G = 1F = 1 - & gt; / Volume 59, Number 2 / Tuesday, January 4, 1994 / Notice & lt ;! - PJG 0012 freewell - & gt; & Lt ;! - PJG / ITAG - & gt; & Lt ;! - PJG istag L = 01g = 1F = 1 - & gt; Volume. 59, No. 2 & lt ;! - PJG 0012 Freewellen - & gt; & Lt ;! - PJG / ITAG - & gt; & Lt ;! - PJG Iitag L = 02g = 1F = 1 - & gt; Tuesday, January 4, 1994 & lt ;! - PJG 0012 Freewellen - & gt; & Lt ;! - PJG 0012 Freewellen - & gt; & Lt ;! - PJG / ITAG - & gt; & Lt ;! - PJG / STAG - & gt; & Lt ;! - PJG / FTAG - & gt; & Lt; / Text & gt; & Lt; / Doctor & gt;
I have to load this XML doc in a dictionary text
key as the Dakono & amp; Value in the form of the text inside the tag Besides this, all the comments in this lesson should not be. Example: Text ['FR940104-2-00001']
should contain the Federal Register / Volume 59, Number 2 / Tuesday, January 4, 1994 / Notice Volume 59, Number 2 Tuesday, 4 January, 1994 I have written this code.
l = doc.getElementsByTagName ("DOCNO") in node2: node2.childNodes in node3: if node3.nodeType == Node.TEXT_NODE: docno .append (node3.data ); #print node2.data l = doc.getElementsByTagName ("TEXT") for I n = 2 in n = 2: node in node 3. in 2.childNodes: if node 3. NodeType == Node. TEXNNO: text [docano [i]] = node 3 Surprisingly, with my code I'm receiving ['FR940104-2-00001'] as u '\ n'
as to how
Come on ?? How to get what you want
You can repeat this document twice:
Import xml.sax.handler Import Store class DocBuilder (xml.sax.handler.ContentHandler): def __init __ (self): self.state = '' self.docno = '' self .text = archive default name (list) def startElement (self, name, attrs): self.state = name def endElement (self, name): if name == u'TEXT ': self.docno =' def alphabet , Content): content = content.strip () if content: if self.state == u'DOCNO ': self.docno + = content elif self.state == u'TEXT': Content: open.exe (self.text [self.docno] .append ('test.xml') as open ('test.xml') as f: data = f.read () builder = DocBuilder () xml. Value (): ('{k}: {v}' format (k = key, v = '' .join (value)) for sax.parseString key (data, builder), builder.text.iteritems ) # FR940104-2-00001: Federal Register / Volume 59, No. 2 / Tuesday, January 4, 1994 / Notice Volume 59, No. 2 Tuesday, January 4, 1994
Comments
Post a Comment