Maintain a top-k set in Java -


Can not think of a well-organized way of doing so in Java:

I'm streaming the set one Line from line, line of strings from file

  s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 ...  

I get a line with a TreeSet , some analysis Throw it and throw it on the next line ... I can fit the content of individual lines in memory, but not everything.

Now I want to maintain top 5 biggest strings in scanning (nothing else to store) in order to maintain

I priority question < With / code> a SetSizeComparator , with add / poll I'm reaching the size of queue 5. Anyone found a nipper solution?

(Today I can not do the brain. I have dumb ...)

< Ol>
  • Create a tube, say linetipple, which should have a line and its string frequency.

  • There are lineupplays with comparator in comparison to frequency values.

  • For the first lines, insert them into the heap.

  • to (k + 1) from the St. line,

    • Remove the route, i.e. the heap with the minimum frequency, the pile with, and ( This operation is o (LG K) .)
    • Create a tulip with the current line and insert it into a heap. (This operation average constant time, worst position O (lg k) )
  • any From time to time, k tuples k are the largest lines contained in the heap.

  • I'm not clear in Java, so I can not provide any code sample but, check it out.


    Comments