optimization - Optimizing mean in python -


I have a function that updates centroid (mean) in the Kashmir algorithm. I ran a profiler and saw that this function uses a lot of computing time.

It looks:

  def updateCentroid (self, label): x = []; Y = [] for the point in the self. Cluster [label]. Point: X.append (point.x) Y.append (point.y) self.clusters [label] .centroid.x = numpy.mean (X) self.cluster [label] .centroid.y = numpy.mean ( Y)  

So I think, is there a more efficient way to calculate the mean of these points? If not, is there another great way to prepare it? ;)

Edit:

Thanks for all the great responses! I was thinking that maybe I can count less cumulative ways: alt text

Where x_bar (t) means new and x_bar (t-1) is old means.

The result of which is similar:

  def UpdateCentroid (self, label): cluster = self.clusters [label] n = len (cluster.points) cluster. Centroid.x * = (n-1) / n cluster.centroid.x + = cluster.points [n-1] .x / n cluster.centrode.y * = (n-1) / n cluster.scentroy.ai + = Cluster point [N-1]. / N>  

This is not working really, but do you think it can work with some twiking?

One of the means algorithms is already implemented in. If there is something about the implementation that you are trying to change, then I suggest starting from the study of code here:

  in [62]: import scipy.cluster. Vq as scv [64]: scv.__ file__out [64]: '/usr/lib/python2.6/dist-packages/scipy/cluster/vq.pyc'  

Ps Because the data in your posted algorithm ( self.clusters ) and attribute lookup ( .points ) keeps data behind, you use slow python looping Is compelled for information. A big speed can be achieved by placing it with oval arrays. For ideas on better data structure, see the implementation of k-means clustering.


Comments