arrays - Subsetting data in Python -


I am writing for some python code, I want to use the equivalent command command in R.

Here's my data:

  col1 col2 col3 col4 col5 100002 2006 1.1 0.01 6352 100002 2006 1.2 0.84 304518 100002 2006 2 1.52 148219 100002 2007 1.1 0.01 6292 10002 2006 1.1 0.01 5968 10002 2006 1.2 0.25 104318 10002 2007 1.1 0.01 6800 10002 2007 4 2.03 25446 10002 2008 1.1 0.01 6408  

I base the content of col1 and I want to dedicate the data to . (The unique value in call 1 is 100002 and 10002, and 2 is 2, 2006, 2007 and 2008).

This subset can be done in R using command, is there something similar in Python? ?

While the iterator-based answer is perfectly fine, if you are working with numpy arrays (like That you mention that you have a better and faster way of choosing things:

  np data = np.array ([[100002, 2006, 1.1, 0.01, 6352], [100002, 2006, 1.2, 0.84, 304518], [100002, 2006, 2, 1.52, 14821 9], [100002, 2007, 1.1, 0.01, 6292], [10002, 2006, 1.1, 0.01, 5968], [10002, 2006 , 1.2, 0.25, 104318], [10002, 2007, 1.1, 0.01, 6800], [10002, 2007, 4, 2.03, 25446], [10002, 2008, 1.1, 0.01, 6408]]) subset 1 = data [ Data [, 0] == 100002] subset2 = data [data [, [0] == 10002]  

this yield

subset 1:

  Array ([[1.000 02e + 05, 2.006e + 03, 1.10e + 00, 1.00e-02, 6.352e + 03], [1.00002e + 05, 2.006e + 03, 1.20e + 00, 8.40 -01, 3.04518 E + 05], [1.00002e + 05,2006e + 03, 2.00e + 00, 1.52e + 00, 1.4821 9A + 05], [1.00002e + 05, 2.007e + 03, 1.10e + 00, 1.00e-02, 6.292e + 03]])  

subset 2:

  array ([[1.0002e + 04, 2.006e + 03, 1.10e + 00, 1.00 E 02, 5.968 E + 03], [1.0002e + 04, 2.006e + 03, 1.20e + 00, 2.50e-01, 1.04318 A +05], [1.0002 A + 04, 2.007E + 03, 1.10e +00, 1.00e-02, 6.800e + 03], [1.0002e + 04, 2.007e + 03, 4.00e + 00, 2.03e + 00, 2.5446E + 04], [1.0002 E + 04, 2.008e + 03, 1.10e + 00, 1.00e-02, 6.408e + 03]]  

If you did not already know the unique value in the first column, then The function you created to find them either Set

Edit: I realized that you want to select the data, where you have unique combinations of two columns ... in that case, you can do something like that Col1 = Data [, 0] col2 = data [, 1] for val 1 command = {}, itertools.product in val2 (np.unique (col1), Np.unique (col2)): subset = data [(col1 == well 1) & amp; (Col2 == val2)] if np.any (subset): subsets [(val1, val2)] = subset

(I store a subset as a dictionary, with the key Being a Tutorial of the Combination ... Surely other (and better, depending on what you are doing) Ways to do this!)


Comments