ho cercato una risposta a questa domanda in quanto sembra piuttosto semplice, ma non ho ancora trovato nulla. Mi scuso se ho perso qualcosa. Ho panda versione 0.10.0 e ho avuto modo di sperimentare con i dati della seguente modulo:aggiungi un campo in dataframe panda con colonne MultiIndex
import pandas
import numpy as np
import datetime
start_date = datetime.datetime(2009,3,1,6,29,59)
r = pandas.date_range(start_date, periods=12)
cols_1 = ['AAPL', 'AAPL', 'GOOG', 'GOOG', 'GS', 'GS']
cols_2 = ['close', 'rate', 'close', 'rate', 'close', 'rate']
dat = np.random.randn(12, 6)
cols = pandas.MultiIndex.from_arrays([cols_1, cols_2], names=['ticker','field'])
dftst = pandas.DataFrame(dat, columns=cols, index=r)
print dftst
ticker AAPL GOOG GS
field close rate close rate close rate
2009-03-01 06:29:59 1.956255 -2.074371 -0.200568 0.759772 -0.951543 0.514577
2009-03-02 06:29:59 0.069611 -2.684352 -0.310006 0.730205 -0.302949 -0.830452
2009-03-03 06:29:59 2.077130 -0.903784 0.449857 -1.357464 -0.469572 -0.008757
2009-03-04 06:29:59 1.585358 -2.063672 0.600889 -1.741606 -0.299875 0.565253
2009-03-05 06:29:59 0.269123 0.226593 1.132663 0.485035 0.796858 -0.423112
2009-03-06 06:29:59 0.094879 -1.040069 0.613450 -0.175266 -0.065172 3.374658
2009-03-07 06:29:59 -1.255167 -0.326474 0.437053 -0.231594 0.437703 -0.256811
2009-03-08 06:29:59 0.115454 -1.096841 -1.189211 -0.208098 -0.807860 0.158198
2009-03-09 06:29:59 2.142816 0.173878 -0.160932 0.367309 -0.449765 -0.325400
2009-03-10 06:29:59 0.470669 -0.346805 1.152648 0.844632 1.031602 -0.012502
2009-03-11 06:29:59 -1.366954 0.452177 0.010713 -1.331553 0.226781 0.456900
2009-03-12 06:29:59 2.182409 0.890023 -0.627318 -1.516574 -1.565416 -0.694320
Come potete vedere, sto cercando di rappresentare i dati timeseries 3d. Quindi ho un indice di timeseries e colonne MultiIndex. Sono piuttosto a mio agio nell'affettare i dati. Se volevo solo una media di uscita dei dati stretti, posso fare il seguente:
pandas.rolling_mean(dftst.ix[:,::2], 5)
ticker AAPL GOOG GS
field close close close
2009-03-01 06:29:59 NaN NaN NaN
2009-03-02 06:29:59 NaN NaN NaN
2009-03-03 06:29:59 NaN NaN NaN
2009-03-04 06:29:59 NaN NaN NaN
2009-03-05 06:29:59 0.410966 -0.412356 0.722951
2009-03-06 06:29:59 -0.103187 -0.497165 0.137731
2009-03-07 06:29:59 0.000194 -0.645375 -0.298504
2009-03-08 06:29:59 -0.074036 -0.541717 -0.035906
2009-03-09 06:29:59 -0.391863 -0.671918 -0.554380
2009-03-10 06:29:59 -0.336397 -0.411845 -0.992615
2009-03-11 06:29:59 -0.251645 -0.289512 -0.458246
2009-03-12 06:29:59 -0.138925 0.244572 -0.230743
Quello che non posso fare è creare un nuovo campo, come avg_close e assegnare ad esso. Idealmente mi piacerebbe fare qualcosa di simile al seguente:
dftst [:, 'avg_close'] = pandas.rolling_mean (dftst.ix [:, :: 2], 5)
Anche se scambiare la i livelli della mia MultiIndex, non riesco a farlo funzionare:
dftst = dftst.swaplevel(1,0,axis=1)
print dftst['close']
ticker AAPL GOOG GS
2009-03-01 06:29:59 1.178557 -0.505672 -0.336645
2009-03-02 06:29:59 0.234305 0.581429 -0.232252
2009-03-03 06:29:59 -0.734798 0.117810 1.658418
2009-03-04 06:29:59 -1.555033 -0.298322 0.127408
2009-03-05 06:29:59 0.244102 -1.030041 -0.562039
2009-03-06 06:29:59 -0.297454 1.150564 -1.930883
2009-03-07 06:29:59 0.818910 -0.905296 1.219946
2009-03-08 06:29:59 0.586816 0.965242 0.928546
2009-03-09 06:29:59 -0.357693 0.071455 0.072956
2009-03-10 06:29:59 0.651803 -0.685937 0.805779
2009-03-11 06:29:59 0.569802 -0.062447 -1.349261
2009-03-12 06:29:59 -1.886335 0.205778 -0.864273
dftst['avg_close'] = pandas.rolling_mean(dftst['close'], 3)
----> 1 dftst['avg_close'] = pandas.rolling_mean(dftst['close'], 3)
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in
__setitem__(self, key, value) 2041 else: 2042 # set column
-> 2043 self._set_item(key, value) 2044 2045 def _boolean_set(self, key, value):
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in
_set_item(self, key, value) 2077 """ 2078 value = self._sanitize_column(key, value)
-> 2079 NDFrame._set_item(self, key, value) 2080 2081 def insert(self, loc, column, value):
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in
_set_item(self, key, value)
544
545 def _set_item(self, key, value):
--> 546 self._data.set(key, value)
547 self._clear_item_cache()
548
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in set(self, item, value)
951 except KeyError:
952 # insert at end
--> 953 self.insert(len(self.items), item, value)
954
955 self._known_consolidated = False
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in insert(self, loc, item, value)
963
964 # new block
--> 965 self._add_new_block(item, value, loc=loc)
966
967 if len(self.blocks) > 100:
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in
_add_new_block(self, item, value, loc)
992 loc = self.items.get_loc(item)
993 new_block = make_block(value, self.items[loc:loc+1].copy(),
--> 994 self.items)
995 self.blocks.append(new_block)
996
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in make_block(values, items, ref_items)
463 klass = ObjectBlock
464
--> 465 return klass(values, items, ref_items, ndim=values.ndim)
466
467 # TODO: flexible with index=None and/or items=None
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in
__init__(self, values, items, ref_items, ndim)
30 if len(items) != len(values):
31 raise AssertionError('Wrong number of items passed (%d vs %d)'
---> 32 % (len(items), len(values)))
33
34 self._ref_locs = None
AssertionError: Wrong number of items passed (1 vs 3)
Se le mie colonne non erano MultiIndex, ho potuto assegnare nel seguente modo:
start_date = datetime.datetime(2009,3,1,6,29,59)
r = pandas.date_range(start_date, periods=12)
cols = ['AAPL', 'GOOG', 'GS']
dat = np.random.randn(12, 3)
dftst2 = pandas.DataFrame(dat, columns=cols, index=r)
print dftst2
AAPL GOOG GS
2009-03-01 06:29:59 2.476787 2.386037 -0.777566
2009-03-02 06:29:59 -0.820647 1.006159 -0.590240
2009-03-03 06:29:59 0.433960 0.104458 0.282641
2009-03-04 06:29:59 0.300190 -0.300786 -1.780412
2009-03-05 06:29:59 -0.247919 1.616572 1.145594
2009-03-06 06:29:59 -0.779130 0.695256 0.845819
2009-03-07 06:29:59 0.572073 0.349394 -3.557776
2009-03-08 06:29:59 2.019885 0.358346 1.350812
2009-03-09 06:29:59 0.472328 -0.334223 -0.605862
2009-03-10 06:29:59 -1.570479 0.410808 0.616515
2009-03-11 06:29:59 1.177562 -0.240396 -2.126951
2009-03-12 06:29:59 0.311566 -1.743213 0.382617
Per aggiungere un campo, in base a un altro campo, ho può fare quanto segue:
dftst2['GOOG_avg'] = pandas.rolling_mean(dftst2['GOOG'], 3)
print dftst2
AAPL GOOG GS GOOG_avg
2009-03-01 06:29:59 2.476787 2.386037 -0.777566 NaN
2009-03-02 06:29:59 -0.820647 1.006159 -0.590240 NaN
2009-03-03 06:29:59 0.433960 0.104458 0.282641 1.165551
2009-03-04 06:29:59 0.300190 -0.300786 -1.780412 0.269944
2009-03-05 06:29:59 -0.247919 1.616572 1.145594 0.473415
2009-03-06 06:29:59 -0.779130 0.695256 0.845819 0.670347
2009-03-07 06:29:59 0.572073 0.349394 -3.557776 0.887074
2009-03-08 06:29:59 2.019885 0.358346 1.350812 0.467666
2009-03-09 06:29:59 0.472328 -0.334223 -0.605862 0.124506
2009-03-10 06:29:59 -1.570479 0.410808 0.616515 0.144977
2009-03-11 06:29:59 1.177562 -0.240396 -2.126951 -0.054604
2009-03-12 06:29:59 0.311566 -1.743213 0.382617 -0.524267
Ho provato con un oggetto del pannello, ma finora non hanno trovato un modo rapido per aggiungere un campo dove ho colonne MultiIndex, idealmente l'altro livello delle colonne sarebbe stata trasmessa. Mi scuso se ci sono stati altri post che rispondono a questa domanda. Qualsiasi suggerimento sarebbe molto apprezzato.
grazie per questo post, ho trovato un modo per farlo con gli oggetti del Pannello. Sembra, tuttavia, che ci siano parecchie cose chiave che non riesco a fare con gli oggetti del Pannello. Chiederò alcune domande specifiche del panel in un altro post. grazie ancora! – granders19