vendredi 21 octobre 2016

Efficiently creating lists in Python due to dynamic variable

I'm using Keras to build an LSTM recurrent neural network. My code is working well, but could do with a serious refactoring. I am forecasting time series values, and depending on the window-size I want to predict, I end up writing code that seems far too specific to that window-size, i.e. it is hard to cater for lots of different sizes.

I split my dataset into train & test sets

train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

If I want to predict 5 timesteps ahead then my predictions, stored in the variable output, will take the form [[t+6, t+7, t+8, t+9, t+10], [t+7, t+8, t+9, t+10, t+11]], i.e.

prediction 1    [t+6,   t+7,    t+8,    t+9,    t+10]
prediction 2    [t+7,   t+8,    t+9,    t+10,   t+11]
prediction 3    [t+8,   t+9,    t+10,   t+11,   t+12]
prediction 4    [t+9,   t+10,   t+11,   t+12,   t+13]
prediction 5    [t+10,  t+11,   t+12,   t+13,   t+14]

Now if I want to get these values back in a logical sequence, i.e. t+6, t+7, t+8,...,t+14 I am using this code

output_plot = np.array([])
output_plot = np.append(output_plot, output[0][0])
output_plot = np.append(output_plot, np.mean([output[0][1], output[1][0]]))
output_plot = np.append(output_plot, np.mean([output[0][2], output[1][1], output[2][0]]))
output_plot = np.append(output_plot, np.mean([output[0][3], output[1][2], output[2][1], output[3][0]]))

for i in range (len(output) - predict_steps + 1):
    tmp = np.mean([output[i][4], output[i+1][3], output[i+2][2], output[i+3][1], output[i+4][0]])
    output_plot = np.append(output_plot, tmp)

My problem arises when I want to extend the prediction window out to say 10 timesteps. Then I manually extend the preceding code as follows

output_plot = np.array([])
output_plot = np.append(output_plot, output[0][0])
output_plot = np.append(output_plot, np.mean([output[0][1], output[1][0]]))
output_plot = np.append(output_plot, np.mean([output[0][2], output[1][1], output[2][0]]))
output_plot = np.append(output_plot, np.mean([output[0][3], output[1][2], output[2][1], output[3][0]]))
output_plot = np.append(output_plot, np.mean([output[0][4], output[1][3], output[2][2], output[3][1], output[4][0]]))
output_plot = np.append(output_plot, np.mean([output[0][5], output[1][4], output[2][3], output[3][2], output[4][1], output[5][0]]))
output_plot = np.append(output_plot, np.mean([output[0][6], output[1][5], output[2][4], output[3][3], output[4][2], output[5][1], output[6][0]]))
output_plot = np.append(output_plot, np.mean([output[0][7], output[1][6], output[2][5], output[3][4], output[4][3], output[5][2], output[6][1], output[7][0]]))
output_plot = np.append(output_plot, np.mean([output[0][8], output[1][7], output[2][6], output[3][5], output[4][4], output[5][3], output[6][2], output[7][1], output[8][0]]))


for i in range (len(output) - predict_steps + 1):
    tmp = np.mean([output[i][9], output[i+1][8], output[i+2][7], output[i+3][6], output[i+4][5], output[i+5][4], output[i+6][3], output[i+7][2], output[i+8][1], output[i+9][0]])
    output_plot = np.append(output_plot, tmp)

While this works, it is horrendously inefficient. How can I best refactor these steps to make the code more amenable to a wider range of prediction windows? Also, my question title could do with some improvement, so please edit away!

Aucun commentaire:

Enregistrer un commentaire