Is it preferable/safer/optimal to use lists or numpy arrays in kernels?
From the documentation, lists are listed as a supported type whereas arrays are not explicitly mentioned. Furthermore, to use the append_to_dataset()
method, documentation explicitly says the dataset must be a list. However, numpy arrays work on the kernel, and indeed have an append()
method. In contrast to this, the set_dataset()
documentation mentions datasets can be numpy arrays but says nothing of lists. More critically, the "multi-dimensional slicing" of mutate_dataset()
completely fails (yields an error) if the dataset was set using a multi-dimensional list (or more accurately, a list of lists) where as it works as I would have expected if the dataset was created from a numpy array (a true multi-dimensional object).
On the other hand, vector math, the typical reason numpy arrays are often preferable to lists, does not work on the kernel. Then again, neither does list concatenation using +
.
All in all, I am getting mixed messages on which object I should default to in kernels. My impression is that documentation uses the words list and array rather interchangeably, other than the case of mutate_dataset()
. Which would generally be better to use?