30192

Question:
How can the min_itemsize for <strong>all</strong> string columns be changed in a hdf5 table? I don't know my dataframe structure during run time, and thus can not hardcode it.
Answer1:see the docs <a href="http://pandas.pydata.org/pandas-docs/dev/io.html#string-columns" rel="nofollow">here</a>.
The itemsize is created on the first append (and cannot be changed later). If min_itemsize
is not specified it will be the max length of strings in that append.
In [1]: df = DataFrame({ 'A' : ['foo','bar']})
In [2]: store = pd.HDFStore('test.h5',mode='w')
In [3]: store.append('df',df,min_itemsize=30)
In [4]: store.get_storer('df')
Out[4]: frame_table (typ->appendable,nrows->2,ncols->1,indexers->[index])
In [5]: store.get_storer('df').table
Out[5]:
/df/table (Table(2,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": StringCol(itemsize=30, shape=(1,), dflt='', pos=1)}
byteorder := 'little'
chunkshape := (1724,)
autoindex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False}
In [8]: store['df']
Out[8]:
A
0 foo
1 bar
In [6]: store.close()