74599

Iterate over pandas dataframe columns containing nested arrays

<h3>Question</h3>

I hope you can help me with this issue,

I've this data below (Columns names whatever)

data=([['file0090', ([[ 84, 55, 189], [248, 100, 18], [ 68, 115, 88]])], ['file6565', ([[ 86, 58, 189], [24, 10, 118], [ 68, 11, 8]]) ]])

I need to iterate over columns 0 and 1 into a list in sort I can transform into a Dataframe with this output:

col0 col1 col2 col3 file0090 84 55 189 file0090 248 100 1 file0090 68 115 88 file6565 86 58 189 file6565 24 10 118 file6565 68 11 8

I've tested all dataframe iteration with iterrows, iteritems, items, and append into a list but the results always turn around the same output and I dont get how separate the items form these arrays

thank you in advance if you can help.


<h3>Answer1:</h3>

You can try this:-

data_f = [[i[0]]+j for i in data for j in i[1]] df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])

Output:-

col0 col1 col2 col3 file0090 84 55 189 file0090 248 100 1 file0090 68 115 88 file6565 86 58 189 file6565 24 10 118 file6565 68 11 8
<h3>Answer2:</h3>

You can do explode with a join after crreating another df from the series of lists:

df = pd.DataFrame(data).add_prefix('col') out = df.explode('col1').reset_index(drop=True) out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))

Adding another solution if the list structure is similar:

l = [*itertools.chain.from_iterable(data)] pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1]))) <hr /> col0 col_0 col_1 col_2 0 file0090 84 55 189 1 file0090 248 100 18 2 file0090 68 115 88 3 file6565 86 58 189 4 file6565 24 10 118 5 file6565 68 11 8
<h3>Answer3:</h3>

You can create a custom function to output the correct form of data.

from itertools import chain def transform(d): for l in d: *x, y = l yield list(map(lambda s: x+s, y)) df = pd.DataFrame(chain(*transform(data))) df 0 1 2 3 0 file0090 84 55 189 1 file0090 248 100 18 2 file0090 68 115 88 3 file6565 86 58 189 4 file6565 24 10 118 5 file6565 68 11 8

Timeit results of all the solutions:

# YOBEN_S's answer In [275]: %%timeit ...: s = pd.DataFrame(data).set_index(0)[1].explode() ...: df = pd.DataFrame(s.tolist(), index = s.index.values) ...: ...: 1.52 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Anky's answer In [276]: %%timeit ...: df = pd.DataFrame(data).add_prefix('col') ...: out = df.explode('col1').reset_index(drop=True) ...: out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_')) ...: ...: 3.71 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) #Dhaval's answer In [277]: %%timeit ...: data_f = [] ...: for i in data: ...: for j in i[1]: ...: data_f.append([i[0]]+j) ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3']) ...: ...: 712 µs ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #My answer In [280]: %%timeit ...: pd.DataFrame(chain(*transform(data))) ...: ...: 489 µs ± 8.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Using List comp of Dhaval's answer In [306]: %%timeit ...: data_f = [[i[0]]+j for i in data for j in i[1]] ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3']) ...: ...: 586 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) #Anky's 2nd solution In [308]: %%timeit ...: l = [*chain.from_iterable(data)] ...: pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1]))) ...: ...: 221 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
<h3>Answer4:</h3>

We can do explode with row the do it explode with column again

s = pd.DataFrame(data).set_index(0)[1].explode() df = pd.DataFrame(s.tolist(), index = s.index.values) df Out[396]: 0 1 2 file0090 84 55 189 file0090 248 100 18 file0090 68 115 88 file6565 86 58 189 file6565 24 10 118 file6565 68 11 8

来源:https://stackoverflow.com/questions/62284286/iterate-over-pandas-dataframe-columns-containing-nested-arrays

Recommend

  • Iterate over pandas dataframe columns containing nested arrays
  • post video to youtube
  • For loops in R and computational speed
  • MariaDb SQL Injection
  • Adjust View for keyboard appears when switching UITextField (Swift)
  • Outlook PropertyFrom MAPI Schema Property ID
  • Using Microsoft graph API to retrieve a specific attribute
  • /.git/hooks/: No such file or directory protocol error: expected control record on Mac osx
  • Bitwise color filter in MATLAB
  • How to get Featured image from the Post Link using Wordpress?
  • Get a class as javax.lang.model.element.Element in junit tests
  • Who should create view model instances in MvvmCross
  • Android: Compressing images creates black borders on left and top margin
  • How to force Delphi to use D8.bat instead of dx.bat to compile Java 1.8 bytecode into DEX bytecode
  • JavaFX TabPane System like in a browser
  • Split an Array into 3 arrays [duplicate]
  • How to set a dynamic form fields to dirty or touched with angular?
  • Why is this Animatable property being set again?
  • Binding json result in highcharts for asp.net mvc 4
  • PHP users local time
  • How to make a dependent dropdown in codeigniter
  • Bazel failed to include a external static library .a
  • Overwrite text file programmatically
  • MFMailComposer send email without presenting view
  • ASP.NET GridView throws: The version of SQL Server in use does not support datatype 'date'
  • Unable to run testNG tests from maven
  • playing mp3 from nsbundle
  • opencv deskewing a contour
  • Create an Office365 mailbox from within C# Web API method
  • What are advantages/disadvantages of using Selenium for Java vs .NET applications?
  • Regex not working in java 1.5
  • Separating definition/instantiation of template classes without 'extern'
  • How convert html to BBcode in C#
  • Python 3x- Compression Makes File Bigger :(
  • How can I ssh into a server that requires 2 password authentication using python's paramiko mod
  • How to turn off notice reporting in xampp?
  • Android: Unable to detect vertical plane
  • calling IO Operations from thread in ruby c extension will cause ruby to hang
  • Angular FormGroup won't update it's value immediately after patchValue or setValue
  • media foundation H264 decoder not working properly