13591

Python/Pandas - ValueError: Index contains duplicate entries, cannot reshape

I have a dataframe called 'bal'. It looks like this:

ano id unit period business_id 9564 2012 302 sdasd anual 9564 2011 303 sdasd anual 2361 2013 304 sdasd anual 2361 2012 305 sdasd anual ...

I'm running the following code on it:

bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

My intention is to turn that into something like this:

ano id unit period 2006 2007 2008 2009 2010 \ business_id 72 2013 774 sdasd anual NaN NaN NaN NaN NaN 72 2012 775 sdasd anual NaN NaN NaN NaN NaN 74 2012 1120 sdasd anual NaN NaN NaN NaN NaN 119 2013 875 sdasd anual NaN NaN NaN NaN NaN 119 2012 876 sdasd anual NaN NaN NaN NaN NaN ...

When I that code, I get this error:

ValueError: Index contains duplicate entries, cannot reshape

So to avoid duplicates, I added a drop_duplicates line:

bal=bal.drop_duplicates() bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

When I run the code, voilá, I get the same problem:

ValueError: Index contains duplicate entries, cannot reshape

Am I doing something wrong or misunderstanding something?

<strong>EDIT</strong>

bal is a dataframe I'm creating out of a SQL using the following code:

bal=pd.read_sql('select * from table;',connection).set_index('business_id')[['ano','id','unit','period']]

The weird thing is that if I limit the SQL query it works fine:

bal=pd.read_sql('select * from table limit 1000;',connection).set_index('business_id')[['ano','id','unit','period']]

I thought that the problem could be related to the fact that the index has a lot of duplication (as you can see in that example above). However if I print(bal.head(4)) in this limited bal it looks exactly the same as the one you can see above, with indexes that repeat.

Answer1:

<strong>UPDATE2:</strong>

qry = "select distinct business_id,ano,id,unit,period from table where period='anual'" bal=pd.read_sql(qry, connection, index_col=['business_id'])

assume we get the following DF (still with duplicated values in the ano column):

In [167]: bal Out[167]: ano id unit period business_id 9564 2012 302 sdasd anual 9564 2012 299 sdasd anual 9564 2011 303 sdasd anual 2361 2013 304 sdasd anual 2361 2012 305 sdasd anual

we can do this:

In [169]: bal.join(bal.pivot_table(index=bal.index, columns='ano', values='id', aggfunc='first')) Out[169]: ano id unit period 2011 2012 2013 business_id 2361 2013 304 sdasd anual NaN 305.0 304.0 2361 2012 305 sdasd anual NaN 305.0 304.0 9564 2012 302 sdasd anual 303.0 302.0 NaN 9564 2012 299 sdasd anual 303.0 302.0 NaN 9564 2011 303 sdasd anual 303.0 302.0 NaN

<strong>UPDATE:</strong>

consider the following sample DF:

In [161]: bal Out[161]: ano id unit period business_id 9564 2012 302 sdasd anual 9564 2012 299 sdasd anual # i've intentionally added this row with duplicated `ano` 9564 2011 303 sdasd anual 2361 2013 304 sdasd anual 2361 2012 305 sdasd anual

reproducing your error:

In [162]: bal.pivot(columns='ano', values='id') ... skipped ... ValueError: Index contains duplicate entries, cannot reshape

<strong>Old answer:</strong>

Is that what you want?

In [144]: bal.join(bal.pivot(columns='ano', values='id')) Out[144]: ano id unit period 2011 2012 2013 business_id 2361 2013 304 sdasd anual NaN 305.0 304.0 2361 2012 305 sdasd anual NaN 305.0 304.0 9564 2012 302 sdasd anual 303.0 302.0 NaN 9564 2011 303 sdasd anual 303.0 302.0 NaN

Answer2:

Consider using unstack() and merge() - this will take care of the duplicate issue.

# sample data data = {"business_id":[9564, 9564, 2361, 2361], "ano":[2012, 2011, 2013, 2012], "id":[302,303,304,305], "unit":["sdasd"]*4, "period":["anual"]*4} df = pd.DataFrame(data) # include ano for MultiIndex df.set_index(["business_id","ano"], inplace=True) df id period unit business_id ano 9564 2012 302 anual sdasd 2011 303 anual sdasd 2361 2013 304 anual sdasd 2012 305 anual sdasd

Now unstack(), grab the id data, and merge(). The inner-most level is unstacked, which is why we added ano to the index above.

df.merge(df.unstack()['id'], right_index=True, left_index=True) id period unit 2011 2012 2013 business_id ano 9564 2012 302 anual sdasd 303.0 302.0 NaN 2011 303 anual sdasd 303.0 302.0 NaN 2361 2013 304 anual sdasd NaN 305.0 304.0 2012 305 anual sdasd NaN 305.0 304.0

Recommend

  • regular expression ruby phone number
  • x86 multi-byte NOP and instruction prefix
  • How do I get bash to redirect stderr into a >( command substitution )?
  • How can I extract columns from a fixed-width format in Perl?
  • TypeScript 2.0 throws errors from excluded files?
  • Sending signal to thread from annother non-process and logging stack not happening
  • Glassfish/Wildfly not failing if @Resource cannot be resolved?
  • Individual Preferences for each item in a ListView?
  • overhead of reserving address space using mmap
  • How do i convert a List to List in c#
  • Python: include entries of a dictionary in the local namespace of a function
  • EventLog logs in Application, even though set to another log
  • Have anyone succeeded in ThinApp'ing Visual Studio?
  • WPF MVVM cancel window closing
  • git push origin master denied to user X where x is NOT the user in the local git config
  • Swift string variables localization
  • How to use ResourceDictionary in Windows Phone class library project
  • How can I get the choice “H2” back in the H2 consol?
  • How do I shift the decimal place in Python?
  • how to avoid repetitive constructor in children
  • custom UITableViewCell with image for highlighting
  • How to get Eclipse Oxygen to run on Java 9
  • Meteor: Do Something On Email Verification Confirmation
  • Django: Count of Group Elements
  • Control modification in presentation layer
  • Fetching methods from BroadcastReceiver to update UI
  • Join two tables and save into third-sql
  • Where to put my custom functions in Wordpress?
  • How to model a transition system with SPIN
  • Symfony2: How to get request parameter
  • ORA-29908: missing primary invocation for ancillary operator
  • GridView Sorting works once only
  • SVN: Merging two branches together
  • RestKit - RKRequestDelegate does not exist
  • Hibernate gives error error as “Access to DialectResolutionInfo cannot be null when 'hibernate.
  • WPF Applying a trigger on binding failure
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • Converting MP3 duration time
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize