89576

matrix operation using numpy pandas

Question:

I am trying to test same example given on <a href="https://stackoverflow.com/questions/30293881/matrix-search-operation-using-numpy-and-pandas" rel="nofollow">Matrix search operation using numpy and pandas</a>

on 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:32:08 UTC 2012 i686 i686 i686 GNU/Linux with python 2.7.3, numpy 1.9.2 and pandas 0.15.2

For this small exammple :

ds1 = [[ 4, 13, 6, 9], [ 7, 12, 5, 7], [ 7, 0, 4, 22], [ 9, 8, 12, 0]] ds2 = [[ 4, 1], [ 5, 3], [ 6, 1], [ 7, 2], [ 8, 2], [ 9, 3], [12, 1], [13, 2], [22, 3]] ds1= pd.DataFrame(ds1) ds2= pd.DataFrame(ds2) C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0]) print C

gives wrong result

(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14]), array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]))

<strong>Expected output is</strong>

output = [[1, 2, 1, 3], [2, 1, 3, 2], [2, 0, 1, 3], [3, 2, 1, 0]]

and while working with large matrix values

ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None) ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ') C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0]) print C

it gives

(1000, 1001) (4000, 2) (array([], dtype=int32),)

instead of the replaced matrix value.

Any suggestion would be much helpful.

Answer1:

I agree with @Anthony Lethuillier 's answer and I just guess the IndexError may be caused by different version. It seem's in @nlper 's situation, C is (array([], dtype=int32),) which means nothing found in ds1.values.ravel()[:, None] == ds2.values[:, 0], and this is obviously different from @Anthony 's. Nothing found, thus C is a tuple which only contains 1 element, so an IndexError is triggered when you accessing C[1].

This also works on my machine so I don't know why C is empty. I recommend you to print ds1.values.ravel() and ds2.values[:, 0] in detail and see why nothing equals.

Besides, I use python 2.7.9, numpy 1.9.2 and pandas 0.16.1

Answer2:

The second array in C (array([ 0, 7, 2, 5, 3, 6, 1, 3, 3, 0, 8, 5, 4, 6]) gives you the positions of the values you want to replace in ds1.

So you have to replace the values in ds1.values.ravel() with the index of the first array of C with the values in ds2 with the index of the second array of C

Here is the code that gives the right output for the small example :

import pandas as pd import numpy as np ds1 = [[ 4, 13, 6, 9], [ 7, 12, 5, 7], [ 7, 0, 4, 22], [ 9, 8, 12, 0]] ds2 = [[ 4, 1], [ 5, 3], [ 6, 1], [ 7, 2], [ 8, 2], [ 9, 3], [12, 1], [13, 2], [22, 3]] ds1= pd.DataFrame(ds1) ds2= pd.DataFrame(ds2) C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0]) ds1_new = ds1.values.ravel() ds1_new[C[0]]=ds2.values[C[1], 1] ds1_new = ds1_new.reshape(4,4) print(ds1_new) ds1 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/9527bad750fbe75e072c/raw/ds1', sep=' ', header=None) ds2 = pd.read_table('https://gist.githubusercontent.com/karimkhanp/1692f1f76718c35e939f/raw/6f6b348ab0879b702e1c3c5e362e9d2062e9e9bc/ds2', header=None, sep=' ') C = np.where(ds1.values.ravel()[:, None] == ds2.values[:, 0]) ds1_new = ds1.values.ravel() ds1_new[C[0]]=ds2.values[C[1], 1] ds1_new = ds1_new.reshape(1000,1001) print(ds1_new)

Gives the following output :

[[1 2 1 3] [2 1 3 2] [2 0 1 3] [3 2 1 0]] [[ 1. 1. 1. ..., 1. 1. nan] [ 1. 1. 1. ..., 0. 1. nan] [ 1. 0. 1. ..., 1. 0. nan] ..., [ 1. 1. 1. ..., 0. 1. nan] [ 1. 0. 1. ..., 1. 1. nan] [ 1. 1. 1. ..., 0. 1. nan]]

Recommend

  • Why is my datagrid only populating one row?
  • How can I loop through an array and populate a datagrid?
  • More numpy way of iterating through the 'orthogonal' diagonals of a 2D array
  • Iterate Go map get index
  • Newtonsoft JSON- Conversion to/from DataSet causes Decimal to become Double?
  • Speed up for loop with numpy
  • Left outer join not emitting null values when joining two streams in spark structured streaming 2.3.
  • Using meshgrid to convert X,Y,Z triplet to three 2D arrays for surface plot in matplotlib
  • How to add new index numbers to the upsampled data while preserving the orginal indices one
  • Transpose table then set and rename index
  • How to filter on year and quarter in pandas
  • Color time-series based on column values in pandas
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Make new pandas columns based on pipe-delimited column with possible repeats
  • xtable package: Skipping some rows in the output
  • Error when parsing timestamp with pandas read_csv
  • Meteor: Do Something On Email Verification Confirmation
  • Fetching methods from BroadcastReceiver to update UI
  • WinForms: two way TextBox problem
  • Fill an image in a square container while keeping aspect ratio
  • Join two tables and save into third-sql
  • Where to put my custom functions in Wordpress?
  • Convert array of 8 bytes to signed long in C++
  • Rearranging Cells in UITableView Bug & Saving Changes
  • align graphs with different xlab
  • Return words with double consecutive letters
  • RestKit - RKRequestDelegate does not exist
  • Numpy divide by zero. Why?
  • php design question - will a Helper help here?
  • AngularJs get employee from factory
  • WPF Applying a trigger on binding failure
  • Benchmarking RAM performance - UWP and C#
  • Angular 2 constructor injection vs direct access
  • Understanding cpu registers
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Authorize attributes not working in MVC 4
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • Reading document lines to the user (python)
  • Python/Django TangoWithDjango Models and Databases