25065

Transpose CSV data with awk (pivot transformation)

my CSV data looks like this:

Indicator;Country;Value no_of_people;USA;500 no_of_people;Germany;300 no_of_people;France;200 area_in_km;USA;18 area_in_km;Germany;16 area_in_km;France;17 proportion_males;USA;5.3 proportion_males;Germany;7.9 proportion_males;France;2.4

I want my data to look like this:

Country;no_of_people;area_in_km;proportion_males USA;500;18;5.3 Germany;300;16;7.9 France;200;17;2.4

There are more Indicators and more countries than listed here.

Pretty large files (number of rows something with 5 digits). Looked around for some transpose threads, but nothing matched my situation (also I'm quite new to awk, so I couldn't change the code I found to fit my data).

Thanks for your help. Regards Ad

Answer1:

Using awk and maintaining the order of output:

awk -F\; ' NR>1 { if(!($1 in indicators)) { indicator[++types] = $1 }; indicators[$1]++ if(!($2 in countries)) { country[++num] = $2 }; countries[$2]++ map[$1,$2] = $3 } END { printf "%s;" ,"Country"; for(ind=1; ind<=types; ind++) { printf "%s%s", sep, indicator[ind]; sep = ";" } print ""; for(coun=1; coun<=num; coun++) { printf "%s", country[coun] for(val=1; val<=types; val++) { printf "%s%s", sep, map[indicator[val], country[coun]]; } print "" } }' file Country;no_of_people;area_in_km;proportion_males USA;500;18;5.3 Germany;300;16;7.9 France;200;17;2.4

Answer2:

If the number of Ind fields is fixed, you can do:

awk 'BEGIN{FS=OFS=";"} {a[$2,$1]=$3; count[$2]} END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file

Explanation

    <li>BEGIN{FS=OFS=";"} set input and output field separator as semicolon.</li> <li>{a[$2,$1]=$3; count[$2]} get list of countries in count[] array and values of each Ind on a["country","Ind"] array.</li> <li>END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]} print the summary of the values.</li> </ul>

    Output

    $ awk 'BEGIN{FS=OFS=";"} {a[$2,$1]=$3; count[$2]} END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file France;200;17;2.4 Germany;300;16;7.9 USA;500;18;5.3 <hr>

    Update

    unfortunately, the number of Indicators is not fixed. Also, they are not named like "Ind1", "Ind2" etc. but are just strings.' I clarified my question.

    $ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file proportion_males no_of_people area_in_km France 2.4 200 17 Germany 7.9 300 16 USA 5.3 500 18

    To have ; separated, do replace each space with ;:

    $ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file | tr ' ' ';' proportion_males;no_of_people;area_in_km; France;2.4;200;17; Germany;7.9;300;16; USA;5.3;500;18;

Recommend

  • Matlab code optimization and removing loops [closed]
  • Multiply two data frames with similar index in python pandas
  • Find by regex and replace match to lowercase in Bash
  • Does Perl currently (5.8 and 5.10) make any promises about the order alternations will be used?
  • Multiprocessing in Python: how to implement a loop over “apply_async” as “map_async” using a callbac
  • overlapping appointments using the entity framework
  • Remove characters after a specific character in column
  • Is an if-let or a normal if condition better?
  • Parallel sieve of Eratosthenes - Java Multithreading
  • In Akka, is ActorContext thread safe?
  • Facebook Messenger Platform - Detect user typing
  • matching similar elements in between two lists
  • Is it possible to use arbitrary image sizes in caffe?
  • Python cosine function precision [duplicate]
  • Calculate time difference in hh:mm:ss with simple javascript/jquery
  • Get specific string
  • MVC3 Razor - ListBox pre-select not working
  • Word Open XML Mail Merge
  • Record samples being played with OpenAL
  • Checking free space on FTP server
  • Retrieving value from sql ExecuteScalar()
  • Spray.io: When (not) to use non-blocking route handling?
  • JavaScriptCore crash on iOS9
  • WinForms: two way TextBox problem
  • vba code to select only visible cells in specific column except heading
  • VBA Convert delimiter text file to Excel
  • Do I've to free mysql result after storing it?
  • Calling of Constructors in a Java
  • SVN: Merging two branches together
  • Traverse Array and Display in markup
  • Buffer size for converting unsigned long to string
  • Acquiring multiple attributes from .xml file in c#
  • Why can't I rebase on to an ancestor of source changesets if on a different branch?
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • How can I remove ASP.NET Designer.cs files?
  • Append folder name and increment by 1 using batch script
  • Sorting a 2D array using the second column C++
  • Android Heatmap on canvas or ImageView
  • Conditional In-Line CSS for IE and Others?
  • java string with new operator and a literal