50720

reading tab-delimited data without header in pandas

Question:

I'm having trouble using pandas to open tab-delimited data without headers.

My test data (actually contains 200 lines, of which I am showing the first 10):

Tag19184 CTAAC hffef 1 a 36 - chr1 10006 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10012 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10018 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10024 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10030 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10036 0 36M 36 Tag19184 CTAAC hffef 1 a 36 - chr1 10042 0 36M 36 Tag20198 CTAAC hffef 1 a 36 - chr1 10048 0 36M 36 Tag20198 CTAAC hffef 1 a 36 - chr1 10054 0 36M 36 Tag45093 CTAAC hffef 1 a 36 - chr1 10060 0 36M 36

My code:

import pandas as pd df = pd.read_csv('in_test.txt',sep='\t',header=None) print df

However, I get the following output, which I don't think I can use to further process data (?):

<class 'pandas.core.frame.DataFrame'> Int64Index: 200 entries, 0 to 199 Data columns: X.1 200 non-null values X.2 200 non-null values X.3 200 non-null values X.4 200 non-null values X.5 200 non-null values X.6 200 non-null values X.7 200 non-null values X.8 200 non-null values X.9 200 non-null values X.10 200 non-null values X.11 200 non-null values X.12 200 non-null values dtypes: int64(5), object(7)

The <a href="http://pandas.pydata.org/pandas-docs/dev/io.html" rel="nofollow">tutorial here</a> suggests that print df should just give me the corresponding data frame. What am I doing wrong?

Answer1:

I think you are getting the it read correctly, but:

<ol><li>See: <a href="https://stackoverflow.com/questions/21482546/change-pandas-0-13-0-print-dataframe-to-print-dataframe-like-in-earlier-version" rel="nofollow">change pandas 0.13.0 "print dataframe" to print dataframe like in earlier versions</a>, this is what pandas do in the older versions. So, update will solve it.</li> <li>You can use ipython notebook, where DataFrames will show up as HTML tables.</li> <li>You can use df.head(5) (similar to r's head) to get the first a few rows just to make sure your DataFrame is correct.</li> </ol>

Recommend

  • How to collect samples in multiple csv files
  • Paste a row from a dataframe to match the length of rows of another dataframe
  • Creating one dataframe from another (using pivot)
  • How to change the format of date in a dataframe?
  • Pandas Read CSV with string delimiters via regex
  • Pandas `agg` to list, “AttributeError / ValueError: Function does not reduce”
  • How to (re)name an empty column header in a pandas dataframe without exporting to csv
  • How to remove just the index name and not the content in Pandas multiindex data frame
  • psycopg2.ProgrammingError: syntax error at or near “stdin” error when trying to copy_from redshift
  • pandas parse csv with left and right quote chars
  • Plot a table with R
  • How to make Plotly chart with year mapped to line color and months on x-axis
  • jQuery timepicker 'Object does not support this property or method'
  • How to save dynamically created textboxes and their values
  • How to filter on year and quarter in pandas
  • Color time-series based on column values in pandas
  • get_absolute_url with parameters
  • SPARQL date range
  • Magento site down due to mysql error General error: 1030 Got error -1 from storage engine
  • pip in virtualenv gets ConnectTimeoutError
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Angular2 component view does not update on value change via method
  • CakePHP ACL tutorial initDB function warnings
  • Run multiple queries from 1 SQL file showing result in multiple tables
  • saving file generated by TCPDF
  • How to view images from protected folder with php?
  • Installed module is empty
  • Textfile Structure (tables)
  • Error when parsing timestamp with pandas read_csv
  • Can I check if a recipient has an automatic reply before I send an email?
  • vba code to select only visible cells in specific column except heading
  • Why winpcap requires both .lib and .dll to run?
  • align graphs with different xlab
  • Return words with double consecutive letters
  • SetUp method failed while running tests from teamcity
  • Python: how to group similar lists together in a list of lists?
  • Understanding cpu registers
  • Busy indicator not showing up in wpf window [duplicate]
  • Reading document lines to the user (python)
  • Python/Django TangoWithDjango Models and Databases