27389

combining 2 text file and make a new one in python

Question:

I have 2 big text files like the following small examples. there are 2 files (major and minor). in both major and minor files there are 4 columns. in the major file the difference between 2nd and 3rd columns in 10000 and the difference between 2nd and 3rd columns is 32 or 31 or a number close to 31 but not so high.

small example of major file:

chr4 530000 540000 0.0 chr4 540000 550000 1719.0 chr4 550000 560000 0.0

small example of minor file:

chr4 295577 295608 12 chr4 323326 323357 10 chr4 548873 548904 32 chr4 548873 548904 20 chr4 549047 549078 32 chr4 549047 549078 20 chr4 549137 549168 32 chr4 549137 549168 20 chr4 549181 549212 32 chr4 549181 549212 20 chr4 549269 549300 22 chr4 549269 549300 381 chr4 549269 549300 67 chr4 549269 549300 89 chr4 549269 549300 95 chr4 549269 549300 124 chr4 549269 549300 149 chr4 549269 549300 87 chr4 549269 549300 33 chr4 549269 549300 65 chr4 549269 549300 68 chr4 549269 549300 190 chr4 549269 549300 20 chr4 549355 549386 32 chr4 549355 549386 20 chr4 549443 549474 16 chr4 705810 705841 10 chr4 846893 846924 28

I want to make a new text file in which there would be 4 columns. like the expected output:

expected output:

chr4 548873 548904 32 chr4 540000 550000 chr4 548873 548904 20 chr4 540000 550000 chr4 549047 549078 32 chr4 540000 550000 chr4 549047 549078 20 chr4 540000 550000 chr4 549137 549168 32 chr4 540000 550000 chr4 549137 549168 20 chr4 540000 550000 chr4 549181 549212 32 chr4 540000 550000 chr4 549181 549212 20 chr4 540000 550000 chr4 549269 549300 22 chr4 540000 550000 chr4 549269 549300 381 chr4 540000 550000 chr4 549269 549300 67 chr4 540000 550000 chr4 549269 549300 89 chr4 540000 550000 chr4 549269 549300 95 chr4 540000 550000 chr4 549269 549300 124 chr4 540000 550000 chr4 549269 549300 149 chr4 540000 550000 chr4 549269 549300 87 chr4 540000 550000 chr4 549269 549300 33 chr4 540000 550000 chr4 549269 549300 65 chr4 540000 550000 chr4 549269 549300 68 chr4 540000 550000 chr4 549269 549300 190 chr4 540000 550000 chr4 549269 549300 20 chr4 540000 550000 chr4 549355 549386 32 chr4 540000 550000 chr4 549355 549386 20 chr4 540000 550000 chr4 549443 549474 16 chr4 540000 550000

the first 4 columns are from the minor file and the last 3 columns are from the major file. looking at the expected output the number in the 2nd and 3rd columns (from minor file) are in the range of the same row but columns 6 and 7 (from major file) and 1st column is equal to the 5th column (in fact the 1st columns of both major and minor files). in fact I want to look for the rows in minor file in which the first column is equal to the 1st column of major file, also 2nd and 3rd columns of the same row (in minor file) must be in a range of 2nd and the 3rd columns in the major file. so in fact there are 3 conditions for every row in the minor file to be eligible to be included in the output file. and the last 3 columns are from the major file which fit the rows from minor file.

I am trying to do that in python and have made the following code but it does not return what I expected:

major = open("major.txt", 'rb') minor = open("minor.txt", 'rb') major_list = [] minor_list = [] for m in major: major_list.append(m) for n in minor: minor_list.append(n) final = [] for i in minor_list: for j in major_list if minor_list[i] == major_list[j] and minor_list[i+1] <= major_list[j+1] and minor_list[i+2] >= major_list[j+2]: final.append(i) with open('output.txt', 'w') as f: for item in final: f.write("%s\n" % item)

Answer1:

You should do something like this

final = [] for i, j in zip(minor_list, major_list): final.append(i, j)

Answer2:

Maybe its a typo in your code I can see that your missing a tab at your if minor_list[i]

final = [] for i in minor_list: for j in major_list if minor_list[i] == major_list[j] and minor_list[i+1] <= major_list[j+1] and minor_list[i+2] >= major_list[j+2]: final.append(i)

should be

final = [] for i in minor_list: for j in major_list if minor_list[i] == major_list[j] and minor_list[i+1] <= major_list[j+1] and minor_list[i+2] >= major_list[j+2]: final.append(i)

Answer3:

Do you HAVE to use Python for this? If you install "bedtools" in bash shell, this can be accomplished with the following line:

bedtools intersect -wa -wb -a minor.bed -b major.bed > intersected_file.bed

A few bioinformatics tools are linux/mac-only, so if you're going to be doing any amount of bioinformatics, it's worth learning how to script in shell.

Recommend

  • JOOQ nested condition
  • reduce/reduce conflicts using ocamlyacc
  • How to write string.Contains(someText) in expression Tree
  • How to get the index of element in the List in c#
  • Rails AREL .where statement
  • MySQL multiple IN conditions to subquery with same table
  • How to make R's read_csv2() recognise the text characters properly
  • Android Activity.onWindowFocusChanged doesn't get called from within TabHost
  • Excel's Macro-Recorder usage
  • C: Incompatible pointer type initializing
  • Groovy: Unexpected token “:”
  • How to write order and limit within cakephp joins array
  • CakePHP 2.0.4 - findBy magic methods with conditions
  • WPF - CanExecute dosn't fire when raising Commands from a UserControl
  • Replace value with Factor in r data.table
  • How to access EntityManager inside Entity class in EJB3
  • Repeat a vertical line on every page in Report Builder / SSRS
  • What is the “return” in scheme?
  • Deserializing XML into class C#
  • How to handle AllServersUnavailable Exception
  • vba code to select only visible cells in specific column except heading
  • VBA Convert delimiter text file to Excel
  • Do I've to free mysql result after storing it?
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Unanticipated behavior
  • Compare two NSDates in iPhone
  • Transpose CSV data with awk (pivot transformation)
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • What are the advantages and disadvantages of reading an entire file into a single String as opposed
  • python draw pie shapes with colour filled
  • Can't mass-assign protected attributes when import data from csv file
  • Django query for large number of relationships
  • Sorting a 2D array using the second column C++
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Reading document lines to the user (python)
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • How to push additional view controllers onto NavigationController but keep the TabBar?