Sort os.listdir files Python


If have downloaded several years of data stored in files with the following naming convention, year_day.dat. For example, the file named 2014_1.dat has the data for January 1, 2014. I need to read these data files ordered by day, 2014_1.dat, 2014_2.dat, 2014_3.dat until the end of the year. In the folder they are listed in that ordered BUT when I create a list of the files in the directory they are reordered 2014_1.dat, 2014_10.dat, 2014_100.dat, 2014_101.dat...2014.199.dat, 2014_2.dat. I think I need to use a sort function but how do I force it to sort the listed files by day so I can continue processing them? Here's the code so far:

import sys, os, gzip, fileinput, collections # Set the input/output directories wrkDir = "C:/LJBTemp" inDir = wrkDir + "/Input" outDir = wrkDir + "/Output" # here we go inList = os.listdir(inDir) # List all the files in the 'Input' directory print inList #print to screen reveals 2014_1.dat.gz followed by 2014_10.dat.gz NOT 2014_2.dat.gz HELP d = {} for fileName in inList: # Step through each input file readFileName = inDir + "/" + fileName with gzip.open(readFileName, 'r') as f: #call built in utility to unzip file for reading for line in f: city, long, lat, elev, temp = line.split() #create dictionary d.setdefault(city, []).append(temp) #populate dictionary with city and associated temp data from each input file collections.OrderedDict(sorted(d.items(), key=lambda d: d[0])) # QUESTION? why doesn't this work #now collect and write to output file outFileName = outDir + "/" + "1981_maxT.dat" #create output file in output directory with .dat extension with open(outFileName, 'w') as f: for city, values in d.items(): f.write('{} {}\n'.format(city, ' '.join(values))) print "All done!!" raw_input("Press <enter>") # this keeps the window open until you press "enter"


If you don't mind using third party libraries, you can use the <a href="https://github.com/SethMMorton/natsort" rel="nofollow">natsort</a> library, which was designed for exactly this situation.

import natsort inList = natsort.natsorted(os.listdir(inDir))

This should take care of all the numerical sorting without having to worry about the details.

You can also use the ns.PATH option to make the sorting algorithm path-aware:

from natsort import natsorted, ns inList = natsorted(os.listdir(inDir), alg=ns.PATH) <hr />

Full disclosure, I am the natsort author.


Try this if all of your files start with '2014_':

sorted(inList, key = lambda k: int(k.split('_')[1].split('.')[0]))

Otherwise take advantage of tuple comparison, sorting by the year first then the second part of your file name.

sorted(inList, key = lambda k: (int(k.split('_')[0]), int(k.split('_')[1].split('.')[0])))


dict.items returns a list of (key, item) pair.

the key function is only using the first element (d[0] => key => city).

There's another problem: sorted returns a new copy of the list sorted, and does not sort the list inplace. Also the OrderedDict object is created and not assigned anywhere; Actually, you don't need to sort each time you append the item to the list.

Removing the ... sorted ... line, and replacing following line:

with open(outFileName, 'w') as f: for city, values in d.items(): f.write('{} {}\n'.format(city, ' '.join(values)))

with following will solve your problem:

with open(outFileName, 'w') as f: for city, values in d.items(): values.sort(key=lambda fn: map(int, os.path.splitext(fn)[0].split('_'))) f.write('{} {}\n'.format(city, ' '.join(values)))

BTW, instead of manually joining hard-coded separator /, use <a href="http://docs.python.org/2/library/os.path.html#os.path.join" rel="nofollow">os.path.join</a>:

inDir + "/" + fileName => os.path.join(inDir, fileName)