Python: Parsing and grouping filenames in directory


I'm pretty new to python, but I have lots of experience with MATLAB & C.

What I need to do it parse the filenames of files in a particular directory, separate them into groups according to the fields within the file names, and perform operations within these groups.

Specifically, the filenames are:


where that '-x-' has been purposely inserted as the field divider. I need to do operations on every group of files that shares the same PROJECT-x-SUBJECT-x-SESSION component.

<em><strong></strong></em>__<em>_</em>____<em>My best attempt follows: <strong></strong></em>__<em>_</em>__<em>_</em>__

I can parse each of the files one at a time by:

dirList=os.listdir(directory) for fname in dirList: # kill extension ext = os.path.splitext(fname) # get the 4 fields labels=ext[0].split('-x-') PROJECT_list.append(labels[0]) SUBJECT_list.append(labels[1]) ...

... which reflects this only idea I have had on how to organize this stuff: by creating 4 lists and appending to them for each filename.

Then with my 4 (ordered?) lists, I could then call something like:

from collections import Counter c=Counter(SESSION_list) list(c)

<em>Then at least I have a unique list of SESSION names</em>

Suggestions? I could go on, but since I really just need a starting point, I think that this is sufficient.

Thanks, guys.


You can use defaultdict to make a dictionary that contains lists:

from collections import defaultdict groups = defaultdict(list) for filename in os.listdir(directory): basename, extension = os.path.splitext(filename) project, subject, session, ftype = basename.split('-x-') groups[session].append(filename)

Now, groups contains a mapping between session names and filenames.


How about using a defaultdict to group filenames, glob to find the appropriate files, and fileinput to read lines from all files with the same key. (untested)

import os from glob import glob import fileinput from collections import defaultdict filenames = glob('*-x-*') dd = defaultdict(list) for filename in filenames: name, ext = os.path.splitext(filename) dd[tuple(name.split('-x-')[:3])].append(filename) for key, fnames in dd.iteritems(): for line in fileinput.FileInput(fnames): pass # do something with lines from files with same key


  • How to read file names from harddisk in t-sql
  • Custom dialog hasn't divider under the title
  • Dotted line in android [duplicate]
  • How to disable or remove DirectoryListingModule in IIS to prevent HTTP 405 error
  • How to set the list divider of an MultiAutoCompleteTextView
  • How to send params in url query_string in Symfony?
  • Angular2 & SystemJS : Cannot find module while building a moduleLoader
  • f:param to composite components
  • Instanciate service on startup in Angular2
  • C# COM Component Fails To Read Config When Loaded Into An Unmanaged C++ App
  • Do I have to inject the service in the unit test if I cal testBed.get() previously?
  • Emit new line in Inno Setup preprocessor
  • Linux command line : edit hacked index files
  • Doctrine2 inverse persistance not working in nested forms
  • How to get month name with year and list of years between two Date
  • SF2 Functional tests : “Resetting the container is not allowed when a scope is active”
  • Efficient User-Agent Regex to find Safari in Python
  • In C what exactly happens if i use () to initialize a double dimension array instead of the {}?
  • Google Places API - Find a company's CID and LRD
  • Spring: No transaction manager has been configured
  • Updating both a ConcurrentHashMap and an AtomicInteger safely
  • Angular2 component view does not update on value change via method
  • AndEngine Applying Transparancy to AndEngine View
  • How can I enlarge video fullscreen without the affected interface project in as3?
  • Typescript - Unable to get 'import' statement to function
  • How to avoid particles glitching together in an elastic particle collision simulator?
  • Disabling Alt-F4 on a Win Forms NotifyIcon
  • Updating server-side rendering client-side
  • Running a C# exe file
  • Apache 2.4 - remove | delete | uninstall
  • Cannot Parse HTML Data Using Android / JSOUP
  • KeystoneJS: Relationships in Admin UI not updating
  • Hits per day in Google Big Query
  • How do you join a server to an Active Directory (domain)?
  • Understanding cpu registers
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Checking variable from a different class in C#
  • Running Map reduces the dimensions of the matrices