Is this the right way to read lines from file and split them into words in Rust?



Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.


I've implemented the following method to return me the words from a file in a 2 dimensional data structure:

fn read_terms() -> Vec<Vec<String>> { let path = Path::new("terms.txt"); let mut file = BufferedReader::new(File::open(&path)); return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect(); }

Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?


You could instead read the entire file as a single String and then build a structure of references that points to the words inside:

use std::io::{self, Read}; use std::fs::File; fn filename_to_string(s: &str) -> io::Result<String> { let mut file = File::open(s)?; let mut s = String::new(); file.read_to_string(&mut s)?; Ok(s) } fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> { s.lines().map(|line| { line.split_whitespace().collect() }).collect() } fn example_use() { let whole_file = filename_to_string("terms.txt").unwrap(); let wbyl = words_by_line(&whole_file); println!("{:?}", wbyl) }

This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.

A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.


There is a shorter and more readable way of getting words from a text file.

use std::io::{BufRead, BufReader}; use std::fs::File; let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt")); for line in reader.lines() { for word in line.unwrap().split_whitespace() { println!("word '{}'", word); } }


  • How do I access fields of a *mut libc::FILE?
  • Is it possible to create a macro to implement builder pattern methods?
  • Why do I get incorrect values when implementing HMAC-SHA256?
  • How to prevent a value from being moved?
  • Java catching exceptions and subclases
  • How to read data from socket connection - android
  • How could I write a BsonDocument object into a file, and read it again, using Java
  • Runtime error in UVA Online Judge [closed]
  • Why does access(2) check for real and not effective UID?
  • Checking free space on FTP server
  • R - Combining Columns to String Based on Logical Match
  • Read text file and split every line in MSBuild
  • How to redirect a user to a different server and include HTTP basic authentication credentials?
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • script to move all files from one location to another location
  • ILMerge & Keep Assembly Name
  • Can I make an Android app that runs a web view in Chrome 39?
  • Symfony2: How to get request parameter
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Return words with double consecutive letters
  • AT Commands to Send SMS not working in Windows 8.1
  • Circular dependency while pushing http interceptor
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • how to add data labels for bar graph in matlab
  • Linker errors when using intrinsic function via function pointer
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Load html files in TinyMce
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • coudnt use logback because of log4j
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • LevelDB C iterator
  • Linking SubReports Without LinkChild/LinkMaster
  • apache spark aggregate function using min value
  • Running Map reduces the dimensions of the matrices
  • Sorting a 2D array using the second column C++
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • Android Heatmap on canvas or ImageView
  • java string with new operator and a literal