54219

Is this the right way to read lines from file and split them into words in Rust?

Question:

<blockquote>

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.

</blockquote>

I've implemented the following method to return me the words from a file in a 2 dimensional data structure:

fn read_terms() -> Vec<Vec<String>> { let path = Path::new("terms.txt"); let mut file = BufferedReader::new(File::open(&path)); return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect(); }

Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?

Answer1:

You could instead read the entire file as a single String and then build a structure of references that points to the words inside:

use std::io::{self, Read}; use std::fs::File; fn filename_to_string(s: &str) -> io::Result<String> { let mut file = File::open(s)?; let mut s = String::new(); file.read_to_string(&mut s)?; Ok(s) } fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> { s.lines().map(|line| { line.split_whitespace().collect() }).collect() } fn example_use() { let whole_file = filename_to_string("terms.txt").unwrap(); let wbyl = words_by_line(&whole_file); println!("{:?}", wbyl) }

This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.

A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

Answer2:

There is a shorter and more readable way of getting words from a text file.

use std::io::{BufRead, BufReader}; use std::fs::File; let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt")); for line in reader.lines() { for word in line.unwrap().split_whitespace() { println!("word '{}'", word); } }

Recommend

  • How do I access fields of a *mut libc::FILE?
  • Is it possible to create a macro to implement builder pattern methods?
  • Why do I get incorrect values when implementing HMAC-SHA256?
  • How to prevent a value from being moved?
  • Java catching exceptions and subclases
  • How to read data from socket connection - android
  • How could I write a BsonDocument object into a file, and read it again, using Java
  • Runtime error in UVA Online Judge [closed]
  • Why does access(2) check for real and not effective UID?
  • Checking free space on FTP server
  • R - Combining Columns to String Based on Logical Match
  • Read text file and split every line in MSBuild
  • How to redirect a user to a different server and include HTTP basic authentication credentials?
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • script to move all files from one location to another location
  • ILMerge & Keep Assembly Name
  • Can I make an Android app that runs a web view in Chrome 39?
  • Symfony2: How to get request parameter
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Return words with double consecutive letters
  • AT Commands to Send SMS not working in Windows 8.1
  • Circular dependency while pushing http interceptor
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • how to add data labels for bar graph in matlab
  • Linker errors when using intrinsic function via function pointer
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Load html files in TinyMce
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • coudnt use logback because of log4j
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • LevelDB C iterator
  • Linking SubReports Without LinkChild/LinkMaster
  • apache spark aggregate function using min value
  • Running Map reduces the dimensions of the matrices
  • Sorting a 2D array using the second column C++
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • Android Heatmap on canvas or ImageView
  • java string with new operator and a literal