Replace accents in string vector with Latex code



df <- data.frame(name=c("México","Michoacán"),dat=c(1,2))


> df name dat 1 México 1 2 Michoacán 2

When I print this table to a .tex file using xtable the accented characters get garbled, which is no surprise.

I would like to replace accents with proper Latex formatting e.g.:

> df name dat 1 M\'{e}xico 1 2 Michoac\'{a}n 2

Please note in real dataset there are many different names with different accented letters but all with same type of accent (i.e. foward-slash), so the only thing that needs to change in \'{.} is the letter in place of the dot.

In trying one reader's suggestion i did the following:

> df <- data.frame(name=c("México","Michoacán"),dat=c(1,2)) > df name dat 1 México 1 2 Michoacán 2 > df$name <- sub("é", "\\\\'{e}", df$name,) > df name dat 1 M\\'{e}xico 1 2 Michoacán 2 > capture.output( + print(xtable(df)), + file = "../paper/rTables.tex", append = FALSE)

When I opened the <em>rTables.tex</em> file in Notepad:

% latex table generated in R 2.13.1 by xtable 1.5-6 package % Fri Jul 15 13:19:17 2011 \begin{table}[ht] \begin{center} \begin{tabular}{rlr} \hline & name & dat \\ \hline 1 & M$\backslash$'\{e\}xico & 1.00 \\ 2 & Michoacán & 2.00 \\ \hline \end{tabular} \end{center} \end{table}

This is not what is needed.


I think the problem is that this case is asking a lot of xtable's attempts to convert strange characters to LaTeX. Try overriding sanitize.text.function as follows:


which on my system outputs this:

% latex table generated in R 2.13.0 by xtable 1.5-6 package % Fri Jul 15 10:30:00 2011 \begin{table}[ht] \begin{center} \begin{tabular}{rlr} \hline & name & dat \\ \hline 1 & M\'{e}xico & 1.00 \\ 2 & Michoacán & 2.00 \\ \hline \end{tabular} \end{center} \end{table}

It might be that other LaTeX markup may be broken by doing this, though, so be aware of that.


Use the stringr package, and replace each type of accented character one at a time.

library(stringr) df$name <- str_replace_all(df$name, "é", "\\\\'{e}") df$name <- str_replace_all(df$name, "á", "\\\\'{a}") df$name


