40698

R convert summary result (statistics with all dataframe columns) into dataframe

[I'm new to R...] I have this dataframe:

df1 <- data.frame(c(2,1,2), c(1,2,3,4,5,6), seq(141,170)) #create data.frame names(df1) <- c('gender', 'age', 'height') #column names

I want the df1's summary in a dataframe object that looks like this:

count mean std min 25% 50% 75% max age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000 gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000 height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000

I've generated this in Python with df1.describe().T. How can I do this in R?

It would be a gratis if my summary dataframe would contain the "dtype", "null" (number of NULL values), (number of) "unique" and "range" values as well to have a comprehensive summary statistics:

count mean std min 25% 50% 75% max null unique range dtype age 30.0000 3.5000 1.7370 1.0000 2.0000 3.5000 5.0000 6.0000 0 6 5 int64 gender 30.0000 1.6667 0.4795 1.0000 1.0000 2.0000 2.0000 2.0000 0 2 1 int64 height 30.0000 155.5000 8.8034 141.0000 148.2500 155.5000 162.7500 170.0000 0 30 29 int64

The Python code of above result is:

df1.describe().T.join(pd.DataFrame(df1.isnull().sum(), columns=['null']))\ .join(pd.DataFrame.from_dict({i:df1[i].nunique() for i in df1.columns}, orient='index')\ .rename(columns={0:'unique'}))\ .join(pd.DataFrame.from_dict({i:(df1[i].max() - df1[i].min()) for i in df1.columns}, orient='index')\ .rename(columns={0:'range'}))\ .join(pd.DataFrame(df1.dtypes, columns=['dtype']))

Thank you!

Answer1:

I commonly use a little function (adapted from a script found on the net) to do this kind of transformation:

sumstats = function(x) { null.k <- function(x) sum(is.na(x)) unique.k <- function(x) {if (sum(is.na(x)) > 0) length(unique(x)) - 1 else length(unique(x))} range.k <- function(x) max(x, na.rm=TRUE) - min(x, na.rm=TRUE) mean.k=function(x) {if (is.numeric(x)) round(mean(x, na.rm=TRUE), digits=2) else "N*N"} sd.k <- function(x) {if (is.numeric(x)) round(sd(x, na.rm=TRUE), digits=2) else "N*N"} min.k <- function(x) {if (is.numeric(x)) round(min(x, na.rm=TRUE), digits=2) else "N*N"} q05 <- function(x) quantile(x, probs=.05, na.rm=TRUE) q10 <- function(x) quantile(x, probs=.1, na.rm=TRUE) q25 <- function(x) quantile(x, probs=.25, na.rm=TRUE) q50 <- function(x) quantile(x, probs=.5, na.rm=TRUE) q75 <- function(x) quantile(x, probs=.75, na.rm=TRUE) q90 <- function(x) quantile(x, probs=.9, na.rm=TRUE) q95 <- function(x) quantile(x, probs=.95, na.rm=TRUE) max.k <- function(x) {if (is.numeric(x)) round(max(x, na.rm=TRUE), digits=2) else "N*N"} sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x, null.k), sapply(x, unique.k), sapply(x, range.k), sapply(x, mean.k), sapply(x, sd.k), sapply(x, min.k), sapply(x, q05), sapply(x, q10), sapply(x, q25), sapply(x, q50), sapply(x, q75), sapply(x, q90), sapply(x, q95), sapply(x, max.k)) sumtable <- as.data.frame(sumtable); names(sumtable) <- c('count', 'null', 'unique', 'range', 'mean', 'std', 'min', '5%', '10%', '25%', '50%', '75%', '90%', '95%', 'max') return(sumtable) } sumstats(df1) count null unique range mean std var min 5% 10% 25% 50% 75% 90% 95% max gender 30.00 0.00 2.00 1.00 1.67 0.48 0.23 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 2.00 age 30.00 0.00 6.00 5.00 3.50 1.74 3.02 1.00 1.00 1.00 2.00 3.50 5.00 6.00 6.00 6.00 height 30.00 0.00 30.00 29.00 155.50 8.80 77.50 141.00 142.45 143.90 148.25 155.50 162.75 167.10 168.55 170.00

You might easily adapt it to add more descriptive columns, such as quantiles, nulls, range, etc. It does return a data.frame. You also might want to specify in advance the behaviour with NAs in the arguments.

Hope it helps.

Answer2:

you can do this quite easily and readable with these libraries - tidyr, dplyr

library("tidyr") library("dplyr") df1 <- data.frame(c(2,1,2), c(1,2,3,4,5,6), seq(141,170)) #create data.frame names(df1) <- c('gender', 'age', 'height') #column names df2<- gather(df1,"attributes","value") df2 %>% group_by(attributes) %>% summarise(count = n(), mean = mean(value), med = median(value),sd = sd(value), min = min(value), max = max(value)) # A tibble: 3 x 7 # attributes count mean med sd min max # <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 age 30 3.500000 3.5 1.7370208 1 6 # 2 gender 30 1.666667 2.0 0.4794633 1 2 # 3 height 30 155.500000 155.5 8.8034084 141 170

Recommend

  • Cumulative percentage for multi index dataframe in pandas
  • Converting Pandas DatetimeIndex to a numeric format
  • Pandas DataFrame.update with MultiIndex label
  • Using tinytest to test Meteor client while the server is running
  • SQL Server XML with dynamic arguments
  • Why single SQL delete statement will cause deadlock?
  • The column name “FirstName” specified in the PIVOT operator conflicts with the existing column name
  • Return value from stored procedure to c#
  • Dynamic Pivot without Null value
  • VBScript: Sorting Items from Scripting.Dictionary
  • How to merge overlapping columns
  • Getting the phone number of nearby place
  • php sqlsrv_query stored procedure with naming parameters
  • Changing NULL's position in sorting
  • How to make Plotly chart with year mapped to line color and months on x-axis
  • Cannot convert a char value to money. The char value has incorrect syntax
  • Transpose table then set and rename index
  • How to filter on year and quarter in pandas
  • Color time-series based on column values in pandas
  • Use default value of a column in stored procedures
  • Primefaces :radioButton inside a ui:repeat
  • Approximate Order-Preserving Huffman Code
  • Make new pandas columns based on pipe-delimited column with possible repeats
  • Breaking out column by groups in Pandas
  • Unable to get column index with table.getColumn method using custom table Model
  • xtable package: Skipping some rows in the output
  • Grails calculated field in SQL
  • Error when parsing timestamp with pandas read_csv
  • Django: Count of Group Elements
  • Is possible to count alias result on mysql
  • How to check if every primary key value is being referenced as foreign key in another table
  • How to handle AllServersUnavailable Exception
  • How to get next/previous record number?
  • php design question - will a Helper help here?
  • AngularJs get employee from factory
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Authorize attributes not working in MVC 4
  • How can I remove ASP.NET Designer.cs files?
  • java string with new operator and a literal