48650

What's the fastest way to partition a sas dataset for batch processing?

Question:

I have a large sas dataset (1.5m obs, ~250 variables) that I need to split into several smaller sas datasets of equal size for batch processing. Each dataset needs to contain all the variables but only a fraction of the obs. What is the fastest way of doing this?

Answer1:

You could do something like the following:

%macro splitds(inlib=,inds=,splitnum=,outid=); proc sql noprint; select nobs into :nobs from sashelp.vtable where libname=upcase("&inlib") and memname=upcase("&inds"); quit; %put Number of observations in &inlib..&inds.: &nobs; data %do i=1 %to &splitnum.; &outid.&i %end;; set &inds.; %do j=1 %to (&splitnum.-1); %if &j.=1 %then %do; if %end; %else %do; else if %end; _n_<=((&nobs./&splitnum.)*&j.) then output &outid.&j.; %end; else output &outid.&splitnum.; run; %mend;

An example call to split MYLIB.MYDATA into 10 data sets named NEWDATA1 - NEWDATA10 would be:

%splitds(inlib=mylib,inds=mydata,splitnum=10,outid=newdata);

Answer2:

Try this. I haven't tested yet, so expect a bug somewhere. You will need to edit the macro call to BATCH_PROCESS to include the names of the datasets, number of new data sets, etc.

%macro nobs (dsn); %local nobs dsid rc; %let nobs=0; %let dsid = %sysfunc(open(&dsn)); %if &dsid %then %do; %let nobs = %sysfunc(attrn(&dsid,NOBS)); %end; %else %put Open for dataset &dsn failed - %sysfunc(sysmsg()); %let rc = %sysfunc(close(&dsid)); &nobs %mend nobs; %macro batch_process(dsn_in,dsn_out_prefix,number_of_dsns); %let dsn_obs = &nobs(&dsn_in); %let obs_per_dsn = %sysevalf(&dsn_obs / &number_of_dsns); data %do i = 1 %to &number_of_dsns; &dsn_out_prefix.&i %end; ; set &dsn_in; drop _count; retain _count 0; _count = _count + 1; %do i = 1 %to &number_of_dsns; if (1 + ((&i - 1) * &obs_per_dsn)) <= _count <= (&i * &obs_per_dsn) then do; output &dsn_out_prefix.&i; end; %end; run; %mend batch_process; %batch_process( dsn_in=DSN_NAME , dsn_out_prefix = PREFIX_ , number_of_dsns = 5 );

Recommend

  • SAS Conditional row highlighting with ODS and Proc Print
  • SAS: Mean, median, max and percentiles by two variables
  • Automated grouping in SAS with minimizing variance within group
  • converting list to data frame with specific column names in R
  • How to have the user input alternate between uppercase and lowercase in Ruby?
  • Handle command-line switch in Ruby without if…else block
  • How to call a macro in a data step that updates a macro variable and use that value immediately?
  • How to use Ruby's metaprogramming to reduce method count
  • Alphabetical lists with ruby on rails
  • Why derived class does not have the vtable pointer and used instead vtable of the base class?
  • Can an empty virtual table exist?
  • SAS: Enhanced Editor. Color Scheme Sharing
  • Determine the size of object without its virtual table pointers
  • How to make the target depend on lib file, but exclude it from $^ (VPATH involved)?
  • c++ reinterpret_cast, virtual, and templates ok?
  • Cumulative sum in two dimensions on array in nested loop — CUDA implementation?
  • Prompting for SAS ODBC connection password when running in batch mode
  • Display.getDefault().asyncExec not running correctly
  • Can Sikuli be used for web testing?
  • Error - Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs)= etc
  • Automating table/object name scan and search in SAS
  • PXAction seemingly does nothing
  • Is there a way to ensure one object reference per record in an ActiveRecord hierarchy?
  • Ruby: Why does this way of using map throw an error?
  • How to make stdcall from Go
  • How to enable large page memory for the JVM?
  • what does prefix @- mean in makefile?
  • pyodbc doesn't report sql server error
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • PHP - How to update data to MySQL when click a radio button
  • Counter field in MS Access, how to generate?
  • Display Images one by one with next and previous functionality
  • ORA-29908: missing primary invocation for ancillary operator
  • How to get next/previous record number?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Circular dependency while pushing http interceptor
  • Linker errors when using intrinsic function via function pointer
  • How do you join a server to an Active Directory (domain)?
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • How does Linux kernel interrupt the application?