Issues with CRSPMERGE
One of the workhorse SAS macros provided by Wharton’s WRDS is crspmerge. Written by Rabih Moussawi and Luis Palacios, is was published in April 2009. The crspmerge macro accepts lists of variables to be taken from monthly or daily CRSP files crsp.msf (crsp.dsf) and crsp.mseall (crsp.dseall). The necessity of the macro arises because CRSP provides information in several different files and many research projects require that these data be merged.
There are two problems with crspmerge and several inconveniences. The first problem, or error, occurs on lines 34 and 35:
34 %let sdate = %sysfunc(putn(“&start”d,5.)) ;
35 %let edate = %sysfunc(putn(“&end”d,5.)) ;
These statements take the macro arguments start and end and convert them into numbers stored as text in macro variables sdate and edate. The probelm is that dates earlier than 16Aug1932 require a space of 6 not 5. That is, 16Aug1932 becomes -9999, which takes 5 spaces but 15Aug1932 becomes -10000 which takes 6 spaces. The conversion on the above lines turns 10Aug1932 into -1E4 which is actually the numerical representation of 15Aug1932. So, all dates earlier than that are converted incorrectly. The fix is simple: change the 5’s to 6′ in lines 34 and 35.
The second problem is that when crspmerge creates a daily file it misses distribution events that occur on non-trading days. There are 95 such events in the 2020 CRSP file. For example,on September 11, 2001, Herley Industries (permno 41663) announced a 2-for-1 stock split. However, there was no trading that week on Wall Street and the event does not appear on the crspmerge daily output. On October 12, 2012, hurricane Sandy halted trading and the $2 per share dividend announced by Lockhhed Martin (permno 21178) does not show up on the crspmerge daily file output.
SUPERCRSPMERGE
Richard Rumelt and Fan Xia have written a new merge program named supercrspmerge. It fixes the above issues with crspmerge and has a number of new convenience features. To deal with distribution events on non-trading days (in a daily merge) the macro moves the distribution event(s) to the next available trading day.
An inconvience in crspmerge is that one has to properly list the desired variables as SFVARS (from the price file) or SEVARS (from the mseall or dseall files). Supercrspmerge knows where to get variables. It also has a keyword _allv_ which brings in all available CRSP variables from these files. In addition, supercrspmerge allows the user to specify index variables from the msi or dsi files (veretd etc.) or use the keyword _alli_ to bring them all in.
Supercrspmerge allows the user to specify a list of variables for which one-period lags will be added. For example, by specifying lags = prc, the output daily file will contain prc_L, the price on the preceeding trading day (or a missing value if the focal date is the first for the permno). The macro will look back before the start date to get a lag. On a monthly file, the lag would be denoted prc_M.
There are a number of useful options provided by supercrspmerge. They are:
- abs Apply the absolute value operator to any prices among the specified variables.
- ntdist Find and shift non-trading day distributions events on daily file merge.
- mv Create market value variable (millions of $). Can be lagged.
- exdt On monthly file, include the date of the event as variable exdt.
- cut Drop distribution events occuring after the delisting date.
- nonul Exclude rows with no data on selected variables (except permno and date).
- ym Include the ‘yearmo’ variable = 100*year+month
The calling syntax of the macro is:
%supercrspmerge(freq=, lib=, start=, end=, thevars=, plist=, filter=, outfile=, options=, lags=);
where
- freq = d or m for daily or monthly merge. Defaults to m.
- lib = name of the crsp data library. If no such local libname, use wrds/crspsasdata/a_stock.
- start = start date as text. For example, 01Jan1973. Defaults to earlist date on CRSP files.
- end = end date as text. Defaults to most recent date on CRSP files.
- thevars = list of crsp variables to be included. _allv_ brings in all from dsf (msf) and dseall (mseall). _alli_ brings in all indices from dsi (msi). Permno and date will always be included and need not be listed. No default.
- plist = list of permnos to select. The list should be within parentheses and formatted as an argument to the IN operator. For example, plist = (10005, 20000:29999). Defaults to none.
- outfile = name of the output file. Defaults to test_&freq.
- options = list of option keywords. Defaults to none.
- lags = list of variables to be lagged one observation. Defaults to none.
When the user specifies distribution variables the merge may create more than one row for each date. Each row marks a different distribution event. The distribution variables are: distcd divamt facpr facshr dclrdt rcrddt paydt acperm accomp. Delisting variables may also create an additional row. They are: dlstcd nwperm nwcomp nextdt dlamt dlretx dlprc dlpdt dlret. Each such row will have the same values for the non-distribution non-delisting variables. Note that distribution and delisting variables are not taken from dseall (mseall) files as in crspmerge. Instead, they are taken from the source files dsedist and dsedelist for both monthly and daily merges. If one is using a local crsp library, these files must be present in that library.