Aside from the very interesting theoretical and political-sociology oriented posts of late, some of us at the UT Austin Soc blog would also like to encourage other types of posts with a more methodological angle.
Since many of us use STATA for statistical work, I thought a series of posts on STATA tips and tricks would be a good place to start our “geek out” and share some time-saving, or just plain cool commands.
So here I’m going to give a little bit of sample code for getting tables and graphs out of STATA for a more manageable look at results.
TABLES
When it comes to tables, there are a number of useful programs built in to STATA to export results of regressions and other data. STATA 12 now comes with an improved menu button for exporting certain parts of the raw data to excel. “Tabout” is a useful tool for creating summary excel tables of tabbed data, for example average income by gender, if the data is from a certain country.
But many times it is not simple data or tabs that we want to see in excel, but rather more complex results of our regressions. Our eyes can only take so much of staring at the output window and it is hard to make connections without seeing things in neat tables. Sometimes we even need to create publication quality tables to insert into articles.
Now, copying and pasting and formatting by hand is always an option. But over the course of just one project, not to mention an entire PhD program, the countless hours spent making revisions by hand until perfection seem to justify the short-term time-expenditure on learning how to automate tables in your STATA code.
For this purpose then, there are two excellent little programs called “estout” and “outreg”. “estout” enables you to output a specific set of regression (or other analysis) results after first saving them with “esttab.” “outreg” is a bit more automated, and in fact my favorite, which I will demonstrate here is “outreg2”, which has more bells and whistles and seems to work well even with more advanced models beyond simple regression.
For this purpose, I will use a simple country level data set collected from the World Bank website, which includes three variables: 1) country, 2) hiv (average hiv infection rate for the past 5 years), and 3) pov (average poverty rate for the past five years).
DISCLAIMER: I came across the question of whether poverty has a significant effect on HIV infection rates in some development literature, much of which assumes that the two are linked. However, the jury is still out and this simple regression exercise does not in any way claim to offer answers. Rather it aims only to demonstrate some techniques for data analysis in STATA. Statistics probably can tell us something about this question, but for that, a much more complicated model would be appropriate.
So, there are several steps to get our neat excel output. (In case you are totally new to STATA, note that the actual code you type comes on the lines below that begin with periods, and what follows is the output.)
First, we run our basic regression:
. reg hiv pov Source | SS df MS Number of obs = 61 -------------+------------------------------ F( 1, 59) = 0.76 Model | 19.0177362 1 19.0177362 Prob > F = 0.3873 Residual | 1478.91636 59 25.0663791 R-squared = 0.0127 -------------+------------------------------ Adj R-squared = -0.0040 Total | 1497.9341 60 24.9655684 Root MSE = 5.0066 ------------------------------------------------------------------------------ hiv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pov | -.0190502 .0218708 -0.87 0.387 -.0628136 .0247133 _cons | 3.1443 .970026 3.24 0.002 1.203282 5.085317 ------------------------------------------------------------------------------
Next, we can call up a list of the estimates obtained from the above regression.
. ereturn list scalars: e(N) = 61 e(df_m) = 1 e(df_r) = 59 e(F) = .7586949891917172 e(r2) = .0126959765324349 e(rmse) = 5.006633506264527 e(mss) = 19.01773619459277 e(rss) = 1478.916364896987 e(r2_a) = -.0040379899670153 e(ll) = -183.795077505511 e(ll_0) = -184.1847839095903 e(rank) = 2 macros: e(cmdline) : "regress hiv pov" e(title) : "Linear regression" e(marginsok) : "XB default" e(vce) : "ols" e(depvar) : "hiv" e(cmd) : "regress" e(properties) : "b V" e(predict) : "regres_p" e(model) : "ols" e(estat_cmd) : "regress_estat" matrices: e(b) : 1 x 2 e(V) : 2 x 2 functions: e(sample)
With these estimates we can use outreg2 to create a simple table.
. outreg2 using OUTPUT_hiv_pov_countries, e(N df_m F rss ll) excel replace OUTPUT_hiv_pov_countries.xml dir : seeout
So, it’s as simple as that. Just run your analysis, call up the list of estimates, and plug those in to have outreg2 create a nice excel table like the one below. As you can see, one cool feature of outreg2 is that it automatically adds 1, 2 or 3 stars to your estimates in order to indicate whether they are statistically significant at the .1, .05, or .01 levels. This is not only a publication convention, but is also very useful for a quick eyeball look at your results, to see if you are on the right track.
NOTE ON AUTOMATION:
When you are running a large number of analyses however, it is useful to note a few things about automating outreg2.
1) Advanced formatting: Type “help outreg2” and take a closer look at the advanced features in order to be able to play with the formatting. This can save you from having to format every excel table by hand.
2) Replace: When running a number of analyses, for example the same regression over and over on individual countries, or separately for men and women… using the “replace” option on the very first analysis will make sure that you save over the old versions of your excel file when you re-run your code with the latest tweaks.
3) Append: Using the append option for your outreg2 code after each additional analysis that you wish to include in the same excel file will ensure that you have one big comparable table, which will list results for other regressions right along side the first one.
This was just a simple example to show outreg2 in action. However, one nice thing about this command is that it works well with more advanced analyses as well, including multi-level models, and can give you additional statistics such as Inter-Class Correlations… Basically, anything you can get STATA to estimate will appear in the “ereturn list” and can be outputted with outreg2.
Although it’s a bit basic, I hope you found this little geek out useful.
Please share with us some of your favorite STATA tips and tricks either in the comments, or perhaps as a guest blogger.