Step #6: Automation - Separating the Men from the Boys

Before we start, to make sure, when I'm talking about automation I'm referring to all sorts of ways to write a program in Stata that will run, and save output from, commands in a more-or-less structured program without having to write all commands separately. In the Stata manual you will sometimes see the term Automation as reserved to OLE automation, which is making Stata available to Microsoft Office programs. I will not deal with this automation at all.

Intro: Why?
OK, so what is automation good for? We had a preview in the previous step: One thing we can do is to calculate things based on outputs from commands we ran, before seeing the output and automatically by the program: divide mean by standard deviation, add 1.96 standard errors to the mean, and so on. Of course, one way (the boys way) is to run the command, and then calculate it with the di command, or in a spreadsheet or calculator. But the men don't do things manually. They tell their programs what to do.
Another thing we can do is to avoid repeating similar commands. Instead of having 50 rows of the same reg command but with different regressors or different samples, we can write a loop that will do it in 5 rows. This is good not only to save rain forests when you print out your code, but it also puts a structure to your regressions and reduces the chances that when you need to change something in the command (report clustered standard errors for example), you'll need to do it just once and not 50 times (or 48 times and forget to change 2 of the regressions accidentally).
Finally, what we can also do is to construct tables of the results we want to report. But we'll deal with this possibility in the following step.
When is it better to leave automation out? Probably when you need just a few regression and you're not doing anything too time-consuming with the output. Think of automation as an investment you do in your program. It entails a fixed cost of thinking about the structure and implementing it, but above a certain threshold the benefits of having the program do most of the work for you. What I used to do in many cases when I was starting to automate do-files, was to first write the program simply and then when I saw that I'm starting to repeat almost the same code (Copying and pasting like there's no tomorrow) I started thinking of how to automate things.

Macros
A macro is a word (or a string) that whenever we write it in a Stata command, before running the command, Stata replaces this word by a string (a set of characters) that is set for this macro, and only after the replacement, it runs the command.
Enough with the definitions, let's see a simple example:
To define a macro x that contains the value 6, run the following line (without the . in the beginning):
. local x = 6
Now Stata has assigned a place in memory called x and put 6 in it. So whenever we want to tell Stata to use this x we saved, we use backquote (usually the key in the top-left corner of the main keyboard, left to "1") and quote (single quote) characters:
. di `x' + 5
11
What really happens, behind the scene, is that Stata first sees the ` followed by the ', and looks within them to find a local we previously defined. Then it replaces this referral by the value that was saved:
First (as typed): di `x' + 5
Then (replacing `x' by 6): di 6 + 5
And only then when no `' are left in the command, Stata will run it and return 11.

Note: if nothing was defined for y, then Stata will replace the `y' by nothing:
First (as typed): di `y' + 5
Then (replacing `y' by an empty string): di + 5
And then you will get an error because the di command can't handle "+ 5" as input. Note that you will not always get an error. If Stata has no problem running the command after replacing the macro by an empty string, then it will run. Many annoying bugs in your programs will stem from this problem.


We can also put strings inside macros:
. local x = "Hi there, how are you?"
or
. local dependent_variable = "wage"

Why do we need the double quotes? Because otherwise Stata will think what we put after the = sign is part of our command and not simply a value. Specifically, when we put words (instead of numeric values) after the = sign, Stata thinks we're referring to a variable in the dataset (if there is, indeed a variable in that name, it will put the value of the first observation in this variable). Thus, to tell Stata you don't want the value of what's inside wage, but rather you want simply the name "wage" to be kept inside the macro dependent_variable

Now, let's see how we can refer to these macros. Almost the same as when it was numeric:

. di "`x'"
Hi there, how are you?

. sum `dependent_variable'

Why did we use the double quotes for the first command but not for the second?
Remember what the macro does, it replaces the `macro' with what we've saved in it. So for the first line this would be:
First (as typed): di "`x'"
Then (replacing for the macro): di "Hi there, how are you?"
So if we had not put the double quotes, it would had been equivalent to running:
. di Hi there, how are you?
But Stata will then look for a variable named Hi and won't find it. We didn't intend Stata to look up a variable, but to simply display a string as it is.

However, when we ran the second line with the summarize (sum) command, we indeed wanted the command to treat wage not as just a word, but as a reference to the variable wage! In other words, we wanted to run:
. sum wage
(and not sum "wage")

Three final remarks before we move on to loops (if you're still wondering why do we need all this, hang on).
  • local and global - you might have wondered why the command to define a macro is called local. In step #8, which hopefully will be written some day, we will deal with writing commands in Stata, and then it will matter. local means that the macro is defined within the program it was set in, and global means that all commands and programs can refer to the macro.
In any case, to define global, you will use the word global instead of local:
. global x = 6
But to refer to the global later, we do something a little different. We use ${macroname}
. di ${x}

  • Long strings - at least in Stata 9, there is a weird issue with defining macros for strings that are longer than 255 characters. You might think why do you need more than 255 characters, but it happens sometimes and unless somebody told you, this can be one of the most annoying bugs (Stata might simply cut your string after 255 characters...). To avoid that, what you need to do is define string macros without the = operator:
. local mylongstring "This is my very long string, and since it is longer than 255 characters, I omitted the = in its definition. Looks strange, but this is how it works. Good luck!"
  • Predefined macros - Stata has some macros within it that you might find helpful. They're not exactly macros, when I come to think of them, but we treat them as such (but without the `'). For example, _N holds the number of observations in the dataset (try to run di _N), _pi holds the number pi. For a few more, you can look up help _variables
  • Nested macro reference - You can refer to a macro within another macro reference. What does that mean? Say you have one macro named x_a and another macro named x_b, you can define a macro named i and do the following:
local x_a = 800
local x_b = 43.2
local i = "b"
di `x_`i''
Note that it is not double-quote at the end of the di command but two single-quotes. What happens when Stata hits the di command is the following:
First (as typed): di `x_`i''
Then (replacing the innermost macro): di `x_b'
Then (replacing the next innermost macro): di 43.2
And then Stata will execute the command and shoe the number 43.2

This can be handy when you have several macros and you want to alternate referring to them (i decides which of the x's to use), which sometimes you need to do inside loops.
  • Extended functions - Ever wondered how to save a variable's label? Ever wanted to count how many words there are in a string? Maybe you didn't, but sometimes there are things you need to save to a macro and you have no idea how to do that. In some of these cases, you might find your answer in the extended functions. They work a bit differently (but just like egen, you get many different features with variations on the same command):
local <macro_name> : <extended_function>
Note that we use : instead of = which tells Stata we're not using the regular functions but the extended functions. For example, say you want to keep the label of the variable w2gef (usually questionnaire data will have cryptic variable names but hopefully informative labels) inside the macro w2gef_label:
local w2gef_label : variable label w2gef
Another example:
local x "This is my string. How many words are in it?"
local num_words : word count `x'
local sixth_word : word 6 of `x'
di "There are `num_words' words in x. The sixth of them is `sixth_word'"
The output will then be:
There are 10 words in x. The sixth of them is many
More on that in help extended_fcn
  • Saving and reusing - I thought once that this part is obvious, but teaching Stata has taught me otherwise. So to make things clear... When you save with the "save" command or with the icon on the top left and so on, it just saves your data. It will not save the macros. Macros are part of programs. If you use the interpreter interface of Stata (the command line below the output window), then when you will close Stata, your macros will disappear. If you want to reuse them when you open Stata next time, you have got to work with .do files. Stata comes with a do-file editor, but you can write it in any text editor. Make sure from now on you work with .do files. Of course, experimenting commands with the interpreter is something which is always worth doing, but in the end keep the commands you liked in a do-file.

Loops (and Conditions)
The power of automation comes mainly from the ability to create a loop and repeat commands with it. The next subsection will show some examples.

* Before we start, the following examples sometimes have lines extend beyond the boundaries of the blog, so it cuts them to two. If you're not sure where each line ends and another one starts, copy the example to a text editor.


while


The simplest loop is the while. The syntax goes like this:
while <exp> {
    ...
}
where <exp> is a condition. If you remember when we talked about creating dummy variables we said that a condition in stata is an expression that is equal to 1 if the condition holds and 0 otherwise. The while command tells Stata to keep running the same commands between the {} until is equal to 0 (that is, as long as the condition is satisfied).
When we dealt with dummy variables we usually constructed the condition on one of the other variables (and Stata checked the condition on the values of the variables for each observation: educ>12 for example). But when you deal with loops and other matters of flow control (that is, how your program runs contingent upon the situations it faces), the conditions will mainly deal with macros instead of variables.*

* There is no technical problem with referring to variables in the condition. The thing is that as opposed to conditions when creating variables - in which Stata goes through all the observations - here referring to a variable will give its value in the first observation only, because nothing tells Stata to go through all observations. If you want to refer specifically to the value of the variable in an observation other than the first, just rever to varname[observation_number]. You can experiment with the command di

Anyway, here's an example:
local i = 1
while `i' <= 4 {
    di "counting `i'"
    di "good."
    local i = `i' + 1
}
This will output:
counting 1
good.
counting 2
good.
counting 3
good.
counting 4
good.
Note that the last row within the while loop is iterating the macro i. Each time before the iteration is over, we increase i by 1. If we didn't do so, i would have stayed 1 and the condition would always be satisfied. To get out of the loop we need to make sure that after a finite number of iteration the condition is no longer satisfied.

But for examples as the one I given, there is a better loop which is less cumbersome (the foreach/forvalues loop). While is good for situations in which one doesn't know in advance how many iteration one wants.

forvalues
When you know how many iterations you want, using a for loop is much better. The simplest for loop is the forvalues loop. Lets start with an example which will do the same thing as the example for the while loop:
forvalues i=1/4 {
    di "counting `i'"
    di "good."
}
Let's try to find the differences between the examples:
First and foremost, the condition from the while loop has changed to i=1/4. Second, the initialization of the macro i before the while loop and the incrementation before the end of the iteration are both gone. This is done by the simple i=1/4 which we wrote for the forvalues. It means that we are creating a loop that will start with i=1, then increase i by 1 until it reaches 4 (including 4). We can refer to i inside the loop or we can ignore it. The loop will run 4 times with each time having the next number for i.

More generally, our forvalues loop looks like this:
forvalues <loop_macro> = <range> {
    ...
}

In our example <loop_macro> was i and <range> was 1/4. Note that when we're in the forvalues context, 1/4 doesn't mean a quarter, but rather "from 1 until 4 in steps of 1". The range can be different both in terms of boundaries and in terms of steps. We can do this:
forvalues proportion = 0(0.05)1 { ...
Which will start with proportion=0, then the next iteration will have proportion=0.05, then the next one 0.1, and so on until proportion=1.

More on the possibilities of range in help forvalues.

foreach
The foreach command is pretty versatile. In my experience, two of its versions are very common. The first and simplest one is this:
foreach <loop_macro> in <list> {
  ...
}
where <list> is simply a list of words (can also be numbers if you want) separated by white space. Let's see some examples:
foreach regressor in educ_mom educ_dad "educ_mom educ_dad" {
    reg wage educ `regressor'
}
The loop will run the following three regressions:
reg wage educ educ_mom
reg wage educ educ_dad
reg wage educ educ_mom educ_dad
Note that the double-quotes in the last expression are there to tell Stata we want it to treat it as a one word (one iteration in which the whole string inside the double quotes is the value that is assigned to the macro regressor). In other words, if you don't want Stata to treat the space as a separator.
foreach male_value in 0 1 {
    reg unemployed wage educ shock if male == `male_value'
}
This will run twice:
reg unemployed wage educ shock if male == 0
reg unemployed wage educ shock if male == 1
What if you want an additional regression for both males and females? Because macros are simply text substitutions before commands are run, there are quite a few possibilities to implement this. I would try to do the one which makes the code easiest to read. One possibility is doing it this way:
foreach male_cond in "male == 0" "male == 1" 1 {
    reg unemployed wage educ shock if `male_cond'
}
This will run the following three regressions:
reg unemployed wage educ shock if male == 0
reg unemployed wage educ shock if male == 1
reg u
nemployed wage educ shock if 1
The last 1 says that the condition will always satisfy. Thus, all observations (including, for example, those with a missing value in the variable male) will be in the last regression.

Now, besides lists of strings and numbers, we can tell foreach to iterate between variables only. This is good for two reasons: (1) You can refer to a group of many variables with just one word , and, (2) If we're really interested in iterating names of variables, we can get something which is usually absent in Stata - we can get an error message if there is no such variable (error messages are definitely underrated - it is true you don't want any of them, but if you misspelled one of the variables' name, you probably want Stata to tell you).
How do we do it?
foreach <loop_macro> of varlist <varlist> {
    ...
}
For example (suppose the following variables exist in the loaded dataset: educ educ_dad educ_mom year1998 year1999 year2000 year2001 year2002):
foreach var_to_sum of varlist educ* year1998-year2002 {
    sum `var_to_sum'
}
The educ* will make the loop go through all variables of which names start with educ. Then, year1998-year2002 will make the loop go through all the variables between year1998 and year2002.

As always, further details are to be found in help foreach

if
You are already familiar with the if condition most commands support. This if is meant to limit the execution of the command only to observations for which the condition is satisfied. As we said when we talked about the while loop, sometimes we would like conditions to control how our program flows. Those conditions are a bit different.

Let's do an example. Suppose you want to run the loop above which iterates over different samples: male, female and all. But when you run both males and females in the regression you want to add the male dummy as a regressor (this is sometimes called adding a main-effect), or an interaction between the male dummy and a treatment variable. You only need to add those regressors to the "all-sample" iteration (actually you can put the regressors in the male-only and female-only regressions too and Stata will just drop those variables as they are multicollinear with the constant, but lets ignore this for the sake of the example). You can do something like
foreach male_cond in "male == 0" "male == 1" 1 {
    if "`male_cond'" == "1" {
       local add2reg "male maleXshock"
    }
    else {
       local add2reg ""
    }
    reg unemployed wage educ shock `add2reg' if `male_cond'
}
Note that I put double-quotes on both sides of the condition because if I wouldn't, the first and second iterations would make the if command look like this:
if male == 0 == 1

Stata would first evaluate 0 == 1 and then male == 0 (the second 0 is because 0 is not equal to 1). You didn't want this. You wanted simply to compare the string of the condition to 1 (to get the last iteration).
This example brings me to another point. Note that we wrote 9 rows of code for a loop that replaces 3 rows of simple regression commands. In many cases, simply writing the original regressions will do the job. In others you might be working in a greater framework, or you want to later add additional subsamples which will make it better to write the loop instead of the regressions themselves. Do your own calculation of whether complicating things with a loop (and inner conditions) is better than simply repeating your commands, however stupid it feels.

For further help (this time I'm going to surprise you), look up help ifcmd.

Additional issues for loops and conditions:
  • Nested loops - you can write a loop inside a loop. This will make the inner loop run anew for each iteration of the outer loop. This is where the whole thing really starts to pay off, because you can run many regressions and make it pretty readable, enabling easier changes in the specification when you need it. Here's an example
local control_vars "educ_dad educ_mom hh_income grade_5 grade_6"

foreach dep_var of varlist score pass_dummy admitted {
    foreach treatment of varlist hours_tutored tutored_dummy {
       foreach sample in "male == 0" "female == 1" "male == 0 & educ_dad < 12" {
          reg `dep_var' `treatment' `control_vars' if `sample'
       }
    }
}
  • continue - if you want to exit a loop before it ends naturally (i.e murder a loop?), you can use the continue command. Usually it will appear inside an inner if condition. This is very uncommon, though, and makes the code less readable.


Summary
So we learned how to define macros and give our regressions a structure with loops (and nested loops). I hope by now you understand how this can contribute to your project. I think the last example - for the nested loop remark - makes it very clear. As we will see in the following steps, loops and macros can help us automate not only the statistical commands, but also how we save the output we're interested in and export it to nice tables (if reading logs of Stata isn't your favorite pastime activity).

27 comments:

Owen Martin said...

Hey Stataman,

This tutorial has been great! I needed to know the style of using Stata in a pinch and your blog was there-- so thanks a lot.

Any chance you'll be doing the final two updates soon?

stataman said...

Hi Owen,

Thanks for the thumbs up. I appreciate it.
I will be doing the final two steps some day. As the winter quarter just started, I'm afraid I won't do it until later this year. Things are pretty stressed here.

Thanks again for the feedback

Owen Martin said...

Hi again,

I'm actually coming up on an interview in which I demonstrate my competency in Stata, for which I would LOVE to be able to show them excellent export abilities for the many regressions I'm generating. If you couldn't do post #7 can you possibly give me a few hints on which commands to check out or send me to another link? Much much appreciated,

Owen.

Michael said...

Stataman: I made it through your material for Step 6 on automation. I am still struggling in understanding how that sort of automation saves significant time in economic and finance research. It seems to me that writing (or copying and pasting with some tweaking) separate Stata commands in a .do file would be as efficient as macros when it comes to running regressions and so forth on typical data sets we see in economics and finance.

If you get a moment, can you provide a few examples where it is significantly more efficient to use macros/loops/etc. instead of using multiple commands in a .do file? The time to learn the syntax of Stata's macros/loops/etc. seems pretty significant. I currently can't see much time-savings. Thank you for your time and your website.

stataman said...

Good question, Michael.

First, let me say that whenever you learn a new language it is looks at first as a waste of time, or even if not - it takes a lot of time and energy. But once you become more acquainted and experienced with the language, things are much faster and more efficient.

Now, even after you will get to know the language, it won't always be better to write things generically and some times just copying a list of commands will do the job. Especially when you already experimented with the command line and got the job done.

However, in many papers you see tables of regressions, for example, or of other statistics, and maybe some figures too. When the program gets big - in terms of running time or in terms of the lists of variables you use in your analysis and the number of commands you write - it becomes easier to just write a loop that will do the job. You can still do everything command after command, but it will make your code long, hard to edit and much more time-consuming.
Suppose now you want to do the whole table with a new control variable you didn't think of when you first ran the regressions. If you didn't write a loop, then you need to go over the lines and add the variable's name to each line. However, if you had a loop for all of the regressions, all you need to do is to add it to one place. It will be clearer, I hope, once I will find time to write the remaining steps here, but even copying your numbers to a table can be done automatically.

Another place where you must use loops is Monte Carlo and other simulations. If you want to simulate samples of some random variable, you need to repeat it many times to learn more about your estimator (or other statistics you have in mind).

Lastly, knowing the syntax of the language allows you to understand what some commands you find in Stata are actually doing. You can find those commands in the "ado" subfolder of where Stata is installed. You can then extend them if you don't have the command. For example, I think only the upcoming version of Stata 11 is going to systematically deal with GMM. However, you can think of ways to do, say, nonlinear IV regressions in earlier versions of Stata.

I hope this gives you enough examples but, you know, it's a free country. If you don't buy it - ignore it. From my experience, the ability to generalize and automate my commands made me do stuff thrice as fast, and therefore freed up time to do more analysis.

Michael said...

Stataman: I want to post a follow up to my message of a few weeks ago. After reading your tutorial on automation, I've been thinking how I might incorporate those ideas into my own work with Stata. I managed to recognize a couple of opportunities to use automation. But I needed to learn more about the syntax, application, etc. I found Larry Hamilton's text, "Statistics with Stata" in chapter 14 a very good follow up to your excellent primer. Reading your material is much easier than any text I think and it gave me a start in the new area (for me) of automation. Your comments about automation syntax being a language is spot on. It comes along with practice. Thanks.

I'll be checking back to your website to see if you manage to write the final two sections. We appreciate your time in putting this together. It's made me more proficient for my doctorate research.

stataman said...

Hi Michael,

Thanks! I'm glad you found this helpful.

Roy

marthe said...

Hi Stataman,

thanks a lot for your sooooo usefull contribution. It is very clear and interesting and I am deadly looking forward to reading the next steps ! Using the output results and making nice tables might be sometimes pretty tricky, particularly with ecommands...
hope you will find some time !
cheers

Marie G

Jonathan said...

Your tutorial is excellent! I come from a programming background and this has brought me up to speed. One thing I am sorely lacking in understanding is how to build tables. Did you ever get around to writing that next step in the tutorial? If not, any example code you could send me? Thanks again.

stataman said...

Actually, yes. I prepared some handouts for a short Stata course I gave. Will post a link in the main site soon.

thought monger said...

thank you for this blog. i have often found it helpful.

i just want to say that i'm disappointed by the title of this post (and some of its context). there are many skilled female stata users out there, including myself. saying that automating the code separates the "men from the boys" excludes women. why not just separate the "great coders from the good ones"?

i know this is a touchy issue, and i appreciate your thoughtfulness on the topic.

stataman said...
This comment has been removed by the author.
Long said...

Hi Stataman

I have problem with local command. I use Stata 11/MP.
I have a dofile like this:
sysuse auto, clear
regress mpg weight
local rsqf e(r2)
local rsqv = e(r2)
di 'rsqf' // this has the current R-squared
di 'rsqv' // as does this
regress mpg weight foreign
di 'rsqf' // the formula has the new R-squared
di 'rsqv' // this guy has the old one

And the result is here:
sysuse auto, clear
(1978 Automobile Data)

. regress mpg weight

Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 1, 72) = 134.62
Model | 1591.9902 1 1591.9902 Prob > F = 0.0000
Residual | 851.469256 72 11.8259619 R-squared = 0.6515
-------------+------------------------------ Adj R-squared = 0.6467
Total | 2443.45946 73 33.4720474 Root MSE = 3.4389

------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0060087 .0005179 -11.60 0.000 -.0070411 -.0049763
_cons | 39.44028 1.614003 24.44 0.000 36.22283 42.65774
------------------------------------------------------------------------------

. local rsqf e(r2)

. local rsqv = e(r2)

. di 'rsqf' // this has the current R-squared
'rsqf' invalid name
r(198);

end of do-file

r(198);

Why it says: 'rsqf' invalid name

Please help!

stataman said...

Hi,

You are not using the back-tick for your local when you call it, you are using the regular tick. The back-tick on my keyboard is in the top-left corner of the main set of keys (left to the 1 key).
do:
di `rsqv'
and not:
di 'rsgv'

Long said...

Thank you very much

Long said...

Hi Stataman,

Could you please guide me how to export the results from Stata into Excel file format.

For example, after I run the "summarize" command and I want to save the result into a Excel file. Please help !!!!!!!!!!!!!!!!!!

stataman said...

Hi,

You may find additional automation tips and tricks in my personal webpage, for a short course that I co-taught. Look for the third lecture notes file of Stata:

http://www.stanford.edu/~roymill/cgi-bin/computing/material.php

Long said...

Thank you very much!

rec.bicycle said...

Thank you, very helpful!

sheetal said...

hi,

blog very useful for someone like me (scared of stata and just want to get done with it soon...)

q: i need to create an indicator variable such that there are ten 1s only (out of 19 observations). this is going to be part of a loop, where i'll need stata to come up with different combinations of 1 and 0 distributions... and i'll need it to do other stuff with this variable... but i'll get my doubts clarified one at a time...

thanks
sheetal

Niyati Sharma said...

HELP needed
My data is the exicison of skin cancers from patients over 4 1/2 year period. Many people have multiple events (max is 120) and I want to count number of lesions if the cancers are <=90days apart. If they are then I want to also delete that record. I have started working through the permutations but if anyone can suggest how i can loop this?
doservice = date lesion taken out
numsyn1 = number of skin lesions taken out on that day.
For example if person had 2 events, I would look at difference between 1-2
if person had 3 events, I would first look at 1-3, then 1-2 then 2-3.
if person had 4 events, I would look at 1-4, then 1-3, then 1-2 then 2-4, then 2-3 then 3-4

(starting to get combinations formation occuring)

Please see below.
//1-4 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+3] if (doservice[_n+3] - doservice[1]<=90) drop if event==4 & (doservice - doservice[_n-3]<=90)//for 3 events
//1-3 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+2] if (doservice[_n+2] - doservice[1]<=90) drop if event==3 & (doservice - doservice[_n-2]<=90)//for 2 events
//1-2 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+1] if (doservice[_n+1] - doservice[1]<=90) drop if event==2 & (doservice - doservice[_n-1]<=90)
//SECOND LOOP///for 4 events
//2-4 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+2] if event==2 & (doservice[_n+2] - doservice<=90) drop if event==4 & (doservice - doservice[_n-2]<=90)
//for 3 events
//2-3 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+1] if event==2 & (doservice[_n+1] - doservice<=90) drop if event==3 & (doservice - doservice[_n-1]<=90)

//THIRD LOOP///for 4 events
//3-4 bysort id (event) : replace numsyn1 = numsyn1 + numsyn1[_n+1] if event==3 & (doservice[_n+1] - doservice[1]<=90) drop if event==4 & (doservice - doservice[_n-1]<=90)
id doservice survday failure numsyn1 event repeat newstopdate maxevent
9 20-Feb-08 41 1 1 1 1 30-Jun-08 2
9 1-Apr-08 90 0 1 2 0 30-Jun-08 2
14 1-Jun-04 14 1 1 1 1 30-Jun-08 2
14 15-Jun-04 1476 0 1 2 0 30-Jun-08 2
15 6-Jun-06 268 1 1 1 0 1-Mar-07 3
15 1-Mar-07 257 1 1 2 0 13-Nov-07 3
15 13-Nov-07 230 0 1 3 0 30-Jun-08 3
16 20-Jan-04 497 1 1 1 0 31-May-05 4
16 31-May-05 7 1 1 2 1 16-Jun-05 4
16 7-Jun-05 9 1 1 3 1 30-Jun-08 4
16 16-Jun-05 1110 0 1 4 0 30-Jun-08 4

Regards
Niyati

mahmoud aymo said...

Hey Stataman,

This tutorial has been very useful, thank you so much for sharing.

I have a quesiton, My data set is panel data on daily basis. I would like to run daily regression for each quarter and extract the estimates to have a quarterly time series of estimates (coef, tstat, rsquare and stand error). but my problem is Not all my dependent var or id available in all quarters, some of them are present at the beginning of sample and some only at the end.

when I run my loop to run the regression it gives me the following error message:

no observations found or no variable defined.

how can I solve this problem?

Best,

Mahmoud

South Florida Vending Machines Services said...

This is very uncommon, though, and makes the code less readable.
South Florida Vending Machines Services

Unknown said...

Thank you very much for all of your advises!! They have been really helpful to my work! I just wanted to make one remark in the kindest way possible: I believe that the title of the post is quite sexist. Like, if only 'men' use this kind of commands, which is the kind that use 'women' or 'girls'? Why don't you title it: separating adults' commands from children's! That would be totally gender-neutral! Regards!

aneka obat said...

very good information and Inspiring & Interesting.

success always
http://tokoobatbiusasli.blogspot.com

Tran Phuong said...

Dear Stataman,
Thank you so much for your tutorial.However I tried to apply some of the Stata codes in your blog for my analysis, but it didn't work.

My dataset contains a set of 4 dependent variables which were cost1, cost2, cost3, cost4, and a set of 4 explanatory variables such as country, company, market, province. My aim is to run regression models to find out the significant explanatory variables. Basically I want the full model and the reduced models for each of the costs which were as followings:

Cost 1 <- a + b1country + b2company + b3market + b4province
Cost 1<-a + b1country + b2company + b3market
Cost 1<- a + b1 country + b2company
Cost1<-a+b1country

Cost 2 <- a + b1country + b2company + b3market + b4province
Cost 2<-a + b1country + b2company + b3market
Cost 2<- a + b1 country + b2company
Cost2<-a+b1country

and so on.....to Cost n

Would you please help me with the loops code as it would take ages to run each regression individually?

Thank you so much! Fong

e3cleary said...
This comment has been removed by the author.