The Stata Project-Oriented Guide: Introduction

Introduction

This blog is a free Stata tutorial. I have been using Stata for the last two years now for different applied work in economics and other fields of the social sciences. If you are in your undergraduate or graduate studies or if you are working for some agency that performs social research, you will probably need to use Stata in the context of your project. Stata has an extensive manual which is very accessible, in my opinion, but in order to know how to use it, one needs to already know the commands' names.

However, if you are new to Stata, and you have a project to do, there is a sequence of actions you probably need to do. This tutorial is constructed to follow this sequence: data assembly and construction of additional variables. Then I deliberately skip talking about commands that perform statistical analyses and leave it to your statistics or econometrics courses. But the second part of the tutorial (steps #5-#8) are dedicated to automating those commands and the creation of tables which will report the results. In addition, there are best practices of how to write code that will be easy to follow and change if needed.

I am assuming the reader has basic knowledge of Econometrics (regressions etc.) and I will not get into issues of how to specify an appropriate model. I will concentrate, though, on the practical steps one needs to do before and after the regressions, and how to organize the code so as to minimize mistakes.

The tutorial is divided to steps. You might not need to go through all the steps, so feel free to move on if you see the step is irrelevant for you. You can also navigate through the tutorial with the labels on the left bar. They consist of keywords (like an index) and steps numbers (like a table of contents).

The steps are as follows (keep in mind that the tutorial is still under construction):

Step #1: Getting the Data - if your Data isn't in .dta format - Excel as an example
Step #2: Combine Multiple Datasets into One - for datasets already in .dta format
Step #3: Simple Data Manipulation - generate variables, change values and drop variables or observations
Step #4: Thank God for the egen Command - a very powerful command that extends the possibilities of data manipulation.
Step #5: Keeping commands' calculations - How to tell your program to use the output from reg, sum, etc.
Step #6: Automation - macros, loops, and other sorts of fun

Step #7: Exporting Results to a Spreadsheet - Excel as an example

Step #8: Program Definition - if you start to see the same code in many .do files, maybe you should read this step.

I did not find time to fill in steps #7 and #8 here. For step #7, and other good things, I prepared some slides for a short Stata sequence I gave at the department. You can find them here.

Good luck!

210 comments:

1 – 200 of 210 Newer› Newest»

Katrinayellow said...: hey stata man!
I have a question. I have 4 samples, each of which use different years. So of course I have to repeat my code 4 times cos I'm not smart enough to figure out how to use a list of years or something.

I repeat this code four times, just using different years each time:

use sample1_1999.dta, clear
foreach year in 2000 2001 2002 {
append using sample1_`year'.dta
}
foreach year in 1999 2000 2001 2002 {
gen year`year' = (year==`year')
}
egen count_year = count(year), by(zehut)
unique zehut
tab count_year
drop if count_year > 1
unique zehut
save stack_sample1.dta, replace

I know I need to make some sort of macro list, and then loop through the list, but I don't know quite how. also I have to use the first year each time to start the stacked file, but then I don't append it, so I have to somehow get rid of that first element of the years list, or just add it in twice and then not worry because in any case I get rid of the duplicates later on.

any ideas?; December 10, 2007 at 2:01 PM
stataman said...: Hi Katherine,

If I understand correctly, you need to change the list of years every time.
To do this, define four local macros:

local sample1_years "1999 2000 2001 2002"
local sample2_years "1998 2002 2003"
local sample3_years "..."
local sample4_years "..."

foreach sample in sample1 sample2 sample3 sample4 {
   local first = 1
   foreach year in ``sample'_years' {
      if `first' {
         use `sample'_`year', clear
         local first = 0
      }
      else {
         append using `sample'_`year'
      }
   }

   // The following command creates the year`year' dummies
   xi i.year, prefix("year") noomit

   egen count_year = count(year), by(zehut)
   unique zehut
   tab count_year
   drop if count_year > 1
   unique zehut
   save stack_`sample'.dta, replace
}; December 11, 2007 at 10:25 AM
Anonymous said...: Hi, I am using stata to analyse a discrete choice experiment. I am using the clogit function and I am totally stuck!
I have got my main variable effects by this:
clogit y x11 x12 x21 x31 x32 x41,group (setid)
I now need to get the effects for just the males ect...
I rally do not know what to do.
Please help me!!; April 15, 2008 at 8:09 PM
stataman said...: Hi Keith,

I'm not sure I understand exactly where the males come into the regression, but I am assuming you need to run this regressions for males only? Or do you want to run the regression for all observations and allow just some of the variables to have a different effect for males and females?

If you want to run the regression just for males, add the if option after the command:

clogit y x11 x12 x21 x31 x32 x41 if male==1,group (setid)

If you are looking for the second option, you will need to interact the variables you want to allow the effect to vary between males and females. For example, if you want to see how the effect of x11 changes between males and females (while the rest of the variables have the same effect on both), you should run:

gen male_x11 = male * x11
clogit y x11 male_x11 male x12 x21 x31 x32 x41, group (setid)

The effect of x11 for females will be the coefficient reported for x11, and the effect of x11 for females will be the sum of the x11 coefficient and the male_x11 coefficient.

I hope that answers your question.; April 15, 2008 at 8:43 PM
Galia said...: Hello Stataman :)

I love your Stata Blog, it look like the clearest, i.e. best guide i could find on the net. So firstly a big thank you!

I am having one small problem however.... I am using Stata on a Mac computer. I am trying to load an excel file onto Stata and therefore have been reading your guide for that. The problem is that i cannot input the 'location' of my file to type it into the command window... The only location name my Mac gives me doesn't seem to be working when i type it in exactly as you describe.

I wonder if it would be easier, instead of typing the precise commands into the commands window, you know how to go about doing it in the more tedious way? i.e. through the panel at the top? i.e. 'file', 'open' etc. ????

The deadline for my project is in a few days, i would be extremely grateful if you could reply before then! Fingers crossed,

Galia; January 5, 2009 at 11:16 PM
stataman said...: Thanks Galia!
I have got to say I have no experience with Mac whatsoever. I worked with UNIX, some Linux and mainly Windows, but never on a Mac, so I don't know even how files are saved and referred to in Mac. The best I could find was here: http://www.macworld.com/article/57685/2007/05/copyfilepath.html (maybe this will help you to get the location of the file).

There are other ways to get the data. I'm assuming you wanted the location (or path) of the file for the insheet command, but you can try to use StatTransfer if you have it (the best way that I know of), or you can simply write "edit" in the command line, then open the Excel file, copy the columns you need and paste it into the data editor that Stata opened after you entered "edit". In Windows the paste action is done by clicking Ctrl-V. I understand that the Mac counterpart is Cmd-V.

Does that help in any way?; January 5, 2009 at 11:48 PM
Galia said...: Hello,

Yes, i have not solved the problem, thank you. :D I decided to try pasting the data and it worked very well just as you described. So i will stick to that method since finding the location of the file seems a little complicated!

Looking forward to reading the next of your articles now :); January 6, 2009 at 8:42 PM
Odovakar said...: Hi Stataman,

I am new to Stata (have been using Eviews earlier), and despite of your excellent blog I'm still facing some basic trouble getting my panel data right into the editor. I have a "normal" panel so to speak, with "Country" in the first column, "Year" in the second, and then the variables. When I try to get it into Stata from Excel, either by using Stattransfer or by Copy/Paste, "Country" is all the time being treated as a string variable, no matter what I do, i.e. the panel can't be estimated. "Year" is being treated as 'Int'. What am I doing wrong?; March 12, 2009 at 3:11 PM
stataman said...: Hi Odovakar,

I don't know how your country is coded. If it's coded like "ARG", "USA", etc, then there is no other choice but to treat the original country variable as a string. If there is a number code that StatTransfer fails to code into a numeric Stata variable, then look again at your Excel - there is probably a tiny green triangle in the top-right corner of each cell that says that the numbers are treated as strings. You can click on the green triangle and tell Excel to treat them as numbers.

But, in any case, even if the country is coded as string, and you want to give it a numeric code, you can use the following command in Stata (assuming the original country variable is named "country"):

egen country_code = group(country)

And then you have a code for each distinct string stored under country.

Good luck; March 12, 2009 at 5:06 PM
snowtree said...: hey.. this is a great help. your blog inspired a breakthrough for me, over something that had me blocked for a week. thanks man!; March 19, 2009 at 7:43 PM
stataman said...: Cool! More than happy to help.; March 19, 2009 at 7:57 PM
Odovakar said...: Hi again Stataman,

Thank you very much for your previous help - I immediately got it right! You are the best.

Now, a rude leecher as I am (as well as a Stata newbie...), I have to ask you once again for some help. I have in vain for several days now been reading everywhere on the Internet and in manuals, and have been experimenting with commands such as vif, xtserial, hettest and xtgls, but without luck. All I want to do is to test my panel models for autoregression, multicollinearity and heteroscedasticity. Please tell me simply: How do I do this? Nothing seems to work and I get different kinds of error messages all the time.

I have estimated a few log-log models with fixed effects, and one with random effects as well.

Also: Is there any simple command in Stata with which to estimate an EGLS panel model, such as the settings available in Eviews?

All the best to you!; March 25, 2009 at 10:10 PM
Odovakar said...: Hi again Stataman,

Thank you very much for your previous help - I immediately got it right! You are the best.

Now, a rude leecher as I am (as well as a Stata newbie...), I have to ask you once again for some help. I have in vain for several days now been reading everywhere on the Internet and in manuals, and have been experimenting with commands such as vif, xtserial, hettest and xtgls, but without any luck. All I want to do is to test my panel models for autoregression, multicollinearity and heteroscedasticity. Please tell me simply and generally: How do I do this? Nothing seems to work and I get different kinds of error messages all the time.

I have estimated a few log-log models with fixed effects, and one with random effects as well.

Also: Is there any simple command available in Stata with which to estimate an EGLS panel model, such as the settings that are available in Eviews?

All the best to you!; March 25, 2009 at 10:13 PM
stataman said...: This comment has been removed by the author.; May 10, 2009 at 8:11 PM
stataman said...: Pfew.... Sorry for this huge delay in answer, but first year PhD is tough here.

In any case, it would have been more helpful if you could copy & paste the errors you got.

To tell you the truth, I didn't have the chance to use any of the commands you mentioned, but I experimented with some now and hopefully I can help with some of them.

1. xtserial - you first need to specify how your panel looks like with the tsset command. For example, suppose I have a panel of years and countries. Then I should first run:
tsset country year and then xtserial <variable> where variable is what you want to check autocorrelation in.

A note about tsset. If you have the panel defined by either a string variable or more than one variable (country, province, county, town) then to have one numeric variable with a code for each of the combinations you can run:
egen [varname] = group(<varlist>)where varname should have the name of the variable you want to keep the code in, and varlist is like the "country province county town".

Then you run tsset varname year

2. xtgls In xtgls you just need to specify the group-identifying-variable (country in our case) with the i() option:

xtgls agri_gdp rainfall, i(country)

3. EGLS? I never heard of EGLS. I know FGLS and you have several ways to implement it. I guess xtgls implements one of them.

Again, sorry for the long delay.; May 10, 2009 at 8:13 PM
marie said...: Hi!
I've tried to use xtserial to test for serialcorrelation in a paneldataset, but I get an error: r(2000) no observations. All the variables are either float or byte. Can you help me?; June 7, 2009 at 3:04 PM
stataman said...: hmmm... pretty hard for me to tell. If you're working on a panel data, make sure you do the tsset before the command with both the repeated-observation identifying variable and the time variable.

This is what I did to see if the command works:

set mem 20m
set obs 20000
gen year = 2000 + mod(_n, 3)
gen x = mod(_n,6667)
drawnorm u
tsset x year
xtserial u

Then I got:

Wooldridge test for autocorrelation in panel data
H0: no first order autocorrelation
F( 1, 6665) = 0.108
Prob > F = 0.7427

(as expected, as I drew u regardless of years)

Try to copy the list of commands above and see how the structure of the dataset is different than yours.
In general, the "no observations" error comes up in two cases: (1) there are actually no observations in the data (2) there are observations, but they have missing values (the annoying "."). Maybe one of the years has missing values throughout, even though other years are ok. Make sure all years have non-missing values (at least some - probably more than one, but no need to have them all full).; June 7, 2009 at 6:47 PM
Marleen @ DC said...: hi Stataman and Marie,

I was going to write that I had the same problem of "no observations" after the xtserial command.. However I solved it with the advice that there should be no missing data..

Allthough I have no missing data in my years, I have gaps (1981, 1984, 1987...). Just by creating another timevariable (1,2,3...) the problem solved =)

So thanx a lot Stataman and good luck with the serial correlation tests.; June 17, 2009 at 12:07 PM
ellewelle said...: Hey,

I am trying to do a factor analysis with Stata but I keep on receiving a r(2000)- no observation- error message. Does anybody have an idea what goes wrong here?

Elena; July 2, 2009 at 6:11 PM
stataman said...: I haven't done much factor analysis in the past. My first two guesses would be to either check that you have nonmissing values for the varlist you give or check that your conditions (if you have any) match at least some observations (and that the value of the variables for those observations specifically is nonmissing). To check that your condition is ok, you can do:

sum varlist if <condition>

If you get 0 in the number of observations, you should check your condition again.

Any other suggestions are, as always, welcome.; July 2, 2009 at 6:34 PM
mac said...: This comment has been removed by the author.; July 7, 2009 at 6:16 PM
stataman said...: Delete your comment? I never deleted any comment... I don't know, perhaps a blogspot bug.

Maybe you should write it again?; July 7, 2009 at 6:18 PM
mac said...: Elana: Make sure your variables for are defined as numeric. If they are defined as 'string,' one option is to -destring-

Everyone: Stata has a discussion board that is another resource for technical questions about Stata.; July 7, 2009 at 6:25 PM
ken1088 said...: This comment has been removed by the author.; August 2, 2009 at 1:13 PM
ken1088 said...: hello!
i have a question. i used the xtserial command to determine autocorrelation on my panel data but i'm getting the no observations error. how can i correct this? there were no missing values in my data and i did not put any conditions. i have also tried destringing my data.; August 2, 2009 at 1:14 PM
mac said...: Ken1088:
Did you define your data as time series?
-tsset-; August 4, 2009 at 5:20 AM
Monalisa said...: I have been using STATA to develop a logit model. The variables in my model are all binomial.Could you please suggest how I show the regression fitness graphically? Is it at all possible?; August 6, 2009 at 6:54 PM
stataman said...: Well, you can graph something, but it will not give you extra information. Remember, the binary dependent variables models, such as probit or logit, will give you an estimate of the probability to get 1 in the dependent variable given your independent variables. If your independent variable is also binary then you can plot a graph that connects between the dots: (0,Pr[y=1|x=0]) and (1,Pr[y=1|x=1]). As you can see, in this case you can simply report those probabilities in your text or table.

I hope that answers the question.; August 6, 2009 at 7:09 PM
stataman said...: ken1088, sorry but without seeing the data I can only guess. I also learned that there is an illegal copy of Stata10 going around that does funky stuff - loses variables, drops observations, I don't know.

I'm not saying you are using an illegal copy, but in case you do, make sure you get a legal one.; August 6, 2009 at 7:13 PM
Awesome0 said...: Stataman, have a question for you. I am trying to create a local varlist that contains all of my variables
such as
local varlist1 v1 v2 v3 v4

and then at times be able to remove a variable or two for instance

local varlist2 v2 v4

The code I have been trying to use is

local varlist1 v1 v2 v3 v4
local varlist2 v2 v3
local varlist: list varlist1 - varlist2

but it doesn't some seem to work. I can't find any examples despite a couple hours of googling.; November 2, 2009 at 10:40 PM
stataman said...: Hi Benjamin,

To make things generalized I would need to know more of which variables are you trying to remove. Is there any criterion for them? Maybe construct two lists - one for each subset of variables - and combine them when you need all.

In short, it depends on what you're trying to do.

In general, if you need to put a varlist into a local (perhaps there's a direct command for it, but if there isn't:) you can fill it with a loop:

local myVarlist ""
foreach var of varlist v1-v3 v4-v8 {
local myVarlist "`myVarlist' `var'"
}

will do the job.; November 2, 2009 at 11:42 PM
Victoria said...: Hi, I just discovered your blog! I'm trying to run a fixed effects model on cross sectional data (not panel data), but every time I run the model, I get this error:

. xtreg ratio_arriv gdp_origin hdi_origin gini_origin empire_o perc_mfg_o cul_sit
> e_o travel_o embassies region_d region_o pop_mill_o gdp_dest hdi_dest gini_dest
> empire_d perc_mfg_d cul_site_d travel_d pop_mill_d empire_same region_same, fe
must specify panelvar; use xtset
r(459);

Why is this and how can I "fix" it?

Thanks!; March 9, 2010 at 4:18 PM
Lim said...: Hey Stata man,

I have a panel data with "ethnicity" coded for only one year of the individual id but missing for the other 3 years. How do I fill in the missing values with the observed value for ethnicity?; March 10, 2010 at 5:43 AM
Unknown said...: Hi Stata man!!!!
I am working on a document for the central bank of Colombia. I am trying to estimate the following model:

. xi: xtpcse endcp1 vario crec tam end1 z pi diftasa pib z2 i. nit i. ao, correlation(ar1)

or

. xi: xtgls endcp1 vario crec tam end1 z pi diftasa pib z2 i. nit i. ao, panels (correlated) corr(ar1)

but I get this error:
varlist required
r(100);

could you help me please?

thanks

Luisa; April 3, 2010 at 7:11 AM
stataman said...: Victoria, to run fixed effects you must specify which variable has the group code for the groups you want to run fixed effects for. This can be done in two ways. Suppose is a country code (it should be numeric. if you have a string code like a three-letter code, run "egen country_num = group(country)").
After you have the country_num as the variable that contains each country's distinct code, you can run fixed effects in two ways:

1. Directly, with an i(.) argument:

xtreg dep_var indep_vars, fe i(country_num)

2. Indirectly, by specifying the panel's structure.

xtset country_num

Or if there's a time variable too, like year:

xtset country_num year

see "help xtset" for more details on the second approach.

-=-=-=-=-=-=

Lim, I think I have something in the egen chapter about how to populate a value. It's pretty simple. Suppose the individual id is stored in a variable named indiv_id:

egen ethnicity_temp = max(ethnicity), by(indiv_id)

replace ethnicity = ethnicity_temp

drop ethnicity_temp

-=-=-=-=-=-=-=-=-=-

Luisa,

Try to avoid using the space bar after the "i." Stata reads commands in words, so if you do "i. variable" it reads the i. separately and expects a variable name to come right after the dot.

Try running:

xi: xtpcse endcp1 vario crec tam end1 z pi diftasa pib z2 i.nit i.ao, correlation(ar1)

-=-=-=-=-=-; April 3, 2010 at 7:05 PM
stataman said...: Victoria, to run fixed effects you must specify which variable has the group code for the groups you want to run fixed effects for. This can be done in two ways. Suppose is a country code (it should be numeric. if you have a string code like a three-letter code, run "egen country_num = group(country)").
After you have the country_num as the variable that contains each country's distinct code, you can run fixed effects in two ways:

1. Directly, with an i(.) argument:

xtreg dep_var indep_vars, fe i(country_num)

2. Indirectly, by specifying the panel's structure.

xtset country_num

Or if there's a time variable too, like year:

xtset country_num year

see "help xtset" for more details on the second approach.

-=-=-=-=-=-=

Lim, I think I have something in the egen chapter about how to populate a value. It's pretty simple. Suppose the individual id is stored in a variable named indiv_id:

egen ethnicity_temp = max(ethnicity), by(indiv_id)

replace ethnicity = ethnicity_temp

drop ethnicity_temp

-=-=-=-=-=-=-=-=-=-

Luisa,

Try to avoid using the space bar after the "i." Stata reads commands in words, so if you do "i. variable" it reads the i. separately and expects a variable name to come right after the dot.

Try running:

xi: xtpcse endcp1 vario crec tam end1 z pi diftasa pib z2 i.nit i.ao, correlation(ar1)

-=-=-=-=-=-; April 3, 2010 at 7:05 PM
Unknown said...: Thanks for answer!!

I tried to estimate the model again with your recommendation but I get another error:

. xi: xtpcse endcp1 vario crec tam end1 z pi diftasa pib z2 i.nit i.ao, correlation(ar1)
no room to add more variables
An attempt was made to add a variable that would have resulted in more than 5000 or 4999 variables (Stata reserves
one variable for its own use). You have the following alternatives:

1. Drop some variables; see help drop.

2. If you are using Stata/SE, increase maxvar; see help maxvar.
r(900);

I drop the variables that I am not using and I set the maximum variables possible (5000)

What else can I do?

Thanks!!; April 3, 2010 at 9:06 PM
stataman said...: Well, it looks like you have too many categories in your fixed effects (total possible values in the nit and ao variables is too high). This poses a computational problem for Stata.

As for a solution, I am not familiar with the command you are using, neither am I familiar with the dataset and the econometric model.

If your data is truly a panel, maybe nit or ao are the variables that define the group? Try to use xtset to define the dataset as a panel and get rid of the explicit use of the i.var coefficients.

Sorry I can't help more.; April 3, 2010 at 9:15 PM
Eric said...: Hi Stataman,
I am working on panel data with missing observations. When I declare the dataset as panel data using xtset, the data is decribed as strongly balanced which should not be the case. Secondly I'm trying to fill in the missing data by using tsfill, full command but nothing really is happening the missing data is not filled.

Please suggest!

Thanks!
Eric; April 21, 2010 at 1:34 PM
stataman said...: Sorry, Eric. I can't really tell why it says it's balanced without seeing the data. My guess is that your missing values are not in the variables that contain the group and individual code. But then again, that's just a guess.

Never used tsfill before. Sorry.; April 21, 2010 at 7:20 PM
Eric said...: Hi Stataman,

Thanks! You are right and I realised that I dont require to use tsfill.

Cheers!
Eric; April 22, 2010 at 4:22 AM
Eva said...: Hey!
So, you'll probably think this is a ridiculous question, but I haven't been able to input data. I downloaded .dat and .do files from ipums then ran the do file in stata. The file ran and the variables appeared, but when I went to data editor, the variables were listed, but no values. I also tried to use tabulate but it said no observations. Help please! Thanks!!; May 2, 2010 at 8:40 AM
Unknown said...: Hey StataMan

I must say thank you for the Guide.

But i have this task at hand - i have two datasets (panel Data) where 2009 data set is a follow up survey. So what i want to do is only select the HouseHolds (HH) in 2007 data set that are also in 2009 dataset. The unique variable is HH id. Which is the best way around this assignment?; July 6, 2010 at 9:50 AM
stataman said...: Hi Kristine,

Sorry for the delay. I had some personal stuff to attend to.

What you can do is use the following command:

gen in2009 = year==2009
egen hh_in2009 = max(in2009), by(hh)
keep if year == 2007 & hh_in2009

The first command will create a dummy variable with 1's for the observations for which the year variable contains 2009. The second command will take each household and give it the maximum of the in2009 variable, which is 1 if this household has an observation in 2009 or 0 if there is no observation in 2009. Finally, the last command will keep the 2007 observations of households that also appear in 2009.

I hope this helps.; July 14, 2010 at 6:06 AM
Victoria said...: Hi. I was wondering if I have unstandardized data (on tourism), how can I normalize it by population?

I want to make a graph that is

number of dyads (cumulative) by tourist arrivals so it reflects a power log

Thanks!; July 14, 2010 at 8:51 PM
stataman said...: Sorry, Victoria, but I don't think I can help without knowing what does the data look like. It sounds like your question is less about Stata and more about the analysis at hand. I'm sorry, but I don't think I can help you with that.

Good luck!; July 15, 2010 at 1:28 AM
Victoria said...: Thanks Stata Man...perhaps framing it like this will help:

how can I make a graph where the number of observations (e.g. dyads in my case) is the x? (with y being the percent of tourist flows)

I cannot just use the option "scatter percent" since that's too few variables and because I want the x to be number of dyads (observations)....

What I want is a graph where x is 1, 2, 3, etc (or 5, 10, 15 - something to that effect); July 20, 2010 at 10:28 PM
stataman said...: It's a bit hard without knowing how the original dataset looks like. You say that every dyad appears several times (hence "number of observations" on the x-axis). If this is the case, you probably want to collapse the dataset so that every dyad will be one observation and you will create a new variable with the number of observations this dyad had in the original dataset. In addition you get some y variable (tourist flow percentage). I'm not sure how this y variable appears in the original dataset. Suppose you have it for each observation in the original dataset and you want to present the mean y (mean tourist flow percentage)... you can select other statistics than mean.

So if this is how your original dataset looks like (every dyad appears several times, with some y-variable for each observation). You can do:

/* The following command will collapse the dataset from multiple-observations per dyad to one-observation per dyad, where we mean the tourist_flow variable and count the number of observations in each dyad and put it in the variable obs_num. */

collapse tourist_flows (count) obs_num=dyad_id, by(dyad_id)

/* The following line will draw the plot */
twoway (scatter tourist_flows obs_num)

I hope this is what you wanted to do. If not, I hope this gives you a hint at what you're aiming to.

Sorry I can't help more,
Roy; July 20, 2010 at 11:06 PM
Victoria said...: Hi,

Each dyad appears only once (unique destination/origin combo) - but your advice did give me good ideas on how to move forward.

Thanks!
Victoria; July 20, 2010 at 11:17 PM
Unknown said...: Hey stataman!

I have a noob question concerning the estimation of a random effects model (xtreg re). I hope you can help me: I have run the random effects model and accordingly a fixed effects model. Afterwords I performed a hausman test which was insignificant (confirming that my data allows to be estimated with random effects) I also confirmed this with the Breusch-Pagan Lagrange multiplier test for random effects. I am now testing the assumptions for my model, but I am unsure about all the assumptions that should be tested. For now I have tested my models for collinearity(collin), heteroscedasticity(lrtest) and autocorrelation(xtserial). I am however unsure whether I have to test for other assumptions? Maybe you have a clue. I have some textbooks(wooldridge & Baltagli) concerning random effects modelling but since I am a econometrics-illiterate, I do not understand everything. thnx in advance!
Kind regards
David.; July 29, 2010 at 5:20 PM
stataman said...: Hi David,

Hmmm... I'm not sure what to say. It's very hard to run tests without knowing what are your maintained assumptions. Moreover "testing assumptions" is something that you can't do without maintaining others. For example, testing for endogeneity of some variables requires you to assume that the instrumental variables you use are exogenous (which you can't test, but... well... assume). In other words, if it's something that you can check, it's not really an assumption. It is more of a result.

This is an econometrics question and not a Stata question. My take is that you can't really do econometrics without having some economic/behavioral model (even if simple or implicit) behind. This model, and your understanding of the data, should give you ideas about what assumptions are required for you to test your hypotheses using the data at hand.

It might be just my take on it (or my econometrics professors') and others will think that you can test hypotheses (or assumptions if you insist) without thinking about the mechanism or a model.

I'm sorry I can't help you more on this.

Roy; July 29, 2010 at 8:03 PM
kim said...: This comment has been removed by the author.; August 1, 2010 at 7:54 AM
kim said...: This comment has been removed by the author.; August 2, 2010 at 2:43 AM
kim said...: Hi stataman!
How do I know if the runs test has a autocorrelation? my runs test looks like this

. runtest residual
N(residual <= -.006206203950569) = 158
N(residual > -.006206203950569) = 158
obs = 316
N(runs) = 79
z = -9.02
Prob>|z| = 0; August 2, 2010 at 2:51 PM
stataman said...: hi Kim,

Sorry, I have never used this command.

It looks like this page addresses this issue http://www.stata.com/support/faqs/stat/panel.html

sorry I can't help more.
Roy; August 2, 2010 at 8:48 PM
stataman said...: This comment has been removed by the author.; August 2, 2010 at 8:48 PM
kathy said...: Hello Stataman,
I am running some data and applying kmeans cluster. I got 3 different groups. My furter step is to apply a fuzzy c-means cluster and get the degree of membership of each observation to each fuzzy cluster, but I do know how to get it. For your reference, I applied the following sentece:

cluster kmeans stu_pdifte pub_pdifte patappl_pdifte, k(3) measure(L2) name(prueba) start(prandom)

And for the fuzzy option, I read something like that in the Stata help:

cluster set myclus, addname type(fuzzy) method(kmeans) dissimilarity(L1) var(group g2l1)

But nothing happens. Please, could you help me?
Thanks in advance.; August 7, 2010 at 1:31 PM
Gaëlle said...: Hey stata man!

First I used xtreg and results was ok.
Second I'm tring to use xtabond but stata said:
no observations
r(2000).
:-(
I can't understand why xtreg run and xtabond no...

Please help me.

Gaelle; August 11, 2010 at 5:50 PM
Gaëlle said...: Hey stata man!

First I used xtreg and results was ok.
Second I'm tring to use xtabond but stata said:
no observations
r(2000).
:-(
I can't understand why xtreg run and xtabond no...

Please help me.

Gaelle; August 11, 2010 at 5:50 PM
daticon said...: This comment has been removed by the author.; August 23, 2010 at 3:02 PM
daticon said...: Hi stataman...

Thanks for your BLOG. Good reading.

I've been having trouble with a command and was hoping you might be able to help.

Quite simply, all I want to be able to is rename var1, var2, var3,... var_n to the corresponding data within the first observation.

Thus, var1's first obs might be "user_id", var2's is "lastname", var3's is "firstname".

The closest I have gotten to accomplishing this is:

foreach v of varlist var* {
di “`v’”
di `v’
}

This command will correctly display:
var1
user_id
var2
lastname
var3
firstname

However, to get it to rename var1 to user_id... if I add in the following line to the foreach statment:

rename `v’ `v’

..it says 'var1 already defined'

If I change it to
rename "`v’" `v’

...it says ' "var1 invalid name'

Any suggestions?; August 23, 2010 at 3:05 PM
stataman said...: Good question, daticon, and the answer lies in the nature of macros (locals or globals).

When you invoke a macro in a command (say, " `mymacro'"), what happens is that Stata replaces all macros with their content BEFORE running the command. After all macros have been substituted by their content, then the command will run as if there weren't any macros used at all.

So, for example in the first iteration of the loop, when you run:
> di "`v'"

Stata will replace `v' with var1 and then will run:
> di "var1"

which will display the string var1.

When you run
> di `v'

It will do the same replacement:
> di var1

but now that we don't have quotes, Stata doesn't treat var1 as a string that it should not worry about its meaning but as a word which is part of the command. In this case a variable's name. Commands that take one value (like di, for example) instead of whole variables (like corr, reg, su, and so on), will take the first row of the variable (unless you add the row number in brackets after the variable's name). This is why you're getting the value of var1 which you set to be the name.
Finally, when you run:
> rename `v' `v'

Stata will translate this into
> rename var1 var1

which is not what you had in mind.

Generally speaking, putting the variable's name as the observation is not a very good strategy. It's usually an outcome of a flawed data import (there's a way to tell insheet to treat the first row as variable names, look at the help file). Otherwise, all the variables become strings and you probably don't want that.

If the problem wasn't a problematic data import, but rather your own idea (to put the variables names in the first row), a better approach would have been to make a long local with all the names ordered, and then loop like this:

local varnames "user_id lastname firstname ..."
local totvars : word count `varnames'

forvalues varnum = 1/`n' {
local renameto : word `varnum' of `varnames'
rename var`varnum' `renameto'
}

(I haven't checked that code so it might have a bug in it, but that's the spirit, anyways)

Good luck; August 23, 2010 at 6:39 PM
daticon said...: Thanks for the reply stataman. Yes, the problem is that I have to 'import' data from web-based tables that all have different column headings. The issue is compounded by the fact that the leading 0 or 0s get dropped on import, but which are vital to maintain, column headings are not STATA friendly, etc. etc.

I have used a similar technique to your suggestion that basically involves importing the data (by cutting and pasting into the data editor) into STATA 11 two times. On the first run I treat the first row of data as an observation, so that the leading 0's don't drop. Clear. On the second run I treat the first row as headings. Then it is matter of having to click each heading to get the list of variables (like in your example).

Of course, this takes a lot of time! And is tedious. The only way around is to find a programmatic way. Unfortunately, neither of our efforts yets solves it so that I only have to import the data once...treating the first as data...and them simply renaming var1-var(n) the first row.

Would be so easy to accomplish in Java, C, etc. that it is quite frustrating how impossible it seems in STATA.

If you (or any of your readers) can think of an answer... please post back! Thanks again, daticon.; August 23, 2010 at 8:08 PM
daticon said...: ...should have said with the first method..the STATA command is then quite simply (for example)

renvars var1-var100 / user_id lastname firstname ... etc.; August 23, 2010 at 8:33 PM
stataman said...: if that's the problem you can take the first line (or any line) of the data (in the format from which you want to import from), and add some character (non-numeric character) to the value of the variable you want to be imported as a string.
Save your file, import it to Stata, and fix the value of the first line that you changed (take the character away).

Alternatively, if your leading zeros create a number of equal length across the dataset (say 9 digits), you can run:

gen stringvar = string(var, "%09.0f")

(right after you import)

I hope this helps.; August 23, 2010 at 10:53 PM
daticon said...: I wish it would...but that would take longer than the 'paste into STATA 2 times' option. I may look at ways I could fix files using Java first, and then import into STATA...but that will have to be some weekend fun.

Unfortunately, the only 'fix' that will save me time is the ability to paste once into STATA and then rename all the column headers (var1 - var(n)) to the contents of the first row.

I do appreciate all the suggestions, but I really do need an automatic / programmatic answer to this.; August 24, 2010 at 9:30 AM
stataman said...: if you insist, you can do what you suggested, but instead of running the

rename 'v' 'v'

first, do this
local newname = 'v'[1]

And then
rename 'v' 'newname'

Sorry for the lack of backticks on my current keyboard.; August 24, 2010 at 10:14 AM
daticon said...: Hooray! Many thanks stataman for your help! That will save me loads of time.

The final code was:

foreach v of varlist var* {
local newname = strtoname(`v'[1])
rename `v' `newname'
}

The strtoname was needed to convert non-stata friendly names to stata friendly ones.; August 24, 2010 at 11:54 AM
Paul de Boer said...: Hi Stata man,

I'm trying to conduct a negative binomial regression. With the emphasis on trying!

I get stuck before even running it...

When I enter the syntax (or via the tabs above)I get:

number of dependent variables for equation 1 must be greater than zero. This, while I most certainly did select a dependent variable!

I cannot find out what that means (beyond what it litteraly says) and how to solve it.

I can conduct, for example, a linear regression, create graphs so i dont think that its a data-import problem.

And yes, Im new to stata (or negative binomial regression for that matter:)

I hope you can help me

Sincerely yours,

Paul de Boer; August 27, 2010 at 8:35 PM
mac said...: Hi, stata man,
I have a question that I can't solve. I've got panel data of company stocks and their month-end prices. For instance, for firm ABC, its stock prices are Jan 31, 20x1 $10.00, Feb 28, 20x1 $11.20, and so on. The data set is in a vertical form.

For each year, I'm trying to calculate each stock's average monthly return based on the change in stock price from Dec 31, 20x1 and Dec 31, 20x2. That is [Pr(month 13)/ Pr(month 1)] - 1 / 12.

I can't figure out how to do this in Stata but I'm guessing its in relative observations values.

I appreciate your help. Thanks!; August 29, 2010 at 1:01 AM
Unknown said...: Hi Stataman,

First of all, thanks a lot for this very useful blog!
I have some troubles performing a unit root test for an unbalanced panel with xtunitroot and would appreciate your help. My original variable is numeric and log-transformed ("logsalesperemp"). I tried to run a unit root test (fisher pperron) using the following command but got the error "r(2000) no observations". Note that I tsset the data before running the program and also tried with ips test but no panel was used and no result returned....
Have you already encountered this error with xtunitroot? Any idea where the problem comes from? Thank you

. xtunitroot fisher logsalesperemp, pperron lags(2); September 28, 2010 at 9:58 PM
Woody said...: hey stata man!

Is there a command in Stata that solves heteroskedasticity and autocorrelation problems for binary dependent variable models in panel data?

Thank you for reading this...; October 16, 2010 at 9:56 PM
mac said...: Woody:
Tell us more about what sort of model are you using. It's not obvious to me since your dependent variable is binary.; October 16, 2010 at 10:05 PM
Woody said...: I am tring to model the determinants of implementation of clean tech based on several firm caracteristics. As such, my dependend variable is structured as follows: 1 indicates the presence of clean tech in firm i, and 0 indicates no clean tech is present in the firm.; October 16, 2010 at 11:16 PM
mac said...: Woody:
Are you trying to use OLS regression or logistic regression? Logit is for categorical dependent variables like you have. Don't think you can properly use OLS regression with a binary dependent variable.; October 17, 2010 at 12:08 AM
Woody said...: Indeed, it has to be a logistic, ´cause xtgls would be solution for an continous dependent variable, that´s why I´m currently at a dead point...; October 17, 2010 at 1:14 AM
mac said...: Woody:
Now that I know that you are using a model with the usual linear assumptions, back to your original question. A stats program can't automatically fix problems of heteroskedasticity and autocorrelation. Solutions come from the human.

Heteroskedasticity can mean that you have a missing variable in your model or your data set comes from different populations. In the first case, you add the missing explanatory variable to your model if you can identify it. In the second case, you can split your data into two (or more) data sets and run your model separately on each set.

Regarding the problem of autocorrelation, it seems to me that you have cross-sectional data rather then time-series so I'm not clear on why you have an autocorrelation problem in your logit model.; October 17, 2010 at 2:21 AM
Long said...: Hi Stataman,

I use the panel data and I try to use the command

XTABOND

but it turns out the message that "no observations". Please advise me what should I do.

Thanks a lot!; November 30, 2010 at 9:16 AM
Peter Pan said...: Hi, nice guide.

Im just looking for one solution, I have a huge data set with many N/A, N/R and some string saying "data not found for year 20xx for company with code "asdf".

Ofc i coudl replace that in excel with a rather complicated if command, but since it is so easy to replace the N/A and N/R in Excel with a. and b. for missing values, i would just like to replace any string left in my data with a c. directly in stata.

so to sum up is it possible to use i.e. the replace command to just replace any string in any variable?; November 30, 2010 at 6:03 PM
stataman said...: Hi Max,

destring <varlist>, replace force

should work.

That is, if your variable name is, say, "amount", then:

destring amount, replace force; November 30, 2010 at 7:01 PM
Peter Pan said...: Thx it does work, could i tell him to replace the missings as .c directly?

But ok i can rename them later easily.; December 1, 2010 at 3:37 PM
Unknown said...: Hi, stataman,

I'm studying economics in university and I start to use STATA in
recent days.

I have a question.
I get a "no observations" error message -r(2000) when I gave the
following command (and had the shown output):
probit bio09 emagri08 exrate08 upov08

but
When I gave the following command (and had the shown output):
probit bio09 emagri08 upov08,
I can get result.

supposedly,
the data "exrate08" has a problem.
Do you have some solutions?
This data's type is "str2" and format is "%92."

"bio09" is dummy variable whether a country cultivate GM crops.
"emagri08" is employment in agriculture by country.
"exrate08" is exchange rate by country.
"upov08" is dummy variable whether a country affiliates UPOV.; December 4, 2010 at 2:44 PM
mac said...: Kyoko:
Stata is reading exrate08 as a string variable meaning that it is reading the field as non-numeric. Functions like -probit- need numeric data to run. That's why you get the "no observations" error. Stata doesn't see any numeric data in that field.

As a first step to solve this problem, I suggest that you look at the data in this field to see what characters or entries are being interpreted as non-numeric.
-list exrate08-

The solution depends on what's causing Stata to read the data as string data. You could have alpha characters in this field. This link may help:

http://www.stata.com/support/faqs/data/allstring.html

Also, Stataman gave someone else a solution to a similar problem on November 30, 2010 so you might want to read that discussion. But I'd suggest that you understand what's causing Stata to read the field as string data before using -destring-. -destring- alters your data set.; December 4, 2010 at 8:14 PM
Unknown said...: Michael,

Thank you very much for your answer.
I'll try at the university on next Monday.; December 5, 2010 at 8:34 AM
Unknown said...: Michael,

There are data-brank enterd ".."
maybe it is read as strings.
so I can use destring command.
and I can get result.

Thank you for your kindness.; December 6, 2010 at 3:38 AM
Peter Pan said...: Is there a way to tell stata to use a minimum group size in xtreg?; December 7, 2010 at 7:05 PM
stataman said...: I don't think there's a direct option, but how about doing:

local min_group_size = 10
egen group_size = count(group_id), by(group_id)
xtreg y x1 x2 if group_size >= `min_group_size', fe i(group_id); December 7, 2010 at 10:32 PM
Peter Pan said...: Yeah i though about something like that, but my problem is that i have many ., .a, .b, .c. So out of my 24000 and something observations i can only use around 3000. Is it possible to change that command so it considers that variables y,x, and z are not a missing value?; December 8, 2010 at 11:33 AM
stataman said...: Sure.

First do:

mark inSample
markout inSample x y z

Then condition the egen with "if sample"; December 8, 2010 at 1:33 PM
Peter Pan said...: thx that helped; December 8, 2010 at 5:19 PM
Peter Pan said...: Hey me again :) i hope u dont mind.

I have a problem generating growth rates of some of my variables. I am worried that stat does not respect the groups because my results so far dont seem to be correct.

I though i could just take d.sales/l.sales for the growth rate.
or (sales/l.sales)-1. When i look at the results it seems that he considers the groups but the summary shows me a mean and a std which is not possible for a growth rate.

Is there any other way doing this correctly?; December 9, 2010 at 5:16 PM
mary said...: hi there

thanks for you useful blog
i want to know why when i do the sum command its showing 0 observations in all my variables yet i have entered the data correctly its there. is it because i have entered them as string variables? help out please.; December 17, 2010 at 2:46 PM
Unknown said...: SARA
HI
I have a problem,in my panel model I have many observation for every variable in a year ,how do I edit my observation in data editor and run this panel?; January 4, 2011 at 8:05 PM
Victoria said...: I have a question: how do i create a frequency line graph by groups where the x axis is the year (its longitudinal), the y axis is frequency, and the body of the graph are 5 different lines, each representing a group.

I want to trace how the frequency of something changes by year and by group. Does this make sense?; January 5, 2011 at 7:17 PM
mac said...: Hi, Victoria:
I'm not clear on what you are doing. Are you wanting a frequency distribution or a count of records?; January 5, 2011 at 7:44 PM
Victoria said...: I'm essentially trying to replicate the "Number of World Heritage properties inscribed each year by region"graph on this website: http://whc.unesco.org/en/list/stat#s12

I want this:

hist(date_inscr~d), by(region) freq

But instead of bars, I'd like just lines, and I want all the regions to be in one graph. Does this make sense?; January 5, 2011 at 7:56 PM
mac said...: Victoria:
The graph on the website looks like count information rather than a frequency distribution. Histograms are graphics showing distributions. You'll probably need to create count information in your dataset and then graph that information. I believe Stataman has a tutorial on creating count information on this website.; January 5, 2011 at 8:17 PM
mac said...: One more thing, if you have the data on the website already in your dataset, try -twoway connected y1 y2 y3 year- to create the graph.; January 5, 2011 at 8:23 PM
Victoria said...: Thanks. I searched the blog and didn't find the count data tutorial that you mentioned...do you have a link (sorry); January 5, 2011 at 8:30 PM
Victoria said...: So I just created a variable "counts" with a 1 in each cell - I assume that's the same thing?

I can do:
scatter counts region, by(date_inscr~d)

where date_inscr~d is the year, but then that gives me different graphs, one for each year, and it won't let me do "by(region)" because its a factor variable.; January 5, 2011 at 8:36 PM
Victoria said...: Sorry for so many comments! I basically want to graph the outcome of :

table region date_inscr~d

Where the frequencies in the table are the number of sites in each region by year (correct?)

eg
| date_inscribed
name_en | 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

Africa | 4 3 7 5 3 2 3 1 2 2 3 2
Arab States | 9 4 2 9 4 6 3 2 3 1
Asia and the Pacific | 5 3 5 5 4 3 5 5 11 5 1
Europe and North America | 7 25 10 11 3 18 10 14 17 17 12 3
Latin America and the Caribbean | 2 2 3 3 4 5 2 4 2 9 4

|; January 5, 2011 at 8:40 PM
mac said...: What does your data set look like? The data under the graph? If it does, try:
-sort year-
-twoway connected EU_NA AS_PAC LA_CAR ARAB AFR year-; January 5, 2011 at 8:41 PM
mac said...: Victoria:
This should do it:

clear*
input year EU_NA AS_PAC LA_CAR AR AFR
1978 7 0 2 0 4
1979 25 5 2 9 3
1980 10 3 3 4 7
1981 11 5 3 2 5
1982 3 5 4 9 3
end
sort year
twoway connected EU_NA AS_PAC LA_CAR AR AFR year; January 5, 2011 at 8:50 PM
Victoria said...: Hmm...when I try that

twoway connected Africa Arab Asia_Pac E_NA LA_C year

Where: e.g.
gen Arab = 1 if name_en == "Arab States"

(and name_en lists each region)

I just get a single (the Asia Pac color) straight dot line, and the y axis goes from -1 to 1; January 5, 2011 at 8:50 PM
Victoria said...: Thanks - that does work. So each time, I essentially have to create a new data set? Or create the graphs in excel? I can't take my larger data set and create it?

Thanks again for the help; January 5, 2011 at 8:55 PM
mac said...: I'm afraid that I don't know what your data set looks like. If it looks like the data under the online graph, then, yes, it is easily done with -twoway connected-. I essentially recreated that dataset form under the graph with -input-.; January 5, 2011 at 9:13 PM
Unknown said...: I have a question.when I run panel model in stata I Deal with this messag ,"repeated time values within panel".what do I do for producing a composite identifier replace "xtset" in panel data?; January 5, 2011 at 9:39 PM
Samuel Chase said...: Quick question, is it possible to form a frequency distribution for a string variable? Thanks; January 26, 2011 at 4:23 AM
stataman said...: Did you try to tabulate it?

tab country3letcode; January 26, 2011 at 5:55 AM
LLA said...: Hi stata man!
May I ask a question that has been evading me for days, but I can't believe is that hard!

I have a data set with 3 variables:
1) "number weeks treatment": in weeks so ordinal or categorical rather than continuous I assume
codebook agrees it is numerical
2) "strain bacteria" either A or B
3) "Time to detect bacteria" in days: a continuous variable

I would like to test whether the strain of bacteria (subgroup A or subgroup B) affects the time to detect baceria (y axis) over number of weeks treatment (x axis)

I think this must be a multiple regression so have created 2 dummy variables, one that contains all the time to detect bacteria if strain=A and one all the time to detect bacteria if strain=B I used:
gen strain_A=time_to_detect_bac
replace strain_A=. if strain!=A
gen strain_B=time_to_detect_bac
replace strain_B=. if strain!=B

when I type
regress strain_A strain_B num_weeks_rx
it consistently replies "no observations"
can't work out why, have tried various things, any ideas?; February 19, 2011 at 1:55 AM
mac said...: LLA:
Are any of your variables in your regression model a string variable? You can use -codebook- to examine your variables. Variables in a regression model must be numeric.; February 19, 2011 at 2:38 AM
LLA said...: Thank you for the thought, but have checked them all with codebook & they are all numeric (byte).; February 19, 2011 at 8:33 PM
mac said...: Next step, which seems basic, is examining your data using -browse y x1 x2-. You are looking at the data for your regression variables (y, x1, x2 = strain_A strain_B num_weeks_rx) for some clue why Stata cannot see any observations. All these values should be either numbers or . and in black (not red).; February 19, 2011 at 9:10 PM
LLA said...: Thanks, yes, they're all black.

I don't understand why they're called y x1, x2 (as stata says in help section too) when really it's y1 y2 x ie dependant var1 strain_A, dependant var2 strain_B, independant variable number weeks treatment, makes me concerned error is because I'm doing the wrong test. There are quite a lot of missing values: 76 obs in strain A and 343 in strain B, but surely this doesn't matter as 2 groups rarely the same & tests work..
Thanks very much for your help; February 19, 2011 at 9:50 PM
mac said...: No need to worry about the naming of y, x1, x2. In my background, y= dependent variable. Just make sure the order of your variables for -regress- is correct.

Next step, what type of data do you have for your dependent variable: continuous, categorical, etc. And what type of data for your two predictor variables?; February 19, 2011 at 9:59 PM
LLA said...: Thank you, sorry for some reason didn't see you reply that time.
My variables are as follows;
dependant var1 strain_A: continuos (time)
dependant var2 strain_B: continuous (time)
independant variable number weeks treatment: ordinal (or categorical) I think. as is time but in weeks 0 weeks, 1 week, 2 weeks etc with no fractions of weeks; February 24, 2011 at 8:38 PM
mac said...: Are you trying to model two dependent variables in one regression? If you are, that won't work. Perhaps you need two models: (1) - reg strain_A number_weeks_treatment-
(2) - reg strain_B number_weeks_treatment-; February 24, 2011 at 8:50 PM
LLA said...: I'm not sure, Don't think the 2 models will give me the answer I want, as I want to see if strain is correlated to num weeks rx but also if strain type (A or B) affects the time to detect bacteria.. Maybe I've ordered the data wrongly?; February 24, 2011 at 9:03 PM
mac said...: I'm afraid I can't help there. That's a research methodology issue and not my field. Perhaps you ought to talk with someone in your field. I guess that ANOVA might be a technique to explore. Good luck.; February 25, 2011 at 9:01 AM
LLA said...: Thank you very much for your advice Michael, think I need to look over he data again to decide what I have..; February 25, 2011 at 8:33 PM
LLA said...: May I take advantage of your stata knowledge one more time?
I have some survival data which I'd like to plot kaplan meir curves on. Because the 'event' of time to event is a positive one I think the graph would make more sense if it started low (ie where crosses x at 0)and went to 1 at the end. I think this means I would like to plot anaysis time against 1-survival. Do you know if it's possible to ask stata to do this?
Many thanks; March 16, 2011 at 2:54 AM
PV said...: Hi Stata Man,

I would like to second Paul de Boer's request.

I have a dummy variable. I can run "tab dummy", I can run "reg dummy x1 x2", I can run "cor dummy x1".

But I CANNOT run "logit dummy x1", nor "xtlogit dummy x1".

Everytime I try a logit command, I get the error message "number of dependent variables for equation 1 must be greater than zero r(198)"

What am I doing wrong? I used to run logits all of the time, and I am not aware of doing anything differently.

Thank you,
PV; March 30, 2011 at 1:24 AM
stataman said...: Sorry. I have no idea.

I guess you can call me statapig now... ?
http://www.youtube.com/watch?v=714-Ioa4XQw; March 30, 2011 at 1:27 AM
PV said...: Thanks for trying, Statapig.

I just ran the same do file with the same dataset on a different computer, and everything worked perfectly. I guess at some point a computer just decides it won't run another logit model?

And there's your solution Paul: buy another computer.; March 30, 2011 at 1:55 AM
ncarvalho said...: To PV and Paul de Boer (and anyone who was getting this error:)

Everytime I try a logit command, I get the error message "number of dependent variables for equation 1 must be greater than zero r(198)"

You need to uninstall and reinstall your version of Stata b/c for some reason it must have become corrupted. I had the same issue and uninstalled/reinstalled it and the issue has resolved.

Good luck!; April 9, 2011 at 12:55 AM
priya said...: Hi!

I am glad to see the blog helping ppl out .. i am running xtlogit and want to know how can i check for autocorraletion and hetero? i tried xtserial but it gives me no result just a dot in place of F and p value? what other test could i use in case this is not possible?; May 2, 2011 at 8:54 AM
priya said...: i have another query.. i am wanting to run dea in stata but i have a lot of missing values in my data... stata keeps running but gives no output for lamost five to six hours... i have around 3000 observations in a panel dataset; May 2, 2011 at 8:56 AM
Adam said...: stata man, blog is great. Question for you. I am looking at a time series of data (15years or so) for a bunch of firms. I am trying to estimate a left hand side variable for firm i and I am supposed to estimate my coefficients by industry and year, excluding firm i.

So essentially, I've got 76,000 observations and lets say I've got 20 observations in industry "a" in year "t". I've got to run a regression for firms 2-20 in that industry/year to estimate my coefficients (I've got all relevant left and right hand side variables), then plug in those estimates with my right hand side variables for firm 1 in that industry/year to calculate my left hand side dependent variable. Then repeat for firm 2, etc...

Any suggestions on how to write this code?

Thanks; June 17, 2011 at 1:41 AM
zaber said...: Hi I am trying to find effect of a policy variable (dummy) that does not change over time, on a pooled cross sectional database consisting of more than 100 countries. I have 500 observations.
In my regression I tried to use country dummies ( 1 for specific country 0 for others) but stata says "no observations" but when I run regression without these country dummies the results are there. Can you please tell me what is going wrong ? Is there anyother way to include country dummies rather than the technique I just described ?; September 2, 2011 at 7:03 PM
mac said...: How did you create your dummies? Also, look at the properties of each
dummy variable to make sure they are numeric rather than string variables.; September 2, 2011 at 7:22 PM
Economist said...: Hey Stata Man!

I want to make a loop that has insde of it:

forvalues k=1/6{
clear all
use data`k'.dta
save data`k'.dta,replace
}

and I can't get the program to work. It does everyting except when reading data`k'.dta it reads data.dta instead of data1.dta.

Do you know how do make a loop with clear all and save inside?

Thank you,
Ana; September 7, 2011 at 4:13 AM
Economist said...: Hi Stata Man!

I would like to make a loop like the following:

forvalues v=1/6{
clear all
use data`v'.dta
save data`v'.dta
}

but when I write it like this instead of saving it as data1.dta it saves it as data.dta for all cases. Would you happen to know how I can fix this?

Thank you,
AG; September 7, 2011 at 4:15 AM
Peter Pan said...: Maybe someone knows if there is a command for that.

if i wanna create a dummy var for a set of countries is there a way to create it for a list?

Say i have the ISO 3 digit Numerical codes and i want a dummy for Countries 110-120 or for different numbers.; September 12, 2011 at 4:43 PM
stataman said...: sure.

look up the command xi; September 12, 2011 at 4:45 PM
Edson C. Araujo said...: Hey, well done for the site, very useful!

I not sure if this still working, but would you mind to give a hand with a stata issue?

If so, can I send through here or to your e-mail?
Thanks

araujoec@gmail.com; September 12, 2011 at 7:16 PM
Peter Pan said...: Thx for the tip but i do want to create a dummy for a group of countries, not sperate ones for each country. So lets say i have Iso codes for all countries and i want a dummy for all G77 countries, I´d like sth like

gen Dummy_G77=0
replace Dummy_G77=1 if IS0==(...)

in the brackests i would like to put a list of the G77 ISO codes or all numbers. I would not like to write a loop or add a | command for every country; September 13, 2011 at 4:37 PM
stataman said...: Try:

egen g7= anymatch(iso), ....

I put the dots because I don't remember the option, but you can give it a list of numbers. Look it up.; September 13, 2011 at 5:26 PM
Help Me said...: Help me Stataman,
all my variables are binary, I keep running the logit command and i get no observations; September 21, 2011 at 9:01 AM
MVS said...: Dear Stataman,
Please help me. I’m stuck. I intend to do discrete choice modeling. I have thousands of patients (say 5,000) and a few hospitals and tens of surgeons. Hospital-surgeon pair is a choice. Each patient faces around 50 choices. My dataset has one observation for each patient which corresponds to the chosen hospital and the chosen surgeon i.e. the variable choice = 1 in all these observations. There are at least 10 variables that describe hospital and surgeon characteristics in all the observations. Now I want to create a choice set for each one of the patients. That means adding around 50 observations per patient ID where the patient did not choose the remaining hospital-surgeon pairs in the choice dataset (for the non-chosen alternatives). The challenge I’m facing is how to create this dataset of 250,000 observations from my dataset. How do I do it in STATA or SAS?
Mahesh; September 28, 2011 at 8:09 PM
Lizard said...: Hi Stata man,

Does collapsing a data set create problems with variable estimation if my original data contained some missing values? For example: collapse...(mean)age. And some age variables are missing. Will my mean age be affected by the missing values?

Thanks.; October 21, 2011 at 5:56 PM
Thomas said...: Hello everybody

I am new in using Stata but already got a huge problem. I try to do a unit root (ADF) for some time series. The problem is, i get the error message:

. dfuller i_spiindex, trend lags(2)
no observations
r(2000);

what is the problem here? the time series is already numerical and when using summarize it tells me to have 141 observations?

thanks for your help; October 25, 2011 at 2:44 PM
mac said...: Did you -tsset- the data?; October 25, 2011 at 4:58 PM
Buddy said...: Hi Stataman!

I need some advice on some very basic things. This is my first time to do a choice experiment and to use Stata.

I already have my raw data in my computer and I need to set it up for multinomial logistic regression.
My choice experiment is about vehicle choice.
I have 6 attributes with the following levels (5x3x3x2x2x2)
Each respondent has to answer 10 choice sets. Each choice set has 2 alternatives with the opt out option.
This is how I initially set up my data (just an illustration)

id choice P1 P2 P3 P4 P5 T1 T2 T3..
1 0 1 0 0 0 0 0 1 0
1 1 0 0 0 1 0 1 0 0
2 1 0 1 0 0 0 0 0 1
2 0 0 0 1 0 0 0 1 0
.
.
P1..P5, T1-T3 are attribute levels

So I made a column for each attribute level and made it a dummy variable.
Each row is a type of vehicle. If the vehicle possesses a certain attribute level (e.g. eng2), that variable has value=1, otherwise 0.

If respondent chooses an alternative, choice=1, if not =0
If the repondent "opts out" (e.g. ce_id 3), both alternatives has 0 value for choice.

Am I doing this data setup right?

Plus how do I treat the "opt out optio?

Many thanks!
BudZ; November 28, 2011 at 7:53 AM
SBB said...: Hi! I'm wondering if there is a way to collapse a list of dummy variables while turning them into a count of how many times that dummy variable appears as 1 in the original data. For instance, if my panel data where A is a country is:
Dum1 Dum2 Dum3
A 1 0 1
A 1 1 0

I want to get:
A 2 1 1

I tried
collapse (sum) Dum*, by(country)
and
collapse (count) Dum*, by(country)
neither of which work.
I look forward to your suggestion,
Best.
SBB; January 9, 2012 at 12:47 AM
Unknown said...: Stata Man. I hope you can help. I work with a datatset with a lot of binary data. I want to created a connected line graph of the "mean" aka proportion, of one of my binary variables by year. I'm having a terrible time with it (using Stata 12). Any advice?
LB; January 21, 2012 at 9:51 PM
Has said...: Hi Can you please tell me how to use the yes/no response in asurvey data.; February 26, 2012 at 2:41 AM
Thannaletchimy said...: Hey Stata man,

I would like to clarify my doubts regarding running the DW test on my panel. I have a panel data for bilateral trade of 11 countries disaggregated over 20 sectors. To create the IDs, I simply assumed that each bilateral trade pair for a particular sector is coded as an id. So I ended up having 200 ids ! Is that the correct way to shape my data ? When I ran the regressions and when I try to do dwstat, I get r 459. So I am wondering if it is the format of my data that is wrong.

Thanks in advance for your help.; March 7, 2012 at 1:16 PM
Jonathan Haskel said...: Thanks STATAMAN my search for how to code "country" turned you up and you have saved me days. thanks. One thing: in panel data on stata, i think i am right, you DO have to code your own time dummies, is that right?
thanks again, Jonathan; April 2, 2012 at 5:08 PM
Unknown said...: Dear Stataman!

I have a panel data on prices that vary across country and time:

clear all

input id str8 (dates) variable
1 "23/11/08" 2
1 "28/12/08" 3
1 "25/01/09" 4
1 "22/02/09" 5
1 "29/03/09" 6
1 "26/04/09" 32
1 "24/05/09" 23
1 "28/06/09" 32
2 "26/10/08" 45
2 "23/11/08" 46
2 "21/12/08" 90
2 "18/01/09" 54
2 "15/02/09" 65
2 "16/03/09" 77
2 "12/04/09" 7
2 "10/05/09" 6
end

As you can see
the start and end date of the time series for countries 1 and 2 are
different. For example, for country 1 the time series begins on
"23/11/08" while for country 2 the time series begins on "26-10-2008”.

My data on prices are available every 28 days (or equivalently every 4
weeks). So each observation is a 4-week average. But in some cases I have jumps (35 days or 29 days instead of
28 days). For example from the above table we have the jumps: from
"28/12/08" to "28/12/08" , from 22/02/09" to "29/03/09", etc

My goal is create a unified sequence of dates across countries. Otherwise I can not do further econometric/data analysis Unless you have different suggestion, I want to take
what I have and calculate monthly average prices. SO I want to change the data frequency (via interpolation?) and instead of having 4 week averages or 5 week averages to have monthly averages
Please, I would be grateful to you if you could provide some code
in order to achieve this
thank you for your understanding; June 18, 2012 at 2:24 AM
Unknown said...: HI Stataman, I could really use your suggestion! I am using a data set of 209 countries, i have run a fixed effects regression for non-oecd countries, when i summarize my variables it gives me all variables in data set, i only need variables , summary of onlythose variables that have been used in my fixed effects regression, for instance, when i run fixed effects, number of groups are 138,I beleive stata drops some observations or some countries, so my total obs for a variable should be 138 multiply by 26 years, 1980-2006, which is 3588 observations only, how can i ask stata to generate this ? please help.; August 8, 2012 at 4:49 AM
Unknown said...: Hi,
I could really use your help!
I am using a data set of 209 countries, i have run a fixed effects regression for non-oecd countries, when i summarize my variables it gives me all variables in data set, i only need variables , summary of onlythose variables that have been used in my fixed effects regression, for instance, when i run fixed effects, number of groups are 138, believe stata drops some observations, so my total obs for a variable should be 138 multiply by 26 years, 1980-2006, which is 3588 observations only, how can i ask stata to generate this ? please help.; August 8, 2012 at 4:51 AM
Anonymous said...: Hi Stataman,

I need your help urgently!

I am dealing with an unbalanced panel data (30 countries and 25 years).

My questions:

1. I tested my panel dataset for stationarity with Levin-Lin-Chu test. STATA requires to consider one variable. I had to consider the dependent variable because it has no gaps. I could not consider any other variable because gaps exist and the software does not run the test when identifies gaps. Performing the test as described I get a p-value = 0.0014 which I intepret in the sense of rejecting the null hypothesis that is panels contain unit-roots. In other words the test confirms the absence of unit-root which means that the specification in use is stationary/valid. Am I right?

2. I run LR test for the heteroskedasticity as follows

xtgls depvar indepvars, igls panels(heteroskedastic)

estimates store hetero

xtgls depvar indepvars

local df = e(N_g)-1

lrtest hetero . , df(`df')

The p-value turns out to be 0.0000 which I interpret in the sense of rejecting the null hypothesis this meaning we have heteroskedastcity. Am I right?

3. I run the Wooldridge test for autocorrelation by using

xtserial depvar indepvars

and I get a p-value of 0.0000. Here again I reject the null hyopthesis associated to the non-existence of autocorrelation. In other words I have autocorrelation. Am I right?

My panel then is stationary (which is good!), but has hetersk. and autocorrelation problems.

To overtake these problems I think I am right to rely on the results from the estimation (which coul be my final task ... right?)

xtgls depvar indepvars, igls panels(heteroskedastic)

which gives me Cross-sectional time-series FGLS regression

I wonder if I can consider (although teh signs of the coeff. are different)

xtmixed depvar indepvars || _all: R.id || _all: R.year,mle

W.r.t. this I would like to know if this command already models while correcting for hetero and autocorr. Or should I consider some option to do this?

Thanks in advance.; August 19, 2012 at 8:50 PM
⭐ResDoc94⭐ said...: . alpha v243 v52 v223 v183, casewise item std
no observations
r(2000);
. alpha v243 v52 v223 v183
cannot determine the sense empirically; must specify option asis
r(459);
. corr v243 v52 v223 v183
no observations
r(2000);
. fuck you and your no observations stata you fucking bastard there is nothing wrong with the fucking data
unrecognized command: fuck
r(199);; October 24, 2012 at 9:45 PM
John said...: I am working with the UCDP Battle Related Deaths dataset. It has data on conflicts group by warring party, location and year. I am comparing it with another dataset and my unit of analysis is the Country Year. I want to find a way to break out the warring party observations in UCDP dataset by individual country.

Just to clarify. What I am looking for is the total number of Battle Deaths in a Country Year whether the country was on either side of the battle or if the battle took place in that country. The command(using the varnames form the dataset ):

total(bdBest) if strpos(SideA, "Burundi") | strpos(SideA2nd, "Burundi")| strpos(SideB, "Burundi") | strpos(SideB2nd, "Burundi") | strpos(Battlelocation, "Burundi"), over(YEAR)

Gives me a table of what I am looking for but of course I would have to do this for every single country and I would like to get the info from that table inserted into the dataset somehow.

Thanks; November 2, 2012 at 6:35 PM
John said...: This comment has been removed by the author.; November 2, 2012 at 6:35 PM
HHB said...: Hello Stataman - thank you for this great BLOG. I am using STATA for the first time to analyze some discrete choice experiment data. I can run McFadden's cond logit using the clogit command (with choices grouped into choice sets), but I also want to run a random effects model (I have 200 respondents who each performed the same 13 choice tasks). When I use xtset or xtlogit with "re", STATA treats my data as binary choice- not as choice experiment data. I cannot find the appropriate command for this anywhere. Appreciate your help - Henry; November 9, 2012 at 1:54 PM
M said...: Do you know what "variable mode has replicate levels for one or more cases; this is not allowed
r(459);" means is wrong with my data?; December 3, 2012 at 8:20 AM
M said...: I'm sorry but I realized I didn't mention that I was trying to run a nested logit regression. My data looks just like the one in the restaurant example in STATA help so I can't figure out what's wrong.; December 3, 2012 at 8:23 AM
xarifx said...: This comment has been removed by the author.; December 25, 2012 at 9:05 AM
xarifx said...: Hi Stataman!

Being fairly new to Stata, I'm having a difficulty figuring out how to do the following:

I have time-series data on selling price (p) and quantity sold (q) for 10 products in a single datafile (i,e., 20 variables, p01-p10 and q01-q10). I am strugling with appropriate stata command that computes sales revenue (pq) for each of these 10 products (i.e., pq01-pq10).

I would greatly appreciate you help.

Thank you.; December 25, 2012 at 9:19 AM
Emilia S said...: Hiiiiii! your blog is great, I'd love you to join my websites, and you put my link on your site, and so we benefit both.

I await your response to munekitacate@gmail.com

kisses!
Emilia; March 8, 2013 at 8:17 PM
Unknown said...: Thanks for providing such a great blog stateman. Appreciated. rain water harvest; May 30, 2013 at 2:48 PM
Anonymous said...: In my data, I have variables as follows: household ID, ID of persons in household, father ID, years of education, who is the father. So person 3 in house 23 for example might say that person 1 is his or her father, while person 6 and 7 and 8 also in house 23 says that person 9 is their father. This is likely a joint family.

So I can't make a new column eduF in the usual way, since for person 3 and 6/7/8 in the same household, the father is different so the eduF level varies even in the same household. I need however this new column eduF saying, for each member of the family, what is the education level of the person they list to be their father.

I think this requires forvalues or foreach and loops, but am not sure what would be the code! Any help would be SINCERELY appreciated.; July 4, 2013 at 9:47 PM
Unknown said...: That’s a great blog!.I have never read a blog like this before. Your writing style is truly informative. jobs stores.; August 6, 2013 at 3:31 PM
vazir98 said...: i am using time series data to analyse the bilateral relation. as the distance variable is constant stata is not taking it into analysis. How to put in distance variable so that stata uses it.

Ambrose; September 12, 2013 at 5:13 PM
boeme said...: This comment has been removed by the author.; October 11, 2013 at 3:00 PM
boeme said...: Hello,

I am trying to test my panel data models with xtserial for serial correlation, but when I use the command I get the result: unrecognized command: xtserial. And when I tried to ssc install xtserial I get the result:ssc install: "xtserial" not found at SSC. I wonder if there is any alternative of this test or if the command has changed the syntax?
I am using Stata 12 and I have done the updates.
Many thanks in advance.; October 11, 2013 at 3:05 PM
daticon said...: Boeme, try any of these:

findit xtserial
net sj 3-2 st0039
net install st0039; October 11, 2013 at 3:19 PM
boeme said...: Thank you very much the last two commands worked and I run the test. Many thanks.; October 11, 2013 at 3:47 PM
Menka said...: Hello,
I have a panel data set and after a Hausman and LM test, I now have to carry out a pooled OLS regression. I need to test for autocorrelation. I only wanted to confirm whether the command "xtserial" is for fixed and random effects only or I can use it for OLS as well?
Thanks in advance!; March 12, 2014 at 7:36 AM
sabina moon said...: hi, everyone, are you need economics help service. We are ready to provide service. please visit this site and contract with us. we are ready 24 hours.
econometrics help; April 22, 2014 at 12:41 PM
Unknown said...: Hi,
I need your guidance.
i have panel of 670 firms from 1972 to 2010. i have already completed panel data unit root analysis. Now i want to do ADF time series unit root analysis for each and every firm in the data. as there 670 firms so is there a command to take time series unit root of all these firms at once or i have to do it separately for each and every firm. i want summary of results so that i can interpret them. please help. will be thankful to you.
ahsan; August 17, 2014 at 9:11 AM
Unknown said...: This comment has been removed by the author.; August 17, 2014 at 9:13 AM
Unknown said...: Hi,
I need your guidance.
i have panel of 670 firms from 1972 to 2010. i have already completed panel data unit root analysis. Now i want to do ADF time series unit root analysis for each and every firm in the data. as there 670 firms so is there a command to take time series unit root of all these firms at once or i have to do it separately for each and every firm. i want summary of results so that i can interpret them. please help. will be thankful to you.
ahsan; August 17, 2014 at 9:14 AM
Unknown said...: here we are providing good healthcare courses to all us and its usefull mbbs courses

phn number:9492066112; January 20, 2015 at 1:46 PM
Unknown said...: Thank you for such an usefull blog.
I'm starting with Stata in cloud https://www.apponfly.com/en/application/stata?KAI
Do you have any experience?; June 4, 2015 at 12:30 PM
Unknown said...: Hi i am working on the YRBS dataset, which is a weighted data. After having the survey commands in place, Stata doesnt seem to run the logistic regression analysis. I can do linear regression but not logistic with the same variables. The code and error is
svy linearized : logit qn8 q92
(running logit on estimation sample)
an error occurred when svy executed logit
r(2000);
Do you have any idea what might be wrong?
qn8 is binary
q92 is multinomial, both are numerical variable.
Thank You.; June 11, 2015 at 10:24 PM
Stephen Angudubo said...: Problem with Mixlogit or Condition logit implementation of discrete choice experiment with multiple choice scenarios
Yesterday, 03:54
Hi! I am new to that statalist forum and Stata but working hard to get used to the software

I implementing a discrete choice experiment to model cassava planting material alternative choice. In my questionnaire, I presented each respondents with 16 choice experiments or choice sets with each choice set having 2 alternatives or choices with an opt-out option. The explanatory variables are the attributes (11 in total) of cassava planting material with varying attribute levels that have randomly fitted between 2 alternative. With this, I am fitting a conditional logit model. In my data set, some explanatory variables are represent by dummy variables while others were categorical variables with up-to 3 categories.
Since each choice experiment has 2 alternative options and an opt-out option, each choice set has 11 rows and each respondent was presented with 16 choice sets.

(The sample of the original attribute and sample data is show below) Attributes of the cassava planting material
Cassava stem attributes Alternative A Alternative B Alternative C
Yield Low (<20 Tons/Ha) Moderate (20-30 Tons/Ha) High (>30 Tons/Ha)
Disease tolerance Susceptible Tolerant
Raw taste Bitter Sweet
Cooked taste Bitter Sweet
Mealiness Hard Mealy Watery
Maturity Late (>18 months) Intermediate (13-18 months) Early (6-12 months)
Seed availability Scarce Available Plenty
In soil longevity Short term (Up to 1 year) Long term (Above 1 year)
Shelf-life of stakes Low shelf-life High shelf-life
Suitability in crop systems Suitable Not suitable
Price 10,000 27,000 40,000
Which one would you choose?
Yes 0. No
1. Yes 0. No 1. Yes 0. No
Choice set
Attribute Alternative 1 Alternative 2 Alternative 3
Yield Low (<20 tons/ha) Low (<20 tons/ha) Opt out
Disease tolerance Susceptible Susceptible
Raw taste Bitter Bitter
Cooked taste Sweet Sweet
Mealiness Mealy Mealy
Maturity Late (>18 months) Intermediate (13-18 months)
Seed availability Available Plenty
In soil longevity Short term (Up to 1 year) Short term (Up to 1 year)
Self-life of stakes Short term Short term
Suitability in crop system Suitable Suitable
Price 10000 27000
Question: Which alternative do you prefer? 1. Yes 0. No 1. Yes 0. No
Sample data
Respondent Choice_set Choice Yield_new Disease_tol Rawtaste Cookedtaste Mealiness Maturity_1
1 1 1 3 1 1 1 2 3
1 1 0 3 1 1 1 1 2
1 2 0 1 0 0 1 3 3
1 2 1 1 1 0 1 3 3
1 3 1 3 1 1 1 2 3
1 3 0 3 1 1 1 1 2
1 4 0 2 0 1 0 1 2
1 4 1 3 0 1 1 1 2
1 5 0 3 0 1 1 1 3
1 5 1 3 0 1 1 3 3
1 6 1 2 1 0 1 3 3
For clogit to work, I tried select choice set as the grouping variable but stata shows "variable Choiceset has replicate levels for one or more cases".
clogit y Yield Disease_tol Rtaste Ctaste Mealines Maturity Seed_avail InSoil_long Shelflife Suit_crop_sys nprice , group(Choice_set )

I want to establish the attributes preferred by the respondents including the level and later determine the willingness-to-pay and potential demand,
Can stataman clarify for me this?; August 26, 2015 at 7:47 PM
Stephen Angudubo said...: Hi stataman
Problem with Mixlogit or Condition logit implementation of discrete choice experiment with multiple choice scenarios

I implementing a discrete choice experiment to model cassava planting material alternative choice. In my questionnaire, I presented each respondents with 16 choice experiments or choice sets with each choice set having 2 alternatives or choices with an opt-out option. The explanatory variables are the attributes (11 in total) of cassava planting material with varying attribute levels that have randomly fitted between 2 alternative. With this, I am fitting a conditional logit model. In my data set, some explanatory variables are represent by dummy variables while others were categorical variables with up-to 3 categories.
Since each choice experiment has 2 alternative options and an opt-out option, each choice set has 11 rows and each respondent was presented with 16 choice sets.

(The sample of the original attribute and sample data is show below) Attributes of the cassava planting material
Cassava stem attributes Alternative A Alternative B Alternative C
Yield Low (<20 Tons/Ha) Moderate (20-30 Tons/Ha) High (>30 Tons/Ha)
Disease tolerance Susceptible Tolerant
Raw taste Bitter Sweet
Cooked taste Bitter Sweet
Mealiness Hard Mealy Watery
Maturity Late (>18 months) Intermediate (13-18 months) Early (6-12 months)
Seed availability Scarce Available Plenty
In soil longevity Short term (Up to 1 year) Long term (Above 1 year)
Shelf-life of stakes Low shelf-life High shelf-life
Suitability in crop systems Suitable Not suitable
Price 10,000 27,000 40,000
Which one would you choose?
Yes 0. No
1. Yes 0. No 1. Yes 0. No
Choice set
Attribute Alternative 1 Alternative 2 Alternative 3
Yield Low (<20 tons/ha) Low (<20 tons/ha) Opt out
Disease tolerance Susceptible Susceptible
Raw taste Bitter Bitter
Cooked taste Sweet Sweet
Mealiness Mealy Mealy
Maturity Late (>18 months) Intermediate (13-18 months)
Seed availability Available Plenty
In soil longevity Short term (Up to 1 year) Short term (Up to 1 year)
Self-life of stakes Short term Short term
Suitability in crop system Suitable Suitable
Price 10000 27000
Question: Which alternative do you prefer? 1. Yes 0. No 1. Yes 0. No
Sample data
Respondent Choice_set Choice Yield_new Disease_tol Rawtaste Cookedtaste Mealiness Maturity_1
1 1 1 3 1 1 1 2 3
1 1 0 3 1 1 1 1 2
1 2 0 1 0 0 1 3 3
1 2 1 1 1 0 1 3 3
1 3 1 3 1 1 1 2 3
1 3 0 3 1 1 1 1 2
1 4 0 2 0 1 0 1 2
1 4 1 3 0 1 1 1 2
1 5 0 3 0 1 1 1 3
1 5 1 3 0 1 1 3 3
1 6 1 2 1 0 1 3 3
For clogit to work, I tried select choice set as the grouping variable but stata shows "variable Choiceset has replicate levels for one or more cases".
clogit y Yield Disease_tol Rtaste Ctaste Mealines Maturity Seed_avail InSoil_long Shelflife Suit_crop_sys nprice , group(Choice_set )

I want to establish the attributes preferred by the respondents including the level and later determine the willingness-to-pay and potential demand,
Can stataman clarify for me this?; August 31, 2015 at 12:40 PM
thiet ke nha dep said...: niceblog

nha xinh
thiet ke nha; October 16, 2015 at 8:54 PM
statanewb said...: Hello, I am having some troubles running svy logistic. Would greatly appreciate any help.

I am able to run the model no problem using:
svy: logistic outcome mode i.agecat gender i.edcat i.incomecat i.prov

however, when I add the propensity score to my model, I get the following error:
svy: logistic outcome mode i.agecat gender i.edcat i.incomecat i.prov propscore
(running logistic on estimation sample)
an error occurred when svy executed logistic
r(2000);

I've looked into this error, and confirmed that all my variables are numeric (not string) and also confirmed that the outcome is coded as 0,1. I'm stuck. Any ideas?; December 25, 2015 at 1:29 AM
sixsigma said...: http://www.aigproexcellence.com/
AIGPE is the most powerful way to build your career.its offer the verious skills like Lean six sigma,Lean six sigma certification,six sigma certification,six sigma tools,six sigma green belt,six sigma green belt certification,six sigma black belt,six sigma black belt certification,lean six sigma black belt,lean six sigma black belt certification,DMAIC; April 16, 2016 at 9:38 AM
Unknown said...: Thank you for sharing very informatics and useful post about very useful info.... Turbo-IVP (Invoice Validation Portal) and Turbo eSigner; May 9, 2016 at 1:16 PM
Anonymous said...: Great and really helpful article! Adding to the conversation, providing more information, or expressing a new point of view...Nice information and updates. Really i like it and everyday am visiting your site..

Online Training in Chennai; July 20, 2016 at 12:43 PM
prethikarajesh said...: Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...
digital marketing company in chennai; July 26, 2016 at 1:59 PM
Unknown said...: confirmed that all my variables are numeric too

-----
My first website:
Thay man hinh opppo gia re; August 11, 2016 at 11:46 AM
karthireva said...: Great words about an these educations,which is useful to learn more.it helps us to get new ideas about an these services.this nice information.wonderful explanation.your way of explanation is good

Best Dot Net institutes in chennai; August 17, 2016 at 10:57 AM
Unknown said...: These information really worth saying, i think you are master of the content and thank you so much sharing that valuable information.

Java J2ee training in chennai; August 17, 2016 at 4:07 PM
Anonymous said...: Very good post. Thank you for providing me with useful information, we can organize a tour of asean,halong, danang...you are interested please afford survey.
halong elegance cruise | halong travel tips
halong elegant cruise | halong attractions
vietnam travel cheap | vietnam cheap tours; October 20, 2016 at 6:22 AM
Anonymous said...: I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.

Sofa Cleaning Services Mumbai; March 17, 2017 at 8:24 AM
Priya B said...: This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Velachery
ios Training in Velachery; July 10, 2017 at 4:26 PM
Unknown said...: Great article.Thanks for share,This tutorial is constructed to follow this sequence: data assembly and construction project of additional variables.

construction magazine in India; October 17, 2017 at 7:50 AM
Unknown said...: Download STATA 15 Full Version
https://www.youtube.com/watch?v=3Wf1yLV6668
This Video Provides Download Links To Software STATA 15. It is Software STATA 15 Full Version. Thank You for See Video Download STATA 15 Full Version; March 19, 2018 at 11:04 AM
Mahesh said...: Really very informative and creative contents. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
Mobile App Development training in

gurgaon; March 27, 2018 at 10:18 AM
Unknown said...: It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.

Digital Marketing Training in Mumbai

Six Sigma Training in Dubai

Six Sigma Abu Dhabi; September 7, 2018 at 2:10 PM
Unknown said...: Useful information.I am actual blessed to read this article.thanks for giving us this advantageous information.I acknowledge this post.and I would like bookmark this post.Thanks
Selenium training in Chennai | Selenium training institute in Chennai | Selenium course in Chennai

Selenium training in Bangalore | Selenium training institute in Bangalore | Selenium course in Bangalore

Selenium interview questions and answers

Selenium training in Pune | Selenium training institute in Pune | Selenium course in Pune; November 30, 2018 at 7:19 AM
Unknown said...: I have been meaning to write something like this on my website and you have given me an idea. Cheers.
Data Science Training in Chennai | Data Science Training institute in Chennai
Data Science course in anna nagar
Data Science course in chennai | Data Science Training institute in Chennai | Best Data Science Training in Chennai
Data science course in Bangalore | Data Science Training institute in Bangalore | Best Data Science Training in Bangalore
Data Science course in marathahalli | Data Science training in Bangalore; December 6, 2018 at 7:59 AM
SANDY said...: Hmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.
Advanced AWS Course Interview Questions And Answers, Top 250+AWS Jobs Interviews Questions and Answers 2018
Advanced AWS Jobs Interview questions and answers |Best Top 110 AWS Interview Question and Answers – india
Amazon web Services Certification Training in Bangalore | AWS Training in Bangalore jayanagar; December 28, 2018 at 11:57 AM
priya said...: Really you have done great job,There are may person searching about that now they will find enough resources by your post
Data Science training in Chennai
Data science training in Bangalore
Data science training in pune
Data science online training
Data Science Interview questions and answers
Data science training in bangalore; January 24, 2019 at 1:09 PM
Lopez said...: Thanks for writing this blog. It is very much informative and at the same time useful for me

Clinical Biostatistics Services; June 23, 2020 at 10:04 AM

«Oldest ‹Older 1 – 200 of 210 Newer› Newest»