Step #2 - Combine Multiple Datasets into One

In many cases, the data needed for the statistical analyses come from different sources. For example, if you want to analyze international growth, you might find economic indicators in a dataset of the World Bank, political indicators in think tanks such as Freedom House, and climate data in another dataset. Another case is when you have one dataset which is divided into multiple files. In this post I will try to elaborate a bit on how to make it work.

Types of Datasets Combinations

There are actually two main types of combinations:
  1. "Vertical" combination - You want to do this when you want to add observations from one file to another file. For instance, if you are working on a sports statistics project and you have data for players performance in four separate files, one for each year between 2001 and 2004. Another possibility is that the data is separated according to different leagues, groups, etc. As long as the variables in the files are the same and the only thing you need to do is to add observations, this is vertical combination. The command in Stata we will use is append. We will explore this command later.


  2. "Horizontal" combination - This is the kind of combinations in which you want to add variables, and not observations. The observations appear in both files (at least most of them), but in each file there is different information about them. For example, if we're dealing with high school students and we have one file with their personal information and grades, and another file with SAT scores only. If we have an identifying variable in both files (e.g Social Security Number), we can assign each student his/her SAT score. This example is a One-to-One matching. There are three types of matches of this kind:

    1. One-to-One matching: If the identifying variable which appears in the files is unique in both files, then it's a one-to-one match. Unique means that for each value of this variable, there is only one observation that contains it. In the figure below, country is the identifying variable. In both datasets, each country has only one observation.



    2. One-to-Many matching: If the identifying variable is unique in one file, but not unique in the other, then it's a one-to-many matching. This is very common when you have groups of observations in one file (the file with the identifying variable which is not unique), and information regarding each group in another file (the other file). The following figure will make it clearer:



    3. As you can see, one can group the individuals to housefolds. The household identifying variable (fam_ID) is common to both of the files. It is not unique in the individuals file, but it is unique in the households file. This enables Stata to assign the same value, of each of the households variables, to all the members of the household. Note that although we have a unique identifier for the individuals (indiv_ID), it is irrelevant for this merge of files.

    4. Many-to-Many matching: This is very rare. This is also problematic, since there is no unambiguous rule for the assignment of values from observations in one file to observations in the other file. I will not elaborate on this matching too much.

Commands Syntax

There are three commands you should know if you want to combine datasets: append, merge and joinby. All three of them combine the dataset currently in memory with data from a file you specify. We will name the data in memory "Master Data" and the data to combine from the specified file "Using Data". It will be clear why we use the word Using here.

Append

The append command does what we called "vertical" combination. It adds observations. It's syntax, in a simple form (for options not specified in this tutorial, you can always type help append in the command line in order to explore more about the command), goes like this:

append using <filename>

Example:

append using "C:\more_observations.dta"

append using "C:\more_observations" // (this is equivalent)

This will add the observations from the file C:\more_observations.dta to the data in memory. In case no extension is specified (i.e no .dta at the end of the filename), Stata assumes it's .dta, so you can omit it.

Now you understand why we call the data in C:\more_observations.dta "Using Data".

What happens if you have variables in the Master Data which do not exist in the Using Data? The observations from the Using Data will be assigned missing values in those variables. If there are additional variables in the Using Data which do not appear in the Master Data, the observations from the Master Data will have missing values in them.

Tip: Before you append, you might want to make sure you know the source file for each observations. For example, if you append 2008 data to 2007 data currently in memory, you might want to make sure you have the variable year in each of the datasets prior to the incorporation of the Using Data.

Merge

For "horizontal" combination of datasets you will need either merge or joinby. The difference between them is the method they use in order to do the merging, but in one-to-one or one-to-many merges, they give almost the same functionality. We will start with the merge command. The syntax, in its simplest form, is:

merge <identifying variable(s)> using <filename>

Examples:

(1)
use "D:\geography", clear // Assumes D:\geography.dta"
merge country using "D:\economy"
// Assumes "D:\economy.dta"

(2)
merge fam_id using "K:\households.dta"

(3)
merge state year using "K:\USA_data\precipitation.dta"

In the first example, Stata first loads observations from a file called geography and then matches them to observations in the economy.dta file. This will do what the figure in the one-to-one section above shows.
Note: what comes after the double forward-slash (//) will be ignored by Stata. It's used to make the code clearer to the human reader

In the second example, assume the individuals dataset is already in memory. I tried to do what the figure in the one-to-many section above shows. Notice that there is no difference in the syntax of the command. The only difference is in the structure of the files you are operating on.

In the third example, I wanted to show you can use more than one identifying variable. In case only combination of variables is unique (and you want to identify observations uniquely), you can specify both of them. In this example, suppose you have data on state-year basis (this is called Panel Data, because you have the same subjects reappearing in different instances) - let's say car accidents data (number of accidents, injuries, etc) and you need to add data about the weather conditions in that year, you need to tell Stata to make the match between the datasets according to both state and year.

Important: The merge command requires that both the Master and Using Data will be sorted by the identifying variables. If the Master Data isn't sorted, run sort <identifying variable(s)>before the merge command. If the Using Data isn't sorted, open it first (use <filename>, clear), then run the sort command, then save it (save <filename>, replace), open the Master Data and run the merge command. Here's an example:

use "D:\economy", clear
sort country
save "D:\economy", replace
use "D:\geography", clear
sort country
merge country using "D:\economy"

1) Since you saved D:\economy.dta in the third line, you will not need to open D:\economy.dta and sort it again in future runs.
2) If you are doing a one-to-one match (i.e if the identifying variable(s) are unique in both sets), you can run the merge command with the sort option. It will automatically sort the datasets within the merge command. The sort option will not work if the identifying variables are not unique.

The _merge variable:

The merge command automatically creates a variable named _merge, which contains information regarding the observation's existence in each of the two datasets. In the simple cases I mentioned above, it will contain, for each of the observations, one of the following values:
1 => the observation (the identifying variable(s) values) appeared only in the Master Data
2 => the observation (the identifying variable(s) values) appeared only in the Using Data
3 => the observation (the identifying variable(s) values) appeared in both datasets

It is up to you to decide what you want to do with each of the cases. In some projects you will not want observatios with the value 2 in the _merge variable. For example, take example 2 above. If you have households data in the Using data, but your interest is individuals (in the Master Data), you don't need observations with household data but without individuals that are linked to it. If you want to get rid of it, you can either type drop if _merge == 2 after the merge command, or, even better, run the merge command with the option nokeep. That is:
merge fam_id using "K:\households.dta", nokeep

You can also decide that observations in the Master Data that has no corresponding observations in the Using Data are irrelevant for your research. In that case, there is no special option for the merge command. So you need to add the command drop if _merge == 1 after the merge command.

Other options of interest

update and replace

What happens if you have some overlap between the variables in the files? Say, when you are merging data from the CIA World Factbook and the World Bank, you might have GNI in both datasets. If you specify none of them, Stata will keep the values that were in the Master Data (in memory). If you specify the options update replace (replace can't be specified without update), Stata will take, instead, the values that are in the Using Data and put them in place of the Master Data values. If you just type the update option (without replace), however, Stata will put the Using Data values only in observations where the Master Data values are missing.

So in case you have the same variable but different values, use neither option when you think the Master Data is more reliable. Use the update replace options if you think the Using Data is more reliable. If they are equally reliable, use just update.

If you specified the update option, _merge will contain 5 possible values:
1 => the observation (the identifying variable(s) values) appeared only in the Master Data
2 => the observation (the identifying variable(s) values) appeared only in the Using Data
3 => the observation (the identifying variable(s) values) appeared in both datasets and the values are the same in both
4 => the observation (the identifying variable(s) values) appeared in both datasets and the value in the Master Data is missing.
5 => the observation (the identifying variable(s) values) appeared in both datasets but the values in the datasets are not missing and not the same.

Examples:

merge country using "D:\Economy", update replace

merge id using "K:\second_version", update

keep

If you want only some variables to be merged, instead of all of them, you can specify keep().

Example:

merge country year using "F:\intl_health_stats.dta", keep(birth_rate death_rate)

unique, uniqmaster, uniqusing and sort

In order to make sure the one-to-one or one-to-many matches are really unambiguously defined, you can make sure the identifying variables are unique in either the Master Data (uniqmaster), Using Data (uniqusing) or both datasets (unique). It is really recommended to specify them, although it won't change the functionality. The main contribution of these options is to make Stata print an error and exit if what you think is unique is not really unique. The sort option can make the merge command sort the datasets on its own, but it is only possible if you're running a one-to-one match (in other words, sort implies unique).

More than one dataset

You can merge more than one file in one command. Instead of specifying one filename after using, you can add more filenames. Unless the nosummary option is specified, the command will create _merge1, _merge2, ... , _mergen variables in which the observation's value in each of the _mergek variables will be 1 if the k-th dataset had this observations and 0 otherwise. The _merge variable will still be there, but now the value 3 in it means that the observations appeared in at least one of the Using datasets.

Personally, I prefer running the merge command iteratively and adding one dataset at a time. It requires to drop the _merge variable each time, and it might take longer time, but I can better report and deal with the merging outcomes.

Joinby

The joinby command does almost the same job merge does, but its internal working is different, so there might be differences in terms of processing time. Its main difference arises when you're dealing with many-to-many matches, but it can be used for one-to-one and one-to-many matches too. The simple syntax is:

joinby <identifying variable(s)> using <filename>

Example:

joinby country using "D:\economy"

Unlike merge, the default of joinby is to drop all observations that do not appear in both datasets. In order to keep those observations, you need to use the unmatched() option. This option has four possible variations:

  • unmatched(none) - Keep none of the unmatched observations (this is the default)
  • unmatched(master) - Keep observations in Master Data that have no match in Using Data (but not vice versa)
  • unmatched(using) - Keep observations from Using Data that have no match in Master Data (but not vice versa)
  • unmatched(both) - Keep all unmatched observations, from both Using and Master Data

So if you want to do the same thing done in the first example of the merge command, use the following commang:

joinby country using "D:\economy", unmatched(both)

There is no need for the datasets to be sorted by the identifying variable(s), which is an advantage over merge.

The update and replace options are available for joinby too.

As I said, more details with:

help joinby

Many-to-Many Merge

Although I have never needed it, this is where merge and joinby will give you totally different results. The question is how to match values from one dataset to the other. I think the best way to explain the difference between the commands is graphically:

Now you can understand the meaning of the sentence describing the joinby command in the help reference: "Form all pairwise combinations within groups".

Conclusion

If you want to add observations: append.
If you want to add variables: merge or joinby

As always, before you celebrate, make sure you got the combination of the files right by looking at the means, counts, minimum and maximum values (sum command) and tabulations (tab command). Take a special look at the _merge variable. Look for missing values or other outlying observations. If you have too many of them, you might have made a mistake along the way. Browse the data a bit. See that the data merged correctly.

Don't forget to save the file (that is, if you don't want to rerun the merge command later).

(go on to Step #3)

161 comments:

  1. nice one dude! I will be visiting this post quite a few times methinks :)

    ReplyDelete
  2. hey stataman
    I do a merge of a bunch of files, which are all already sorted by the personid I'm using. here's the command and the output:
    merge personid using idS7G__IND.DTA idS7B__IND.DTA idS19C_IND.DTA idS7F__IND.DTA idS9___IND.DTA idS7H__IND.DTA idS3___IND.DTA, _merge(ind)
    (label timeunit already defined)
    (label yesno already defined)
    (label timeunit already defined)
    (label yesno already defined)

    do you have any clue what this label already defined thing is?

    hey and when are we getting our post on thank god for the egen command? hmm?

    ReplyDelete
  3. Hi Katherine,

    I haven't seen an error like this before. My guess is that it talks about labels defined in each file. These labels are later attached to variables and then numeric values are displayed with their corresponding label.

    Is this actually an error or a warning? If it's an error, it appears in red color and stops the program. If it's a warning, it's in green and you can go on with your program without a problem.

    If it is indeed an error, try to run the merge with the option "nolabel". The help file says it will not copy value labels from the using files.

    Does this help?

    ReplyDelete
  4. hey you
    so yes it was just a warning and not an error, and using nolabel did fix the problem.

    here's a suggestion for a post - the importance of using log files. I just had crimson go loopy on me, and it deleted all the pretty code I had written over the past 5 days. It was pretty code! luckily I could use my log file to retrieve the code and recreate my file - thank goodness. So now I have put into place a proper repository backup system but in the meantime I am happy I was using log files!

    ReplyDelete
  5. Hi Stataman,

    nice blog with interesting articles. I started a similar project a while ago, but I didn't descover yours until now. Keep up the good work!

    Sofie

    ReplyDelete
  6. Hi Stataman,

    Your blog is terrific and thank you for your time and efforts on putting it together!

    I have to append 230 datasets together (using vertical combinition). Do you have any tips on doing it all at once?

    ReplyDelete
  7. Thanks!

    To combine 230 I'd recommend looking at stage #6 in this tutorial. It shows how to use loops. If your dataset files have a systematic name (file1.dta file2.dta ... file230.dta) it would really be easy with a forvalues loop. Otherwise you can construct a long macro with all the filenames one after the other (except for the first). Load the first by the "use" command and then use a foreach loop to joinby, or merge, the other files to the accumulated dataset in memory.

    ReplyDelete
  8. You're awesome! Thank you so much!

    ReplyDelete
  9. Hi Stataman!
    I need some urgent help in understand of the mergins many datasets. I have to merge 6 to 7 datasets in fact. i these files like 1 to 7. I started with merging 1 with 2 using the code
    use 1
    sort id
    save 1, clear
    use 2
    sort id
    save 2, clear
    use 1
    merge id using 2, no keep
    tab _merge
    keep if _merge==3
    use 1
    .
    .
    .
    same pattern till I merged all the 7 files to 1. I got finally a merged datasets.
    My question is should i drop _merge=1 if i have to use repeated cross sectional sample.

    ReplyDelete
  10. First of all you probably need to drop _merge, if it exists from previous merges, before any merge.

    As to the _merge == 1 (those in memory that did not find a match in the file on disk you are merging into memory), it's your decision. I don't think there's a rule. Maybe they were missing from the first dataset but have observations in datasets 2 to 7. Still in some projects dataset 1 might be crucial, so you might want to drop them after all.

    What I usually do is look at the most inclusive dataset (with all the ones that did not find a match), try to understand why there is no match and then decide according to what I got what I want to keep. Some times it's only _merge==3, other times not.

    ReplyDelete
  11. Thank you Statsman for the reply. I think I should give some more explanation to my query. I have 28 quartely collected data in 5 waves each and only 20% of the individuals repeated each wave so that whoever entered in the first wave, 20% of them are interviewed in 2nd waves and this over the 5th wave they exit. Now that there seems to be panel touch in it but it more generally used as a cross sectionally so I do not need to drop if an individual was contacted once. Now in this case if my datasets do not exactly match still I need to keep only _merge=3 and drop else. can you help me with the choice of merge or append command in that case. I have many variables with the same name and coding over the quarters.

    ReplyDelete
  12. Hi again.

    Wait, if you have a recurring cross-section, why are you merging it "horizontally" instead of "vertically"? Usually you will have the same variables, right? Just use the append command and add each wave below the other. You can add a variable that indicates which wave did the observation come from.

    Does this help?

    ReplyDelete
  13. I am sorry for late reply to your reply but I was unable and away so could not made that in time. Now after following some hints from these posting I think have to use the same append command and I can only have some slight confusion and I hope you would finally help me sort that out also. Ok I appended 28 waves only one wave have such recoded variables which are different from the other codes. For example rest waves codes countries by names and one have have numeric codes. I know I have to recode by tostring and replace commands but as there are more than 100 countries in the names so is there any way which will directly recodes these countries into naming codes instead of digits. I know there might not be but still want to confirm. Also would it be fine to use both the codes for the same named variables.

    ReplyDelete
  14. I would recommend creating a dataset that will be like code dictionary. In it you can have a variable for each coding method. One for the numeric codes, another one for three-character country code, another for two-character etc (only if you need to). Then, if your original datasets are tidy, you can merge the relevant variables from the dictionary according to the code you have in the original file and the one you want in the big destination file. After you create the dictionary you only need to merge each file.

    One more thing to remember, though, is that some commands in Stata don't like string values (for example, if you try fixed effects regression with xtreg). So maybe the best thing is to keep the numeric country code and maybe label the values with some string format of the country name - so that human eyes can read it easily too.

    I hope this helps, but I'm less and less sure.

    ReplyDelete
  15. Hi Again!
    I am really thanking you for your guidance which let me to work out most of the issues by now. Here the last thing I would like you to confirm for is that if I have the same type of variables like country and there are different answers to this questions like
    use dataset1
    list country
    UK
    USA
    France
    Spain

    and

    use dataset2
    list countru
    UK
    USA
    Spain
    Germany.

    Would the apending the command would replace not being alike entries in the dataset or it would creat another category in the same variable. eg
    use apndeddataset
    list country
    UK
    USA
    Spain
    Germany
    France

    or it would add the entries alike and superimpose the dataset1 entry of france with germany. Please confirm it for me as I have more than hundred countries in my country variables I could not figured out how that appending the country variable in 8 different quarterly data would be consistent.

    ReplyDelete
  16. The best way to learn that is to experiment. Try to construct datasets as you gave in the example and then do the append and see what happens.

    Append does not superimpose datasets on each other. It just puts the appended dataset below the dataset in memory. If you have the same variable name for country, it will put the appended observations' countries in the same variable, but in the appended observations. If there are two names (country and countru), then a new variable named countru will be created and the first dataset's observations will have missing values for countru whereas the appended dataset's observations will have missing values for country.

    I'm pretty sure experimenting will be much more helpful than my comments.

    ReplyDelete
  17. THANK YOU SO SO SO MUCH! Your site (the merge/append post) just saved me from hours & hours or further struggling (I've already spent many such hours). Thanks!

    ReplyDelete
  18. Thank you, I really needed a refresher on Stata. :) Your blog is wonderful.

    ReplyDelete
  19. Hi Stataman, I am working on a project. I need to make combinations of variables of the common values in those variables and create new variables from these. For example, in one dataset, I have 8 variables so possible number of combinations would be 28 for two, 56 for three, 70 for four etc. I have worked out a way but this takes a long time. Can you help me write a shorter code or guide me which command(s) should be used to accomplish this. Thanks. Nafees

    ReplyDelete
  20. You can use the gen or egen command where gen newvar= var1 if var2==varvar3 format. This way all equal in values variables will be generated.

    ReplyDelete
  21. Hi, I hope this question is not too basic, but I am new to Stata and don't really know how to search for help with this question. I am analyzing data from the American National Election Study of 2008. In the post election part of the survey, respondents are asked two questions about their perception of government responsiveness.

    The problem is that about half of the respondents are asked one version (labeled "old" question) of the first question. The other half are asked another version (labeled "new") of the first question. The only difference between the two versions, however, is the presence of the word "about" in one and its absence in the other. Thus, I want to assume that the questions are asking essentially the same thing.

    The second of these Government Responsiveness questions (the actual second question, not the second version of the first question) just has one version. I want to create a scale to combine the responses to the two Government Responsiveness questions , but don't know how given the two versions of the first question.

    Normally, if two questions only have one version each, I would generate a new scaled variable to combine the two questions, as in gen NewScale = (Question1 + Question2). However, given that there are two versions of question 1, I don't know how to do this.

    If you would help me I would be most helpful.

    Thanks for your time.

    ReplyDelete
  22. I am merging data on 1 to 1, 1 to many, and many to one but i a m getting the message "variable hhid does not uniquely identify observations in the master data"
    When i merge on m to m data especially on group variables is becoming correlated. what can i do?

    I used the following commands:

    use "C:\Users\MWENIAK\Documents\LCMS2006\Education 14.08.2010.dta", clear
    rename SEC4_PID pid
    rename HID hhid
    sort hhid pid
    save newfile1.dta, replace

    use "C:\Users\MWENIAK\Documents\LCMS2006\Household Roster and migration and poverty.dta", clear
    sort hhid pid
    save newfile2.dta, replace

    Thanx

    Kabaso Nkandu

    ReplyDelete
  23. I am merging data on 1 to 1, 1 to many, and many to one but i a m getting the message "variable hhid does not uniquely identify observations in the master data"
    When i merge on m to m there is no problem and it is successful, but data especially on group variables is becoming correlated. what can i do?

    I used the following commands:

    use "C:\Users\MWENIAK\Documents\LCMS2006\Education 14.08.2010.dta", clear
    rename SEC4_PID pid
    rename HID hhid
    sort hhid pid
    save newfile1.dta, replace

    use "C:\Users\MWENIAK\Documents\LCMS2006\Household Roster and migration and poverty.dta", clear
    sort hhid pid
    save newfile2.dta, replace
    /*Merges the three new files generated*/

    use newfile1.dta, clear
    merge 1:1 hhid using newfile2.dta
    tab _merge /*check the file to verify that _merge takes the appropriate value*/
    drop if _merge!=3
    drop _merge

    Thanx

    ReplyDelete
  24. Try to merge according to both hhid and pid:

    merge hhid pid using ...

    ReplyDelete
  25. Thanx for your quick response. I tried merging using both hhid and pid but i am getting the following error message:

    merge 1:1 hhid pid using newfile2.dta
    variables hhid pid do not uniquely identify observations in the master data

    ReplyDelete
  26. This means your dataset has at least one case in which at least two observations share the same combination of hhid and pid. Stata doesn't know which one of them to choose for the merge. You need to figure out exactly how your datasets are constructed. Using different egen commands can help you learn more about it. For example:

    egen c = count(_n), by(hhid pid)
    tab c

    browse if c > 1

    Will show you the cases that confuse the merge

    ReplyDelete
  27. Thanx once again. I have managed to use the egen and got the following results:

    use "C:\Users\MWENIAK\Documents\LCMS2006\Education 14.08.2010.dta", clear

    . rename SEC4_PID pid

    . rename HID hhid

    . sort hhid pid

    . egen c = count(_n), by(hhid pid)

    . tab c
    c Freq. Percent Cum.
    1 95009 99.82 99.82
    2 170 0.18 100
    Total 95179 100


    what can i do to make merge 1 to 1 possible. please advise!

    ReplyDelete
  28. I'm sorry I can't help more, but I'd look at the 170 cases of 2 obs per hhid-pid combination and see why you have them. If they are just duplicates, drop one of each (duplicates command can help with that). If they are not exact duplicates, try to find out what distinguishes each observation in the pair and see maybe there's a third variable you need to merge by.

    ReplyDelete
  29. Thanx very much stataman. may almighty God bless you. your advice worked. i dropped the 170 cases and a 1to 1 merge worked.

    ReplyDelete
  30. Hi kabaso,

    I'd drop only half of the 170 cases (those that are duplicates), not all of them. There is still valuable information in them. To keep just one instance of every group of the same hhid-pid you can:

    egen tag = tag(hhid pid)
    keep if tag == 1
    drop tag

    Good luck

    ReplyDelete
  31. Hi stataman. with your advise i managed to merge the first four files successfully. when i decided to merge three extra files to make 7 files there is a problem. variables from the second and third file were dropped from the final merged file. what can i do to retain all the variables in the seven files?

    ReplyDelete
  32. hi stataman i want to withdraw my earlier post. You took too long to reply. Therefore i made so many tries and research only to discover a typographical error in my do file. it is working perfectly. you are genius

    ReplyDelete
  33. hi stataman i want to withdraw my earlier post. You took too long to reply. Therefore i made so many tries and research only to discover a typographical error in my do file. it is working perfectly. you are genius

    ReplyDelete
  34. Hi Stataman!
    I have two datasets, one baseline and one follow up each of these have unique ID for household (hhid). I want to merge these to construct a panel of it. I need your suggestions. Thanking you in anticipation.

    ReplyDelete
  35. Stataman!!!You briliant!!! Thanks a lot!

    ReplyDelete
  36. Hi,

    I have a question regarding how to merge datasets. I want to combine datasets (individual data) from different countries where the categories for each variable will be different, for example with "political party" or "province". Although they are the same variables, what do I do so that all of the categories for all three countries appear in the 'base' dataset? Right now I am trying to do this in SPSS but I am not sure how to continue or if I should try this in STATA. In one dataset I have added more categories for the political parties in each country, but do I have to recode them then in the original dataset before merging? I hope this makes sense and thanks in advance for any advice you can give me!

    ReplyDelete
  37. Sorry, I meant to elaborate, I think this would be either a one to many merge or many to many merge. Another example like I said is the province variable where for one country there are certain provinces and for another country there are others. So the variable is the same, but the categories are different. I would really appreciate specifically on the best method to use and the commands I would need to do this. I have read over the post but any extra advice regarding my examples would help!

    ReplyDelete
  38. Our SLM household survey data contains a number of files pertaining to
    various socioeconomic aspects of the population. We have managed to merge
    different files with the master file by jointly using HHcode and IDC (the
    personal identifier). However, we are finding difficulty in merging the file
    containing data on remittances with the master file. This remittance file
    has only HHcode as identifier, and as is the case with other files, is not
    unique. One solution that works is to drop all non-unique HHcode
    observations in the remittance file, and then do a m:1 merge with the master
    file. We are wondering if there exists a better solution to the problem.

    ReplyDelete
  39. Hi Stataman!!

    I have a huge problem!!

    I`m using data from WB and because it`s too big they divide it into 45 files. I merged them one by one...but then they have 2 files at the end with the weights. I`m stuck, I really need the weights but how can I merge them since the variable don`t correspond? any little help would be highly appreciated

    ReplyDelete
  40. This comment has been removed by the author.

    ReplyDelete
  41. hi, if i need to merge data based on more than one key variable, hw do i do it?

    ReplyDelete
  42. You can use options like 1:m, m:m and m:1.

    for more details, see help merge in Stata.


    Anees
    aneconomist dot com

    ReplyDelete
  43. These multiple dataset are really very helpful.The discussion is really nice and getting some ideas.

    Java Training in Chennai

    ReplyDelete
  44. Very nice article and I am Obat Bius very happy to meet with your blog, the articles are very interesting, thank you for share very amazing article and I wait for the next quality articles...

    ReplyDelete
  45. Thanks for your help regarding the already defined error when merging datasets! That was helpful!

    ReplyDelete
  46. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
    Microsoft SQL Server Training In Chennai

    ReplyDelete
  47. Excellent information with unique content and it is very useful to know about the information based on blogs.
    Informatica Training In Chennai
    Hadoop Training In Chennai
    Oracle Training In Chennai
    SAS Training In Chennai


    ReplyDelete
  48. Econometricians Club (www.econometricians.club) offers an online course in Stata for Econometrics and as I am member of this blog since long, I wish to offer a discount to any of the blog-member in an online, one to one and private online course to be recorded for the participant for future use with custom module based on the participant specialization of research. The courses include:

    1. Data Cleaning, Merging, Appending, Managing, Graphing
    2. Analysis, Regression, Correlation, Hypothesis Testing
    3. Regression Evaluation, Assumptions and Specification Tests
    4. Modification of Models based on 3 where needed
    5. Writing of Results in Academic Standards

    Those who register for this course and mention STATMANBLOG, I will give him a discount for around 50% of the course fee charged from normal students.

    You can see more about my club at htt://www.econometrician.club

    ReplyDelete
  49. Great post,

    This information is impressive..I am inspired with your post writing style & how continuously

    you describe this topic. After reading your post,thanks for taking the time to discuss this, I

    feel happy about it and I love learning more about this topic..

    Java Training in Chennai

    ReplyDelete
  50. Stataman: Your blog is really informative. How often do you clean spams nowadays? There appear several spams: people are trying to sell their junk training courses.

    Could you help explain the difference in the following merging commands?

    The first merge command I experimented is:
    . merge 1:m idgr using ... /*idgr is the identifying var, which is created by grouping two vars, id and session*/

    The second is:
    . merge 1:m id session using... /* id and session are the two identifying vars*/

    The results of the two merging process are not the same. The first one gives less merged obs (_merge==3) than the second one does.

    Should I keep the second merge result or the first one?

    ReplyDelete
  51. #ThangViet:

    Your point of selling junk courses can be true but for my own comment as an instructor of Econometrics using Staa at www.econometricians.club might be exclusion as it is fully relevant as I am always looking to this forum since 2009/2010.

    Now, the two codes are difference as the first one matches each observation/variable based on only on idgr while the second one makes pairs for unique combinations using the id session.

    Initially, the two datasets are compared for idgr only for first set of code and if that matches between the data, it is merged accordingly and _merge results will be ==3. Otherwise, it can be only in main/parent data or merging data.

    The second code first makes unique ids based on the pairs of id and session and where both the id and session matches between the two datasets, then it creates the _merge ==3 or it might be to the one or other datasets.

    I wish this explain simple explanation helps you understand the issue.

    ReplyDelete
  52. Since 2008, I have been following this post when I was studying Econometrics and Stata and now often I recommend reading of this post to my students when I am teaching at AnEc Center for Econometrics Research. You can request private research mentorship and online courses in Panel Data Analysis and Time Series Analysis on www.aneconomist.com. Note, we are Stata listed course providers. See it here: https://www.stata.com/meeting/short-courses/#online

    ReplyDelete
  53. I gathered some needful information from your blog. Keep update your blog. Awaiting for your next update.

    jira agile training

    ReplyDelete
  54. This is good site and nice point of view.I learnt lots of useful information.
    Click here:
    angularjs6 Training in Chennai
    Click here:
    angularjs Training in online

    ReplyDelete
  55. Pleasant Tips..Thanks for Sharing….We keep up hands on approach at work and in the workplace, keeping our business pragmatic, which recommends we can help you with your tree clearing and pruning in an invaluable and fit way.
    Click here:
    Microsoft azure training in btm
    Click here:
    Microsoft azure training in rajajinagar

    ReplyDelete
  56. Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
    Blueprism training in velachery

    Blueprism training in marathahalli


    AWS Training in chennai

    AWS Training in bangalore

    ReplyDelete
  57. I am really happy with your blog because your article is very unique and powerful for new reader.
    Click here:
    Selenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training

    ReplyDelete
  58. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.

    best rpa training in chennai |
    rpa training in chennai |
    rpa training in bangalore
    rpa training in pune | rpa online training

    ReplyDelete
  59. I must say this blog loads a lot quicker than most. Can you suggest a good internet hosting provider at a reasonable price?
    safety courses in chennai

    ReplyDelete
  60. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Office365 Training
    Datastage Training
    Cognos Training

    ReplyDelete
  61. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
    java training in chennai | java training in bangalore

    java interview questions and answers | core java interview questions and answers

    ReplyDelete

  62. Helpful information. Lucky me I discovered your web site accidentally, and I am surprised why this accident did not came about earlier! I bookmarked it.
    Click Here : caterpillar used wheel excavator

    ReplyDelete

  63. I very like your site and I recomend it to my friends, if you want look at this website
    Click Here : caterpillar used wheel excavator

    ReplyDelete
  64. Very much impressive blog keep the good work up. I found this very informative. It helps me a lot. Love to wait for your next post.
    Click Here : used-bakhoe 420f 0skr02123 for sale


    ReplyDelete
  65. Outstanding blog thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.
    Air Hostess Training in Chennai | Air Hostess Training Institute in Chennai | Air Hostess Academy in Chennai | Air Hostess Course in Chennai | Air Hostess Institute in Chennai

    ReplyDelete
  66. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
    CCNA Training in Chennai
    DevOps Training in Chennai
    DevOps certification
    DevOps Training
    Best CCNA Training Institute in Chennai
    CCNA certification in Chennai

    ReplyDelete

  67. Thanks for posting this info. I just want to let you know that I just check out your site and I find it very interesting and informative.
    Click Here : used-bakhoe-cat-420e-0phc00843 for sale

    ReplyDelete
  68. I’ve desired to post about something similar to this on one of my blogs and this has given me an idea. Cool Mat.

    python interview questions and answers | python tutorialspython course institute in electronic city

    ReplyDelete
  69. Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....
    Data Science Training in Indira nagar
    Data Science training in marathahalli
    Data Science Interview questions and answers

    ReplyDelete
  70. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    Java training in Chennai | Java training institute in Chennai | Java course in Chennai

    Java training in Bangalore | Java training institute in Bangalore | Java course in Bangalore

    ReplyDelete
  71. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    online Python training
    python training in chennai

    ReplyDelete
  72. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    angularjs Training in btm

    angularjs Training in electronic-city

    angularjs online Training

    angularjs Training in marathahalli

    angularjs interview questions and answers

    ReplyDelete
  73. feeling so good to read your information's in the blog.
    thanks for sharing your ideas with us and add more info.
    German Training in Nungambakkam
    German Training in Mogappair
    german coaching in bangalore
    german training in bangalore

    ReplyDelete

  74. Awesome post! Really enjoyed this post. But I want more information on such valuable topic.
    Click Here : track excavators for sale used excavator cat-336el

    ReplyDelete

  75. This is really great information found here, I really like your blog. Thanks very much for the share. Keep posting.Click Here : track excavators for sale used excavator cat-336el

    ReplyDelete
  76. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excelle
    Java training in Bangalore | Java training in Electronic city

    Java training in Chennai | Java training institute in Chennai | Java course in Chennai

    Java training in USA

    Java training in Bangalore | Java training in Indira nagar
    nt care and we take your comments to heart.As always, we appreciate your confidence and trust in us

    ReplyDelete
  77. Thanks first of all for the useful info.
    the idea in this article is quite different and innovative please update more.
    AWS training courses near me
    AWS training in Chennai
    AWS Web Services Training in Bangalore
    AWS Training center in Bangalore

    ReplyDelete
  78. You have given a very good explanation about the billing software. I was also looking for such a information about the Healthcare software.
    Cloud computing Training |
    Cloud computing Training in Chennai |
    Cloud computing courses in Chennai

    ReplyDelete
  79. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday.
    iosh course in chennai

    ReplyDelete
  80. Thanks for the good words! Really appreciated. Great post. I’ve been commenting a lot on a few blogs recently, but I hadn’t thought about my approach until you brought it up. 
    Best Devops Training in pune
    excel advanced excel training in bangalore
    Best Devops Training in pune
    excel advanced excel training in bangalore

    ReplyDelete
  81. Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
    python course in pune
    python course in chennai
    python course in Bangalore

    ReplyDelete

  82. Great Post! Such a useful information. You are helping many with your post. Keep up the good work.
    Click Here : 2006 Cat 140H (616) w/10365 Hrs For Sale at $98k

    ReplyDelete
  83. This blog is the general information for the feature. You got a good work for these blog.We have a developing our creative content of this mind.Thank you for this blog. This for very interesting and useful.
    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs interview questions and answers

    angularjs Training in marathahalli

    angularjs interview questions and answers

    angularjs-Training in pune

    ReplyDelete
  84. Have you been thinking about the power sources and the tiles whom use blocks I wanted to thank you for this great read!! I definitely enjoyed every little bit of it and I have you bookmarked to check out the new stuff you post
    rpa training in bangalore
    best rpa training in bangalore
    rpa training in pune

    ReplyDelete
  85. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 
    rpa training in bangalore
    rpa training in pune
    rpa online training
    best rpa training in bangalore

    ReplyDelete
  86. This is quite educational arrange. It has famous breeding about what I rarity to vouch.
    Colossal proverb. This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved.
    This arrange is synchronous s informative impolite festivity to pity. I appreciated what you ok extremely here.


    Selenium interview questions and answers
    Selenium Online training
    Selenium training in Pune
    selenium training in USA
    selenium training in chennai

    ReplyDelete
  87. Thanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.
    Microsoft Azure online training
    Selenium online training
    Java online training
    Python online training
    uipath online training

    ReplyDelete
  88. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    oneplus service center near me
    oneplus service
    oneplus service centres in chennai
    oneplus service center velachery

    ReplyDelete
  89. This comment has been removed by the author.

    ReplyDelete
  90. Amazing post!

    What a job dude

    You have impressed me by your writing skills, thanks for making internet worth, keep writing good content

    Thanks for sharing

    Keep it up and loud :)


    iata course fees
    iata air ticketing course
    air ticketing course
    air ticketing course in delhi
    iata air ticketing course
    IATA Training institute in delhi

    ReplyDelete
  91. Visit for AWS training in Bangalore:- AWS training in Bangalore

    ReplyDelete
  92. Linking is very useful thing.you have really helped lots of people who visit blog and provide them use full information.microsoft training in bangalore

    ReplyDelete
  93. Really it was an awesome article,very interesting to read.You have provided an nice article,Thanks for sharing.Cloud Computing training in bangalore

    ReplyDelete
  94. I know that it takes a lot of effort and hard work to write such an informative content like this.java training in bangalore

    ReplyDelete
  95. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..

    oracle apex tutorial

    ReplyDelete
  96. Learned a lot of new things from your post! Good creation and HATS OFF to the creativity of your mind.HADOOP BIGDATA training in bangalore

    ReplyDelete
  97. It was a very good experience,Faculty members are very knowledgeable and cooperative. Specially My trainer teaching more as he focused upon practical rather than theory. All together it was an enlightening and informative course.

    microsoft training and placement support in bangalore

    microsoft training free demo class

    microsoft placement bangalore

    microsoft online training

    microsoft classroom training

    microsoft training with lab facilities

    microsoft training with certified and experienced trainers


    ReplyDelete
  98. Thanks for post ing such an useful and informative stuff.SVR Technologies is the best online training institute for Selenium Online Training and we also offer self learning on Selenium Tutorials which will be very helpful for Selenium Tutorial for Beginner

    ReplyDelete
  99. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    top servicenow online training
    best servicenow online training
    servicenow online training

    ReplyDelete



  100. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision. i also want to share some infor mation regarding sap pp module training and sap sd training .keep sharing.

    ReplyDelete
  101. This is so elegant and logical and clearly explained. Brilliantly goes through what could be a complex process and makes it obvious.

    sap bw training

    ReplyDelete
  102. This is so elegant and logical and clearly explained. Brilliantly goes through what could be a complex process and makes it obvious. microsoft azure tutorial

    ReplyDelete
  103. Effective blog with a lot of information. I just Shared you the link below for Courses .They really provide good level of training and Placement,I just Had PHP & MySQL Classes in this institute , Just Check This Link You can get it more information about the PHP & MySQL course.


    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ReplyDelete
  104. How do you merge like 10 files in stats?

    ReplyDelete
  105. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.

    machine learning courses in bangalore

    ReplyDelete
  106. This comment has been removed by the author.

    ReplyDelete
  107. This comment has been removed by the author.

    ReplyDelete
  108. This comment has been removed by the author.

    ReplyDelete
  109. STATA 17 MP Crack Full Version
    Stata 17.0 MP Crack Full Version is an integrated statistical tool which gives data analysis
    Stata 17 MP Crack Full Version Full Download! Stata 17 MP Crack Full Version is flexible and powerful statical software for the science field
    Link Download Stata 17 MP Crack Full Version
    https://dik.si/MPV17

    ReplyDelete
  110. Very Informative blog thank you for sharing. Keep sharing.

    Best software training institute in Chennai. Make your career development the best by learning software courses.

    azure course in chennai
    RPA Training in Chennai
    DevOps Training in Chennai
    Cloud-computing Training in Chennai
    Ui-Path Training in Chennai
    PHP Training in Chennai
    Blue-Prsim Training in Chennai

    ReplyDelete
  111. AI Patasala’s Data Science Training in Hyderabad program is the ideal option for those who are looking to start their career in the data science field.
    Data Science Course Training Institute in Hyderabad

    ReplyDelete
  112. Thanks for posting the best information and the blog is very helpful. Hyderabad Sweets Shop

    ReplyDelete
  113. I saw some testimonies about this herbal specialist called @drmosesbuba and decided to email him so I gave his herbal product a try. i emailed him and he get back to me and we discussed, he gave me some comforting words and encouraged me also and then gave me his herbs and cream for Penis Enlargement Within 1 week of it, i began to feel the enlargement of my penis, " and now it just 2 weeks of using his products my penis is about 10 inches longer and am so happy today. Contact @drmosesbuba for any problem via WhatsApp +2349060529305 or email buba.herbalmiraclemedicine@gmail.com

    Thanks doctor Moses

    ReplyDelete
  114. This post is so interactive and informative.keep update more information...
    DevOps course in Tambaram
    DevOps Training in Chennai

    ReplyDelete
  115. Play casino - No.1 for the Casino Guru
    No longer have the opportunity to go to the casinos or read titanium metal trim the https://sol.edu.kg/ reviews of the slots you 토토사이트 love. But they're not always the same. Sometimes you worrione.com have a new online

    ReplyDelete
  116. I used to learn these datasets and data structure in my university where I worn Voguish Vibe and I felt good and easy to learn these complex datasets.

    ReplyDelete
  117. Portable STATA 18 Crack Full Version
    STATA 18 Crack Full Version
    STATA 18 Full Version
    Link Download STATA 18 Full Version
    https://rutube.ru/video/2eab69d75044eb5856998125e0e71a93

    ReplyDelete
  118. I want to tell the world about a great man called Dr. Robinson buckler cured my husband and I from Herpes simplex with herbal medicine. My husband and I have was suffering from Herpes simplex for the past four (4) years. We have tried so many solutions with no result. One fateful day while browsing through the internet I saw a testimony of a client who got cured from herpes by Dr. Robinson buckler through herbal medicine so I decided to give a try. A try that changed our life for good. I contacted Dr. Robinson buckler and he sent some herbal medicine to us, which we took for 14 days. It was a great surprise when we went for a test and the test result came out negative., Dr. Robinson buckler brought joy into my family again. His result is 100% guaranteed. certainly the best online. YESS! SO.MUCH.YES. I love this, it’s exactly what i prayed for!.. it’s unbelievable! ....Thank you!! Very well!.. You contact him on his email. [R.buckler11@gmail. com].....

    ReplyDelete
  119. I really like and appreciate your post. Really thank you! Fantastic.
    https://viswaonlinetrainings.com/courses/powershell-online-training/
    https://viswaonlinetrainings.com/courses/windows-server-online-training/
    https://viswaonlinetrainings.com/courses/php-online-training/
    https://viswaonlinetrainings.com/courses/apache-spark-online-training/
    https://viswaonlinetrainings.com/courses/rpa-online-training/
    https://viswaonlinetrainings.com/courses/sap-spartacus-online-training/
    https://viswaonlinetrainings.com/courses/linux-admin-online-training/
    https://viswaonlinetrainings.com/courses/ibm-cast-iron-online-training/

    ReplyDelete