simple task in stata anaysis
Homework 5: Non-Linear Regression
(Due: 8:50am CST, Nov 29, 2024)
Let’s suppose you are interested in how life satisfaction is associated with internet use, age, gender, and marital status (i.e., married vs. non-married) for older adults (or how older adults’ internet use, age, gender, and marital status influence life satisfaction).
You set up your research question in the following model:
(Eq. 1)
To examine your research question empirically, you figured out that the Health and Retirement Study (HRS) is the secondary data that best suits your goal. Thus, you want to work with 2018 HRS core data.
1. Please go to the following web:
2. Download the relevant files (e.g., h18sta.zip files… ).
3. In
Section LB: Leave Behind Questionnaires (Respondent), you want to take the information about life satisfaction. Use the relevant dta file and keep only three variables:
hhid, pn, and QLB002C.
HHID HOUSEHOLD IDENTIFICATION NUMBER
Section: LB Level: Respondent Type: Character Width: 6 Decimals: 0
...........................................................................
17146 010003-959738. Household Identification Number
PN RESPONDENT PERSON IDENTIFICATION NUMBER
Section: LB Level: Respondent Type: Character Width: 3 Decimals: 0
...........................................................................
9738 010. Person Identifier
654 011. Person Identifier
32 012. Person Identifier
1 013. Person Identifier
5480 020. Person Identifier
180 021. Person Identifier
21 022. Person Identifier
1 023. Person Identifier
353 030. Person Identifier
44 031. Person Identifier
3 032. Person Identifier
1 033. Person Identifier
586 040. Person Identifier
47 041. Person Identifier
4 042. Person Identifier
1 043. Person Identifier
QLB002C Q02C. SATISFIED WITH LIFE
Section: LB Level: Respondent Type: Numeric Width: 1 Decimals: 0
Please say how much you agree or disagree with the following statements. (Mark (X) one box for each line.)
I am satisfied with my life.
....................................................................
262 1. STRONGLY DISAGREE
323 2. SOMEWHAT DISAGREE
348 3. SLIGHTLY DISAGREE
362 4. NEITHER AGREE OR DISAGREE
734 5. SLIGHTLY AGREE
1841 6. SOMEWHAT AGREE
1774 7. STRONGLY AGREE
11502 Blank. INAP (Inapplicable); Partial Interview
4. Rename the variable QLB002C to life_sat.
5. Drop individuals who reported the missing (.) in life_sat variable.
6. Label the values of life_sat variable
1. STRONGLY DISAGREE
2. SOMEWHAT DISAGREE
3. SLIGHTLY DISAGREE
4. NEITHER AGREE OR DISAGREE
5. SLIGHTLY AGREE
6. SOMEWHAT AGREE
7. STRONGLY AGREE
7. Save the data, temp_lb_2018.dta.
8. In
Section W: Event History, Internet Use and Social Security (Respondent), you want to take the information about internet use. Use the relevant dta file and keep only three variables:
hhid, pn, and QW303.
HHID HOUSEHOLD IDENTIFICATION NUMBER
Section: W Level: Respondent Type: Character Width: 6 Decimals: 0
.........................................................................
17146 010003-959738. Household Identification Number
==========================================================================
PN RESPONDENT PERSON IDENTIFICATION NUMBER
Section: W Level: Respondent Type: Character Width: 3 Decimals: 0
...........................................................................
9738 010. Person Identifier
654 011. Person Identifier
32 012. Person Identifier
1 013. Person Identifier
5480 020. Person Identifier
180 021. Person Identifier
21 022. Person Identifier
1 023. Person Identifier
353 030. Person Identifier
44 031. Person Identifier
3 032. Person Identifier
1 033. Person Identifier
586 040. Person Identifier
47 041. Person Identifier
4 042. Person Identifier
1 043. Person Identifier
QW303 REGULAR USE OF WEB FOR EMAIL
Section: W Level: Respondent Type: Numeric Width: 2 Decimals: 0
Ref: EventHistory.W303_
Do you regularly use the Internet (or the World Wide Web) for sending and receiving e-mail or for any other purpose, such as making purchases, searching for information, or making travel reservations?
User Note: Interviewer-administered item.
...........................................................................
14 -8. Web non-response
10068 1. YES
6911 5. NO
9 8. DK (Don’t Know); NA (Not Ascertained)
22 9. RF (Refused)
122 Blank. INAP (Inapplicable); Partial Interview
9. Rename the variable QW303 to internet_use.
10. Drop individuals who reported the values -8, 8, and 9 of the “internet_use.”
11. Replace the value 5 (NO) with 0.
12. Save the data, temp_w_2018.dta.
13. Merge two data files, temp_lb_2018.dta and temp_w_2018.dta.
14.
Keep individuals that are exactly matched (i.e., Keep individuals shown both in the master and using data).
15. Save the merged data, temp_merged_lb_w_2018.dta.
16. Please download the
h18_trk.dta
from Canvas and keep the variables:
hhid, pn, qage, gender, and qmarst.
HHID HOUSEHOLD IDENTIFICATION NUMBER
Section: W Level: Respondent Type: Character Width: 6 Decimals: 0
010003-959738. Household Identification Number
===========================================================================
PN RESPONDENT PERSON IDENTIFICATION NUMBER
Section: W Level: Respondent Type: Character Width: 3 Decimals: 0
QAGE AGE AT 2018 INTERVIEW
Section: TR Level: Respondent Type: Numeric Width: 3 Decimals: 0
999: Not applicable
GENDER GENDER
Section: TR Level: Respondent Type: Numeric Width: 1 Decimals: 0
...........................................................................
19180 1. Male
24234 2. Female
144 Blank. Unknown
QMARST 2018 MARITAL STATUS
Section: TR Level: Respondent Type: Numeric Width: 1 Decimals: 0
This variable may not be completely consistent with core data. Corrections have been made to this variable based on cross-wave information. See Section 5B1 in the tracker data description for more information.
...........................................................................
9758 1. Married
3625 2. Separated/Divorced
3024 3. Widowed
1343 4. Never Married
260 5. Marital Status Unknown
25548 Blank. No core interview from household, or not in sample this wave
17. Rename the variable qage to age.
18. Drop individuals who reported 999 in the age variable.
19. Drop individuals who reported missing (.) in the gender variable.
20. Rename the variable qmarst to marital_st.
21. Drop individuals who reported 5 or missing (.) in the marital_st variable.
22. Generate an indicator variable for married (i.e., i_married) and code 1 if individuals reported 1 in the marital_st variable and 0 if individuals reported 2, 3, or 4 in the marital_st variable.
23. Save the data, temp_trk_2018.dta.
24. Merge two data files, temp_trk_2018.dta and temp_merged_lb_w_2018.dta.
25. Keep individuals that are exactly matched (i.e., Keep individuals shown both in the master and using data).
This time, instead of using the OLS method, you want to use
non-linear regression models.
26. Run the ordered probit regression to estimate (Eq. 1) above.
27. Interpret your estimate for internet_use.
28. Run the ordered logit regression to estimate (Eq. 1) above, and get the estimates of odds ratio (i.e., use the “or” option in the ologit command).
29. Interpret your estimate for internet_use.
30. Your null hypothesis: Gender does not affect life satisfaction. Based on your ordered logit estimate for gender in Q28, test your hypothesis
using the 95% confidence interval.
31. Visualize the conditional predicted probability of each response category (1. SDA, … 7. SA) for individuals who are female and aged between 20-80.
Submit your 1) do file and 2) MS Word document.
4