Using Stata to Analyze Complex Survey Data 2013 Tanzania National Grade 2 Cross-sectional Survey using EGRA/EGMA SSME instruments Chris Cummiskey & Marissa Gargano Sunday March 5, 8:30 14:45 Georgia 2 (South Tower) CIES 2017 Downtown Sheraton Atlanta, GA RTI International is a registered trademark and a trade name of Research Triangle Institute. www.rti.org Purpose1: Show that SRS is incorrect for inferencial Descriptive - SRS describe the sample (not the population from which the same came)

Sample Sample Complex Survey - Inferential Project the sample to the population Pop ulati on Cluster Effect e

s y Sample sv 2 y v s t ts h ig e w Review Background, Research Questions, Sample Methodology.

Materials used in this section: 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx [Paper] 1_Analyze_Tanzania-Data_CIES2017_Analysis-Workshop.do [Electric] 1_Worksheet_Tanzania Grade 2 National Cross_sectional EGRA-EGMA study.xlsx [Electric] Background 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx [Paper] Who? : Standard 2 students attending government schools. What? : English/Kiswahili EGRA & EGMA cross-sectional (snapshot)

study. Where? : Tanzania National. When? : October, 2013 [end of Grade 2]. Why? : To get an national picture of grade 2 classrooms, teachers, students and student reading/math ability. To better understand the different aspects among schools in low, middle and high performance bands, as labeled in the National Standard 7 Leaving List Frame Population Data 2012 National Primary-Schools-Leaving-Certificate-Examination (PSLCE). Contains all primary government schools Contains the information needed to draw the sample Popul ation

0.) Stata: Lets take a look at the Population Data Pop ulati on 1_Analyze_Tanzania-Data_CIES2017_Analysis-Workshop.do Read in the Tanzania Census Data [School Level] use \Analysis Workshop\Materials\Tanzania_2013\Data\ Census_ListFrame For Tanzania2013-National Survey_PLSE7.dta, clear What level is the Population dataset? How many units are in the list?

What percent of the POPULATION is urban/rural? What percentage of the POPULATION is high/mid/low performing band? TZ-2013 Sample Methodology Sample Stage Number Item Sampled Stage 1 Councils (20) Stage 2 Schools

(200) Strata Urban/Rural (2) School Performance Stage 3 Stage 4 Probability of Selection Number of Schools Total high/mid/low performing schools in selected councils Number of

Schools (None) Total grade 2 classrooms in selected school Equal Gender (2) Total grade 2 female/male in selected classroom Equal (3) G2 Classrooms

(200) G2 Student (2,266) FPC Finite population correction Total urban councils + total rural councils 20 Selected Councils 1-1.) Comparison of the Sample and Estimated Population 1_Analyze_Tanzania-Data_CIES2017_Analysis-Workshop.do What is the sample count and percentages for the school performance Sample (band)? What is the estimated population count and percentages for the school performance (band)?

Pop ulati on What group(s) were over sampled? What group(s) were under sampled? How might how this over/under sample effect the results if they are not accounted for in the analysis? 1.2) Compare SRS vs. Complex Analyses 1_Analyze_Tanzania-Data_CIES2017_Analysis-Workshop.do

Compare the high bands SRS vs. Complex for the mean estimates. Is there a big difference? Compare the high bands SRS vs. Complex for the SE and 95%CI estimates. Is there a big difference? Do the same low/mid bands? How might how the over/under sample effect the NATIONAL results if they are not accounted for in the analysis? 1.2) Compare SRS vs. Complex Analyses: Nationallly How might how the over/under sample effect the NATIONAL results if they are not accounted for in the analysis? If the analysis thinks the students were sampled with SRS? If the analysis knows how the students were really sampled?

Mean estimates SE and 95%CI Estimates 1.3) Explore the TZ-2013 svyset 1_Analyze_Tanzania-Data_CIES2017_Analysis-Workshop.do [Refer back to the Sample Methodology Table] Cluster Effect ts h ig

e w Understanding the Sample Motive Sample methodology: Must be developed to answer the primary research designs. Must account for the cost of data collection Should account for the data collection logistics Should be tweaked to maximize the Statistical-Benefit : Cost ratio Statistically Ideal Sample: Sample weights are roughly balanced Large sample size Small clusters (no more than 20 students per cluster)

How effective was our sample? How balanced were the weights in the sample? How large was our sample of grade 2 students? How large were the clusters in the sample? What ways could this sample have been more statistically sufficient? Conduct the same analysis but for Urban/Rural and k_orf Begin to answer some research questions.

Materials used in this section: 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx Primary Analysis: Answering the Primary Research Questions 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx [Paper] P-I. What is the national Kiswahili literacy ability of Grade 2 students attending non-special governmental schools? P-II. What is the national English literacy ability of Grade 2 students attending non-special governmental schools? P-III. What is the national Mathematic ability of Grade 2 students attending non-special governmental schools? P-IV. Based on the national Kiswahili literacy ability, how different were Grade 2 students reading ability by:

a. b. c. School-band (high/mid/low performing) Gender Urban/Rural Primary Analysis: Answering the Primary Research Questions P-I. What is the national Kiswahili literacy ability of Grade 2 students attending non-special governmental schools? CODE: svy: mean k_orf

P-IV. Based on the national Kiswahili literacy ability, how different were Grade 2 students reading ability by: a. School-band (high/mid/low performing) CODE: svy: reg k_orf band OR: svy, over(band): mean k_orf Secondary Analysis: Answering the Secondary Research Questions 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx [Paper] S-1. What does a Grade 2 student from a low/medium/high performing school look like? a. What are students demographic make-up? b. What does the school they attend look like? c. What does the classroom they are instructed in look like?

d. What does the household environment look like? S-2. Of these characteristics mentioned above, what seems correlated with higher/lower reading ability? S-3. How well correlated are the English literacy ability with the Kiswahili literacy ability? Secondary Analysis: Answering the Secondary Research Questions 0_Research Questions for Grade 2 Tanzania EGRA-2013.docx [Paper] S-1. What does a Grade 2 student from a low/medium/high performing school look like? a. What are students demographic make-up? CODE: svy, over(band): proportion female CODE: svy, subpop(if band == 1): tab age b. What does the classroom they are instructed in look like? CODE: svy, subpop(if band == 2 | band == 3): proportion tr_1

S-2. Of these characteristics mentioned above, what seems correlated with higher/lower reading ability? CODE: svy: reg k_orf ib1.band ib0.tr_1 age ib0.female Contact Information THANK YOU! Chris Cummiskey: Email:[email protected] Skype: chris.cummiskey Marissa Gargano: Email: [email protected]

Skype: marissangargano RTI: @RTI_EdWork @RTI_Intl_Dev SharEd.rti.org