STATA Code's Corner Workshop


Maryiam Haroon and Sadia Hussain

CREB organized its third 3-day virtual workshop on the use of STATA software. Stata is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. Researchers have used it in many fields, including economics, sociology, political science, biomedicine, and epidemiology. The objective of this workshop was to provide participants valuable tips on handling data, exporting graphs and tables, and writing code which would be helpful in data analysis, writing papers, and policy briefs[1]. The workshop was conducted online in three sessions from 2nd to August 4th, 2021. The contents covered in each of the three sessions were as follows:

Session I: Data cleaning for string variables and merging datasets using matchit routine

The first session was held on August 2nd from 1:00-2:00 pm. This session was intended to focus on data cleaning for string variables and matching data sets. Many raw datasets- survey and administrative data- contain string variables that need to be cleaned before being processed and analyzed. STATA has a lot of functions that greatly facilitate working with string variables. In this session, the trainers also focused on merging datasets without unique identifiers. Researchers face a common data challenge where the unique identifier in one of the survey waves is missing or wrongly captured such that successful merging is not possible. The trainers made use of string variables to match datasets without having unique identifiers.

Session II: Good STATA practices and creating graphs

The second session was held on August 3rd from 1:00- 2:00 pm. The session was intended to focus on introducing participants to good STATA practices and make meaningful graphs. Data and code management during a project is essential for transparency after a project’s completion. Such practices are also important for internal use as projects often run for multiple years, with several team members working on them sequentially. In this session, the  participants were introduced to best practices for data and code management for better record keeping and use of data in an effective manner. Participants also learned to use the data to create eye-popping graphs since STATA’s default graphs are not the best. The trainers used a consistent and relatively easy way to adjust graphs settings globally, meaning one doesn’t have to change the code of each graph separately, something that comes in handy if you are preparing multiple graphs. 

Session III: Flexible code for balance and summary tables and Automating STATA to convert tables into Latex

The last session was held on August 4th from 1:00 to 2:00 pm to understand how to convert the output from Stata to Latex using flexible codes. For example, most empirical research papers require constructing balance summary tables and summary tables. While standardized tools such as toolkit do the job, this code would enable one to create Latex output using Stata flexibly. For example, one may create tables with selective samples and varying observations. Next, the trainers introduced automating Stata output to Latex by updating tables, figures, and statistics. Such practices minimize human errors while updating and making changes to the data.

[1] The material used in the workshop is not developed by the trainers. We have adopted the material from the Code’s Corner, the Centre for the study of African Economics, University of Oxford.