6 Prepare data for PovcalNet update

In this chapter we briefly describe what needs to be done with each auxiliary dataset used to update PovcalNet. Each of the next sections describes one dataset and, in general, they are divided in Raw Data and Update Master File subsection.

Most of the updating of the master file is carried by the pcn master, udpate(sheet) command in Stata, where sheet refers to name of the sheet in the master file. If you need to make any modification, please make sure to create a new branch from the most recent of the master branch in the PovcalNet-Team/pcn Github repository.

6.1 Population

6.1.1 Raw data

Everything related population data should be placed in the folder p:/01.PovcalNet/03.QA/03.Population, hereafter (./).

Population data is saved in the folder ./data. The data may come from two different sources, WDI or from an internal source in DECDG. Right now, the data is built and shared by Emi Suzuki <esuzuki1@worldbank.org> in an Excel file.

The file shared by Emi should be placed without modification in the folder ./data/original. Then, the file is copied again into the folder ./data with the name population_country_yyyy-mm-dd.xlsx where yyyy-mm-dd refers to the official release date of the population data. As of today, April 06, 2021, the files available in the system are,

fs::dir_tree("p:/01.PovcalNet/03.QA/03.Population/data", 
             recurse = FALSE)

#> p:/01.PovcalNet/03.QA/03.Population/data
#> +-- original
#> +-- population_aggregate_2019-07-01.xlsx
#> +-- population_country_2019-07-01.xlsx
#> +-- population_country_2020-07-01.xlsx
#> +-- population_country_2020-12-01.xlsx
#> \-- population_missing_2020-12-01.xlsx

The main file shared by Emi has data gaps for KWT, PSE, and SXM. She shared another file that fills these gaps, which is saved in the above folder as population_missing_2020_12_01.xlsx. It is unlikely that this file will change, given that the data gaps concern historical figures. Right now this ‘missing’ file is automatically accounted for when updating the master file (see below). In the unlikely case that these data will be updated, the code in pcn_master_update.ado needs to be updated manually to reflect the newer file.

6.1.2 update master file

Once the new raw population data is in the folder ./data it is necessary to update the population sheet in the master file. This can be done by just typing,

pcn master, update(population)

Finally, you can check the differences between the two versions by using the following code,

pcn master, load(Population) 
rename population pop2

tempfile pop2
save `pop2'

// 'the version to compare with must change depending on the vintage available
pcn master, load(Population) version(-1) 
rename population pop1
merge 1:1 countrycode coveragetype year using `pop2', gen(checkpop)  // '

tab year if (checkpop == 2)
tab countryname if (checkpop == 2 & year < 2019)
gen diffpop = pop1 - pop2
sum diffpop
br if abs(diffpop) > `r(mean)'*2*`r(sd)' & diffpop != .

Keep in mind that option version(-1) refers to one version before the current one. If it were version(-2), it would be two versions before the current one. Keep in mind that the negative integer in this function refers to the version of the master file and not to the version of the sheet you’re loading. Currently, there is no way to track the versions of each sheet in the Master file.

6.2 GDP and Consumption (PCE)

GDP and consumption data are used to measure economic growth for the interpolation and extrapolation calculations. PCE refers to “per capita expenditure,” more conventionally known as household final consumption expenditure (HFCE).

6.2.1 Raw data

The raw data of GDP and PCE data comes from WDI through the Stata command wbopendata. For GDP the indicator is NY.GDP.PCAP.KD (GDP per capita–constant 2010 US$) and for PCE the indicator used is NE.CON.PRVT.PC.KD (Households Final Consumption Expenditure per capita–constant 2010 US$). For some countries, however, the economic growth data is not available in WDI, in which case we use data from WEO, the Maddison project, and country specific sources, in that order. You can see below a deeper explanation and process on how to update the auxiliary files for these special cases.

Preparing for updating of master file

In order to prepare the data before the master file can be updated, do the following,

Get the data from the latest version of the World Economic Outlook (WEO). Save it in the folder P:/01.PovcalNet/03.QA/04.NationalAccounts/data with the name WEO_YYYY-mm-dd.xls, where YYYY refers to year, mmto month, and ddto day (identifying the vintage of the file).
Make sure that the Maddison data used in the pcn_master_update.ado is the most recent vintage available of the Maddison Database: https://www.rug.nl/ggdc/historicaldevelopment/maddison/data. If it is not, change the line in the pcn_master_update.ado accordingly.
Check with Andrés, Christoph, or Daniel on whether any things need to be changed for the countries that use country specific sources. These Special Cases can be found in the folder P:/01.PovcalNet/03.QA/04.NationalAccounts/data. This folder contains all the different versions previously used in PovcalNet with national accounts info on special cases. Each file is called NAS special_YYYY-mm-dd.xlsx. When you update the information of these files make sure you, open the most recent version, save with the current date, edit it as needed. In the country-specific sheets, you will find the source of data and clues on how best to update each observation. IMPORTANT: You should check whether the country-year is still a special case; e.g. the standard WDI, WEO and Maddison sources may have been extended to cover additional years. Similarly, there could be new special cases (i.e. WDI/WEO/Maddison observations were available in a previous vintage, but no longer are).Make sure you have the exact variables (countryname coverage countrycode year GDP PCE sourceGDP sourcePCE) as in the previous Excel file for the special cases national accounts.The variable names are case-sensitive.

Description of WEO series

NE.CON.PRVT.PC.KD: Household final consumption expenditure per capita (constant 2010 US$).
NY.GDP.PCAP.KD: GDP per capita is gross domestic product divided by midyear population. Data are in constant 2010 U.S. dollars.
NGDPRPC: Gross domestic product per capita, constant prices. GDP is expressed in constant national currency per person. Data are derived by dividing constant price GDP by total population.
NGDPRPPPPCPCH: Gross domestic product per capita, constant prices (purchasing power parity; percent change). Data are derived by dividing constant price purchasing-power parity (PPP) GDP by total population.

6.2.2 Update master file

Once the special cases file has been updated you must update the master file by using the following command lines,

pcn master, update(gdp)
pcn master, update(pce)

The pcn command integrates all the WDI, WEO, Maddison, and Special Cases files in the correct format of the master file. You can run these lines even if only one of the sources gets updated. If you want to understand how these two sheets get updated, you can see the procedure in the file pcn_master_update.ado. In particular, you can see these lines for GDP and these lines for PCE.

6.3 CPI and PPP

Even though the CPI and PPP come from different sources, in the PovcalNet update process we use the most recent version provided by datalibweb (DLW). DLW provides a curated, clean, and organized file with all the CPIs and PPPs values for all the countries and years. If you want to load the most recent version directly from DLW, you use the following lines of code,

local cpipath "c:\ado\personal\Datalibweb\data\GMD\SUPPORT\SUPPORT_2005_CPI"
local cpidirs: dir "`cpipath'" dirs "*CPI_*_M"

local cpivins "0"
foreach cpidir of local cpidirs {
    if regexm("`cpidir'", "cpi_v([0-9]+)_m") local cpivin = regexs(1)
    local cpivins "`cpivins', `cpivin'"
}
local cpivin = max(`cpivins')


cap datalibweb, country(Support) year(2005) type(GMDRAW) /*
*/  surveyid(Support_2005_CPI_v0`cpivin'_M) /* 
*/  filename(Final_CPI_PPP_to_be_used.dta)

6.3.1 Word of Caution

Be aware that as of today (2020-11-17) the datalibweb system is being updating and it is likely that the convention used to vintage control the CPI data will change in the following month. That is, Dec 2020. If that is the case, the code above would be obsolete and it will need to be updated in the pcn_update_cpi.ado file and in the pcn_master_update.ado.

6.3.2 Different CPI datasets

The pcn command updates and loads two databases of the CPI and PPP, and thus, you need to update both datasets when the CPI or PPP data is updated in DLW.

Special CPI cases

The primary source of CPI data for PovcalNet updates is the IMF’s International Financial Statistics (IFS) database (Lakner et al. 2018). When CPI series from surveys are deemed more appropriate, they are used instead of IFS CPI series. Currently, six countries are in this category of ‘special CPI cases,’ including Bangladesh, Ghana, Lao, Malawi, Tajikistan, and Zimbabwe.

The datafiles and dofiles for the special CPI cases are stored in this path, p:/01.PovcalNet/03.QA/09.CPI/Special CPI Cases/.

You should follow the steps below to update the special CPI cases:

Get CPI input file labeled as “Imputed_CPI_input.xlsx” from the P-drive. This file contains raw CPI series imputed from survey to survey.
Update the CPI input file if necessary. This file is updated when there is a new survey from which a new CPI can be imputed. (A country is not always a special case: A new survey may use the IFS CPI; the imputed CPIs are only used in exceptional cases.) The new CPIs are provided by poverty economists (PE’s). Daniel Gerszon Mahler currently liaises with PE’s to get the correct CPI and survey year. You may contact him for guidance and ideas for the whole exercise.
Get yearly annual CPI series from Datalibweb. The yearly annual series are collated and organized by Minh Cong Nguyen from monthly (or, in a few cases, quarterly) IFS data and annual World Economic Outlook (WEO) data. Use the code below to obtain the required data.


dlw, country(support) year(2005) type(gmdraw) surveyid(Support_2005_CPI_v05_M) filename(Yearly_CPI_Final.dta)

[Note that this code is for the current version of Datalibweb (i.e., Support_2005_CPI_v05). Check with Minh Cong Nguyen for the latest version and line of code to use.]

Generate a final data set with label “ImputedCPIs_output_yyyymmdd.dta.” This file contains the final CPI series with the ICP reference year as base year, which is currently 2011. See the latest file for the required format.
As robustness checks, construct the final CPI series afresh as in sheet(Calculations) in “Compare Imputed CPIs yyyymmdd.xlsx” and compare the Excel results with the Stata output. You may compare the results using the dofile labelled “combine_countries_compare_results.do.”
Send the Stata output to Minh Cong Nguyen. He will integrate the file into his price framework.
Save all datafiles and dofiles in a sub-folder in the P-drive [e.g., P:\CPI\Special CPI Cases\PovcalNet Update Mar 2021].

Internal data

This dataset is used only by the PovcalNet team and can be loaded in Stata by using the pcn load cpi, clear directive and updated by using the directive pcn update cpi. This dataset is intended to be used for general purposes that require deflation such as reports, papers, or even internal tests. However, it can’t be used to update the CPI sheet in the master file. Though the data is the same, the format and labels are very different between the two. If you need to update the format of this dataset, you need to modify the file pcn_update_cpi.ado

CPI sheet in the Master file

This dataset can be loaded though the directive, pcn master, load(cpi). By default the data will be loaded in Stata in wide form, which is the original format in the master file. You can load it in Stata basically for testing and checking purposes but it is highly discouraged to use this data for any data work. It comes in a very unfriendly format that is only useful for the PovcalNet system. pcn, however, provides the option pcn master, load(cpi) shape(long) that attempts to reshape the data in long format for ease of use, but it is still not as nice and friendly as the data loaded from pcn load cpi.

In order to updated the CPI sheet in the master file, you must use the directives pcn master, update(cpi) and pcn master, update(ppp). Even though the CPI and PPP come from the same file, they are split in two different sheets in the master file. Keep in mind that if the CPI vintage control in DLW changes as mention above, you will need to update the corresponding lines in pcn_master_update.ado.

6.4 Updating the “SurveyInfo” sheet of the Master file

Updating the SurveyInfo sheet in the Master file comprises three general steps:

1. Identifying the recently added country/year combinations (data points) from PRIMUS;

2. merging new rows into the existing SurveyInfo sheet;

3. manually updating columns E to L based on previous years information, as well as finding out the number of observations in the survey to update column I.

A set of short do-files have been written and saved in the folder p:\01.PovcalNet\03.QA\10.Master file work\ to expedite this process. To work in a new update, create a new folder with Month and Year copy the do-files from the previous update.

6.4.1 New surveys from PRIMUS

To identify the recently added data points, open do-file 01. primus dowload(approved & pending).do and update with the following changes:

Adjust the path on top.
Set a date_modified Steps I and II usually to the past 3 months. For instance, for March updates, date is usually set it for December 1st of the previous year.

keep if date_modified > clock("2020-12-01", "YMD")

Check Step IV. Some surveys like EU-SILC collect information on earnings from the “previous year.” Thus, PovcalNet adjusts the survey year to year-1 for published estimates. Make sure the list of modifications on this step is complete.
Save changes and run do-file.

6.4.2 Merging new rows into SurveyInfo Sheet

Open do-file 02. merge primus into SurveyInfo.do, update the path on top and run. (This should run smoothly if excel files in the previous step remained unchanged).

6.4.3 Manual updates

The newly added surveys obtained from PRIMUS should be those identified as “using only (2)” in your column _merge (O). The final step thus, consists of manually updating the information on columns E to M based on the information you have from previous years in each particular country.

For most countries with a continuous series, the survey acronym has not changed from the previous year, so you can just copy and paste fields like the survey name and conductor. In the cases when the acronym is not the same from the past, some researching will be required (google, microdatalibrary, IHSN, etc.) in order to obtain the official survey name and conducting institution.

Finally, the SampleSize (column L) must be updated on a case by case basis. You can load the new databases with the command pcn and check observations. The do-file _count obs in vintage ctrl bases.do, can assist you to quickly run individual and household observations for a chunk of years in a particular country (Note: the module “mod” in line 13 might need adjustment depending on the case).

6.4.4 Save & Inform

When you have finished all updates in the document, compare the number of rows in your new Excel with the ones in the Master SurveyInfo sheet.

Update the Master:

Open ONLY SurveyInfo sheet from the latest vintage,
Copy and Paste from your file into the Master omitting columns N & O (overall_status and _merge). Make sure columns A to M are in the same order.
Go to Management tab -> click on Save in Master -> add a short note describing your updates -> click on “OK.”

Inform the team managers that Master file has been updated, indicating the number of new surveys (rows) added.

5 Introduction to PovcalNet Update

7 LIS Data