6 Prepare data for PovcalNet update
In this chapter we briefly describe what needs to be done with each auxiliary dataset used to update PovcalNet. Each of the next sections describes one dataset and, in general, they are divided in Raw Data and Update Master File subsection.
Most of the updating of the master file is carried by the
pcn master, udpate(sheet)
command in Stata, where sheet
refers to name of
the sheet in the master file. If you need to make any modification, please make
sure to create a new branch from the most recent of the master branch in the
PovcalNet-Team/pcn Github repository.
6.1 Population
6.1.1 Raw data
Everything related population data should be placed in the folder
p:/01.PovcalNet/03.QA/03.Population
, hereafter (./
).
Population data is saved in the folder ./data
. The data may come from two
different sources, WDI or from an internal source in DECDG. Right now, the data
is built and shared by Emi Suzuki <esuzuki1@worldbank.org>
in an Excel file.
The file shared by Emi should be placed without modification in the folder
./data/original
. Then, the file is copied again into the folder ./data
with
the name population_country_yyyy-mm-dd.xlsx
where yyyy-mm-dd
refers to the
official release date of the population data. As of today,
April 06, 2021, the files available in the system are,
fs::dir_tree("p:/01.PovcalNet/03.QA/03.Population/data",
recurse = FALSE)
#> p:/01.PovcalNet/03.QA/03.Population/data
#> +-- original
#> +-- population_aggregate_2019-07-01.xlsx
#> +-- population_country_2019-07-01.xlsx
#> +-- population_country_2020-07-01.xlsx
#> +-- population_country_2020-12-01.xlsx
#> \-- population_missing_2020-12-01.xlsx
The main file shared by Emi has data gaps for KWT, PSE, and SXM. She shared
another file that fills these gaps, which is saved in the above folder as
population_missing_2020_12_01.xlsx
. It is unlikely that this file will change,
given that the data gaps concern historical figures. Right now this ‘missing’
file is automatically accounted for when updating the master file (see below).
In the unlikely case that these data will be updated, the code in
pcn_master_update.ado needs to be updated manually to reflect the newer file.
6.1.2 update master file
Once the new raw population data is in the folder ./data
it is necessary to
update the population
sheet in the master file. This can be done by just
typing,
pcn master, update(population)
Finally, you can check the differences between the two versions by using the following code,
load(Population)
pcn master, rename population pop2
tempfile pop2
save `pop2'
// 'the version to compare with must change depending on the vintage available
load(Population) version(-1)
pcn master, rename population pop1
merge 1:1 countrycode coveragetype year using `pop2', gen(checkpop) // '
tab year if (checkpop == 2)
tab countryname if (checkpop == 2 & year < 2019)
gen diffpop = pop1 - pop2
sum diffpop
if abs(diffpop) > `r(mean)'*2*`r(sd)' & diffpop != . br
Keep in mind that option version(-1)
refers to one version before the current
one. If it were version(-2)
, it would be two versions before the current one.
Keep in mind that the negative integer in this function refers to the version of
the master file and not to the version of the sheet you’re loading. Currently,
there is no way to track the versions of each sheet in the Master file.
6.2 GDP and Consumption (PCE)
GDP and consumption data are used to measure economic growth for the interpolation and extrapolation calculations. PCE refers to “per capita expenditure,” more conventionally known as household final consumption expenditure (HFCE).
6.2.1 Raw data
The raw data of GDP and PCE data comes from WDI through the Stata command
wbopendata
. For GDP the indicator is NY.GDP.PCAP.KD
(GDP per
capita–constant 2010 US$) and for PCE the indicator used is
NE.CON.PRVT.PC.KD
(Households Final Consumption Expenditure per
capita–constant 2010 US$). For some countries, however, the economic growth
data is not available in WDI, in which case we use data from WEO, the Maddison
project, and country specific sources, in that order. You can see
below a deeper explanation and process on how to update the auxiliary
files for these special cases.
Preparing for updating of master file
In order to prepare the data before the master file can be updated, do the following,
- Get the data from the latest version of the World Economic
Outlook
(WEO). Save it in the folder
P:/01.PovcalNet/03.QA/04.NationalAccounts/data
with the nameWEO_YYYY-mm-dd.xls
, whereYYYY
refers to year,mm
to month, anddd
to day (identifying the vintage of the file). - Make sure that the Maddison data used in the pcn_master_update.ado is the most recent vintage available of the Maddison Database: https://www.rug.nl/ggdc/historicaldevelopment/maddison/data. If it is not, change the line in the pcn_master_update.ado accordingly.
- Check with Andrés, Christoph, or Daniel on whether any things need to be
changed for the countries that use country specific sources. These Special
Cases can be found in the folder
P:/01.PovcalNet/03.QA/04.NationalAccounts/data
. This folder contains all the different versions previously used in PovcalNet with national accounts info on special cases. Each file is calledNAS special_YYYY-mm-dd.xlsx
. When you update the information of these files make sure you, open the most recent version, save with the current date, edit it as needed. In the country-specific sheets, you will find the source of data and clues on how best to update each observation. IMPORTANT: You should check whether the country-year is still a special case; e.g. the standard WDI, WEO and Maddison sources may have been extended to cover additional years. Similarly, there could be new special cases (i.e. WDI/WEO/Maddison observations were available in a previous vintage, but no longer are).Make sure you have the exact variables (countryname coverage countrycode year GDP PCE sourceGDP sourcePCE) as in the previous Excel file for the special cases national accounts.The variable names are case-sensitive.
Description of WEO series
NE.CON.PRVT.PC.KD: Household final consumption expenditure per capita (constant 2010 US$).
NY.GDP.PCAP.KD: GDP per capita is gross domestic product divided by midyear population. Data are in constant 2010 U.S. dollars.
NGDPRPC: Gross domestic product per capita, constant prices. GDP is expressed in constant national currency per person. Data are derived by dividing constant price GDP by total population.
NGDPRPPPPCPCH: Gross domestic product per capita, constant prices (purchasing power parity; percent change). Data are derived by dividing constant price purchasing-power parity (PPP) GDP by total population.
6.2.2 Update master file
Once the special cases file has been updated you must update the master file by using the following command lines,
pcn master, update(gdp) pcn master, update(pce)
The pcn
command integrates all the WDI, WEO, Maddison, and Special Cases files
in the correct format of the master file. You can run these lines even if only
one of the sources gets updated. If you want to understand how these two sheets
get updated, you can see the procedure in the file
pcn_master_update.ado.
In particular, you can see these
lines
for GDP and these
lines
for PCE.
6.3 CPI and PPP
Even though the CPI and PPP come from different sources, in the PovcalNet update
process we use the most recent version provided by datalibweb
(DLW). DLW
provides a curated, clean, and organized file with all the CPIs and PPPs values
for all the countries and years. If you want to load the most recent version
directly from DLW, you use the following lines of code,
local cpipath "c:\ado\personal\Datalibweb\data\GMD\SUPPORT\SUPPORT_2005_CPI"
local cpidirs: dir "`cpipath'" dirs "*CPI_*_M"
local cpivins "0"
foreach cpidir of local cpidirs {
if regexm("`cpidir'", "cpi_v([0-9]+)_m") local cpivin = regexs(1)
local cpivins "`cpivins', `cpivin'"
}local cpivin = max(`cpivins')
year(2005) type(GMDRAW) /*
cap datalibweb, country(Support) */ surveyid(Support_2005_CPI_v0`cpivin'_M) /*
*/ filename(Final_CPI_PPP_to_be_used.dta)
6.3.1 Word of Caution
Be aware that as of today (2020-11-17) the datalibweb
system is being updating
and it is likely that the convention used to vintage control the CPI data will
change in the following month. That is, Dec 2020. If that is the case, the code
above would be obsolete and it will need to be updated in the
pcn_update_cpi.ado
file and in the
pcn_master_update.ado.
6.3.2 Different CPI datasets
The pcn
command updates and loads two databases of the CPI and PPP, and
thus, you need to update both datasets when the CPI or PPP data is updated in
DLW.
Special CPI cases
The primary source of CPI data for PovcalNet updates is the IMF’s International Financial Statistics (IFS) database (Lakner et al. 2018). When CPI series from surveys are deemed more appropriate, they are used instead of IFS CPI series. Currently, six countries are in this category of ‘special CPI cases,’ including Bangladesh, Ghana, Lao, Malawi, Tajikistan, and Zimbabwe.
The datafiles and dofiles for the special CPI cases are stored in this path,
p:/01.PovcalNet/03.QA/09.CPI/Special CPI Cases/
.
You should follow the steps below to update the special CPI cases:
Get CPI input file labeled as “Imputed_CPI_input.xlsx” from the P-drive. This file contains raw CPI series imputed from survey to survey.
Update the CPI input file if necessary. This file is updated when there is a new survey from which a new CPI can be imputed. (A country is not always a special case: A new survey may use the IFS CPI; the imputed CPIs are only used in exceptional cases.) The new CPIs are provided by poverty economists (PE’s). Daniel Gerszon Mahler currently liaises with PE’s to get the correct CPI and survey year. You may contact him for guidance and ideas for the whole exercise.
Get yearly annual CPI series from Datalibweb. The yearly annual series are collated and organized by Minh Cong Nguyen from monthly (or, in a few cases, quarterly) IFS data and annual World Economic Outlook (WEO) data. Use the code below to obtain the required data.
year(2005) type(gmdraw) surveyid(Support_2005_CPI_v05_M) filename(Yearly_CPI_Final.dta) dlw, country(support)
[Note that this code is for the current version of Datalibweb (i.e., Support_2005_CPI_v05). Check with Minh Cong Nguyen for the latest version and line of code to use.]
Generate a final data set with label “ImputedCPIs_output_yyyymmdd.dta.” This file contains the final CPI series with the ICP reference year as base year, which is currently 2011. See the latest file for the required format.
As robustness checks, construct the final CPI series afresh as in sheet(Calculations) in “Compare Imputed CPIs yyyymmdd.xlsx” and compare the Excel results with the Stata output. You may compare the results using the dofile labelled “combine_countries_compare_results.do.”
Send the Stata output to Minh Cong Nguyen. He will integrate the file into his price framework.
Save all datafiles and dofiles in a sub-folder in the P-drive [e.g., P:\CPI\Special CPI Cases\PovcalNet Update Mar 2021].
Internal data
This dataset is used only by the PovcalNet team and can be loaded in Stata by
using the pcn load cpi, clear
directive and updated by using the directive
pcn update cpi
. This dataset is intended to be used for general purposes that
require deflation such as reports, papers, or even internal tests. However, it
can’t be used to update the CPI sheet in the master file. Though the data is
the same, the format and labels are very different between the two. If you need
to update the format of this dataset, you need to modify the file
pcn_update_cpi.ado
CPI sheet in the Master file
This dataset can be loaded though the directive, pcn master, load(cpi)
. By
default the data will be loaded in Stata in wide form, which is the original
format in the master file. You can load it in Stata basically for testing and
checking purposes but it is highly discouraged to use this data for any data
work. It comes in a very unfriendly format that is only useful for the PovcalNet
system. pcn
, however, provides the option pcn master, load(cpi) shape(long)
that attempts to reshape the data in long format for ease of use, but it is
still not as nice and friendly as the data loaded from pcn load cpi
.
In order to updated the CPI sheet in the master file, you must use the
directives pcn master, update(cpi)
and pcn master, update(ppp)
. Even though
the CPI and PPP come from the same file, they are split in two different sheets
in the master file. Keep in mind that if the CPI vintage control in DLW changes
as mention above, you will need to update the corresponding lines
in
pcn_master_update.ado.
6.4 Updating the “SurveyInfo” sheet of the Master file
Updating the SurveyInfo sheet in the Master file comprises three general steps:
1. Identifying the recently added country/year combinations (data points) from PRIMUS;
2. merging new rows into the existing SurveyInfo sheet;
3. manually updating columns E to L based on previous years information, as well as finding out the number of observations in the survey to update column I.
A set of short do-files have been written and saved in the folder
p:\01.PovcalNet\03.QA\10.Master file work\
to expedite this process. To
work in a new update, create a new folder with Month and Year copy the
do-files from the previous update.
6.4.1 New surveys from PRIMUS
To identify the recently added data points, open do-file
01. primus dowload(approved & pending).do
and update with the following
changes:
- Adjust the path on top.
- Set a
date_modified
Steps I and II usually to the past 3 months. For instance, for March updates, date is usually set it for December 1st of the previous year.
keep if date_modified > clock("2020-12-01", "YMD")
- Check Step IV. Some surveys like EU-SILC collect information on earnings from the “previous year.” Thus, PovcalNet adjusts the survey year to year-1 for published estimates. Make sure the list of modifications on this step is complete.
- Save changes and run do-file.
6.4.2 Merging new rows into SurveyInfo Sheet
Open do-file 02. merge primus into SurveyInfo.do
, update the path on top and
run. (This should run smoothly if excel files in the previous step remained
unchanged).
6.4.3 Manual updates
The newly added surveys obtained from PRIMUS should be those identified as “using only (2)” in your column _merge (O). The final step thus, consists of manually updating the information on columns E to M based on the information you have from previous years in each particular country.
For most countries with a continuous series, the survey acronym has not changed from the previous year, so you can just copy and paste fields like the survey name and conductor. In the cases when the acronym is not the same from the past, some researching will be required (google, microdatalibrary, IHSN, etc.) in order to obtain the official survey name and conducting institution.
Finally, the SampleSize (column L) must be updated on a case by case basis.
You can load the new databases with the command pcn and check observations. The
do-file _count obs in vintage ctrl bases.do
, can assist you to quickly run
individual and household observations for a chunk of years in a particular
country (Note: the module “mod” in line 13 might need adjustment depending on
the case).
6.4.4 Save & Inform
When you have finished all updates in the document, compare the number of rows in your new Excel with the ones in the Master SurveyInfo sheet.
Update the Master:
Open ONLY SurveyInfo sheet from the latest vintage,
Copy and Paste from your file into the Master omitting columns N & O (overall_status and _merge). Make sure columns A to M are in the same order.
Go to Management tab -> click on Save in Master -> add a short note describing your updates -> click on “OK.”
Inform the team managers that Master file has been updated, indicating the number of new surveys (rows) added.