B Kihoon’s Handover

The documentation below has been directly copied from Kihoon’s files.

B.1 GPWG process

Introduction

  • Prior to 2014, PovcalNet team obtained the survey data from regional focal points with the assistance of Minh.
  • Global Poverty Working Group (GPWG) was formed to compile and harmonize the surveys from regional teams.
  • In May 2014, the GPWG database started with ECA (257 surveys) and LAC (242 surveys) regional surveys.
  • In September 2014, EAP (48 surveys) and MNA (16 surveys) were added.
  • In May 2015, SAR (21 surveys) and SSA (107 surveys) regional surveys were added, so that all the 6 regional surveys were compiled for GPWG database.
  • As of 2019, about 1000 survey data from GPWG data are used for the PovcalNet poverty statistics update, which is more than 60% of all data used in the total PovcalNet database.
  • Other sources include those from LIS, EU-SILC high income countries, historical surveys and old grouped data.
  • Since the regional colleagues add or revise the survey data irregularly, it is important to check the database periodically to synchronize the GPWG database and PovcalNet databank.
  • Previously, PovcalNet team updated only surveys that were reported to have been changed or added. But there have been many missing updates and mistakes. Since computer technology became much faster and memory spaces became much larger, I have downloaded all the surveys each time and compared them with previous downloads. In that manner, I was able to log the changes of the database and confirmed that the changes were planned.
  • Many times, the database have been harmonized with mistakes such as year, terms of consumption (annual or monthly). I have reported the problems to the PovcalNet team and Minh so that the mistakes be corrected.
  • From 2017, GPWG decided to exclude negative households from the sample. It was reflected when the data were downloaded.

How to download surveys from the GPWG data and generate pcb files

  1. Ask Minh to send us the list with the most recent data files in the PRIMUS/datalibweb system.

    1.1. Data list received in May 2016 was in “p:/Kihoon/Handover/GPWG/Datalib_all_DTAs_20160502-101732.xlsx”

    1.2. Data list received in April 2017 was in “p:/Kihoon/Handover/GPWG/GPWG_latest.dta”

  2. Using the list, download the all the datasets using datalibweb. Datalibweb is an upgraded program to download the GPWG surveys from July 2018.

    2.1. Download using do-file: (p:/Kihoon/Handover/GPWG/Program/GPWG_Get_Data.2019.02.06.do)

    2.2. The download is saved in C hard drive and then copied to P drive: p:/Kihoon/Handover/Data/GPWG/AGO2000_dlw.dta

    2.3. Each time data is downloaded it is placed in a particular folder according to the date of download.

    2.4. Each GPWG file is renamed to CCCYYYYZ_dlw.dta, where CCC refers to the country and YYYY to the year, Z refers to income or consumption if there are multiple welfare sources. This happens in ECA regional countries and Mexico.

  3. Using do-file (p:/Kihoon/Handover/GPWG/Program/DLW_All_Summary.do), calculate summary statistics.

    3.1. One log file per update.

    3.2. results are copied and pasted manually in Excel files (For example, p:/Kihoon/Handover/GPWG/GPWG_All_Summary_2015.06.17.xlsx).

    3.3. Generate the differences between the current and the previous updates and report them to Shaohua/Prem.)For example, p:/Kihoon/Handover/GPWG/GPWG_All_Summary_2017.04.11.xlsx)

  4. Using do-file (p:/Kihoon/Handover/GPWG/Program/GPWG_All_Text.do) generate text files and place them in P drive: P:/2019/Textfiles/GPWG

  5. Prem generates .pcb files using “p:/UnitRecordConvertor/unitdataConvertor.exe” from there.

Note: From 2018, PRIMUS system started to synchronize the two database and to be used to generate PovcalNet statistics directly, currently the GPWG process is not necessary any more.

B.2 LIS Data Process

Introduction

  • LIS database is harmonized microdata collected from 50 countries in Europe, North America, Latin America, Africa, Asia, and Oceania. Out of the microdata from the 50 countries, the surveys of 7 high income countries were not currently available in the World Bank microdata database.
  • The welfare variable that PovcalNet elected from the 7 high income countries (OHI) is per capita disposable income excluding negative income households.
  • Under the contract for privacy and confidentiality, we do not offer the users original microdata. We generate distributional data from the raw microdata and provide them to the users for poverty simulation. The differences between using raw microdata and using distributional data are negligible.

How to download surveys from the LIS data server and generate pcb file

  1. Register to the LIS datacenter for LIS data access. Renew the registration annually.

    1.1. Password is given by the datacenter.

    1.2. My id and password are: “kihlee” …

  2. Send the STATA do program to

    2.1 First 4 lines of the program should be

    • user=kihlee
    • password =
    • project = lis
    • package = stata

    2.2. When the World Bank email system was Lotus Notes, there was no problem to send the program with no font changes. But after the email system has been changed to Microsoft Outlook, you have to change the font to “text.” If not, the LIS server cannot read the program.

    2.3. In the menu, go to “Format Text” and select “Aa Plain Text.”

    2.4. I have copied some programs that I have sent to “p:/Kihoon/Handover/OECD and LIS/Programs/.”

    2.5. Generate 400 bins: “p:/Kihoon/Handover/OECD and LIS/Programs/Generating 400 bins 2019.txt”

    2.6. Education data: “p:/Kihoon/Handover/OECD and LIS/Programs/Education in LIS data.txt”

    2.7. Gini coefficients: “p:/Kihoon/Handover/OECD and LIS/Programs/Generating gini coefficient.txt”

  3. The result will arrive in a few minutes depending on the time of processing and the environment of the LIS datacenter server.

  4. There are 8 countries we use in the PovcalNet database. They are not available from other sources than the LIS data. We collect the data for Taiwan but not publish. Australia, Canada, Germany, Israel, Japan, , Republic of Korea, Taiwan,9 U.S.A.

  5. As of now 89 LIS data points are available. Compared with last update in August 2018, I found that 15 surveys were either added or revised. Therefore, this time I have generated the text file of the surveys and sent them to Minh.

    5.1. Australia 2004, 2014

    5.2. Germany 1995, 1998, 2002, 2003, 2005, 2008, 2009, 2012, 2015

    5.3. Israel 2010, 2012, 2014, 2016

  6. For some EU countries, EU-SILC data are only available after 2003. The LIS data center has the data for previous years. We use the historical data of the countries when the data are compatible. (Austria, Belgium, Czech Republic, Denmark, Finland, France, Greece, Ireland, Italy, Luxembourg, Netherlands, Norway, Poland, Slovakia, Slovenia, Spain, Sweden, Switzerland, UK)

  7. Copy and paste the result to empty Excel file.

    7.1. The space in the output file of the result is not the same space in the Microsoft Office.

    7.2. You may need to either

    • copy and paste to text editor first and copy and paste again to Excel.
    • Or copy and paste to Excel and when you use “text to columns” use the output file space as a delimiter.
    • A sample result is in “p:/Kihoon/Handover/OECD and LIS/LIS 8 Countries 400 bins and nobs.xlsx.”
    • For each survey-year data, copy 400 bins and name it as CCCYYYY.TXT and copy them into “p:/2019/Textfiles/LIS” in case of 2019 update.
  8. Using the LIS data we generate 400 bin data for each country and provide them to Minh and he uploads them to the PRIMUS.

    8.1. Some LIS data files include negative income households. Before 2018, we included them but from last year we changed our policy and excluded them from the sample. Zero income households are still included.

    8.2. Previously, we used xtile command to generate the 400 bin data file but we use _ebin command written by poverty GP. When there are multiple households with the same welfare indicators, _ebin separate them into bins in the way more evenly than xtile.

    8.3. The income reference year is the calendar year for which income data has been collected, which is different from EU-SILC data where the income data is a year before the survey.

    8.4. Per Capita Disposable income is selected as a welfare indicator.

DHI: disposable household income, which includes total monetary and nonmonetary current income net of income taxes and social security contributions (HI-HXIT)

  1. It is important to cross check that the statistics from the original files and those from the 400 bin data do not have a significant difference. Because of the rounding errors, they are not 100% identical but the difference should be very small.

B.3 LAC Historical data

Kihoon collected data from 1997 to 2000 when he was working with the MECOVI team in LAC PREM. Since 2000, he got the data from Marco Robles in the IDB. These data are placed on Kihoon/Handover/Data/LAC/DX/. According to Kihoon these data do not require permissions of use but it is necessary to coordinate with the LAC team if they agree on make it part of their collection.

B.4 ECA Historical data

Minh provided EUSILC data. However, the PovcalNet is not allowed to use it directly, so we have to use bin data. From 2003 to now we use the 400-bin data calcualted by Minh. From 2003 backwrds we use 400-bin data calculated using LIS server. The data are placed in this folder Kihoon/Handover/Data/ECA/EUSILC/. In that folder there are regional files (e.g., ECA_2005_EU-SILC-C_v01_M_v05_A_UDB_X.dta) where X might be D, H, P, R (Kihoon is going to find out what D and R mean). He split the regional files into Non-OECD and OECD countries.