2 Folder Structures

This short note provides everything you need to understand the folder structure of the PovcalNet workflow, which has the following objectives in mind:

  1. Store vintages of data for replicability purposes
  2. Share documents (i.e, editable files) among the team members through a stable, friendly, and vintage-control suitable platform.
  3. Execute, load, and save data uninterruptedly and, if possible, in a fast manner.

Unfortunately, the World Bank systems do not provide an ITS service that allows us to meet all the objectives, and thus we are forced to use different platforms. A Network drive for the first objective, OneDrive for the second one, and a server (super computer) of remote execution for the third one.

2.1 Network Drive

The Network drive is mainly used to archive data. Currently, it is known as the P drive, but it could be disconnected and remapped by following the steps below.

This drive has many folders in its root, but under the new folder structure only three folders would take predominance, 01.Povcalnet, 02.personal, and 03.ProjectX. The latter is not explained in this document

  • 01.povcalnethas four main subfulders and follow the structure below:
#>                             levelName
#> 1  P:                                
#> 2   °--01.PovcalNet                  
#> 3       ¦--00.Master                 
#> 4       ¦   ¦--_aux                  
#> 5       ¦   ¦--_vintage_control.xlsx 
#> 6       ¦   ¦--01.current            
#> 7       ¦   ¦--02.vintage            
#> 8       ¦   ¦--03.metadata           
#> 9       ¦   °--Master.xlsm           
#> 10      ¦--01.Vintage_control        
#> 11      ¦   ¦--_aux                  
#> 12      ¦   ¦   ¦--countries         
#> 13      ¦   ¦   ¦--cpi               
#> 14      ¦   ¦   ¦--info              
#> 15      ¦   ¦   ¦--pcn_create        
#> 16      ¦   ¦   °--price_framework   
#> 17      ¦   °--AGO                   
#> 18      ¦       ¦--AGO_2000_HBS      
#> 19      ¦       ¦--AGO_2008_IBEP-MICS
#> 20      ¦       °--AGO_2018_IDREA    
#> 21      ¦--02.Production             
#> 22      ¦   ¦--_aux                  
#> 23      ¦   ¦--2014_OCT              
#> 24      ¦   ¦--2015_OCT              
#> 25      ¦   ¦--2016_OCT              
#> 26      ¦   ¦--2017_OCT              
#> 27      ¦   ¦--2018_APR              
#> 28      ¦   ¦--2018_SEP              
#> 29      ¦   ¦--2019_MAR              
#> 30      ¦   ¦--2020_JUL              
#> 31      ¦   ¦--2020_SM               
#> 32      ¦   °--2021_MAR              
#> 33      ¦--03.QA                     
#> 34      ¦   ¦--01.GroupData          
#> 35      ¦   ¦--02.PRIMUS             
#> 36      ¦   ¦--03.Population         
#> 37      ¦   ¦--04.NationalAccounts   
#> 38      ¦   ¦--05.PCN_estimates      
#> 39      ¦   ¦--06.LIS                
#> 40      ¦   ¦--07.historical         
#> 41      ¦   ¦--08.DLW                
#> 42      ¦   ¦--09.CPI                
#> 43      ¦   °--10.Master file work   
#> 44      °--04.admin
  • /00.Master contains everything related to the master.xlsx file that is uploaded into the PovcalNet system. This file has its own way for proper use and requires a separate explanation.

  • /01.Vintage_control contains the historical data of PovcalNet. Eventually, this would be folder accessible through datalibweb.

  • /02.Production is the folder with the current version of the data in the PovcalNet system.

  • /03.QA is a working folder that is getting constantly modified. Each subfolder here contains the necessary material to work on particular stages of the PovcalNet process

  • 02.personal contains one folder for each member of the team. The name of the folder is the UPI of the user preceded by the the letters wb.

#>              levelName
#> 1  P:                 
#> 2   °--02.personal    
#> 3       ¦--_handover  
#> 4       ¦   ¦--Espen  
#> 5       ¦   ¦--Prem   
#> 6       ¦   °--Rebecca
#> 7       ¦--wb020687   
#> 8       ¦--wb108988   
#> 9       ¦--wb372541   
#> 10      ¦--wb384996   
#> 11      ¦--wb424681   
#> 12      ¦--wb463998   
#> 13      ¦--wb499754   
#> 14      ¦--wb514665   
#> 15      ¦--wb537472   
#> 16      ¦--wb548542   
#> 17      °--wb562318

2.1.1 Steps to map drives

In these steps, we disconnect any mapped drive that has previously assigned to letters P or E. Then, we assign to letter P the network drive and to letter E the high-speed drive in the server.

  1. Open Notepad.
  2. Copy and paste the following lines:
net use /del P: /Y
net use /del E: /Y

net use P: \\wbntpcifs\povcalnet /PERSISTENT:YES
net use E: \\wbgmsddg001\PovcalNet /PERSISTENT:YES
  1. Save it in your desktop as link_drives.bat
  2. Close Notepad
  3. Double click file link_drives.bat in your desktop.

2.2 Remote server connection

The server is a ‘super computer’ with 8 processor and 64GB of RAM memory. All the

2.2.1 Steps to connect to the server

In order to get access to the remote server, please do the following:

  1. Click on Start, type ‘remote,’ and click on “Remote Desktop Connection”
  1. Type WBGMSDDG001 in the field ‘Computer:’ and click on ‘show options’
  2. Make sure the box ‘Always ask for credentials’ is unchecked
  1. Click tab ‘Local Resources’ and make sure boxes ‘Printers’ and ‘Clipboard’ are checked. Then click ‘More…’
  1. Make sure that only boxes ‘(C:)OSDisk’ and ‘Drives that I plug in later’ are checked and then click OK.
  2. click connect.
  3. Enter your username (i.e., wbXXXXXX) and Windows passphrase, where XXXXXX is your UPI number.
  4. Once your in the server, enter again your username and Windows passphrase.

You need to execute steps 2 to 7 only once. Next time you login into the server you only need to execute steps 1 and 8.

2.2.2 Map Network drive in the server

Once you’re in the server, open Window Explorer and go to the path E:\PovcalNet\02.core_team\_aux\ in which you will see a file called link_P_Drive.bat. Double click that file in order to map the PovcalNet network drive in the server. This procedure has to be done only once. After that, the network drive will be always mapped in the server for you.

2.2.3 Folder strcuture in the server.

If you already mapped P drive to the server, you will see that you have access to three drives: the C:/ drive, which is the main drive of the server, the P:/ drive, which is our team network drive, and the E:/ drive.

You are not supposed to use the C:/ for anything. Some programs like R of Python save their packages or libraries in the C:/, which is fine. But the C:/ drive is not for storing data, MS files, or anything personal. If you need to save something on the C:/ drive, please let leads of the povcalnet team know.

The P:/ is fully accessible through the server, so any code pointing to the P:/ drive in your computer will work effortlessly in the server.

The P:/ drive is a high speed drive to store big data or large ammount of files to execute fast. This is NOT and storage drive, since its capacity is limited. All your data should be stored in the P:/ and copied to the E:/ drive temporally for fast execution. Once you’re done with your analysis, you can copy back any results to the P:/ drive and empty the E:/ to allow others to use it. Ideally, we would increase the size of the E:/ drive but that is not a possibility now.

#>                     levelName
#> 1  E:                        
#> 2   ¦--01.personal           
#> 3   ¦   ¦--02.core_team      
#> 4   ¦   ¦--wb020687          
#> 5   ¦   ¦--wb108988          
#> 6   ¦   ¦--wb372541          
#> 7   ¦   ¦--wb384996          
#> 8   ¦   ¦--wb424681          
#> 9   ¦   ¦--wb499754          
#> 10  ¦   ¦--wb537472          
#> 11  ¦   ¦--wb548542          
#> 12  ¦   ¦--wb561460          
#> 13  ¦   ¦--wb562318          
#> 14  ¦   °--wb562350          
#> 15  ¦--03.pcn_update         
#> 16  ¦   ¦--00.Master         
#> 17  ¦   ¦--01.QA             
#> 18  ¦   ¦--02.Vintage_control
#> 19  ¦   °--03.Production     
#> 20  ¦--04.projects           
#> 21  ¦--05.CPI                
#> 22  ¦--06.National_accounts  
#> 23  °--07.ProjectX

In the subfolder E:/01.personal you may place all the date files you need for highspeed performance. For some particular projects, you will be asked to place the folders in the folder E:/02.core_team.

2.3 OneDrive

OneDrive has a double storage functionality. On the one hand, each person has been granted with 5TB of memory in a personal folder that is accessible through either the web browser or through the Window Explorer.

On the other hand, OneDrive offers shared libraries for collaboration among the members of a private team. The PovcalNet team currently has assigned the library called PovcalNet Data, Systems and Management and it is accessible either through the web browser or through the Microsoft Teams app. In general, the shared library in OneDrive works in the same way as the personal OneDrive folder with the exception that MS Teams automatically creates a folder in the root of the library for each new channel that is added to team. Besides this inconvenience, the suggested folder structure is as follows

#>                                       levelName
#> 1  wbntpcifs                                   
#> 2   °--PovcalNet                               
#> 3       °--TestFolder                          
#> 4           ¦--01.admin                        
#> 5           ¦   ¦--01.Recruitment              
#> 6           ¦   ¦--02.Funding                  
#> 7           ¦   °--03.Concept_note             
#> 8           ¦--02.core_team                    
#> 9           ¦   ¦--01.code                     
#> 10          ¦   ¦   ¦--01.packages             
#> 11          ¦   ¦   ¦   ¦--01.Stata            
#> 12          ¦   ¦   ¦   ¦--02.R                
#> 13          ¦   ¦   ¦   ¦--03.Python           
#> 14          ¦   ¦   ¦   °--04.VB               
#> 15          ¦   ¦   °--02.routines             
#> 16          ¦   ¦--02.dashboard                
#> 17          ¦   ¦--03.PPT                      
#> 18          ¦   °--04.Minutes                  
#> 19          °--03.projects                     
#> 20              ¦--01.Metadata                 
#> 21              ¦   ¦--01.CPI                  
#> 22              ¦   ¦--02.PPP                  
#> 23              ¦   ¦--03.National_accounts    
#> 24              ¦   °--04.Population           
#> 25              ¦--02.Nowcasting_error         
#> 26              ¦   ¦--_aux                    
#> 27              ¦   ¦--01.Data                 
#> 28              ¦   ¦--02.Code                 
#> 29              ¦   ¦--03.Results              
#> 30              ¦   ¦--04.Writeup              
#> 31              ¦   °--05.Literature           
#> 32              ¦--03.The_Real_Value_of_Poverty
#> 33              ¦   ¦--_aux                    
#> 34              ¦   ¦--01.Data                 
#> 35              ¦   ¦--02.Code                 
#> 36              ¦   ¦--03.Results              
#> 37              ¦   ¦--04.Writeup              
#> 38              ¦   °--05.Literature           
#> 39              ¦--04.Whats_New_notes          
#> 40              ¦   ¦--_aux                    
#> 41              ¦   ¦--01.Data                 
#> 42              ¦   ¦--02.Code                 
#> 43              ¦   ¦--03.Results              
#> 44              ¦   ¦--04.Writeup              
#> 45              ¦   °--05.Literature           
#> 46              °--05.Project_X                
#> 47                  ¦--_aux                    
#> 48                  ¦--01.Data                 
#> 49                  ¦--02.Code                 
#> 50                  ¦--03.Results              
#> 51                  ¦--04.Writeup              
#> 52                  °--05.Literature

In general, the folder structure is divided by topics according to their functionality. 01.admin, 02.core_team, and 03.projects. Notice that up to the third level in the folder structure, all folders are prefixed with two-digit numbers. This system guarantees that folders will be sorted as they are added and it is useful for navigation when using the keyboard. Also notice that there are no blank spaces in folders names, but rather underscores (_) are used when the name of the folder has two or more words. This is to avoid problems with some systems.

  • 01.admin This folder contains everything related to administrative information. Each subfolder corresponds to a big subtopic such as recruitment, funding, or concept notes.

  • 02.core_team This folder contains information that is common and useful to all the members of the team and intersects two or more functions (or projects). For instance, the Stata and R packages to query the PovcalNet API might be used in many different projects and do not belong to any project besides the production of the packages itself. Thus, four main categories of common information have been added: code, dashboard, PPT, and minutes.

  • 03.projects This folder contains all the analytic projects in which the PovcalNet team participates. By default, each project contains six subfolders, /01.data, /02.Code, /03.results, /04.writeup,/05.Literature, and /_aux, but these structure could be modified by following any of the following examples: example1, example2, or my favorite.

2.4 Additional topics to discuss

2.4.1 R or Stata or both?

2.4.2 GitHub

ss