Displaying data outside RapidPro - The Pakistan Experience
Nov. 10, 2015
These guidelines on how to use data captured by RapidPro have been shared by Elliot McBride from UNICEF Pakistan CO.
1. Using RapidPro data to create a scatter plot
In a new sheet with some rearranging you can isolate the headings: registration number, First seen, and District check – value. (I’ve shortened the data to make it more consumable). As you can see we have delivered the WASHkits to 2 different Districts: Rajanpur and D.G. Khan.
The first thing you need to do is isolate the time and date separately which you can do with the =INT function. This takes the date part (or integer) away from the time. First create two new columns with the headings Date and Time. In the first cell under the Date column type =INT(B2) thereby selecting the score under the ‘First Seen’ column and choosing to take the (date) integer part only. Hit enter and drag the formula to the rest of the column. You’ll notice that the time 00:00:00 is still there – changing the number format to ‘Date’ under Home tab then the number dropdown box will eliminate these zeros.
To isolate the time – it’s even easier. Int the first cell under ‘time’ minus Column C from Column B and format the number to ‘time’ and you are done. =B2-C2 --> Enter
Now we need to separate each registration into our two districts Rajanpur and D.G. Khan. To do this you will need a new sheet (I’m sure there are other ways but this helps me keep track of everything.)
You will need 4 columns for the scatter graph – Date Rajanpur, Time Rajanpur, Date, D.G. Khan, Time D.G. Khan. Using the Filter function isolate each district’s time and date and copy and paste the values only using ‘paste special’ dropdown.
Once you have the four columns and all of the relevant data in the correct headings you just need to create the scatterplot and re-select the data. I’ve never had this work for me first time because in my experience MS office products often try to be smarter than you. They are not. Make sure your MS product knows this. I like to remind mine daily.
Select all four Columns hit create scatterplot.
It will create a chart that looks like something closer to an abstract painting – horrible. But do not despair, we can fix this!!
Right click on the freak of nature, horrible chart and hit ‘select data’
REMOVE the entries under the ‘Legend entries (series)’
heading and then click Add.
The first box is your heading. I like to use the District time cell. Then make the Series X values the scores under the Rajanpur Date column, make the Series Y values the scores under Rajanpur time column, Click OK. You should now have one title under the Legend entry series which will be your first district time cell. If you created the graph now it would correctly display the scatter graph of this district. To add another district’s scores, click Add under the above image and repeat the process with the next districts scores.
To make the graph clearer format the horizontal axis to display Major Units of 1.0 and minor Units of 0.5 this will evenly space the axis into single days.
Some playing with formatting and you are ready to rumble with your scatter graph. Awesome.
2. Gender dependent results using the RapidPro data
In this example we want to graph the favorite and least favorite items in a distributed WASHkit using Gender as the independent variable. If your registration flow is separate to your project flow in RapidPro. You will be working with 2 separate excel workbooks, this creates a problem because then you have to consolidate gender information without mixing up each individuals registration details, i.e. you cannot simply copy and paste the gender data across as you cannot ensure that the gender will align next to the correct phone number on both sheets. Luckily because both workbooks have a common identifying number (Phone number) Excel can use this to assign the participants registration details to the program data sheet next to the correct phone number.
To start, you want to move your registration information (gender and participant number) to your program data worksheet so all of your data is on the same excel workbook. Simply highlight both rows, and under the home tab in the ‘cells’ category, click ‘Format’ then ‘Move or Copy Sheet’
Send your registration cells (number and gender) to your program data spreadsheet. Remember to select ‘Create a Copy’ so that you do not delete the selected cells from the registration workbook.
Now you have your registration data on the same workbook as your program data. Next use the =VLOOKUP function to move the participant’s gender next to their registration number on the same registration number on the program data sheet. This has a few steps.
First you must identify the source of information you are going to be merging – this is your registration data, which should be the first two columns of your registration tab ‘Phone’ and ‘Gender (Category)’ that you just moved from your registration workbook. Highlight all of the data under this heading and call this section List1 (see below)
Now go back to your program data tab, to the first cell under Gender and type “=Vlookup”. There are 4 categories that must be satisfied for the Vlookup function:
Lookup_value – the unique identifier, in our case we want this to be the registration number common to each participant in both the registration data and program data tabs. Click on the Phone number in the cell directly on the left of your Vlookup entry (see below)
Table_array – the list we just created with phone number and gender labeled ‘List1’
Col_index_num – the column number that we are moving over to the new column – in this case it is the gender column ‘2’.
[range_lookup] – this is a check to see if we want a near match or a perfect match, we want a perfect match so we need to select ‘FALSE’.
Once we have entered these details it should look like this
Hit enter then drag that formula down to the rest of the empty column.
To eliminate the #N/A error message go to the first cell
under Gender and in the formula bar use the =IFERROR function to replace the
#N/A with a blank cell by putting the entire VLOOKUP formula in brackets as the
‘value’ and “ “ as the ‘value_if_error’:
(spaces are important here) then drag down to eliminate error messages.
Once that is done it is a case of highlighting your rows of program data along with your Gender data (as they will now be aligned) and copy past them into a new tab I labeled this MALEvFEMALE.
From here use the filter to highlight Males or Females only, then use the =COUNTIF function to count the number of items. For this equation we need to satisfy 2 categories:
1. Range – is the range of scores that we are searching for our scores in. In this case it is within the ‘Most Useful (Category)’ column you can select all of this column to make it easier.
2. Criteria - is the individual scores that we are searching for, in our case the beneficiaries were asked about the most useful items in their kit. The Criteria are the item names. Put these into a graph that you will then make the bar chart from. (see below).
Enter, drag that formula down and then you can bar graph it. Copy this data so it doesn’t get lost when you adjust the filter settings and paste the values, using ‘paste special’. Repeat for Males and you’re DONE.
3. Time Spent Reporting
The first thing you need to do to develop this graph is highlight the time it takes each individual. This can be done by taking the time from the ‘first seen’ column away from the time under the ‘last seen’ column. To ensure that you measure both the DAYS taken as well as hours and minutes you first need to take the date first seen away from the date last seen.
To do this create a new column after the “Date Last Seen” and label it ‘Days’. In your first cell enter =INT (last seen) – INT (first seen) change this date format to a number and you have the number of days it took to complete the poll. Any cell with a number greater than zero took that many days plus the time in hours and minutes which we will isolate next.
To isolate the hours and minutes it took to complete the poll you need two columns of hours and minutes (one for first seen, one for last seen). Create two new columns and call them ‘First Seen Time’ and ‘Last Seen Time’. To isolate the time without the date in these cells, click under your first heading and type =D2-INT(D2) then hit enter. This tells excel to give you the contents of the cell without the integer (or date):
Drag that formula down and then repeat for the ‘Last seen - time’ column. To identify the hours and minutes each UReporter spent, create a new column and give it the title ‘Time spent reporting’. Then in the first cell under this heading, take the ‘first seen’ time away from the ‘last seen’ time and drag to the entire column.
However as we need to identify times longer than one day we will need to clarify this in the formula, using the column of days that we created earlier. We can use an IF function to label them ‘<24hrs’ if they took longer than one day.
Here the formula dictates that if there is a number in the Days column greater than 0 then the Time Spent column will display >24hrs instead of the hours/minutes time from the two time cells.
Drag that formula down. Notice how entries with a number greater than 0 in the days column have been labeled as >24hours.
Now create the histogram. In a new tab labeled Histogram copy the Time spent reporting column and the results to your last question you asked in RapidPro (if this is in another workbook see notes on moving data accurately between workbooks on the scatter graph guide in the blog). In this case our last question column was called Response 10, therefore if this column had a value entered in it the beneficiary had completed the poll. We copy this column next to the time spent reporting and replace the title of this column and call it “Finish Poll?”Using CTRL+G under the special tab we highlight all of the cells that are filled with text constants (see below)
This highlights all the cells with responses in them. Replace this with the word yes by typing Yes and hitting CTRL+ENTER
Then highlight all of the blank cells with the same method, but instead select the ‘Blanks’ radio button and replace these cells with the word NO.
You then can get a list of the Time spent reporting for only beneficiaries who have completed the poll. Copy and Paste the cells of the beneficiary times who have completed the polls in a new column and label it “Time spent reporting + Finished”
From here you can create a Histogram
You need 3 more columns titled bins, freq, and intervals
In the intervals column type the titles of the categories for your histogram i.e. response time in: <5mins, 5mins – 0:09:59, 0:10:00 – 0:19:59, etc.
In the bins column type the upper limit of the intervals i.e. 0:05:00, 0:09:59, 0:19:59 leaving the last one empty:
in the first Freq cell type =FREQUENCY then highlight your time spent recording column (that has been filtered to only display finished polls) as the data array and your bins cells as your bins array close bracket and hit
CTRL+SHIFT+ENTER if you do not it will not work.
Highlight the results of the frequency column and create a 2 dimensional bar graph replace the x axis with the titles under your ‘intervals’ heading and you are done.