Glossary of data use terms

Administrative levels – An area of a country defined for the purposes of government or administration such as region, province, or district. Shapefiles used for mapping are distinguished by administrative levels where higher number administrative levels represent a smaller administrative level. For example in Tanzania, administrative level 0 is the national level, administrative level 1 is the regional level, administrative level 2 is the district level, and administrative level 3 is the ward level.

Coordinates – Numerical values that represent specific point-locations on the surface of the earth. Geographic coordinates are used to represent locations on a three-dimensional sphere; they are defined both in terms of latitude (horizontal) and longitude (vertical) degrees. Geographic coordinates are calculated and recorded using a GPS receiver. Projected coordinates are used to represent locations on a two-dimensional flat surface (such as a computer screen); they are represented as discrete x and y values.

Data - Any specific information that is meant to provide and fulfill the role for which it was collected /generated. Data may be numerical or non-numerical.

Coordinate Reference System (CRS) – A set of rules for assigning coordinates to real-world locations. Because the Earth is a three dimensional oblong sphere, geographic data must be stretched or compressed in order for (1) images to visualized on a flat surface, and (2) spatial relationships to be measured (e.g., distance, area). GPS receivers record coordinates by completing a set of calculations defined by the Coordinate Reference System (CRS). If the CRS is changed, a different set of coordinates will be produced for the same location. When using a GPS receiver to gather coordinates, the specific CRS must be specified and recorded so that shapefiles can be accurately created from the coordinates. The most common geographic coordinate system is the World Geodetic System 84 (WGS84).

Data errors- Mistakes made to collected or aggregate data that compromise data quality

  • Transposition error – The order of numbers or words are switched (e.g. 12 is entered as 21)
  • Copying error – A number or letter is copied as the wrong number or letter (e.g. 0 entered as the letter O)
  • Coding error – The incorrect code is entered for a certain response (e.g. interview subject circled 1 = Yes, but the coder copied 2 (= No)
  • Routing error – An entry is placed in the wrong field (e.g. sex entered into the age category)
  • Consistency error – Responses within a certain indicator are contradictory (Respondent named Mary is entered as Male)
  • Range error – A number lies outs of the range of probable or possible values (e.g. Age = 151 yrs)
  • Gap error – Data that are missing
  • Calculation error – Data erroneously calculated (e.g. 3 positive HIV female tests + 1 positive HIV male test = 5 positive HIV tests in total)

Data Quality Checks – Procedures for verifying that forms, registers and databases are completely and correctly filled at each step of the reporting process.

  • Spot-checks – Look at specific data elements to check for missing data and if data makes logical sense (e.g. Age should not be >100 years; birthday column should not contain a name)
  • Cross Check - Compare report totals with other data-sources

Data Quality Guiding Principles – Elements needed to achieve high data quality

  • Accuracy Data that measure what they are intended to measure and have minimal errors (e.g., recording or interviewer bias, transcription error, sampling error) to the point of being negligible.
  • CompletenessAll variables in either reporting or recording tools are filled. A complete dataset represents the complete list of eligible persons or units and not just a fraction of the list.
  • Confidentiality - Data are are not disclosed inappropriately and are maintained according to national and/or international standards for protected data
  • Integrity - Data and system used to generate them are protected from deliberate bias or manipulation for political or personal reasons.
  • Precision - Data with sufficient detail meaning they have all the parameters and details needed to produce the required information.
  • Reliability - data generated by a program’s information system are based on protocols and procedures that do not change according to who is using them and when or how often they are used; data are measured and collected consistently
  • Timeliness - Data are up-to-date (current) and information is reported and available within the requested/recommended timeframe.

Data Use Tool – Excel-based system that allows users to synthesize country-level data from multiple sources and visually display them in tables and figures, and in conjunction with mapping software, to produce spatial maps by geographic level. These tables, figures and maps are used to interpret data to inform programmatic decisions and resource allocation.

Filter – Function used in data tool to exclude some sources of data when making pivot tables

Fields – Variable levels in the Data Use tool, found in the pivot table

Geographic Information System (GIS) – A system used to capture, manage, analyze and display geographic information. This system works by assigning (or “referencing”) data to a geographic location on the earth’s surface to create a digital representation of the world. Geographic Information Systems can geographically display non-spatial data, such as prevalence or testing coverage, by associating values with specific locations or areas on a digital globe or atlas.

Indicator - A specific marker or pointer that uses data to show a certain change over an event’s course.

Inputs - Data related to the strategic planning questions input into a dataset

Layer – The visual representation of a single data source in a GIS. One layer represents one theme of the map – such as a road network or hospital locations. A final map is made up of multiple layers. For example, roads, political boundaries, and rivers might be considered different layers on the same map. (Figure 1)

Quality Data - Data that is reliable and accurately represents the measure it was intended to present. High levels of data quality are achieved when information is valid for the use to which it is applied and when decision makers have confidence in and rely upon the data.

Shapefiles – Files specific to mapping that contain geographic coordinates indicating point locations or physical boundaries. To use a shapefile in a mapping program, all files batched with the shapefile (.dbf, .sbn, .sbx, .shx) must be saved on the computer in the same location as the shapefile (.shp).