by, Jhumki Kundu, Ajit Kumar Jaiswal[1], Prabhu Ponnusamy[2]
Background
Over the past several years technological improvements have significantly changed how statistical and research organizations collect, capture, and process respondent data. There is one very significant area of survey or census work that has not found a cost-effective technology that would let it make the significant improvements in data collection and capture experienced by other types of censuses or surveys.
Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer pertinent questions and evaluate outcomes. It is a research component in all study fields, including physical and social sciences, humanities, and business. A researcher can assess their hypothesis on the basis of collected data. In most cases, data collection is the primary and most essential step for research, irrespective of the field of research. The approach of data collection is different for different fields of study, depending on the required information. All facets of the data collection process should be carefully reviewed in order to ensure reliable and valid data. It is necessary to ensure data which are collected in the same fashion from beginning to end of the research study.
Different types of data collection tools are globally available. As opposed to in-person data collection, collecting data digitally allows for much larger sample sizes and improves the reliability of the data. Nowadays people are using open-source software technologies and standardized data structures to build advanced applications for high-throughput experiments.
The article will discuss topmost open source data collection tools, such as ODK, KoBo ToolBox , CSPro, etc. which we can quite easily customise and install as per our requirement for research, survey purposes.
Open Data Kit (ODK) – https://opendatakit.org/
ODK is a free and open source set of software tools, developed by the University of Washington, to help create mobile data services including generating data collection forms, collecting data on a mobile device and providing online data storage and aggregation1. It permits offline data collection with mobile devices in remote areas. ODK can be applied when a community desires to collect data with full control over the collected data. Collected data can be stored offline on a mobile device. The collection and accumulation of data from the devices can be performed with Open Source tools according to the community’s privacy concerns. Because ODK permits the data collection in resource-constrained environments, ODK is intended to be applied for underserved populations and recognize their needs and community-driven innovation based on the aggregate data2. The ODK approach is relevant when privacy concerns of communities need to be respected e.g. for health related data,3 4 environmental monitoring,5 and political elections6. In resource-constrained environments, SMS based methods for data collection have limitations e.g. in message length and submission of geolocation added to the collected record.
Features: Study builder, Offline data collection
Mode of Availability: Android
Kobo Toolbox – https://www.kobotoolbox.org/
Kobo Toolbox is a free and open-source tool developed by the Harvard Humanitarian Initiative that helps collect and organize field data. The Toolbox is currently used in humanitarian crisis, research, peacekeeping, economic development, etc. The software helps overcome various problems faced by professionals in the field, making data collection quicker and more reliable. It is fully compatible and interchangeable with ODK but delivers more functionality, such as an easy-to-use formbuilder, question libraries and integrated data management. It also integrates other open-source ODK-based developments such as formhub and Enketo.
Features: Study builder, Offline data collection, Highly Customizable Forms,
Mode of Availability: Android and Web
CSPro – (Census and Survey Processing System) – https://www.census.gov/data/software/cspro.html
The Census and Survey Processing System (CSPro) is a software package for entry, editing, tabulation, and dissemination of census and survey data. It is delineated to be as user-friendly and easy to use as possible, yet powerful enough to operate the most complex applications. This package is extensively used globally by statistical organizations, international agencies, NGOs, consulting companies, colleges and universities, hospitals, and private sector groups in over 160 countries7. For instance, major international household survey programs (Multiple Indicator Cluster Surveys (MICS) and Demographic and Health Surveys (DHS) also utilize CSPro for Census and Survey works.
It works on any modern machine with Microsoft Windows and Android (Mobiles and Tablets). It Can export data to major statistical software formats and often is used in combination with other programs. But only one user can write to a file at any given time, and there is no way to turn applications into internet applications. Another limitation of using CSPro is it is not compliant with common database programs and languages, e.g., SQL.
Features: Highly Customizable application, Offline data collection, Strong Validation Rules, Batch Editing and Tabulation
Mode of Availability: Windows, Android and Web
Survey Solutions- https://mysurvey.solutions/en/
Survey Solutions is a free software developed in the Data group of the World Bank7. It create surveys with a wide selection of standard questions, nested rosters and response pipelines, cascade and linked questions,barcode scanning, image and audio recording, and data from external sensors. Server components for Survey Solutions can be installed on-premises or in the cloud. It uses the power of.Net to validate one’s responses and direct the interview flow; create advanced data validation methods using macros, computed variables, and lookup tables. It analyzes rich paradata to track survey progress in real time. It capture data offline on tablets (CAPI), online via a web interface (CAWI), over the phone (CATI), and in a cost-effective mixed mode survey. Survey Solutions allows storing data on the local servers of NSO thus complying with the local data privacy and anonymity laws. Using high quality satellite photos and built-in GPS sensors, Survey Solutions collect extensive GIS information on locations, distances, and areas, apply geofencing, and lead interviewers to the point of interview offline. Instead of establishing a server on their behalf, Survey Solutions now assists consumers in setting up and managing their own cloud or local server.
Features: Highly Customizable application, Offline data collection and Strong Multimedia support
Mode of Availability: Android and Web
Epi Info- https://www.cdc.gov/epiinfo/index.html
EpiInfo is a free software developed in 1988 by the Centers for Disease Control and Prevention (CDC) in Atlanta to facilitate field epidemiological investigations and statistical analysis. It provides for easy data acquiring form and database construction, a modulated data entry experience, and data analyses with epidemiologic statistics, maps, and graphs for public health professionals who may have an insufficient information technology background. It is utilized for outbreak interrogations, for evolving small to mid-sized disease surveillance systems, as analysis, visualization, and reporting (AVR) components of larger systems,and in the ongoing education in the science of epidemiology and public health coherent methods at schools of public health around the globe. Approximately 300,000 downloads of Epi Info software occurred in 2002 from approximately 130 countries8. These numbers make Epi Info probably one of the most widely distributed and used public domain programs in the world.
Features: Standard Form Design and Strong Multimedia support
Mode of Availability: Windows, Android and Web
REDCap- https://www.project-redcap.org/
Research Electronic Data Capture (REDCap) is a web-based application developed by Vanderbilt University to capture data for clinical research and create databases and projects. REDCap offers a free, easy-to-use, and secure method of flexible yet robust data collection. It gives a user multiple features to select from when creating the data collection instrument. There is no need to know programming to set up a database or project in REDCap. It is more secure than Microsoft Excel or Microsoft Access and can be obtained from any device with an internet connection and web browser. It also provides easy exports, so users are in control of their data.
Features: Longitudinal data collection, offline forms, randomization, on-premise hosting
Mode of Availability: Android, IOS and Web
Google Forms – https://www.google.com/intl/en-GB/forms/about/
Google Forms are one of the most common free online tools to create online surveys and share them with users for data collection purposes. It can be used to create forms, surveys, and quizzes as well as to collaboratively edit and share the forms with other people. It has an easy user interface which makes creating surveys a hassle-free task, and besides being free, it supports many features for different question types that you may wish to use9.
The Google Forms service has undergone several updates over the years. Features include, but are not limited to, menu search, shuffle of questions for randomized order, limiting responses to once per person, shorter URLs, custom themes,10 automatically generating answer suggestions when creating forms,11 and an “Upload file” option for users answering questions that require them to share content or files from their computer or Google Drive.
Features: Standard Forms,and randomization, on-premise hosting
Mode of Availability: Web
Conclusion
The core idea of choosing open sources on surface level as many think might only seem to be cost-viability. But as explained here, if used in the right combination of gadgets, team, planning and timelines, this a technical boon to survey research.
The aforementioned open source data capturing tools are widely used across the globe in various large scale demographic and social health surveys. The backhand reasons for such massive popularity, usage and demand of open source data capturing tools are; firstly it has a simple, comprehendible interface and hence is convenient to use. Secondly, no expert programming skills are required to customize the tool, it hence, can be customized by anyone with good knowledge of data management skills and little programming skills. Thirdly, it is available for free hence, this makes it cost effective. Fourthly, they are flexible to use because formatting and changes can be done conveniently without redoing the entire document unlike some old software tools. Lastly, since these are widely used and accepted across the world, these have become quite reliable tools for working.
References
1. Open Data Kit (ODK). (2013, October 17). Retrieved from Betterevaluation.org website: https://www.betterevaluation.org/en/resources/tool/open_data_kit
2. Hartung et. al. (2010) Open Data Kit: Tools to Build Information Services for Developing Regions. Available from: http://www.nixdell.com/classes/Tech-for-the underserved/Hartung.pdf
3. Tom-Aba, D., Olaleye, A., Olayinka, A. T., Nguku, P., Waziri, N., Adewuyi, P., … & Shuaib, F. (2015). Innovative technological approach to Ebola virus disease outbreak response in Nigeria using the open data kit and form hub technology. PloS one, 10(6), e0131000.
4. Macharia, P., Muluve, E., Lizcano, J., Cleland, C., Cherutich, P., & Kurth, A. (2013, May). Open Data Kit, A solution implementing a mobile health information system to enhance data management in public health. In 2013 IST-Africa Conference & Exhibition (pp. 1-6). IEEE.
5. Ferdoush, S., & Li, X. (2014). Wireless sensor network system design using Raspberry Pi and Arduino for environmental monitoring applications. Procedia Computer Science, 34, 103-110.
6. Aranha, D. F., Ribeiro, H., & Paraense, A. L. O. (2016). Crowdsourced integrity verification of election results. Annals of Telecommunications, 71(7-8), 287-297.
7. “Archived copy”. Archived from the original on 2017-12-01. Retrieved 2017-11-22
8. Su, Y., & Yoon, S. S. (2003). Epi Info–Present and Future. In AMIA Annual Symposium Proceedings (Vol. 2003, p. 1023). American Medical Informatics Association.
9. Sulaiman S. (2021, 24th Aug). Top 5 Open Source Survey Software & Google Forms Alternatives. FP. [Available from: https://fosspost.org/open-source-survey-google-forms-alternative/]
10. “More ways to build and share Google Forms”. G Suite Updates. September 29, 2014. Retrieved December 12, 2016.
11. “Custom themes in Google Forms”. G Suite Updates. September 2, 2014. Retrieved December 12, 2016.
[1] PhD Scholar, International Institute for Population Sciences, Mumbai,
[2] CEO, Iotalytics Research and Analytics Solutions Pvt Ltd