Big Data Analytics and Technologies in COVID-19

Mario Caesar
11 min readFeb 2, 2022

a Big Data Case Study

Photo by Markus Spiske on Unsplash

The Need for Big Data Solutions

Introduction to Big Data in Healthcare Domain

In recent years, the use of big data technology has been used in dealing with data growth in several industrial sectors. According to Ambigavathi and Sridharan (2018), big data is a large amount of data that can quickly be generated, captured, and processed at high speed and cannot be classified as a conventional relational database. In its implementation, big data has been widely used in finance, education, retail, e-commerce, and other industries, including the healthcare industry. In healthcare, the big data technology implementation allows the healthcare industry to store a large amount of patient data digitally. Using big data allows the healthcare industry to perform computational analysis to reveal disease patterns, trends, associations, and differences (Haleem et al., 2020). Medical data such as patient name, address, condition, current prognosis, X-rays scans, CT scans, or MRI require a more in-depth analysis so that the healthcare industry can better understand the patient’s condition allows the healthcare industry to find the hidden pattern from current diseases.

The implementation of big data technologies in the health sector has a substantially positive impact, and this technology allows the healthcare industry to further saves lives. A study by Corsi et al. (2020) mentions that the implementation of big data technologies allows the healthcare industry to predict essential information, such as the length of a patient stay in the hospital, the total of patients requiring surgical intervention, or prevent complications that might be happening for a patient. Another study by Hussain et al. (2020) mentions that the implementation of big data in the healthcare industry can improve patient satisfaction, avoid errors in medical facilities, predict disease patterns and results, improve patient healthcare, help policymakers or governments reduce medical expenses, and solving medical problems. Furthermore, Ambigavathi and Sridharan (2018) stated there are some benefits of big data implementation that could help develop the healthcare industry, such as improve patient health tracking, reducing and avoiding human errors, reduce unnecessary wage costs, and provide a better understanding of a disease or patients with chronic diseases.

This article will discuss the importance of big data in the health sector, especially in the recent COVID-19 disease. The types and characteristics of big data and discuss the types and characteristics of big data that suitable in dealing with the COVID-19 pandemic will also discussed in this article. Furthermore, this article will propose five techniques or methods that can be used to utilize big data to deal with the COVID-19 disease.

Importance of Big Data for COVID-19

Recently, the number of new COVID-19 cases (coronavirus) has rapidly increased globally at an alarming rate. The rapid spread of the COVID-19 virus, the evolving patterns, and differences in its symptoms make it more difficult to control and analyze. In this case, there are 20,000 people dead because of the COVID-19 virus, and each of them experienced different symptoms, different health condition, and different death prognosis of the COVID-19 virus. In addition, doctors and healthcare staff have recorded the symptoms by giving a rating for each symptom. From this case, it can be seen that the exponential growth of the COVID-19 data makes doctors and healthcare staff having difficulties in analyzing patient’s data that affected by the COVID-19 virus.

Moreover, the massive quantity of data generated makes it difficult for doctors and healthcare staff to determine the symptoms most often experienced by COVID-19 patients. This is the use of big data technology has played a role in helping doctors and medical staff analyze the current COVID-19 virus. As stated in the previous section, big data technology allows medical staff to perform computational analysis to reveal COVID-19 patterns, the most common symptoms, correlate and predict the spread of COVID-19. Besides, the implementation of big data technology in the healthcare industry allows the government to reduce unnecessary expenses. In the next section, the author will discuss big data types and their characteristics related to this case study.

Data Types and Characteristics of Big Data and Its Relation to COVID-19

According to Ramírez et al. (2018) and Corsi et al. (2020), in big data, data can be classified into three types, structured, semi-structured, and unstructured data.

  • Structured data is a data type with a defined format or structure and can be stored in table format. The advantages of structured data are consistent data, have specific standards, and can be processed using SQL queries.
  • Semi-structured data is a combination of structured and unstructured data, which is stored with its tags to indicate the data type and other characteristics.
  • Unstructured data refers to the data stored when the data is collected, without a specific format. This data type cannot be stored in tabular formats and must be processed in a complex manner before being used for analysis.

According to Sonnati (2015) and Kumar and Singh (2019), there are four characteristics of big data: volume, velocity, variety, and veracity (often referred to as four V’s).

  • Volume: means the quantity of data produced by IT infrastructure in the health sector that continues to grow. This growth is due to data stored in the healthcare database (such as CT scan images, disease history, or transaction data) is constantly growing every day.
  • Velocity: refers to data collection speed. In the healthcare domain, healthcare systems are generating data at an increasing rate.
  • Variety: means the variation of data produced. In the healthcare domain, it can be structured or unstructured data. Examples of structured data are transaction records, patient health records, or clinical data. Examples of unstructured data are medical imagery, audio, video, or sensor data.
  • Veracity: refers to the vulnerability in its accuracy and effectiveness. As a result, big data analysis, especially in the healthcare industry, is needed to extract valuable insights from this data to treat patients and make the best decisions.

It is stated that 20,000 people died due to the impact of COVID-19. From 20,000 people who died, they had a different health condition and specific symptoms such as shortness of breath, fever, dry cough, phlegm collection in the lung, acute pneumonia, high blood pressure, high blood glucose, fluctuations, coronary failure, and low blood oxygen level. Furthermore, each symptom is rated out of 5 stars. From this case, it can be concluded that the type of big data that can be used to deal with this COVID-19 case is structured big data. The reason for using this type of big data is because the stored data has a fixed structure and standards for storing patient data and the symptoms experienced by these patients based on a predetermined level. The healthcare staff can use this type of big data to store patient ID, patient name, address, gender, age, symptoms experienced by assigning a 5-star score for each symptom and death prognosis.

According to R.C. (2020) and Talend (2021), there are other advantages to using structured big data. The advantages are that machine learning algorithms can easily use structured big data to predict trends or hidden insights, make the organization easy to analyze data with less processing and storage, and more tools available to analyze structured big data. These advantages can help healthcare staff use machine learning algorithms or other tools related to analyzing the most common symptoms of COVID-19, predicting the spread of COVID-19, and predicting death prognosis based on symptoms. However, R.C. (2020) and Talend (2021) also stated a disadvantage in using structured big data. The disadvantage is limited uses since the data already has a predetermined standard, and if records need to be edited, all structured data in big data must be updated. If there are other symptoms or detailed information of the patient in the future, these disadvantages may be a big problem for healthcare staff.

Furthermore, in this case, volume, velocity, and veracity are the most prominent big data characteristics. Volume becomes the most prominent characteristic because it allows data development on deaths due to COVID-19 in the future. Velocity allows the healthcare staff to retrieve data quickly because the acquired data already has standards, and the healthcare staff only needs to follow the previously set standards. Simultaneously, veracity is the other most prominent feature since the accuracy of patient data provided by healthcare staff does not necessarily have high accuracy. As a result, it is necessary to conduct a more in-depth analysis to generate insights from the COVID-19 data for further research and study.

Insights of Big Data

Techniques and Methods for Big Data Analytics

In the previous section, the importance of big data in the healthcare domain, especially in addressing the COVID-19 pandemic has been discussed. Furthermore, the author has also discussed the characteristics and types of big data that can be implemented for this case study. In this section, the author will discuss five methods or techniques to utilize big data technology in order to help doctors or healthcare staff. The author will discuss these five techniques or methods: artificial intelligence, machine learning, data assimilation, deep learning, and geographic information system.

1. Artificial Intelligence

Artificial intelligence as a tool for collect and analyze data to predict a disease has been implemented in the last few years. According to Bragazzi et al. (2020) research, they proposed software using artificial intelligence that can distinguish cough sound data when detecting COVID-19 disease based on gender, age, and symptoms. Through this study, when COVID-19 tested positive cases, it reached 82% of the area under the curve (AUC).

From the research above, artificial intelligence can also be implemented in the case study discussed earlier. Artificial intelligence can help to collect patient data, detect and evaluate patients’ symptoms, such as measuring blood sugar, oxygen levels, body temperature, etc. Therefore, medical staff can quickly analyze whether someone has COVID-19 based on the data collected by artificial intelligence.

2. Machine Learning

According to Alsunaidi et al. (2021), the use of machine learning can improve accuracy in detecting COVID-19 by identifying new disease patterns, symptoms, and course of the disease and predicting the development of COVID-19 outbreaks, and allows for disease-related risk factors by utilizing big data technology. Furthermore, government and the ministry of health can use machine learning to describe the aspects of the COVID-19 epidemic and predict it early to prepare the medical infrastructure to manage the impact of a pandemic. This technology also helps to formulate strategies and proactive measures and make decisions related to the allocation of medical resources.

From the study above, it is evident that the use of machine learning against big data can help medical staff and doctors analyze and predict the impact of COVID-19 in the previously mentioned cases. In addition, the use of machine learning, in this case, can also improve the prediction accuracy in researching related diseases by using 20,000 available data.

3. Data Assimilation

In a study by Li et al. (2020), the use of data assimilation and big data can predict the spread of COVID-19 disease and provide preventive measures. This can be done by combining data assimilation methods, such as Susceptible Exposed Infectious Recovered (SEIR) model or Susceptible-Infectious-Removed (SIR) model, with real-time observation data and setting parameters based on the real-time data.

Through the research above, data assimilation can be applied to the study case mentioned earlier. By using data assimilation methods, medical staff can predict the spread of COVID-19 based on the collected data of 20,000 patients and take further action in preventing COVID-19 diseases.

4. Deep Learning

According to Pham (2021) research, deep learning can help the medical staff determine whether a person has COVID-19. Deep learning models such as Bayes-SqueezeNet can classify patients’ X-ray scan results, whether patient status is normal, viral pneumonia or COVID-19. Besides, deep learning can also speed up the required inspection time compared to the inspection time required for PCR testing.

From the research above, deep learning can help speed up data collection and classify the symptoms suffered by the previously mentioned cases. Deep learning can help analyze the amount of sputum collected in the lungs or pneumonia and then score the patient’s symptoms.

5. Geographic Information Systems

In a study by Zhou et al. (2020), the combination of geographic information systems (GIS) and big data can help the government map the distribution of the COVID-19 pandemic. GIS can map medical resource needs, social sentiment, logistical needs, and the number of COVID-19 patients who have recovered in each region. Furthermore, the GIS visualization can help governments and related organizations make decisions about the spread of COVID-19.

Through the study above, the GIS model can be combined with big data to help the government predict the spread of the COVID-19 virus from 20,000 patient data. Moreover, the government can also make the decision early about the spread of COVID-19.

References

  • Alsunaidi, S.J., Almuhaideb, A.M., Ibrahim, N.M., Shaikh, F.S., Alqudaihi, K.S., Alhaidari, F.A., Khan, I.U., Aslam, N. & Alshahrani, M.S. (2021). Applications of Big Data Analytics to Control COVID‐19 Pandemic. Sensors. 21 (7).
  • Ambigavathi, M. & Sridharan, D. (2018). Big Data Analytics in Healthcare. 2018 10th International Conference on Advanced Computing, ICoAC 2018. (December 2019). p.pp. 269–276.
  • Bragazzi, N.L., Dai, H., Damiani, G., Behzadifar, M., Martini, M. & Wu, J. (2020). How Big Data and Artificial Intelligence Can Help Against COVID-19. IE Business School. [Online]. p.pp. 4–11. Available from: https://www.ie.edu/business-school/news-and-events/whats-going-on/big-data-artificial-intelligence-can-help-covid-19/.
  • Corsi, A., de Souza, F.F., Pagani, R.N. & Kovaleski, J.L. (2020). Big data analytics as a tool for fighting pandemics: a systematic review of literature. Journal of Ambient Intelligence and Humanized Computing. [Online]. (0123456789). Available from: https://doi.org/10.1007/s12652-020-02617-4.
  • Haleem, A., Javaid, M., Khan, I.H. & Vaishya, R. (2020). Significant Applications of Big Data in COVID-19 Pandemic. [Online]. 2020. Indian Journal of Orthopaedics. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204193/. [Accessed: 23 March 2021].
  • Hussain, M.K., Hussain, M.J., Osman, M.B., Abdurraheem, M. & Al-areefi, M. (2020). Big Data in Healthcare. International Journal of Recent Technology and Engineering. 8 (6). p.pp. 2127–2131.
  • Kumar, S. & Singh, M. (2019). Big data analytics for healthcare industry: Impact, applications, and tools. Big Data Mining and Analytics. 2 (1). p.pp. 48–57.
  • Li, X., Zhao, Z. & Liu, F. (2020). Big data assimilation to improve the predictability of COVID-19. Geography and Sustainability. [Online]. 1 (4). p.pp. 317–320. Available from: https://doi.org/10.1016/j.geosus.2020.11.005.
  • Pham, T.D. (2021). Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning? Health Information Science and Systems. [Online]. Available from: https://doi.org/10.1007/s13755-020-00135-3.
  • R.C., A. (2020). Structured Data Vs Unstructured Data: Which One is Better for Your Business? [Online]. 2020. Available from: https://sarasanalytics.com/blog/stuctured-data-vs-unstructured-data. [Accessed: 27 March 2021].
  • Ramírez, M., Moreno, H.B.R. & Rojas, E.M. (2018). Big Data in HealthCare. [Online]. Springer Singapore. Available from: http://link.springer.com/10.1007/978-981-10-8476-8.
  • Sonnati, R. (2015). Improving Healthcare Using Big Data Analytics. Improving Healthcare Using Big Data Analytics. 6 (3). p.pp. 142–146.
  • Talend (2021). Structured vs. Unstructured Data: A Complete Guide. [Online]. 2021. Available from: https://www.talend.com/resources/structured-vs-unstructured-data/. [Accessed: 27 March 2021].
  • Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., Cao, Z., Wang, J., Yuan, W., Zhu, Y., Song, C., Chen, J., Xu, J., Li, F., Ma, T., Jiang, L., Yan, F., Yi, J., Hu, Y., Liao, Y. & Xiao, H. (2020). COVID-19: Challenges to GIS with Big Data. Geography and Sustainability. [Online]. 1 (1). p.pp. 77–87. Available from: https://doi.org/10.1016/j.geosus.2020.03.005.

--

--