Contents
Executive Summary………………………………………………………………………………iii
Problem Definition………………………………………………………………………………..1
The Big Data Problem…………………………………………………………………….1
Volume…………………………………………………………………………….1 Velocity……………………………………………………………………………1 Variety……………………………………………………………………………..2 Veracity……………………………………………………………………………3 Value………………………………………………………………………………4
Solution Description………………………………………………………………………………5
Possible Solutions…………………………………………………………………………5
Cloud Computing in the Context of Big Data…………………………………………….6
Software as a Service………………………………………………………………6
Features in Relation to the Big Data Problem…………………………………….6
Plans for the Development and Implementation of Cloud Computing……………………7
The Connection Between Its Development and Implementation…………………7
Energy Efficiency…………………………………………………………………8 Reliability…………………………………………………………………………9 Education………………………………………………………………………….9
Evaluation………………………………………………………………………………………..10
Advantages of Cloud Computing as a Solution to the Big Data Problem……………….10
Results from Implementing the Solution…………………………………………………11
Mapping Glacier Changes Using Clustering on Cloud Computing………………11
A Modular Software………………………………………………………………12
A Spatiotemporal Indexing Approach…………………………………………….13
Recording Urban Land Dynamic by Cloud Computing with Landsat Series……15
Disadvantage of Cloud Computing as a Solution to the Big Data Problem……………..15 Security…………………………………………………………………………..15 Privacy……………………………………………………………………………16
The Next Step…………………………………………………………………………….16
Conclusion……………………………………………………………………………….17 References………………………………………………………………………………………..19 Appendix…………………………………………………………………………………………22
Executive Summary
The enhancement of data acquisition tools set the stage for the Big Data problem defined by volume, velocity, variety, velocity and value. There is unanswered demand for wide-scale implementation of a software capable of efficiently conducting broad integrative analysis of global climate change data. The lack thereof limits insightful decision-making and, in turn, restricts optimal global climate change action.
While there is a wide array of possible solutions that all address the issue, the best solution to the Big Data problem is cloud computing. The service-oriented feature brings ease of use which is beneficial in guiding the transition to its full adoption. Cloud computing is capable of fostering broad integrative analysis because it is all-encompassing in terms of providing strengths that are demanded by the the Big Data problem’s defining characteristics. Specifically, it contains the features of elasticity, pooled resources, on-demand access, self-service and pay-as- you-go. Further, its implementation is dependent upon how well it can adapt, become more energy efficient, increase in reliability and spread through education.
Finally, in comparison to other proposed solutions, cloud computing proves to be optimal. A series of case studies show its unique capabilities bring the potential to derive key insights necessary to make substantial progress in mitigating climate change. While a thorough evaluation reveals providing cloud computing is a valuable starting point to address the problem, there is a need for additional research to understand how to continue moving forward in improving its efficiency. The next step addresses the main disadvantage of cloud computing and paves the way for Big Data to continue creating additional opportunities from conquering challenges brought forth by innovation.
Problem Definition
The Big Data Problem
The continuously expanding capacity to measure physical parameters of the Earth sparks
the Big Data problem in climate science. Since traditional computing software does not scale to the same extent as incoming data, it inefficiently conducts broad integrative analysis. Therefore, the Big Data problem refers to the insufficient implementation of technology that efficiently helps store, manage, process, analyze, present and utilize the data to inform optimal action.
The Big Data problem is characterized by the data’s volume, velocity, variety, veracity and value.
Volume
The growing volume of data results in extremely large data sets. For example, by 2030, the National Aeronautics and Space Administration (NASA) in itself will have produced 300+ petabytes of climate data that differs in format, spatiotemporal resolution and study purpose (Yang et al. 2017a). Fundamentally speaking, traditional computing infrastructure does not scale to the capacity necessary to synthesize Big Data given that the majority of traditional tools only operate on one computer (Yang et al. 2017b). Further, since the Big Data characteristics function in tandem, the volume exasperates its other issues in computing.
Velocity
For instance, considering that velocity is based upon a measurement of volume in relation to time, the high volume of data decreases the speed of integration. Even before analyzing intricate spatiotemporal relationships, it takes nearly three hours to read one terabyte of data using a single computer with a hard drive read speed of 100 megabytes per second (Li et al. 2016). Additionally, timeliness in the transmission of data to decision-makers cannot be achieved using traditional computing technology. The conventional strategy of storing the data in a centralized repository and, after the fact, moving it to specialized computing facilities for analysis is no longer efficient (Li et al. 2016). However, there is a high demand for real-time data analysis of natural and man-made disasters (Yang et al. 2015). The disasters are only going to intensify and become more frequent as climate change worsens. It is illogical for scientists to use tools that operate at too slow of a speed in a climate crisis calling for a sense of urgency.
Variety
Additionally, given the variety of factors contributing to the change in climate, traditional computing infrastructure cannot make sense of Big Data’s complexity. Variety refers to the diverse data forms in which structural data are archived (Yang et al. 2017a). However, traditional machine learning is accustomed to using attribute-value data as input to the majority of models (Faghmous and Kumar 2014). It is concerning considering that a multitude of climate phenomena cannot be demonstrated in attribute-value form. For example, a hurricane evolves as a pattern over a deep spatiotemporal span. Thus, scientists cannot depict its occurrence with a binary value as it does not simply appear and then disappear. Instead, a storm gradually strengthens, plateaus, and dissipates (Faghmous and Kumar 2014). Other practical examples include detecting temperature anomalies in the global climate system and identifying geographical regions with similar climate patterns (Yang et al. 2017a). The computational requirements for deriving data on regional and global carbon, water and energy conditions from multi-sensor and multi-temporal datasets are too intense for existing analytical algorithms on a single computer to comprehend (Yang et al. 2015). A lack in understanding the interconnectivity of geographic phenomena restricts informed predictions about climate change (Yang et al 2015).
Further, in terms of variety, inability to visualize 3D/4D Big Data with traditional computing software has been a major challenge that limits ability to forecast future conditions. The five main functionalities for Big Data visualization are: (i) interactive graphics; (ii) integrated, intuitive and approachable visual analytics; (iii) web-based interfaces to preview or filter data before the visualizations; (iv) in-memory processing and (v) easily distributed insight via mobile devices and web portals (Li et al. 2016). Designing and developing the functionalities is difficult due to the necessity to bring together multiple data sources as well as the high spatial resolution and high-dimensionality of geospatial data (Li et al. 2016). Visualization is dependent upon great data organization to facilitate access to advanced visualization algorithms of 3D/4D data and display the phenomena described by such data to reveal otherwise hidden patterns and bring to light unknown correlations. The insufficiency in traditional software restricts the ability to obtain fresh approaches to solving complex problems (Li et al. 2016).
Veracity
The Big Data problem interferes with cross-domain collaborations in terms of its veracity (Li et al. 2016) which is defined as the diversity in quality and trustworthiness of the data (Yang et al. 2017b). The modern difficulty involved in cultivating fine-tuned collaboration in broad integrative analysis can be understood by considering that data quality includes four aspects: accuracy, completeness, redundancy and consistency (Yang et al. 2015). Since the intrinsic complex and heterogenous nature of Big Data makes data accuracy and completeness difficult to recognize and monitor, there is a major risk of false discoveries amongst collaborators (Yang et al. 2015). Additionally, in terms of completeness, there is geographically dispersed storage and management of Big Data without a way to consolidate the data collected, stored and shared by dispersed providers, organizations and governmental agencies all over the world (Yang et al. 2015). Additionally, data redundancy control should be conducted in real-time, at the point of data collection to enhance proximate stages of the analytics (Yang et al. 2015). However, as mentioned earlier, traditional software is not capable of achieving the required timeliness. Finally, considering the data frequently changes as it is shared with many collaborators, ensuring data consistency and integrity is a challenge (Yang et al. 2015). At the same time, it is important to support the input from many experts in different domains to have better a understanding of the data and, in turn, make wise decisions (Yang et al. 2015). Therefore, making collaboration quality and trustworthy is required to inform optimal global climate change action.
Value
Considering how Big Data contributes to mitigation of global climate change aids in understanding the Big Data problem. By cultivating knowledge discovery of potential relationships and causal inferences, Big Data contributes to predicting. Accurate forecasts can aid in adaptive policy-making in relation to climate change, whilst the value creation feature of Big Data puts emphasis on optimizing (Hassani et al. 2019). Being able to understand and predict based on drawing inferences across different cases is relatively straightforward (Hassani et al. 2019). However, optimization requires adequate Big Data analytics tools to have the modeling and infrastructure so as to maximize performance, efficiency, utility and achieve sustainable development (Hassani et al. 2019). Without fully implementing updated software to conduct Big Data analysis, the potential value of the data cannot be fully leveraged and is ultimately lost. While it is clear that replacement of traditional broad integrative analysis tools is necessary, there are many possible solutions.
Solution Description
Possible Solutions
The Big Data problem can be tackled with a range of possible solutions illustrated in Appendix A (Yang et al. 2017b). For instance, a distributed file system (DFS) stores and manages data in such a way that makes it convenient to share information among users while a non-relational (NoSQL) database does not require a fixed schema which leads to it being particularly supportive to heterogenous data and easy to scale. Further, by compiling data into a minimum amount of metrics so that users can rapidly locate and access it in a database, indexing improves storage and management while also enhancing transfer, processing and analysis. Alternatively, the MapReduce Hadoop system is a programming model that supplementally improves efficiency in processing, analysis, visualization, integration, and architecture. Moreover, through logically depicting relationships by structuring data in a specific way that adds meaning and integrates it, semantics is a useful solution because it is uniquely sufficient at satisfying demands in security/privacy as well as quality. Additionally, remote collaboration’s capacity to facilitate communication and productivity among a dispersed team of global climate change scientists makes it a particularly useful solution (Yang et al. 2013). The strategies all contribute to climate science’s goals of optimizing production and reducing GHG emissions.
Further, tactics evolved from early High Performance Computing (HPC), grid computing, peer-to-peer computing, and cyber infrastructure to the recent cloud computing (Yang et al. 2013). Cloud computing provides the most well-rounded solution for big data storage and computational requirements (Hassani 2019). It provides outstanding quality while addressing the roots of the Big Data problem in climate science.
Cloud Computing in the Context of Big Data
Software as a Service
In the midst of Big Data, one of the reasons cloud computing is particularly valuable is that it recognizes access to distributed computing for scientists as a worldwide service (Yang et al. 2013). The National Institute of Standards and Technology (n.d.), defined Software as a Service as the capability to access applications running on cloud infrastructure from various client devices without using a server. Any solution as a service, which is defined as a unified approach to outsourcing that integrates disparate systems, is sufficiently cost-efficient (Yang et al. 2017b). Additionally, considering cloud computing is difficult to understand and its implementation is quite abstract, a crucial aspect of SaaS is there is no need for scientists to comprehend application deployment details (Yang et al. 2017a). For example, the scientists do not need to devote time and energy to comprehending the hardware location or how the application is configured (Yang et al. 2017a). Cloud computing can supply platforms for running models so that geoscientists can concentrate on the content without taking into account the underlying mechanism carrying out the computational tasks (Yang et al. 2017a).
Features in Relation to the Big Data Problem
Appendix B shows cloud computing is capable of fostering broad integrative analysis because it contains a specific array of features that are demanded by the volume, velocity, variety veracity and value that define Big Data (Yang et al. 2017b). Cloud computing is a new computing paradigm for delivering computation as a fifth utility (after water, electricity, gas and telephony) with features of elasticity, pooled resources, on-demand access, self-service and pay- as-you-go (Schanse et al. 2017b). For instance, the data volume is processed with the large pooling of resources (Yang et al. 2017b). Users can easily upload and store public data into the Microsoft Cloud, while processing through the Windows Azure Platform (Yang et al. 2013). Also, in terms of velocity, cloud computing’s elasticity and on-demand features enable it to rapidly synthesize data which is a key asset in the measurement of software analytic system maturity (Yang et al. 2015). Additionally, the variety of multi-sourced inputs is addressed by elasticity, pooled resources and self-service advantages (Yang et al. 2017b). It offers easy-to-use interfaces to domain specialists without advanced technical knowledge so that they can manipulate Big Data (Yang et al. 2013). Crucially, the veracity of the Big Data is relieved by self-service to select the best-matched services and pay-as-you-go cost modeling (Yang et al. 2017b). Finally, cloud computing brings value represented as accurate forecasting with high resolution, justifiable cost and customer satisfaction with on-demand, elasticity and pay-as-you- go features (Yang et al. 2017b). A wide-scale implementation of cloud computing would provide unlimited data storage and management, simplify infrastructure, accelerate data processing, improve analysis, and reduce cost (Sebestyén et al. 2021).
Plans for the Development and Implementation of Cloud Computing
The Connection Between Its Development and Implementation
The degree of growth in cloud computing implementation will be shaped by how well it develops. Evolving visions for geosciences and Digital Earth will cultivate more demands for cloud computing advancements and, in turn, increase its implementation (Yang et al. 2017b). However, while cloud computing holds immense potential to enhance the impact of data- intensive applications, several barriers interfere with transitioning from traditional computing to cloud computing. For the purpose of developing cloud computing, there is much needed research to understand the current applications in regard to changing patterns in Big Data. A thorough model is necessary to forecast system behavior as the usage patterns change and working loads evolve (Yang et al. 2017b). For instance, as mentioned earlier, the majority of geoscientists do not have the parallel programming skills which require examining when and how to parallelize an application task. So, it would only develop more grounds for supporting cloud computing implementation if there was a parallel programming language to automatically generate parallelization code with simple and less programming work. One example may be dragging and sketching various modules or clicking a group of buttons (Yang et al. 2013). In order to leverage the complicated benefits of cloud computing, it is necessary to simplify it for the purpose of making its usage easy to implement and sustainable.
Energy Efficiency
It also makes synergistic sense to be outputting as minimal of a carbon footprint as possible considering the investigation of how cloud computing can optimally process large quantities of climate change datasets is for the ultimate purpose of mitigating global climate change. However, high energy consumption is a significant hurtle to making sustainable computing a reality in the context of large-scale data centers (Yang et al. 2017b). Algorithms for activation of cloud computing systems need to be examined in the context of energy efficiency and sustainability (Yang et al. 2017b). Research to tackle this issue is beginning from different approaches including hybrid cloud solutions, cloud cost modeling and intelligent scaling (Yang et al. 2017b). Notable efforts, including GreenHDFS, GreenHadoop and Green Scheduling, have been proposed to tackle this issue and need to be incorporated into its implementation (Yang et al. 2017b). It is also is important to examine the on-site renewable energy generation compatibility with the power demands of a typical cloud computing facility. Since cloud service providers are typically cheaper and more efficient than traditional data centers such that the transition to cloud computing between 2021 and 2024 should prevent at least 629 million metric tons of CO2 from entering the atmosphere (Francis 2021), the inclusion of renewable energies drastically reduces total overall emissions. Cloud computing needs to collaborate with energy efficient infrastructure as the latter is going to be a growing sector in the coming years.
Reliability
Additionally, it’s necessary to devote attention towards ensuring cloud computing’s reliability. Scientists have come up with an intrusive analyzer as part of cloud computing that detects interesting events, such as task failure (Yang et al. 2017b). It is crucial to incorporate the intrusive analyzer component into the solution, particularly considering that cloud computing is under scrutiny as it is surrounded by other possible solutions in its early adoption phase.
Education
Finally, the paradigm conveyed in this report can alter how society works, thinks and lives insofar as Big Data concepts are a priority in formal education arenas. It is essential to enhance education related to Big Data and cloud computing by strengthening Big Data and spatiotemporal thinking education initiatives from K-12 onward to professional training (Yang et al. 2017b). Teaching various components of Big Data and recognizing their impacts from analytics to intelligence will support the development in understanding Big Data, cloud computing and their applications (Yang et al. 2017b) It will prepare the future workforce in realizing the potential power held in the ability to efficiently conduct broad integrative analysis of Big Data and lead to new insights that will inform optimal climate change action.
Evaluation
In order for the solution to be successful it needs to address the elements that characterize Big Data: volume, velocity, variety, veracity, and value. Cloud computing’s advantages set it apart from the other possible solutions. The examples of implementation reveal that cloud computing is not only theoretically a stellar strategy in itself, but also is successful in practice and can be used to enhance other solutions. However, as Appendix A showed, cloud computing does have a disadvantage in terms of its security and privacy issues. Finally, a research agenda is necessary in order to reap maximum benefits from cloud computing and ultimately take the most effective global climate change action.
Advantages of Cloud Computing as a Solution to the Big Data Problem
Cloud Computing provides a flexible tower of massive computing services in a scalable
manner at a low cost (Yang et al. 2017b). A wide-scale adoption of cloud computing as a service in climate science would lower the barriers to risk and organizational change, foster innovation and experimentation, facilitate technology transfer, provide the agility required to meet increasing and changing needs, and crucially increase the capacity to engage communities in the construction of new capabilities (Schanse et al. 2017). Cloud computing not only provides the computing infrastructure to support Big Data but also provides a tested business model to sustain Big Data innovation (Yang et al. 2017b). Its implementation is a sound strategy for reducing cost even independent of its superior capabilities. Taking the latter into account further guarantees that cloud computing is a valuable solution to inform optimal climate change action.
The characteristics of Big Data present the need for implementation of technology containing attributes conducive to enhancing critical success factors of broad integrative data analysis including storage, transfer, management, processing, analysis, visualization, integration, architecture, security/privacy, quality, and cost/energy efficiency. Appendix B reveals that each of the possible solutions are individually fulfilling in regards to some of the critical success factors. Nonetheless, as Appendix B shows, cloud computing is comparatively advantageous based upon satisfying the highest number of critical success factors. Although remote collaboration satisfies the same highest number of critical success factors, cloud computing is superior in quality and can improve upon its inferior standing in security and privacy. Examples of cloud computing’s implementation reveal how remarkable insights are derived from its high quality methodologies that enhance broad integrative analysis of Big Data in climate science.
Results from Implementing the Solution
Mapping Glacier Changes Using Clustering on Cloud Computing
For the purpose of overcoming the computational barriers faced in Remote Sensing Big Data analysis, researchers proposed to leverage the capabilities of cloud computing infrastructure for processing particularly large datasets (Ayma et al. 2013). The researchers studied implementing Expectation Maximization (EM) algorithms along with innovative K-Means algorithms as distributed clustering solutions. They extended the functionalities of the InterCloud Data Mining Package, a collection of distributed classification strategies, by using the clustering algorithms to support its processing of multispectral images (Ayma et al. 2013).
To validate its added clustering functions on cloud computing infrastructure, researchers investigated the Ausangate glacier on the Andes Mountains, by using multi-temporal Remote Sensing analysis to map changes in the environment (Ayma et al. 2013). Their evaluation of results are based upon the thematic accuracy and the computational performance achieved by their solution. They evaluated thematic accuracy by comparing the glacier areas that were automatically detected by the clustering approaches with manually selected ground truth data. In order to determine computational performance, they compared the computational load involved in conducting the clustering processes sequentially with a distributed fashion using a local mode and cluster configuration over a cloud computing infrastructure. They concluded that K-Means algorithm had a better performance than the EM algorithm in, both, thematic accuracy and computational performance (Ayma et al. 2013). The K-Means algorithm achieved higher accuracy and higher speedups when deployed over the distributed infrastructures (Ayma et al. 2013). Their approach offers both a flexible and scalable solution functioning over a cloud computing environment, enabling end-users to perform Remote Sensing Big Data analysis over infrastructures that are similarly distributed (Ayma et al. 2013).
A Modular Software
Scientists also proposed a modular software that enables processing of Big Data in the cloud environment as well as supports multiple algorithm design paradigms and in-memory computing or agent-baed programming (Krämer and Senner 2015). The software contains a web- based user interface where domain experts can use a domain-specific language to define high- level processing workflows (Krämer and Senner 2015). It uses declarative and procedural knowledge, encoded in regulations according to the constraints defined by the user, to produce a processing chain detailing the conduction of the workflows on a particular cloud infrastructure (Krämer and Senner 2015). The services correspond with each other via a DFS that is scalable and fault-tolerant (Krämer and Senner 2015). They specialize in processing the variety in Big Data and do not depend on only one specific cloud infrastructure. Instead, they support many of the models. In tandem with the power to manipulate the processing through domain-specific language based workflows, this software has quite flexible and adjustable architecture and is useful to leverage the processing capability of distributed computing environments for high volume, geospatial data (Krämer and Senner 2015).
A Spatiotemporal Indexing Approach
Cloud computing can also be implemented in other ways to enhance existing strategies by making them more adaptable to satisfying demands of Big Data. MapReduce has become a smart tool for cloud providers to choose in providing data analysis because it provides high scalability (Li et al. 2016) and many traditional algorithms and data processing in a single machine environment are transferred to the MapReduce platform (Yang et al. 2017). As part of Map Reduce, Big Data is directly stored in a Hadoop Distributed File System. However, since Hadoop is not equipped to handle spatiotemporal data, researchers looked into how to change the computing resources to successfully manage the dynamic geo-processing workload. They proposed an indexing approach to process Big Data with MapReduce (Li et al. 2016).
Specifically, researchers proposed a spatiotemporal index to unify the logical array-based data model (space, time, and variables) and the physical data layout (node, file, and byte), which allows for rapidly retrieving data. The researchers presented a solution to automatically scale the Hadoop cluster in the cloud using a comprehensive framework (Li et al. 2016). In doing so, they could designate the correct amount of computing resources aligned with the dynamic geo- processing workload (Li et al. 2016). The researchers used a Digital Elevation Model interpolation as a prototype system to show the proposed scaling mechanism was a success (Li et al. 2016). Results demonstrated that the auto-scaling technology had the capability to (1) reduce the computing resource utilization by 80% while providing similar performance as a full- powered cluster; and (2) manage the peak processing workload effectively by automatically increasing the computing resources to guarantee the processing is completed in a reasonable timeframe (Li et al. 2016). The cost-effective auto-scaling approach is valuable for optimizing the performance of geospatial applications to address Big Data challenges in GIScience (Li et al. 2016). The researchers created a way for MapReduce to achieve high data locality and a more balanced workload (Li et al. 2016).
The spatiotemporal indexing approach via Map Reduce was evaluated by NASA’s climate reanalysis dataset: Modern-Era Retrospective Analysis for Research and Applications (MERRA). MERRA used a combination of data acquired on the ground with basic numerical models to create a global temporally and spatially consistent integrative analysis of 26 key climate variables (Schnase et al. 2014). Results demonstrated that the spatiotemporal indexing approach is aligned with the native, array-based data (Schnase et al. 2014). It also showed that, even while using the same computing cluster, the spatiotemporal index accelerates querying and processing by approximately 1000 percent (Schnase et al. 2014). Additionally, it demonstrated that computationally extensive tasks can be strategically broken down into smaller operations to be processed in parallel with high data locality which also leads to a well-balanced workload throughout the nodes (Schnase et al. 2014). The MERRA analytics service enabled total life cycle management of gigantic scientific repositories (Hassani 2019). Finally, MERRA analytics revealed cloud computing’s capacity to engage communities in the construction of new capabilities is perhaps the most important link between cloud computing and Big Data (Schnase et al. 2014). Speaking of communities, cloud computing informs population trends.
Recording Urban Land Dynamic by Cloud Computing with Landsat Series
In order to achieve strategic climate change adaption through sustainable urban development, it is essential to gain a thorough understanding of land cover land use dynamics in urban regions. Rapid urban growth has sparked massive problems such as pollution, carbon emissions, and biodiversity loss. At the core of global climate change, cities are high impact land cover land use change (LCLUC) hotspots (Dong 2020).
Researchers used Landsat images to create a Google Earth Engine (GEE) application for high-resolution urban LCLUC mapping with a classification scheme. The change detection, showed extensive urban expansion from 2000 to 2019 (Dong 2020). The implementation of Landsat images was crucial in making it so that the wall-to-wall mapping of temporal land dynamics was at a high spatial resolution (Dong 2020). The scientists also quantified the annual LCLUC in Changchun and detected the abrupt turning points of LCLUC. They were able to classify annual LCLUC in Changchun with high accuracy (Dong 2020). The scientists also gathered ancillary data to reveal social-economic status is correlated with LCLUC rate (Dong 2020). The study was part of the initial effort to understand the drivers of change with particularly accurate, scalable method to support better decision-making for sustainable urban development and climate change mitigation (Dong 2020). The primary use of GEE for mapping urban land makes it transferable to other users. However, in order for it to function at maximum capacity, mechanisms need to be in place to protect the data.
Disadvantages of Cloud Computing as a Solution to the Big Data Problem
Security
Big Data brings security issues in terms of encryption standards, methodologies and algorithms that cloud computing is not equipped to navigate. Previous studies of data encryption that exclusively investigate small-to-medium-size data do not apply to the large-scale aspect of Big Data (Yang et al. 2017b). The current security mechanisms are incapable of providing a self- aware security approach from security attacks (Abdulsalam and Hedabou 2021). In addition, data security policies designed to align with the structured data contained in traditional database management systems are ineffective at handling the extreme variety in Big Data (Yang et al. 2017b). Standard definitions of cloud policies have not been sufficiently defined and therefore may conflict with each other, impacting the enforcement of confidentiality (Abdulsalam and Hedabou 2021). With virtualized storage, data owner concerns range from not being able to fully guarantee integrity to even having doubt in data availability (Yang et al. 2017b). Therefore, it is essential to create policies for data access control that take the needs of Big Data into account. Scientists are not going to be nearly as motivated to engage in broad integrative data analysis to inform optimal climate change action insofar as they do not feel safe doing so.
Privacy
Further, the unprecedented networking among devices and computing platforms brings privacy issues into consideration. Specifically, it is unclear as to where an individual’s location, behavior and transactions are digitally recorded (Yang et al. 2017b). The privacy concerns increase exposure of a gap between the traditional regulations and Big Data which are grounds to support the motion for new policies (Yang et al. 2017b). Cloud computing’s disadvantage simply prompts the formation of a conquerable agenda for further research considering, despite its flaw, the technology has proven to be immensely successful.
The Next Step
Therefore, the next step to thoroughly implement cloud computing as the wide-scale tool for sharing data across various enterprises and conducting broad integrative analysis of climate change data is to improve its security and privacy. Initially, it would be useful to establish privacy-preserving protocols that strengthen and maintain user confidentiality (Abdulsalam and Hedabou 2021). Cloud computing will still be vulnerable to security and privacy threats without the practical adoption of adaptive mechanisms for efficient client and user experience (Abdulsalam and Hedabou 2021). Crucially, secure adaptive techniques are ubiquitous and can be adopted at any stage of an underlining technology, from hardware and software to the core computing infrastructure (Abdulsalam and Hedabou 2021). Secure adaptiveness also implies that the system can self-protect during multiple attacks or from a malicious user that is exploring multiple vulnerabilities (Abdulsalam and Hedabou 2021). The introduction of adaptive systems will also lower operation costs in complex environments and uncertainty (Abdulsalam and Hedabou 2021). Just as the next steps to fully implement the cutting-edge broad integrative analysis tool was derived from its disadvantage, the optimal solution of cloud computing in climate science arose out of necessity to create a cost-effective answer to the Big Data problem.
Conclusion
The Big Data problem is a challenge that can be leveraged into a valuable opportunity by adapting to the change. The challenge originated from the curiosity of scientists who created the need for vessels to answer their questions. It prompted inventors to see the challenge as a unique opportunity to come up with tools that would empower humans to measure parameters of the Earth beyond anything they could imagine. As a result of the inventors seizing the opportunity, they created the Big Data challenge in climate science. However, considering cloud computing already exists to integrate the data, the challenge is clearly a life-changing opportunity to mitigate global climate change.
The focus just needs to shift from gathering the best data to implementing innovating processes that will efficiently integrate the information that humans already hold. It is essential to utilize all of the existing data analysis tools to their fullest potential in order to derive the highest level of insight. The urgency of the issue in the face of global warming calls for the need to implement the most efficient, cost-effective methods to understand how to take action. Getting a handle on the data systems that inform action amidst climate change is a way of clearly orienting collective consciousness towards a focus that will protect the planet.
References
Abdulsalam, Yunusa Simpa and Mustapha Hedabou. 2021. “Security and Privacy in Cloud Computing: Technical Review.” Future Internet 14, no. 11 (December): 1-27. https:// doi.org/10.3390/fi14010011
Ayma, V., Beltrán, C., Happ, P., Costa, G., and Feitosa, R. 2019. “Mapping Glacier Changes Using Clustering Techniques on Cloud Computing Infrastructure.” The International Archives of the Photogrammetry, Remote Sensing, and Spatial Information Sciences 42, no. 2/W16 (September): 29–34. https://doi.org/10.5194/isprs-archives- XLII-2-W16-29-2019.
Dong, Yulin, Zhibin Ren, Yao Fu, Zhenghong Miao, Ran Yang, Yuanhe Sun and Xingyuan He. 2020. “Recording Urban Land Dynamic and Its Effects during 2000-2019 at 15-m Resolution by Cloud Computing with Landsat Series.” Remote Sensing. 12 (July): 1-19. https://doi:10.3390/rs12152451
Faghmous, James and Vipin Kumar. 2014. “A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science.” Big Data. 2, no. 3 (September): 155-163. https://doi:10.1089/big.2014.0026
Francis, Taylor. 2021. “The Next Big Cloud Competition is the Race to Zero Emissions.” World Economic Forum, September 20. https://www.weforum.org/agenda/2021/09/the-next-big- cloud-competition-is-the-race-to-zero-emissions/
Hassani,, Hossein, Xu Huang, and Emmanuel Silva. 2019. “Big Data and Climate Change.” Big Data and Cognitive Computing. 3, no. 1 (February): 1-17. https://doi.org/10.3390/ bdcc3010012.
Krämer, Michael and Ivo Senner. 2015. “A Modular Software Architecture for Processing of Big Geospatial Data in the Cloud.” Computers and Graphics 49 (June): 69–81. https://doi.org/10.1016/j.cag.2015.02.005.
Li, Zhenlong, Chaowei Yang, Kai Liu, Fei Hu, and Baoxuan Jin. 2016. “Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data.” International Journal of Geo-Information 5, no. 173 (September): 1-14. https://doi.org/10.3390/ijgi5100173.
Li, Zhenlong, Fei Hu, John L. Schnase, Daniel Q. Duffy, Tsengdar Lee, Michael K. Bowen, and Chaowei Yang. 2016. “A Spatiotemporal Indexing Approach for Efficient Processing of Big Array-Based Climate Data with MapReduce.” International Journal of Geographical Information Science 31, no. 1 (January): 17-35. https://doi.org/ 10.1080/13658816.2015.1131830.
Lushbough, Carol M., Etinne Z. Gnimpieba, and Rion Dooley. 2014. “Life Science Data Analysis Workflow Development Using the Bioextract Server Leveraging the iPlant Collaborative Cyberinfrastructure.” Concurrency and Computation: Practice and Experience 27, no. 2 (February): 408–419. https://doi.org/10.1002/cpe.3237.
National Institute of Standards and Technology. 2022. “Software as a Service (SaaS).” Accessed January 18. https://csrc.nist.gov/glossary/term/software_as_a_service.
Schnase, John, Daniel Duffy, Glenn Tamkin, Denis Nadeau, John Thompson, Cristina Grieg, Mark McInerney, and William Webster. 2014. “MERRA Analytical Services: Meeting the Big Data Challenges of Climate Science Through Cloud-Enabled Climate Analytics-as-a- Service.” Computers, Environment and Urban Systems 61 (January): 198-211. https://doi.org/10.1016/j.compenvurbsys.2013.12.003.
Sebestyén Victor, Timea Czvetkó, and Janos Abonyi. 2021.“The Applicability of Big Data in Climate Change Research: The Importance of System of Systems Thinking.” Frontiers in Environmental Science 9 (March). https://doi.org/10.3389/fenvs.2021.619092.
Yang, Chaowei, Manzhu Yu, Fei Hu, Yongyao Jiang, and Yun Li. 2017a. “Utilizing Cloud Computing to Address Big Geospatial Data Challenges.” Journal of Computers, Environment and Urban Systems 61 (January): 120-28. http://doi.org/10.1016/ j.compenvurbsys. 2016.10.010.
Yang, Chaowei, Min Sun, Kai Lu., Qunying Huang, Zhenlong Li, Zhipeng Gui, Yunfeng Jiang, Jizhe Xia, Manzhu Yu, Chen Xu, Peter Lostritto, and Nanying Zhou. 2015. “Contemporary Computing Technologies for Processing Big Spatiotemporal Data.” In Space-time integration in geography and GIScience, 327-351. Netherlands: Springer.
Yang, Chaowei, Qunying Huang, Zhenlong Li, Kai Liu, and Fei Hu. 2017b. “Big Data and Cloud Computing: Innovation Opportunities and Challenges.” International Journal of Digital Earth 10, no. 1 (January): 13-53. https://doi.org/10.1080/17538947.2016.1239771.
Yang, Chaowei, Yan Xu, and Douglass Nebert. 2013. “Redefining the Possibility of Digital Earth and Geosciences with Spatial Cloud Computing.” International Journal of Digital Earth 6, no. 4 (May): 297–312. https://doi.org/10.1080/17538947.2013.769783.