Breath of Clarity

Report Draft Workshop

Hi all,

I attached my Report Draft, thanks for the feedback!

Comment by Kyle Pesek:

Hello Mary,

Please see below for my peer review.

Writers Purpose

The main argument of this Report Draft is that utilization of Software as a Service (SaaS) can provide comprehensive analysis of large and differing sets of complicated climate data. The paper covered the main challenges posed by the ‘Big Data’ offered by the body of climate change research, how cloud computing has served as the launching pad for Big Data integrative analysis developmens, and how the proposed solution of a collaborative framework provides the most efficient methods of Bug Data accessibility from high-level considerations. Your thesis statement was clear as the opening to your Executive Summary, “The purpose of this paper is to present a solution to mitigate climate change by specifically improving the efficiency of broad integrative analysis given the applications of data-driven solutions in specific scientific fields”. I would rephrase the beginning of the Thesis to remove the passive language and also change the word ‘specific’ as you already used ‘specifically’ earlier in the sentence.

Research 9/10

Your research was a decent mix of 11 peer-reviewed research papers and 1 industry reference to the technical definition of SaaS. All of your included references were highly-specific to these climate data processing considerations. My only suggestion would be to include more citations from more articles (at least as recent as the last 5 years) as possible replacements for the numerous citations from sources dated 2016 or older. Also, the four Yang et al. references were the most frequently cited pieces throughout your Report. Consider diversifying the citations from numerous sources as you did in the Introduction to the Problem subsection compared to the multiple paragraphs that include many citations from a singular source. Perhaps more industry reports or media coverage of SaaS developments could prove insightful.

Executive Summary 8/10

The Executive Summary (ES) did a succinct job of covering the main points that your Report Draft dives into in further sections. However, you should provide another round of edits focusing on formal tone. For instance, your 4th sentence of the opening paragraph begins with “From there, I outline the points of insufficiency in traditional data analysis…”. I would reword this to remove the ‘I’ entirely with phrasing possibly like “From there, the points of insufficiency in traditional data analysis are weighed against the benefits provided by Big Data”. The second paragraph also begins with “From there,” which could be changed for variety.

Problem Definition 8/10

This section was the most forceful of the 3 main Report sections due to the consistent narrative thread of the challenges faced by Big Data currently stand. The four challenges of “volume, velocity, veracity, and variety” are presented in an effective way, but the various solutions covered in the next section leave the point unclear as to how these solutions specifically answer these defined problems in the realm of SaaS handling Big Data from climate research.

Solution Description 7/10

While the Introduction to SaaS in the Context of Big Data subsection covers the advantages offered by SaaS, it devotes no sentences to discuss the obstacles preventing SaaS adoption or difficulties faced post-implementation. As noted in the previous section, your coverage of the 5 solutions of mapping using clustering, using MapReduce, MERRA, modular software, and collaboration should all refer back to which of the specific problems that they directly or indirectly answer. Otherwise, it is unclear to what capacity any of the solutions are being argued for final incorporation (apart from the Collaboration aspect which does get slightly referenced in the beginning lines of the Evaluation section).

Evaluation 6/10

This section begins by saying “SaaS is the most valuable data analysis tool in the face of Big Data related to climate change (Yang et al. 2017).” However, this should be a statement based upon the evaluated body of presented research that has been synthesized with analysis of its advantages and drawbacks instead of a citation from an individual source. As it stands, the Report does not discuss whether any alternatives exist and where they stand in relation to SaaS adoption. You should also be calling back to the solutions proposed in the previous section so that the Evaluation draws conclusions as to how the analyzed solutions were cohesively included and were not merely tangential.

Genre Awareness 10/10

The Report Draft is well tailored to its audience who are familiar with SaaS concepts. However, the Introduction to the Problem subsection could be more filled-out to make the SaaS concepts more approachable to general novices.

Mechanics, Grammar, Formatting 1/5

There were a few minor citation issues that require further attention. For instance, your Reference section contains two sources that are notated as (Yang et al. 2017), which necessitates differentiating these using the 2017a and 2017b convention specified in Turabian section Additionally, the Table of Contents needs to align with formatting standards as outlined in Turabian section A.2.1.7 (it might help to choose one of the formatting styles shown in Figures A.3 thru A.6). Per the University College Format and Style Requirements (pgs. 9-10), your page numeration in the Header should be blank on the Title page (also, include an actual Title for your report here instead of after the Table of Contents), start with Roman numeral ‘ii’ on the Executive Summary page, Roman numeral ‘iii’ on the Table of Contents, and then starting with Arabic numeral ‘1’ from the Problem Definition onwards. Also, this was uploaded as a .pdf instead of a .doc/.docx file. After copy/pasting the text into another program for a word count, I found the submission to be about 1,000 words short of the 5,000 minimum requirement.

Originality of Thought and Design 6/10

While the concept of using a hosted SaaS resource to efficiently handle and generate value from the ‘Big Data’ associated with climate change datasets strikes me as particularly insightful and original, the flow of this paper is ultimately congested by almost continuous references to the Research from one sentence to the next. Consider consolidating references and citations to an absolute minimum for easier digestion of these points by your audience. Without more points that consist of your own conclusions, the frequency of citations gives off the feeling of a collection of restatements of relevant case studies rather than a firsthand analysis of what the findings from these various research sources mean for the underlying problem.

I am personally interested to see where you take your project from here. As someone working in the renewable energy sector, reliable climate data could better help inform decision-makers across the industry as to how to predict the long-term viability of renewable energy generation sites (particularly wherever wildfires, flooding, and/or sea level rise may become concerns). Good luck on your final weeks of this Program!

My Comment:

Hi Kyle,

Thanks for the thorough feedback.

I clarified the purpose with a new Executive Summary in order to establish a new found organization for the entire report. I created a problem statement and inserted it as the first paragraph of the executive summary. It more clearly defines the problem by connecting the various components of the problem, and, in turn, the beginning of the purpose. Good point regarding the tone of the executive summary. I changed it accordingly. Here it is:

“Given the large quantity of diverse data related to global climate change, there is unanswered demand for implementation of a scalable software capable of effectively conducting broad integrative analysis. That said, there is a pressing problem of inefficiency regarding analysis of data related to global climate change that is limiting collaboration amongst geoscience experts. As a result, the lack of collaboration presents a problem in fulfilling potential in terms of informing the best strategic action.

The solution to enhance efficiency regarding analysis of data related to global climate change is Software as a Service (SaaS), specifically a cloud computing framework. The service-oriented feature brings ease of use which is beneficial to guiding the transition into its full adoption and successful implementation. Specifically, cloud computing is the most effective solution considering its capability to improve efficiency in facilitating synthesis of the existing large quantity and diverse array of global climate change data. In doing so, its capability to cultivate increased collaboration brings the potential to derive key insights necessary to make massive progress in mitigating climate change. Finally, while a thorough evaluation reveals providing SaaS in the form of cloud computing is a valuable starting point to address the problem, there is a need for additional research to understand how to continue moving forward in improving efficiency of the optimal broad integrative data analysis tool.”

Clarifying Questions: How is the current tone of the Executive Summary given that my audience is the Center for Climate and Energy Solutions? Is there any unclear transitions present in the executive summary? I am curious whether the new executive summary can serve as a sound skeleton for the outline of my entire report.

I then used the Executive Summary to edit the Problem Definition section. Doing so supported me in seeing I had placed too much focus in explaining the technical details of the traditional data analysis software instead of walking the audience through the logical progression of the problem. Instead of having my problem definition headers be titled the various components of data analysis, I now have headers that are essentially phrases directly from my executive summary’s problem paragraph. That way, I can remember to constantly be looping all my my sub-arguments in the problem definition back to the purpose. I condensed the description of traditional data analysis software to make space for the problem definition to be more filled-out to make the SaaS concepts more approachable to general novices as well as enhance originality of thought and design. Thank you for the suggestion to consolidate references and citations to a minimum for easier digestion of the points. My new focus in revising is eliminating confusions as much as possible by inserting my own, original transitions between the ideas to improve flow.

My next step is to gather information from new sources about how lack of collaboration is leading to inefficiency in mitigating global climate change to expand upon my problem definition. As part of doing so, I will address your feedback about all of my included references being highly-specific to these climate data processing considerations. The root of my lack of clarity in the report is because all of the sources were hyper-focused on the software element of the argument. I have gathered information useful in providing me with the context about technical aspects of the topic and was able to organize it in a way that helped me understand details about the inefficiency of traditional software and features of the software I proposed. However, now, I need additional research that supports the connection between my problem and solution sections considering that the logic of my argument makes claims about concepts beyond just the software in itself. Moving forward, I need to make the current body of my report more concise so that I can make space to clearly connect the problem to my solution with new information throughout the paper.

Here is the edited version of the Problem Definition:

“Problem Definition

The Big Data Problem

Increasing Quantity and Diversity of Global Climate Change Data

The rapidly increasing quantity and diversity of global climate change data (Sebestyén et al. 2021) sets the stage for the Big Data problem. The original definition of the Big Data problem is characterized by the four Vs: volume, velocity, veracity and variety (Yang et al. 2017). The evolution of technologies, such as space-borne instruments, have significantly improved the ability to measure physical parameters of the Earth. The capacity scientists have to record physical phenomena is growing from one dimension, to two, three, four and even higher dimensions. Also, scientists can now use active and passive sensors working at a frequency of years, months, days, seconds or even milliseconds to gather important information more frequently and detect parameters that change at different speeds according to their nature. Additionally, scientists have sensors of various spatial resolution and spectral proficiencies to identify various phenomena for different usage as well as collect a wide range of spectral resolution and scope coverage from nanometer to millimeter (Yang et al. 2017). By 2030, NASA in itself will have produced 300+ petabytes of climate data that differs in format, spatiotemporal resolution and study purpose (Schanse et al. 2017). The various types of data and formats from different organizations all over the world presents a massive problem for computing sciences.

A Massive Problem for Computing Sciences

Fundamentally speaking, the problem is that traditional computing infrastructure does not scale well enough to keep up with Big Data. Standard tools for traditional data sets are not equipped to support analytics for Big Data because the majority of traditional tools only operate on one individual computer (Yang et al. 2017). For example, the computational requirements for deriving data on regional and global carbon, water and energy conditions from multi-sensor and multi-temporal datasets are too much for the capacity of existing analytical algorithms on a single computer to comprehend (Yang et al. 2015). Subsequently, mining interesting climate trends and spatiotemporal patterns from terabytes of data sets is important to climate studies (Li et al. 2016). Practical examples include investigating spatiotemporal distribution of extreme weather events, detecting temperature anomalies in the global climate system, and identifying geographical regions with similar climate patterns (Li et al. 2016). However, there is inefficiency in the broad integrative analysis of data from customarily separate geoscience fields to obtain fresh approaches to solving complex problems.

Moreover, timeliness is a vital factor in many applications that cannot be achieved using traditional computing software. For example, there is a high demand for real-time data analysis of natural and man-made disasters (Yang et al. 2015). The disasters are only going to intensify and become more frequent as climate change worsens. However, even before analyzing intricate spatiotemporal relationships, it takes nearly three hours to read one terabyte of data using a single computer with a 100 megabytes per second hard drive read speed (Li et al. 2016). In addition, the conventional strategy of storing the data in a centralized repository and, after the fact, moving it to specialized computing facilities for analysis is no longer efficient (Li et al. 2016). Essentially, the human species is facing a crisis of survival in the midst of global climate change and electing to use tools that operate too slowly.

Additionally, visualizing 3D/4D Big Data has been a major challenge that limits scientists ability to comprehend and forecast geographic phenomena for relevant decision making. The five main functionalities for Big Data visualization are: (i) interactive graphics; (ii) integrated, intuitive and approachable visual analytics; (iii) web-based interfaces to preview or filter data before the visualizations; (iv) in-memory processing and (v) easily distributed insight via mobile devices and web portals (Li et al. 2016). Designing and developing these functionalities is difficult because of the fusion of multiple data sources as well as the high spatial resolution and high-dimensionality of geospatial data (Li et al. 2016). Efficient visualization is dependent upon data organizations to facilitate quick access to advanced visualization algorithms of 3D/4D data to display the phenomena described by such data (Li et al. 2016). The variety of Big Data is redefining the data management paradigm, calling for technologies capable of cleaning, storing, and organizing unstructured data in a way that is conducive to assisting decision-making (Yang et al. 2017).

In turn, the volume-based definition of the Big Data challenge now encompasses the data itself, pertinent skills and technologies to help generate, collect, store, manage, process, analyze, present and utilize data, as well as the information and knowledge derived. Therefore, the ultimate Big Data challenge is to systematically leverage datasets that are too complex to handle with traditional data processing application software. Big Data reveals otherwise hidden patterns and brings to light unknown correlations to enhance decision-making. It is critical to efficiently analyze these datasets to recognize geographical regions with similar or disparate climate patterns, identify global temperature anomalies, and investigate spatiotemporal distribution of extreme weather events (Yang et al. 2017) to better understand geographic phenomena which will lead to making better informed predictions regarding issues such as long term climate change (Yang et al 2015). Therefore Big Data may bring significant value insofar as the handling of it, from storage to integration, is efficient (Yang et al. 2017). Further, data integration is critical for achieving the 5th V (value) of Big Data through integrative data analysis and cross-domain collaborations (Li et al. 2016).

Obstacles to Collaboration

The difficulty involved in cultivating fine-tuned global collaboration can be understood by considering that data quality includes four aspects: accuracy, completeness, redundancy and consistency (Yang et al. 2015). Since the intrinsic complex and heterogenous nature of Big Data makes data accuracy and completeness difficult to recognize and monitor, there is a major risk of false discoveries amongst collaborators (Yang et al. 2015). On top of that, in terms of completeness, there is geographically dispersed storage and management of Big Data. So a need for the integrating of data collected, stored and shared by dispersed providers, organizations and governmental agencies all over the world is not being fulfilled (Yang et al. 2015). Additionally, data redundancy control should be conducted in real-time, at the point of data collection to improve efficiency of proximate stages (Yang et al. 2015). However, traditional software is not capable of achieving the required timeliness. Finally, considering the data frequently changes as it is shared with many collaborators, ensuring data consistency and integrity is a challenge (Yang et al. 2015). It is important to support the input from many experts in different domains to have better a understanding of the data and, in turn, make wise decisions (Yang et al. 2015). Therefore, enhancing analysis of a broad scientific problem requires a mechanism that makes collaboration seamless to inform optimal global climate change action.”

Also, thank you for highlighting the need for me to differentiate the Yang et al. 2017 sources. I still need to go through the entire report and do that. It is a big endeavor because, in order to be accurate, I have to go back to the articles and confirm where I got each statistic labeled Yang et al. 2017 from. I will also fix the Table of Contents and page numbers. Thanks for being so detailed in regards to the directions about how to fix the Table of Contents and page numbers.

I just checked the word count and saw it was actually 5,040. So, I disagree with the feedback about it. I did not include the title page, executive summary, table of contents, references or appendix in my word count. Either way, I am changing the paper so much from the Report Draft stage to Final Report stage that I am going to be cutting the word count down heavily to eliminate confusion and make room for new material that helps clarify my purpose.

Your feedback about the solution section is so useful. The advantages of SaaS are adequately covered. Although I explained how SaaS cloud computing works, I did not convey how increased collaboration enhances decision-making and global climate change action works. I also did not provide a specific plan for how to implement it because MapReduce is more so an example of cloud computing adoption than a plan of how to make the adoption happen. However, I did put the difficulties faced post-implementation in the evaluation section under the future research agenda which is in accordance with the rubric. After receiving your feedback, I moved the mapping using clustering content and MERRA content to the evaluation section because it is proving the success of cloud computing in solving my problem which is more so evaluating the solution than describing it. With a more flushed out description of traditional data analysis software resulting in limited collaboration and how that all relates to climate change within the Problem Definition, it is going to be easier to refer back to which specific problems collaboration addresses.

Finally, I agree with the feedback about the evaluation section. The quoted statement is also not useful because I go on to talk about the difficulties faced in post-implementation directly after it. My plan is to completely restructure the evaluation. I can briefly list the advantages of all components of the solution (the software and how enhanced collaboration supports mitigation of global climate change), then present the mapping using clustering content and MERRA content, reframe the future research agenda as instead drawbacks concerning what cloud computing is lacking, research alternatives to cloud computing adoption and then research how to prove SaaS as the best option. I can do a thorough explanation of Table 1 (see my appendix) as my basis of conveying why cloud computing is the best option. I will focus on gathering sources produced recently by authors different from the ones I already have.

Clarifying question: How recent does a source need to be produced in order for it to be reputable? Is it within 7 years or within 10 years or dependent on the type of content?

Thanks for reviewing my paper and reading all of my revision thoughts. I am glad the topic is relevant to the professional sector that you’re in. Good luck!

Comment by Kyle Pesek:

Hello Mary,

Your rewriting of the Problem Statement and Executive Summary now strike me as effectively solution-oriented and the new wording of the first paragraph leaves the reader with a much clearer sense of the purpose for the Report. The tone is not only more professional, but also comes off more confident with the direct framing of what will be covered and why it is impactful. Although, the phrasing of the last sentenced could be made a bit more smooth. Consider phrasing the last sentence of the beginning paragraph such as “…in fulfilling potential by determining the best strategic action through readily accessible cloud-based data storage.” I essentially treated my Executive Summary as a helpful, guiding outline for the Report to follow; and per your Clarifying Question, see no reason why you should avoid doing so as well.

I am glad to read that you are incorporating more logical conclusions based upon the included citations while tying these back to your Problem Statement/Executive Summary. The inclusion of more of your own transitions to provide emphasis as to why the details provided in the first 2 sections merit evaluation in the final section should also provide focus when tailoring your Report to an audience as well-familiar with the topic of climate change as C2ES. Considering that the C2ES is an international leader on informing climate change policies, the minutae of technical details related to cloud-computing solutions should be provided a limited focus compared to how your suggested Solutions can yield efficiency breakthroughs in regards to problems faced by researchers handling massive amounts of climate data–as you correctly point out. Now that your audience is clearly defined, this should lead to much greater cohesion from point-to-point as you proceed through the sections or your revised Report Draft. Furthermore, it sounds like you are on the right track for future research in these remaining weeks to drive home the connection between the problem and the proposed solutions with as much impact as possible. Per your comment in the other Week 7 discussion, the inclusion of more numerical references will help reinforce the real-world implications of your solutions. Especially when so much of the topic revolves around data, expansion of the Appendix with relevant figures and/or tables would prove much more accessible to C2ES researchers and Executive leadership alike.

It is possible that my copy-paste from pdf into a wordcounter application earlier this week accidentally left out some parts of your paper. You say throughout your response that you need to aim for trimming down more parts of your paper, but your current 5,040 word count leaves you with nearly another thousand words to expound upon individual sections as you see fit.

As for your last Clarifying question regarding what determines recent versus aging sources, it seems like different professors hold various standards regarding this designation. As my own rule-of-thumb, I try to include a majority of sources from at least the last half-decade to be considered a comprehensive and current report. Unless there is a justification for data availability on a certain topic only being available with great infrequently (which is not the case for a burgeoning sector like cloud computing), the sources from 6-10 years back could already by supplanted by more recent research that either disproves or further clarifies the hypothesis of the previous research.

My Comment:

Hi Kyle,

Thanks for taking a thorough look at the report! I rephrased the last sentence of the first paragraph, as the comment mentioned. Great point to keep the audience in mind while restructuring and adding information into the report. I agree incorporating more numerical support is going to bring tangibility to my solution. Good point the cloud computing sector is burgeoning and therefore has a plethora of modern resources related to it. I appreciate the support, good luck with everything!