Page 5 - i1052-5173-32-11
P. 5
analyzes the relationship between citations platform. The papers were placed into one THE HYBRID MACHINE-HUMAN
and their textual context (i.e., whether the of four classes: SUPPORT, NEGATE, APPROACH
citation is used in a positive way or negative NEGATE&SUPPORT, and UNRELATED Below, we detail the three key components
way). SCITLDR is used to create a short (see Table 1). The annotations for these four of our hybrid machine-human approach in
summary of the given paper (without truly classes were collected by two of the co- this experiment.
understanding what the underlying content authors of this effort, who are domain
means). Our work is complementary to these experts (i.e., geoscientists). The two anno- Contextualizing Findings: Time and
directions, because we aim for deeper lan- tators worked independently. Site Identification
guage understanding. That is, the purpose of Next, we implemented a natural language To analyze the relationship between volca-
the proposed approach is to spatially and processing (NLP) component for geosciences nism and climate change at different times in
temporally contextualize a given geoscience that extracts two types of information. First, the geological past and locations, we built a
research question and to identify whether the we contextualized individual publications by custom Named Entity Recognizer to extract
content of the papers analyzed supports or extracting and normalizing the geospatial spatial and temporal information from the
negates it. and temporal contexts addressed in these analyzed text. Named entity recognition
For this purpose, we developed an appli- papers (e.g., Pliocene, 4 million years ago, (NER) is a common NLP task that aims to
cation to geosciences to demonstrate the and Bering Sea). For example, Tucson and identify named entities within the given text
potential of our proposed approach to Saguaro National Park can be considered as and classify or categorize those entities under
experiment with the limitations of this type the same geographic location (for the pur- various predefined classes. Our focus in this
of literature and how they can be overcome. poses of this analysis), even though they are work is on the identification of locations and
The application investigates the research described differently in text. To facilitate the geological eras and epochs, which are neces-
question of whether there is a causal rela- consolidation of findings, we normalized the sary to contextualize the findings discussed
tionship between volcanism and climate geospatial contexts to absolute latitude/longi- in the papers.
change in the geologic record as seen tude coordinates (see the next section for Existing NER tools such as Stanford’s
through the lens of published literature. details). Similarly, temporal expressions such CoreNLP (Manning et al., 2014) or spaCy
Specifically, we ask whether volcanism as 4 million years ago were converted to geo- (Honnibal and Montani, 2017) focus on
influenced climate change in the deep time logical eras or epochs (e.g., Paleoproterozoic) generic locations, times, and dates rather
geologic archive. We selected this question to have a better overall understanding of the than geoscience-specific ones. For exam-
because several geological studies seem to relationship between volcanism and climate ple, when we fed the sample sentence
support this link (e.g., Lee and Dee, 2019). change on the geological time scale. “Clay mineral assemblages and crystallini-
Our results indicate more variability on Second, we built a document classifier that ties in sediments from IODP Site 1340 in
whether or not available studies on the sub- is trained to determine whether any given the Bering Sea were analyzed in order to
ject actually support this research question. paper supports the observation that “volca- trace sediment sources and reconstruct the
nism affected climate change,” so that we paleoclimatic history of the Bering Sea
SYSTEMATIC MACHINE REVIEW OF could make a prediction on new papers. The since Pliocene (the last 4.3 Ma)” into the
GEOSCIENCE DATA results of these two components were Stanford CoreNLP NER, the result was:
Since there was no pre-built corpus for aggregated into a publication knowledge Clay mineral assemblages and crystal-
this geosciences task, we extracted 1164 base, which contains the publication itself, linities in sediments from IODP Site [1340]
papers from the Web of Science website via the prediction of our classifier (SUPPORT, DATE in the [Bering Sea]LOCATION
the University of Arizona’s library. These NEGATE, NEGATE&SUPPORT, and were analyzed in order to trace sediment
papers were selected because they contained UNRELATED—see Table 1 for details), the sources and reconstruct the [paleoclimatic]
keywords relevant to the research question at occurrence of geological eras and epochs MISC history of the [Bering Sea]
hand, such as volcanism or magmatism, and (e.g., the frequency of Pliocene in a given LOCATION since Pliocene (the last [4.3]
climate change. This was implemented as paper), and the occurrence of geological loca- NUMBER Ma).
the Boolean query: (volcanism OR magma- tions (e.g., the frequency of Africa in a given Even though the Stanford CoreNLP NER
tism) AND “climate change,” where OR and paper). We used this knowledge base to visu- correctly identified Bering Sea as a
AND are the disjunctive and conjunctive alize the evidence for the research question LOCATION, it did not recognize geo-sci-
Boolean operators, and quotes indicate that investigated on the world map to identify ences- specific expressions, and, further, it
the entire phrase must be present. This query global temporal and geospatial patterns. classified expressions into the incorrect
extracted 1164 papers from the Web of
Science. We then randomly chose 200 papers TABLE 1. NAMES AND DESCRIPTIONS OF THE LABELS
and extracted the abstract, introduction, and USED DURING THE MACHINE CLASSIFICATION PROCESS*
conclusion sections from each paper to be Classification label Definition
manually annotated with the information if Support The given text supports the relationship between volcanism and climate change.
they support or do not support the research Negate The given text negates the relationship between volcanism and climate change.
question. Note that for this work we assume Negate&Support The same overall text both supports and negates the relationship between volcanism
that the authors’ data, interpretations, and and climate change, with different paragraphs discussing each relationship.
conclusions are correct. The annotation task Unrelated The given text is unrelated to the topic at hand, i.e., the relationship between
was conducted on FindingFive (https://www volcanism and climate change.
.findingfive.com), an online annotation *See text footnote 1.
www.geosociety.org/gsatoday 5