Page 7 - i1052-5173-32-11

P. 7

B words, the two annotators somewhat agreed
on whether a given paper supported or
negated the observation that “volcanism
affected the climate change.” This “moder-
ate” agreement is often found in this type of
annotation task since the research question
itself is quite complex and only part of the
papers (e.g., abstract, introduction, conclu-
sion) was provided to the annotators.

Classification of Results
We evaluated the quality of the proposed
classifiers that were trained on the annota-
tions by comparing the micro-F1 score cal-
culated using 10-fold cross validation. More
formally, we collected the algorithm’s pre-
C dictions on each test partition, and calcu-
lated the micro-F1 score (see supplemental
material, including a formal definition of
these measures in document 3) from all
these predictions.
In these experiments, we observed that
the MLP classifier outperforms both the
NB-SVM and SVM classifiers, and that the
ensemble approach does not improve over
the performance of the MLP method (see
supplemental document 3 for all these
results). Informed by these results, we used
the MLP model to classify all the 957
remaining papers in the collected data set
on whether they supported/negated or were
unrelated to the research question at hand.
D
Aggregation of Results for
Visualization
With the two components described
above that (a) place a scientific finding in
its proper geospatial and temporal context,
and (b) identify if publications support or
negate the research question at hand, we
can aggregate and visualize results at
scale. To further simplify the visualiza-
tions, we used the geopy (https://pypi.org/
project/geopy/) Python library to convert
IODP sites to latitudes and longitudes, and
we converted the identified specific geo-
Figure 1 (continued from page 6). logical periods and epochs into broader
(larger time intervals) geological eras. For
annotators. After reading the provided text, supplemental document 3). Before conduct- each paper analyzed, we used the most fre-
the annotators determined whether the given ing the annotation session, authors discussed quent top k (where k = 1, or k = 3) spatial
paper supported or negated the relationship annotation criteria using papers that were and temporal entities for context.
between volcanism and climate change. As a not selected for annotation. To measure the Figure 1 shows several visualizations of
result, we produced 400 annotation results agreement between annotators, Cohen’s the results, with light blue indicating sup-
(200 papers × 2 annotators). All of 400 anno- kappa score (Cohen, 1968) was measured. port for the observation that volcanism
tation results were used as a data set to train, Cohen’s kappa score is a commonly used impacts climate change and pink negating
validate, and evaluate the proposed system. metric to measure the agreement between the observation. The sizes of the circles
Thus, even the disagreement between two two annotators. The Kappa result was 0.523, were determined based on the number of
annotators was used as data so that the pro- which showed moderate agreement between papers that the classifier predicted the
posed system could learn the ambiguity (see annotators (Landis and Koch, 1977). In other corresponding label (i.e., light blue for

www.geosociety.org/gsatoday 7

2 3 4 5 6 7 8 9 10 11 12