Page 8 - i1052-5173-32-11
P. 8
SUPPORT, and pink for NEGATE). Figure human/machine interaction must continue els, J.X., Strömberg, C.A.E., and Yanites, B.J.,
1A shows the most frequent locations dur- if this system is to be improved. 2017, Biodiversity and topographic complexity:
Modern and geohistorical perspectives: Trends in
ing the Cenozoic in Europe, and Figure 1 All in all, this experiment finds strong sup- Ecology & Evolution, v. 32, no. 3, p. 211–226,
shows the top three most frequent locations port in favor of feedbacks existing between https://doi.org/10.1016/j.tree.2016.12.010.
during the Cenozoic in North America. volcanism and climate change. However, the Cohen, J., 1968, Weighed kappa: Nominal scale
When manually inspecting the machine precise correlation is not a simple one. Our lit- agreement with provision for scaled disagreement
prediction results from the MLP model, the erature parsing system suggests that we do not or partial credit: Psychological Bulletin, v. 70,
no. 4, p. 213–220, https://doi.org/10.1037/h0026256.
domain experts observed that 11 out of 17 yet have a clear and complete understanding Cortes, C., and Vapnik, V., 1995, Support-vector
data points within the North American con- of how volcanic events affect climate change. networks: Machine Learning, v. 20, no. 3,
tinent were correctly identified and visual- p. 273–297, https://doi.org/10.1007/BF00994018.
ized on the world map. Out of the six errors, CONCLUSIONS Domingos, P., 2015, The Master Algorithm: How the
four data points were from simulation The result of this preliminary work intro- Quest for the Ultimate Learning Machine Will Re-
make Our World: New York, Basic Books, 352 p.
papers, and two data points were based on duced a methodology to automatically pro- Gernon, T.M., Hincks, T.K., Merdith, A.S., Rohling,
incorrect predictions by the MLP classifier, vide a global review of the geoscientific litera- E.J., Palmer, M.R., Foster, G.L., Bataille, C.P., and
as identified by the domain experts. For ture and to evaluate the impact of specific Müller, R.D., 2021, Global chemical weathering
example, one pink circle (i.e., the corre- research questions (i.e., understand if the dominated by continental arcs since the mid-Pal-
sponding paper was classified as not sup- question is [mostly] supported or rejected by aeozoic: Nature Geoscience, v. 14, p. 690–696,
https://doi.org/10.1038/s41561-021 -00806-0.
porting the observation that volcanism the literature), in this case the causal relation- Herman, F., Seward, D., Valla, P.G., Carter, A.,
impacts climate change) was incorrectly ship between volcanism and climate change. Kohn, B., Willett, S.D., and Ehlers, T.A., 2013,
predicted when the actual paper was unre- We show the promises and limitations of this Worldwide acceleration of mountain erosion un-
lated with respect to this observation. approach to the geoscience literature with this der a cooling climate: Nature, v. 504, p. 423,
These figures immediately highlight sev- admittedly simplistic example. This approach https://doi.org/10.1038/nature12877.
eral important observations: helps us process and interpret a large amount Landis, J.R., and Koch, G.G., 1977, The measure-
ment of observer agreement for categorical data:
• Our data processing reduces the search of published scientific papers, without the Biometrics, v. 33, no. 1, p. 159–174, https://doi.org/
space by almost two orders of magnitude need for human annotators to invest time in 10.2307/ 2529310.
(from ~1,000 papers that are shallowly reading and parsing all of the papers. In addi- Lee, C-T., and Dee, S., 2019, Does volcanism cause
related to the topic of interest to 17 that tion, with the visualization, researchers are warming or cooling?: Geology, v. 47, no. 7,
validate/invalidate the current observation able to investigate chronological changes in p. 687–688, https://doi.org/10.1130/focus072019.1.
that volcanism affects climate change), the relationship between volcanism and Honnibal, M., and Montani, I., 2017, spaCy 2: Natu-
ral language understanding with Bloom embed-
while our visualizations allow the scientist climate change. This approach could be dings, convolutional neural networks and incre-
to quickly draw important conclusions expanded to any number of queries in the geo- mental parsing: https://spacy.io/.
that would not be easily available other- science literature for the systematic analysis Manning., C.D., 2015, Computational linguistics
wise. For example, our figures show that of various observations and ideas by examin- and deep learning: Computational Linguistics,
while the majority of publications support ing a large body of previously published v. 41, no. 4, p. 701–707, https://doi.org/10.1162/
COLI_a_00239.
the hypothesis investigated that volcanism papers. Results can be further plotted on Manning, C., Surdeanu, M., Bauer, J., Finkel, J.,
impacts climate change, not all do. reconstructed various sample or study loca- Bethard, S., and McClosky, D., 2014, The Stan-
• Similarly, this bird’s-eye-view of a scien- tions using paleogeographic maps. ford CoreNLP Natural Language Processing
tific question allows one to quickly iden- It is vital to emphasize that the proposed Toolkit: https://doi.org/10.3115/v1/p14-5010.
tify “blank spaces” in research, i.e., topics methodology is hybrid, requiring direct col- Raymo, M.E., and Ruddiman, W.F., 1992, Tectonic
forcing of late Cenozoic climate: Nature, v. 359,
that are insufficiently investigated. For laboration between humans and machines. p. 117–122.
example, our visualizations show that Valenzuela-Escárcega, M.A., Hahn-Powell, G., and
while support for our research question is For example, geoscientists were required to Surdeanu, M., 2016, Odin’s Runes: A rule lan-
provide training data for our research ques-
well represented for the North American tion classifier. Further, as discussed, our guage for information extraction, in Proceedings
continent, it is scarce in other continents. of the 10th International Conference on Lan-
• Further, this work allows one to identify resulting classifier is only ~80% accurate, guage Resources and Evaluation, LREC 2016,
https://aclanthology.org/L16-1050.
(potential) contradictions in scientific find- which means that, in order to improve it, it Wang, S., and Manning, C.D., 2012, Baselines and
ings quickly, which provides opportunities needs continuous feedback from the scien- bigrams: Simple, good sentiment and topic classi-
for better science. For example, Figure 1B tists using it. Longer term, we envision a fication, in 50th Annual Meeting of the Associa-
shows apparent contradictions in findings community-wide effort in which such clas- tion for Computational Linguistics, ACL 2012—
from the East coast of the North American sifiers are created and deployed in the cloud Proceedings of the Conference, https://doi.org/
https://dl.acm.org/doi/10.5555/ 2390665 .2390688.
continent in the Cenozoic. to mine an arbitrary number of observations Zhang, P., Molnar, P., and Downs, W.R., 2001, In-
• Lastly, the fact that 11 out the 17 identified and are continuously improved over time by creased sedimentation rates and grain sizes 2–4
papers are correctly classified is not sur- their human end users. Myr ago due to the influence of climate change
prising considering that none of the auto- on erosion rates: Nature, v. 410, p. 891–897,
mated components (i.e., the module that REFERENCES CITED https://doi.org/10.1038/35073504.
extracts temporal and spatial context, and Badgley, C., Smiley, T.M., Terry, R., Davis, E.B., MANUScRiPT REcEivED 16 NOv. 2021
DeSantis, L.R.G., Fox, D.L., Hopkins, S.S.B., Jez-
the research question classifier) are perfect. kova, T., Matocq, M.D., Matzke, N., McGuire, REviSED MANUScRiPT REcEivED 6 MAy 2022
However, this result emphasizes that the J.L., Mulch, M., Riddle, B.R., Roth, V.L., Samu- MANUScRiPT AccEPTED 23 MAy 2022
8 GSA TODAY | November 2022