Page 42 - i1052-5173-29-1_GSAT
P. 42

Big Data and Artificial Intelligence Analytics in Geosciences:

                                             Promises and Potential



          Roberto Spina, Geologist and DCompSci, CNG (National Council of Geologists), Rome, Italy, robertospina@geologi.it
          ABSTRACT                           can contain huge amounts of hetero-  at the beginning of ocean exploration.
            Big data and machine learning are IT   geneous, structured and unstructured data   Since then, the map has undergone few
          methodologies that are bringing substan-  (text, numerical values, images, e-mail,   changes, with at most six types of
          tial changes in the analysis and interpreta-  GPS data, and data acquired from social   sediment dominant in the ocean basins.
          tion of scientific data. By adding GPU   networks), which can be extrapolated,    The digital map was created using an AI
          processing resources to the typical equip-  analyzed, and correlated with each other.  method consisting of the support vector
          ment of a server host, it is possible to   Artificial Intelligence (AI) is a branch   machine (SVM) model. Through a cross-
          speed up queries performed on large data-  of computer science that studies the way   validation approach, the classifier was
          bases and reduce training time for deep   in which the combination of hardware and   trained by adding new data gradually so as
          learning architectures.            software systems can simulate typical   to allow its learning. Learning the param-
            A recent pairing of the big data technolo-  behaviors of the human brain. One of the   eter values, which optimize the classifier’s
          gies, applied to old and new data, and arti-  most important applications consists of    performance on withheld data, is an impor-
          ficial intelligence techniques has enabled a   a complex algorithm, called machine    tant step in the workflow. In this way, the
          team of scientists to create an interactive   learning, which is able to learn and    vast set of point data has been transformed
          virtual globe that shows a color mosaic of   make decisions.          into a continuous digital map with very
          the seabed geology. This interactive model   GPU Parallel Computing (GPGPU)   high accuracy (up to 80%).
          allows us to obtain robust reconstructions   involves the processing of data by the pro-  The new lithological map of the seabed
          and predictions of climate changes and   cessors present in the graphics card (GPU)   is very important for the interpretation of
          their impacts on the ocean environment.   and has allowed the computation, in rela-  global phenomena related to the evolution
          We suggest a possible evolution of such    tively short times, of huge amounts of data   of ocean basins. An example of this is dia-
          a model by means of the expansion of    with an efficiency of at least two orders of   toms, siliceous phytoplankton that live in
          functionalities and performance improve-  magnitude greater compared to the past.  the oceans and that through chlorophyll
          ments. We refer respectively to the imple-  There are several cases in which these   photosynthesis produce about one-quarter
          mentation of isochronic layers of seabed   technologies have been applied both in    of the oxygen present in the atmosphere,
          lithologies and the addition of GPU   the field of potential earthquakes (Rouet-  contributing to reduce global terrestrial
          resources to speed up the learning phase of   Leduc et al., 2017), volcanic eruptions   warming. At their death, these organisms
          the support vector machine (SVM) model.   (Ham et al., 2012), and to solve the prob-  precipitate through the water column,
          These additional features would allow us    lems of spatial modeling in the field of    accumulating on the underlying sea floor.
          to establish broader correlations and extract   the assessment of landslide susceptibility   Satellite surveys over the years have identi-
          additional information on large-scale    (Korup and Stolle, 2014).    fied places where diatomaceous activity is
          geological phenomena.                The following describes a mixed   more productive; that is, the marine areas
                                             approach (AI and Big Data) in the field of   in which there are the maximum concentra-
          INTRODUCTION                       geosciences—analyzing potentials and   tions of chlorophyll, considering that they
            The Earth system generates continuous   possible future developments.  should also correspond to the areas of max-
          data, and our acquisition capacity has                                imum accumulation of these organisms in
          significantly increased over time. The   CASE STUDY: BIG DATA AND AI   the sea floor. Surprisingly, the digital map
          growing availability of acquired geological   MAP WORLD’S OCEAN FLOOR  of the seabed has revealed that there is a
          data and the methods developed in the field   An example of an application combining   decoupling between the productivity of
          of information technology make it possible   Big Data and machine learning technolo-  diatoms and the corresponding accumula-
          to identify associations and understand   gies was implemented by a team of   tion areas in the sea floor. The possibility
          patterns and trends within data (Big Data),   Australian scientists who created the first   of diatom ooze formation is however
          solve difficult decision problems (artificial   digital map of seabed lithologies   favored by the low surface temperature
          intelligence), and provide acceleration to   (Dutkiewicz et al., 2015) through the analy-  (0.9–5.7 °C), by salinity (33.8–34 PSS),
          data processing (GPU computing).   sis and cataloging of ~15,000 samples of   and by the high concentration of nutrients,
            Big Data is a term that indicates very   sediments found in marine basins. Before   and therefore can represent an important
          large databases (often by order of   such a map, the most recent map of oceanic   indicator of the oceanographic variables
          zettabytes, i.e., billions of terabytes) that   lithologies was hand drawn ~40 years ago,   of the surface of the sea (Cunningham and

          GSA Today, v. 29, https://www.doi.org/10.1130/GSATG372GW.1. Copyright 2018, The Geological Society of America. CC-BY-NC.
       42 GSA Today  |  January 2019
   37   38   39   40   41   42   43   44   45   46   47