This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Benchmarks
A list of benchmarks we currently work on
Currently we are working on a number of sientific applications.
Benchmark Science Task Owner Institute Specific Benchmark Targets
CloudMask Climate Segmentation RAL link cloudmask specifics
STEMDL Material Classification ORNL link stemdl specifics
CANDLE-UNO Medicine Classification ANL link candle-uno specifics
TEvolOp Forecasting Earthquake Regression University of Virginia link tevolop specifics
1 - CANDLE-UNO
CANDLE (Exascale Deep Learning and Simulation Enabled Precision Medicine for Cancer) project aims to implement deep learning architectures that are relevant to problems in cancer.
CANDLE (Exascale Deep Learning and Simulation Enabled Precision Medicine
for Cancer) project aims to implement deep learning architectures that
are relevant to problems in cancer.
CANDLE-UNO
CANDLE (Exascale Deep Learning and Simulation Enabled Precision Medicine
for Cancer) project aims to implement deep learning architectures that
are relevant to problems in cancer. These architectures address problems
at three biological scales: cellular (Pilot1 P1), molecular (Pilot P2)
and population (Pilot3).
Pilot1 (P1) benchmarks are formed out of problems and data at the
cellular level. The high level goal of the problem behind the P1
benchmarks is to predict drug response based on molecular features of
tumor cells and drug descriptors. Pilot2 (P2) benchmarks are formed out
of problems and data at the molecular level. The high level goal of the
problem behind the P2 benchmarks is molecular dynamic simulations of
proteins involved in cancer, specifically the RAS protein. Pilot3 (P3)
benchmarks are formed out of problems and data at the population level.
The high level goal of the problem behind the P3 benchmarks is to
predict cancer recurrence in patients based on patient related data.
Uno application from Pilot1 (P1): The goal of Uno is to predict tumor
response to single and paired drugs, based on molecular features of
tumor cells across multiple data sources. Combined dose response data
contains sources: [‘CCLE’ ‘CTRP’ ‘gCSI’ ‘GDSC’ ‘NCI60’ ‘SCL’ ‘SCLC’
‘ALMANAC.FG’ ‘ALMANAC.FF’ ‘ALMANAC.1A’]. Uno implements a deep learning
architecture with 21M parameters in TensorFlow framework in Python. The
code is publicly available on
GitHub.
The script in this repository downloads all required datasets. The
primary metric to evaluate this applications is throughput (samples per
second). More details on running Uno can be found
here.
CANDLE-UNO Specific Benchmark Targets
- Scientific objective(s):
- Objective: Predictions of tumor response to drug treatments,
based on molecular features of tumor cells and drug descriptors
- Formula: Validation loss
- Score: 0.0054
- Data
- Example implementation
2 - CloudMask (Segmentation)
Estimation of sea surface temperature (SST) from space-borne sensors.
Estimation of sea surface temperature (SST) from space-borne sensors.
CloudMask (Segmentation)
Estimation of sea surface temperature (SST) from space-borne sensors,
such as satellites, is crucial for a number of applications in
environmental sciences. One of the aspects that underpins the derivation
of SST is cloud screening, which is a step that marks each and every
pixel of thousands of satellite imageries as containing cloud or clear
sky, historically performed using either thresholding or Bayesian
methods.
This benchmark focuses on using a machine learning-based model for
masking clouds, in the Sentinel-3 satellite, which carries the Sea and
Land Surface Temperature Radiometer (SLSTR) instrument. More
specifically, the benchmark operates on multispectral image data. The
example implementation is a variation of the U-Net deep neural network.
The benchmark includes two datasets of DS1-Cloud and DS2-Cloud, with
sizes of 180GB and 4.9TB, respectively. Each dataset is made up of two
parts: reflectance and brightness temperature. The reflectance is
captured across six channels with the resolution of 2400 x 3000 pixels,
and the brightness temperature is captured across three channels with
the resolution of 1200 x 1500 pixels.
CloudMask Specific Benchmark Targets
- Scientific objective(s):
- Objective: Compare the accuracy produced by the Neural Network
with the accuracy of a Bayesian method
- Formula: Weighted Binary Cross Entropy of validation dat
- Score: 0.9 for convergence
- Data
- Download: aws s3 --no-sign-request --endpoint-url
https://s3.echo.stfc.ac.uk sync s3://sciml-datasets/en/
cloud_slstr_ds1 .
- Data Size: 180GB
- Training samples: 15488
- Validation samples: 3840
- Example implementation
3 - TEvolOp Earthquake Forecasting
Forcasting Earthquakes
TEvolOp Earthquake Forecasting
Time series are seen in many scientific problems and many of them are
geospatial – functions of space and time and this benchmark illustrates
this type. Some time series have a clear spatial structure that for
example strongly relates nearby space points. The problem chosen is
termed a spatial bag where there is spatial variation but it is not
clearly linked to the geometric distance between spatial regions. In
contrast, traffic-related time series have a strong spatial structure.
We intend benchmarks that cover a broad range of problem types.
The earthquake data comes from USGS and we have chosen a 4 degrees of
Latitude (32 to 36 N) and 6 degrees of Longitude (-120 to -114) region
covering Southern California. The data runs from 1950 to the present day
and is presented as events: magnitude, ground location, depth, and time.
We have divided the data into time and space bins. The time interval is
daily but in our reference models, we accumulate this into fortnightly
data. Southern California is divided into a 40 by 60 grid of 0.1 by
0.1-degree “pixels” which corresponds roughly to squares with an 11 km
side, The dataset also includes an assignment of pixels to known faults
and a list of the largest earthquakes in that region from 1950 until
today. We have chosen various samplings of the dataset to provide both
input and predicted values. These include time ranges from a fortnight
up to 4 years. Further, we calculate summed magnitudes and depths and
counts of significant quakes (magnitude > 3.29). Other easily available
quantities are powers of quake energy (using Energy ~ 101.5m where m is
magnitude). Quantities are “Energy averaged” when there are multiple
events in a single space-time bin except for simple event counts.
Current reference models are a basic LSTM recurrent neural network and a
modification of the original science transformer. Details can be found
here,
and
here.
TEvolOp Specific Benchmark Targets
- Scientific objective(s):
- Objective: Improve the quality of Earthquake forecasting
- Formula: Normalized Nash–Sutcliffe model efficiency coefficient
(NNSE)
- Score: The NNSE lies between 0.8 and 0.99 depending on model and
predicted time series
- Data
- Download:
https://drive.google.com/drive/folders/1wz7K2R4gc78fXLNZMHcaSVfQvIpIhNPi?usp=sharing
- Data Size: 5GB from USGS
- Training samples: Data is decided spatially in an 80%-20%
fashion between training and validation. The full dataset covers
6 degrees of longitude (-114 to -120) and 4 degrees of latitude
(32 to 56) In Southern California. This is divided into 2400
spatial bins 0.1 degree (~11km) on a side
- Validation samples: Most analyses use 500 most active bins of
which 400 are training and 100 validation.
- Example implementation
Example Implementation:
The example implementation is primarily to demonstrate feasibility, show
how the data is represented, help address any interpretation
considerations, and potentially trigger initial ideas on how the
benchmark can be improved.
4 - STEMDL (Classification)
State of the art scanning transmission electron microscopes (STEM) produce focused electron beams with atomic dimensions and allow to capture diffraction patterns arising from the interaction of incident electrons with nanoscale material volumes.
State of the art scanning transmission electron microscopes (STEM)
produce focused electron beams with atomic dimensions and allow to
capture diffraction patterns arising from the interaction of incident
electrons with nanoscale material volumes.
STEMDL (Classification)
State of the art scanning transmission electron microscopes (STEM)
produce focused electron beams with atomic dimensions and allow to
capture diffraction patterns arising from the interaction of incident
electrons with nanoscale material volumes. Backing out the local atomic
structure of said materials requires compute- and time-intensive
analyses of these diffraction patterns (known as convergent beam
electron diffraction, CBED). Traditional analyses of CBED requires
iterative numerical solutions of partial differential equations and
comparison with experimental data to refine the starting material
configuration. This process is repeated anew for every newly acquired
experimental CBED pattern and/or probed material.
In this benchmark, we
used newly developed multi-GPU and multi-node electron scattering
simulation codes [1] on
the Summit supercomputer to generate CBED patterns from over 60,000
materials (solid-state materials), representing nearly every known
crystal structure. A scaled-down version of this data
[2] is used for one of the data
challenges [3]
at SMC 2020 conference, and the overarching goals are to: (1) explore
the suitability of machine learning algorithms in the advanced analysis
of CBED and (2) produce a machine learning algorithm capable of
overcoming intrinsic difficulties posed by scientific datasets.
A data sample from this data set
is given by a 3-d array formed by stacking various CBED patterns
simulated from the same material at different distinct material
projections (i.e. crystallographic orientations). Each CBED pattern is a
2-d array with float 32-bit image intensities. Associated with each data
sample in the data set is a host of material attributes or properties
which are, in principle, retrievable via analysis of this CBED stack. Of
note are (1) 200 crystal space groups out of 230 unique mathematical
discrete space groups and (2) local electron density which governs
material’s property.
This benchmark consists of 2 tasks: classification for crystal space
groups and reconstruction for local electron density, the example
implementation of which are provided in
[4]
and [5].
STEMDL Specific Benchmark Targets
- Scientific objective(s):
- Objective: Classification for crystal space groups
- Formula: F1 score on validation data
- Score: 0.9 considered converged
- Data
- Example implementation