Anda di halaman 1dari 36

SCIENTIFIC Computing solutions for

COMPUTING
scientists and engineers

WORLD October/November 2018


Issue #162

High performance computing Laboratory informatics Modelling and simulation


Scaling up capacity Big data in biomedicine Intelligent optimisation

DEEP
LEARNING,
DEEP
IMPACT?
Artificial intelligence
and HPC development

www.scientific-computing.com
Introducing the Future of
Simulation-Driven Design

The Altair Inspire™ platform is an intuitive and powerful software


technology that enables simulation-driven design throughout the
entire product development process from concept to creation.

Be Inspired at altair.com/InspirePlatform
LEADER

Contents October/November 2018 l Issue 162 Robert


Roe
Editor
High performance computing
Deep learning 4
Rob Farber looks at the impact that AI is having on HPC
hardware and application development AI boosts HPC
This issue leads with coverage
Machine learning with FPGAs 8 exploring the impact that AI and deep
Bill Jenkins looks at the role of FPGA technology learning technology is having on
in machine learning HPC development. This has been a
OpenMP 5.0 release 12 continuing theme throughout the year,
Matthijs van Waveren shares highlights of the latest as AI technology has been rising rapidly,
iteration of the parallel computing API OpenMP both in performance and the number of
applications using the technology. This
Scaling up computing capacity 14 is driving new users to start using HPC
Wim Slagter highlights the importance HPC – they would not have done so just a few
for engineering simulation years ago.
In this issue we have a broad overview
of news and updates covering the HPC
Laboratory informatics industry, starting on page 4 with Rob
Farber’s piece on the impact of AI in
computing and HPC. The theme of AI
Big data in biomedicine 18
research continues on page 8 with Intel’s
Clare Sansom looks at the changing ways that scientists
Bill Jenkins, who discusses the use of
can use data
FPGAs in machine learning research.
AI advances healthcare research 21 On page 12 we have an update
Sophia Ktori explores the role of AI and deep learning on the OpenMP specification, which
in healthcare will be announced formally at SC18.
Finally, we have an article looking at the
Data ecosystems in the cloud 24
creation of a free benchmark tool that
Faisal Mushtaq explains the role of cloud informatics in overcoming
demonstrates performance increase on
the challenges associated with modern pharmaceutical R&D
a user’s model if they were to move to a
HPC infrastructure.
The laboratory informatics coverage
Modelling and simulation starts on page 18, with an article from
Clare Sansom on the use of big data
Decentralised systems buck data-sharing trend 26 in biomedicine. On page 21 Sophia
Adrian Giordani reports on the use of blockchain Ktori explores the use of AI technology
in the automotive industry in healthcare in the first of a two-part
series.
Intelligent optimisation 30 We also have an article from Thermo’s
Gemma Church looks at the role of optimsation Faisal Mushtaq, which looks at the role
software in modern engineering practices of cloud-based informatics software and
Suppliers directory 34 its use in modern pharmaceutical R&D.
Modelling and simulation coverage
begins with a feature from Adrian
Giordani, on page 26, which highlights
Editorial and administrative team Subscriptions: Free registration is available to qualifying individuals. the use of blockchain in the automotive
Editor: Robert Roe Register online at www.scientific-computing.com Subscriptions £180 a
year for six issues to readers outside registration requirements. Single industry, with a particular focus on
robert.roe@europascience.com issue £20. Orders to ESL, SCW Circulation, 4 Signet Court, Swann Road,
Managing editor: Tim Gillett Cambridge, CB5 8LA, UK. Tel: +44 (0)1223 221 030. Fax: +44 (0)1223 213 autonomous vehicles. On page 30,
385. ©2018 Europa Science Ltd.
editor.scw@europascience.com Gemma Church looks at the use of
Specialist reporters: Sophia Ktori, Clare Sansom, Whilst every care has been taken in the compilation of this magazine,
Design: David Houghton, Zoë Andrews errors or omissions are not the responsibility of the publishers or of optimisation in modern engineering
the editorial staff. Opinions expressed are not necessarily those of
Advertising team
the publishers or editorial staff. All rights reserved. Unless specifically
stated, goods or services mentioned are not formally endorsed by
simulation.
Advertising manager: Mike Nelson Europa Science Ltd, which does not guarantee or endorse or accept any
liability for any goods and/or services featured in this publication.
mike.nelson@europascience.com +44 (0) 1223 221039
Production manager: David Houghton US copies: Scientific Computing World (ISSN 1356-7853/USPS No
@scwmagazine
david.houghton@europascience.com +44 (0) 1223 221034 018-753) is published bi-monthly for £180 per year by Europa Science
Ltd, and distributed in the USA by DSW, 75 Aberdeen Rd, Emigsville PA
17318-0437. Periodicals postage paid at Emigsville PA. Postmaster:
Corporate team Send address corrections to Scientific Computing World PO Box 437,

SCIENTIFIC
Emigsville, PA 17318-0437.
Managing director: Warren Clark

COMPUTING
Cover: Immersion Imagery/Shutterstock.com
Scientific Computing World is published by Europa Science Ltd,

WORLD
4 Signet Court, Cambridge, CB5 8LA l ISSN 1744-8026 
Tel: +44 (0) 1223 211170 l Fax: +44 (0) 1223 213385  Subscribe for free at
Web: www.researchinformation.info www.scientific-computing.com/subscribe

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 3


HIGH PERFORMANCE COMPUTING

The impact of AI

Overall, machine learning is changing intensive preparation of clean and relevant


ROB FARBER CONSIDERS how scientists perform research and training sets, as neuromorphic computing
THE EFFECT THAT AI IS interact with data, with remarkable systems can identify their own training
HAVING ON HPC HARDWARE research efforts bringing AI technology sets. Overall, big data, extreme computing,
AND APPLICATION to exascale computation, high-energy and machine learning are causing us to
DEVELOPMENT physics (HEP) , materials design, climate rethink the role of a supercomputer.
simulation, and more. At the moment, HPC modelling and
Don’t be misled that the impact is simulation is experiencing a revolutionary
limited to high-profile research projects change in mindset as ANNs are now
or big computer centres. The use of AI being incorporated directly into strict
in HPC permeates everything HPC, from physics-based codes. More specifically,
Just as technology changes in research and software, to hardware design new algorithmic efforts at high-profile
the personal computer market technology, including the use of data flow institutions like CERN use GANs
brought about a revolution in the hardware such as FPGAs, ASICs, together (generative adversarial networks), which
design and implementation of the systems with the impact on more ‘traditional’ have been shown to bring orders-of-
and algorithms used in high performance hardware like CPUs and GPUs (graphics magnitude increased performance to
computing (HPC), so are recent processing units). Along with Feynman’s physics-based modeling and simulation,
technology changes in machine learning quantum conjecture, researchers believe while preserving accuracy and without
bringing about an AI revolution in the HPC that machine learning maps well to introducing non-physical effects. More
community. Expressed as a single word, quantum hardware. indirectly, reduced precision support for
one can describe the extent of the impact Looking to the future, hardware AI is redefining the numerical and matrix
of AI on HPC as ‘everywhere’. implementations of neuromorphic methods that are core to HPC .
Deep learning, in particular, has been computing promise to redefine edge Everywhere you look, AI is being
revolutionising visual recognition tasks computing, using sensors and Internet of incorporated by researchers into new
that previously could only be performed Things (IoT) due to orders of magnitude hardware, techniques and technologies.
by humans. It is truly remarkable that that will increase efficiency.
these deep types of artificial neural Just as DNNs brought artificial neural Same bandwagon – computation driven
networks (DNNs) are able to match and networks (ANNs) into the mainstream, so adoption
even outperform human abilities on a may neuromorphic computing multiply For a period of roughly 10 years
wide variety of visual tasks. The scientific that impact through ubiquity and the starting in the mid-1980s, the field of
applications are being investigated by removal of humans from the data- machine learning exploded with the
scientists and HPC centres worldwide, advent of nonlinear neural networks,
using some of the largest supercomputers backpropagation, the work of John
in the world.
Reflecting this trend, the Argonne
“This work demonstrates Holland in genetic algorithms, work in
Hidden Markov Models, and precursor
Leadership Computing Facility (ALCF) the viability of machine work for neuromorphic computing
continues its effort to create an learning models performed by Carver Mead.

in physics-based
environment that supports traditional The field stagnated for a number of
simulation-based research, as well as years due to a lack of computational
emerging data science and machine simulations” power, coupled with promises made by
learning approaches in preparation
for Aurora, the first US exascale
Federico Carminati, many who entered the field but did not
fully understand the limitations of the
supercomputer. project coordinator, CERN technology. As a result of many broken g

4 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


HIGH PERFORMANCE COMPUTING

ShadeDesign/Shutterstock.com

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 5


HIGH PERFORMANCE COMPUTING

g promises, investment and funding slowed Traditonal


to a trickle. HPC
The field reemerged in the early 2000s systems
with the advent of DNNs and their ability
to perform commercially viable image CORAL
recognition. Further, these DNNs could Scalable data Supercomputers
be trained in reasonable amounts of Large-scale analytics and exascale
numerical systems
time, due to the significant increase in
computational power that modern CPUs simulation
and GPUs could provide.
Currently, people refer collectively
to all these technologies as AI, but it is
important to heed the warnings from the Deep
past and not over-promise. In fact, the use learning
of ‘AI’ is in itself misleading, as is the use

ANLi
of the word ‘learning’ for ANNs, although
both are commonly used. For example,
it has been known since the 1980s that
ANNs do not ‘learn’ or ‘think’, but rather fit
multidimensional surfaces which are used
for inferencing.  DOE objective: drive integration of simulation, data analytics and machine learning

New analytic techniques


Two highly impactful projects at CERN and
the ALCF demonstrate the power of AI as “Our motivation is to Learning Division is developing a service
a new analytic technique.
An award-winning effort at CERN
create increasingly rich that will make it easier for users to access,
share, analyse, and reuse both large-scale
has demonstrated the adoption and data services, so people datasets and associated data analysis and
the ability of AI-based models to act as don’t just come to the learning methods. The service leverages
orders-of-magnitude-faster replacements
for computationally expensive tasks in ALCF for simulation, but well-known tools including Globus for
research data management and Argonne’s
simulation. More specifically, the team for simulation and data- Petrel storage system.
demonstrated that energy showers
detected by calorimeters can be
centric activities,” ‘Our motivation,’ Ian Foster (Argonne
Data Science and Learning Division
interpreted as a 3D image. Adapting Ian Foster, Argonne Data director and Distinguished Fellow)
existing deep learning techniques, they
then decided to train a GAN to act as a
Science and Learning explains, ‘is to create increasingly rich data
services, so people don’t just come to the
replacement for the expensive Monte division director and ALCF for simulation but for simulation and
Carlo methods used in HEP simulations. distinguished fellow data-centric activities.’
During validation, they observed a The ALCF is a US Department of Energy
‘remarkable’ agreement between the Office of Science user facility.
images from the GAN generator and the The CERN work reflects a huge change
Monte Carlo images, along with orders- in thinking in the physics and modeling Indirect impacts –
of-magnitude-faster runtime. The CERN simulation community – a change in a huge change in mindset
team points out that the impact for High- mindset that can benefit others in a wide The prevalence of AI-optimised hardware,
Energy Physics (HEP) scientists could range of fields realise similar orders- in particular the reduced-precision
be substantial as, ‘currently, most of the of-magnitude-faster speedups for convolutions used in image recognition
LHC’s worldwide distributed CPU budget computationally expensive simulation that have been added to CPUs and GPUs
– in the range of half a million CPU-years tasks. plus expanding support for FPGAs, has
equivalent – is dedicated to simulation’. The work is part of a CERN openlab given the HPC community new hardware
Dr Federico Carminati, CERN’s project in collaboration with Intel capabilities to exploit for increased library
project coordinator, explains: ‘This work Corporation, who partially funded the and application performance – including
demonstrates the potential of machine endeavor through the Intel Parallel new optimised versions of the workhorse
learning models in physics-based Computing Center (IPCC) programme. BLAS linear algebra libraries.
simulations’. At Argonne National Laboratory,
Previously, AI models were excluded researchers are exploring ways to improve Reduced precision maths libraries
because it was not possible for the data collaboration and eliminate barriers to Of extreme interest to the HPC community
scientist to guarantee the data-derived AI using next-generation exascale systems is work on optimised batched and reduced
model would not introduce non-physical like Aurora. However, the Argonne effort precision BLAS libraries.
artifacts. As CERN’s Dr Sofia Vallecorsa is but one example of a number of efforts Over the past few decades there has
points out, ‘the data distributions coming being explored to help scientists exploit been a tremendous amount of work
from the trained machine learning model the convergence of big data, machine performed by vendors and the HPC
are effectively indistinguishable from real learning and extreme-scale HPC. community to create optimised libraries
and simulated data,’ so why not use the In particular, a team of researchers from that implement the BLAS specification.
faster computational version? the ALCF and Argonne’s Data Science and The reason is, as Jack Dongarra noted

6 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


HIGH PERFORMANCE COMPUTING

in his ISC18 talk on NLAFET that, ecosystem of numerical tools. In among colleagues and between
‘Linear algebra is both fundamental and particular, the HiCMA linear algebra library disciplines, the potential for discovery
ubiquitous in computational science and can operate on billion by billion matrices, through the pragmatic application of
its vast application areas’. So many HPC in workstations containing only gigabytes new computational resources, coupled
applications heavily rely on BLAS that of memory. with unrestrained data flow, staggers the
even a small performance increment can imagination.’
translate into huge savings in runtime by Data flow architectures From new hardware to new approaches,
the aggregate HPC community. New ideas, numerical methods, and the true impact of deep and machine
The parallel Numerical Linear Algebra for programmable hardware devices such learning on HPC is yet to be seen.
Future Extreme Scale Systems (NLAFET) is as FPGAs opens the door to data flow In short, the biggest impact is not
a high-profile example where methods and architectures. Cloud providers such technological, but rather a change in
hardware popularised for AI applications as Microsoft Azure are already using mindset that is stimulating innovation
– including DAGs (directed acyclic graphs) persistent neural networks for inference. and new approaches to decade’s old
and reduced precision hardware, are In this case, the ANN is implemented technologies and problems. Thus, we are
used to map BLAS operations to various directly on the FPGA – there is no program at the start of a point of inflection brought
hardware platforms including CPUs, GPUs counter or executable like a conventional about by the popularity of AI hardware
and FPGAs. Similarly, Alexander Heinecke CPU or GPU. Instead, data ‘flows’ through that a host of bright and innovative
(research scientist, Intel Labs) has created the computational elements on the FPGA scientists are exploiting to bring about
the open-source libXSMM library that to produce the desired output. The result the convergence of AI and HPC. The
can speed batched small linear algebra is high performance, low latency and low ramifications are difficult to predict but will
operations, including techniques that use power consumption. be extraordinarily fun to see.
reduced-precision arithmetic operators, While there is speculation about the
while still preserving accuracy. adoption of non-von Neumann data A fully referenced version of this feature is
The advent of optimised reduced- flow architectures in HPC and exascale available online
precision hardware for AI has brought supercomputers, it is clear that scientists
the question of ‘how much numerical are currently laying the groundwork
precision is enough?’ to the attention for the use of data flow architectures . Rob Farber was a pioneer in the field of neural
of computer scientists and is in some This is a natural merging of the use of networks while on staff as a scientist in the
theoretical division at Los Alamos National
sense motivating the development of new DAGs in projects such as NLAFET and Laboratory. He works with companies and national
numerical approaches, such as the efforts programmable hardware such as FPGAs. laboratories as a consultant, and also teaches about
at King Abdullah University of Science and As the ALCF posted, ‘Because current HPC and AI technology worldwide. Rob can be
reached at info@techenablement.com
Technology (KAUST) to build an enhanced innovation is driven by collaboration

Visit the PRACE booth #2033 at SC18 from 12 to 15 November 2018. More information: www.prace-ri.eu/praceatsc18

SHAPE 8th CALL IS OPEN


1 OCTOBER – 1 DECEMBER 2018
Over 40 SMEs have already taken
advantage of the opportunities
opened up by being part of SHAPE
DON’T GET LEFT BEHIND!

YOU ARE: SUCCESSFUL APPLICANTS TO


Keen to enhance your software via more powerful computers THE SHAPE PROGRAMME GET:
A European Small to Medium-sized Enterprise
Individual support from a high performance computing expert
Interested in increasing your competitiveness Effort to develop and enhance your applications
SHAPE is an opportunity you should not miss Compute time on some of Europe’s most powerful computing systems

INTERESTED? TO IMPROVE YOUR:


Contact the SHAPE team at: shape@prace-ri.eu Time to solution Product quality Service innovation
or visit: www.prace-ri.eu/SHAPE

The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels.
The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and
researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are
provided by 5 PRACE members (BSC representing Spain, CINECA representing Italy, ETH Zurich/CSCS representing Switzerland, GCS
representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU’s Horizon
2020 Research and Innovation Programme (2014-2020) under grant agreement 730913. For more information, see www.prace-ri.eu Partnership for Advanced Computing in Europe
HIGH PERFORMANCE COMPUTING

Exploring machine
learning with FPGAs
BILL JENKINS, OF INTEL
PROGRAMMABLE SYSTEMS
GROUP, LOOKS AT THE ROLE
OF FPGA TECHNOLOGY IN
MACHINE LEARNING

Machine learning is not a new


idea: computer scientists have
been developing algorithms
based on models of the way in which the
human brain works for decades. But if
you’ve recently been tagged in a photo on
social media, targeted by ads on an
e-commerce site, or curious about just
how autonomous vehicles are going to
drive themselves, then you’ll know that
machine learning is rapidly moving out of
the lab and into the real world.
This is driving a period of feverish
development in machine learning, within
which field-programmable gate arrays
(FPGAs) are already taking a starring role.
One approach to machine learning that
is currently getting a lot of attention is the  Figure 1: Intel PSG FPGAs offer fine-grained and low latency between compute and memory
convolutional neural network (CNN), widely
used in image-processing tasks such as example, cats have been tagged. Each done with much less processing power
object detection and classification. neuron ’decides’ on its output values and, often, using much less precise
A CNN is loosely based on the way that based on the input image, passing those number representations.
human vision works, and so is made up of values on as inputs to neurons lower down Object detection and classification is
artificial neurons, each with multiple inputs the network’s layers. The final layer of the such a useful skill that there’s an annual
and outputs. Each neuron takes its inputs, network delivers the network’s verdict. competition, the ImageNet Large Scale
applies a transform to them, and delivers Then, crucially, the error between the Visual Recognition Challenge, to test
an output value. The value of each input network’s decision and the right answer is the accuracy of new algorithms using
signal can be modified by a weighting calculated and used to propagate a series a standard dataset of tagged images.
factor, and the value that each neuron of adjustments to the weighting and bias This closely fought contest has seen
needs to reach before it will deliver an factors for each neuron back through each rapid improvements in accuracy using
output can be altered by a biasing factor. layer of the network, from the output to new algorithms or fresh adaptations of g
The neurons are arranged into a series the input.
of layers: an input layer that takes in the
raw data, an output layer that delivers
Repeat this process often enough,
with a large enough dataset of accurately
“One approach to machine
the network’s conclusion, and multiple tagged images, and the network will learning that is currently
‘hidden’ layers in between. These hidden
layers create intermediate results that help
adapt its weights and biases so that it can
‘recognise’, in this example, cats within an
getting a lot of attention
the network formulate a conclusion to the untagged image. is the convolutional
question it has been trained to answer, This training process is computationally neural network (CNN),
widely used in image-
such as ‘Is there a cat in this image?’ intensive – think days or weeks of
An untrained neural network begins dedicated computational effort in large
with random values for the weights and data centres. However, once the right processing tasks such
biasing factors attributed to each neuron,
and is trained to give the right answer by
set of weights and biases have emerged
from the training process, applying them,
as object detection and
being shown many images in which, for a process known as inferencing, can be classification”

8 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


AN F
CE
ND S O
TE AR
AT YE
SC 30

Does your HPC deliver NAG HPC Consulting brings expertise,


experience, and impartiality to help
the best impact at the you deliver the best HPC Service,
best cost? evaluate technology options, acquire
HPC systems, make your applications
www.nag.com go faster, understand your competitive
position, train users and managers,
and ensure best impact at best cost.

NAG is a
No other consulting group in the UK, USA or worldwide has the experience,
WHPC Chapter expertise, independence and track record that NAG brings to you

COMPUTING
INSIGHT UK 2018

12-13 DECEMBER 2018 SHOWCASE


YOUR
MANCHESTER, UK BUSINESS

EXHIBITORS WELCOME
Want a presence at the UK conference for
HPC, big data analytics and cognitive technologies?
Stand spaces are allocated on a first come, first
served basis - so snap yours up quick!

www.scd.stfc.ac.uk/ciuk
@CompInsightUK #ciuk
HIGH PERFORMANCE COMPUTING

g existing approaches over the past five


years. There’s similarly rapid progress in
other machine-learning fields, such as
autonomous driving, and the semantic
analysis of text to enable services such as
online chatbots.

The FPGA advantage


A lot of machine-learning work to date
has been done using CPUs, or GPUs
whose architectures happen to match the
computational requirements of algorithms
such as CNNs, so where do FPGAs fit?
The answer is that FPGAs have a unique
set of attributes that give them particular
advantages for the implementation of
machine-learning algorithms.
 Intel programmable
The first of these is the flexibility that acceleration card (PAC) with
comes with the fact that FPGAs are simply Intel Stratix 10 SX FPGA
reconfigurable hardware. This means
that as algorithms change, so can the
hardware. It also means that FPGA users
can control the whole datapath of their FPGA also promises low latency and high (tensors in Google’s terms, weighted
application, from input through processing throughput, essential characteristics in values in ours) that are communicated
to output. This sounds obvious, but applications such as advanced driver between them.
many comparisons of machine-learning assistance systems. And the performance Intel PSG is implementing some of the
performance look at how fast a CPU runs and programmability of an FPGA’s I/O key primitives used in common machine-
an algorithm, rather than how fast the capabilities also makes it easier to learning algorithms so that developers
algorithm works in its systemic context. integrate the devices into a system, and who are used to working at the level of
An FPGA implementation, however, can adapt the implementation for different abstraction of the TensorFlow library,
be programmed to achieve the greatest markets or evolving I/O standards. It would can also do so when FPGAs are the
systemic performance – and will deliver it be possible, for example, to use FPGAs to implementation target. One way in which
in a highly deterministic way, unlike a CPU- apply machine-learning strategies to pre- this is delivered is as a Deep Learning
based approach subject to interrupts. process data as it came off disc storage in Accelerator, which enables users to
The structure of FPGAs is also a a large data centre, before it even reached implement the key primitives on an FPGA
good match for many machine-learning the server farm. in such a way that they can then configure
algorithms, with highly distributed logic various network topologies within the
resources, rich interconnect schemes Applying FPGAs to machine learning FPGA, without having to reprogram its
and plenty of distributed local memory. Many developers working on machine hardware.
This is important because neural network learning are used to implementing their If this is too constricting, Intel
implementations usually involve many algorithms by writing software in relatively PSG is also implementing a software
distributed computations to mimic high-level languages and then running development kit for OpenCL, a common
the neurons, lots of local storage to it on CPUs or GPUs, with compilers and platform for parallel programming various
hold intermediate values, and rich related tools doing the job of parallelising types of processor to work together, so
interconnections to pass the outputs the task across multiple processor that users can customise and extend the
of one layer of neurons to the inputs of execution threads, cores or even CPUs. facilities of the Deep Learning Accelerator.
the next, or among neurons in the same Making the move to running algorithms They might do this, for example, by
layer. This improves performance and directly on hardware, even if it is changing primitives or adding custom
reduces power consumption by cutting reprogrammable, may seem alien to their accelerators. The solution is available
the amount of off-chip memory accesses software-centric outlook. today for Intel PSG’s Arria 10 PCIe cards.
necessary to implement machine-learning This needn’t be so. The history of Interest in machine learning is growing
algorithms. software development has been one of very rapidly at the moment. Although
The highly parallel hardware of an a steady rise in the level of abstraction some very successful approaches have
at which developers have expressed emerged over the past five years, it is
themselves. This is also true in the field clear that there’s still room for enormous
“FPGAs have a unique of machine learning: Google, for example, amounts of innovation in both algorithms

set of attributes that


has released TensorFlow, an open- and implementations. FPGAs can bring
source software library for ‘numerical the advantages of dedicated hardware
give them particular computation using data-flow graphs’, in to machine-learning developers, as well
advantages for the other words, neural networks and related
algorithms. Nodes in the graph represent
as offering a flexible path to efficient
systemic implementations once an
implementation of mathematical operations, such as the algorithm has been finalised.
machine-learning transfer functions of the neurons we
described above, while the graph edges Bill Jenkins is a senior product specialist in artificial
algorithms” represent the multidimensional data arrays
intelligence at Intel Programmable Systems group

10 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


ISC
HIGH
PERFORMANCE
2019 FUELING
INNOVATION

CALL FOR ACTION

RESEARCH PAPERS JUNE 16 – 20

2019
Submission Deadline
December 12, 2018

TUTORIALS
Submission Deadline FRANKFURT, GERMANY
February 6, 2019

WORKSHOPS
Submission Deadline
February 13, 2019 SAVE THE DATE!

Platinum Sponsors:

Next-Generation High Performance Components | Exascale Systems | Extreme-Scale Applications | HPC and Advanced
Environmental Engineering Projects | Parallel Ray Tracing -- Visualization at its Best | Blockchain Technology and
Cryptocurrency | Parallel Processing in Life Science | Quantum Computers / Computing | What´s New with Cloud Computing
for HPC | Parallel Programming Models for Extreme-Scale Computing | Workflow Management | Machine Learning and
Big Data Analytics | Deep Learning and HPC – State of the Art

isc-hpc.com
HIGH PERFORMANCE COMPUTING

OPENMP 5.0 is changing


the landscape
Matthijs van Waveren, coordinator at the OpenMP ARB, shares
highlights of the latest iteration of the parallel computing API

Arbi Studio/Shutterstock.com
as users, jointly developed
version 5.0 of the OpenMP
specification to fulfil these
requests. In addition to
The OpenMP API is an several minor and major
application programming improvements, the updated
interface (API) that gives specification includes the
parallel programmers a following features:
simple and flexible interface Full support for accelerator
for developing parallel devices. With the latest
applications. The OpenMP additions, the OpenMP
community has made a specification has full support
multitude of requests since the for accelerator devices.
OpenMP language introduced These include mechanisms to
version 4.5 in 2015, and as a require unified shared memory
result, OpenMP 5.0 adds many between hosts system and
new features that will be useful coprocessor devices, the
for highly parallel and complex ability to use device-specific
applications. With version function implementations,
5.0, the OpenMP API now better control of implicit data ”The OpenMP extended with OpenMP 5.0
covers the whole hardware
spectrum from embedded
mapping, and the ability to
override device offload at
community has features. More information can

made a multitude
be found on the Resources
systems and accelerator runtime. Reverse offload, page of the OpenMP website,
devices to multicore systems implicit function generation, of requests since where you can also find links
and shared-memory systems.
Vendors have started releasing
and the ability to easily
copy object-oriented data the OpenMP for OpenMP benchmarks, and
OpenMP research projects.
reference implementations of structures are also supported. language The OpenMP YouTube
parts of the standard, and user
courses will soon be given at
Improved debugging
and performance analysis.
introduced version channel is a great place to find
education videos, from basic
OpenMP workshops and major Two new tools interfaces 4.5 in 2015” entry-level videos to advanced
conferences. The OpenMP enabling intuitive debugging videos treating the different
specification version 5.0 has and support for deeper OpenMP 5.0 features. On the
been released at SC18. performance analysis. a new meta-directive for OpenMP website, you find links
OpenMP users are in a Support for the latest performance portability by to tutorials. Advanced OpenMP
wide range of fields, from versions of C, C++, and compile-time adaptation of courses treating elements
automotive and aeronautics Fortran. The OpenMP API now OpenMP pragmas. of OpenMP 5.0 are being
to biotech, automation, supports important features of The specifications can be given at OpenMP workshops
robotics, and financial analysis. Fortran 2008, C11, C++14, and downloaded from the OpenMP and the SC conferences
There were user requests C++17. website, and it is possible to (SC18 & SC19). The OpenMP
to bring OpenMP to the Multilevel memory participate in the discussions workshops, where courses on
embedded system space systems. Memory allocation on the OpenMP Forum! OpenMP 5.0 can be followed,
and the accelerator space. mechanisms are available are the International Workshop
Also, there was an urgent that place data in different Implementations on OpenMP (IWOMP), the
need to bring OpenMP to the types of memory, such as The major vendors have OpenMP Users’ Conference
latest levels of the C, C++, high-bandwidth. New OpenMP implemented parts of the (OpenMPCon) and the UK
and Fortran standards, and to features also make it easier OpenMP 5.0 specification in OpenMP Users’ Conference.
have a standard interface for to deal with the NUMAness of their compiler products. GNU Once OpenMP 5.0 has been
debugging and performance modern HPC systems. is the furthest along, with released, the basic OpenMP
tools. Finally, there were Greater memory flexibility. their implementation of GCC. courses given at universities
user requests for improved Support for acquire/release They plan to have quite a few and other venues will be
portability and usability. semantics to optimise low- features in the next release of updated. In addition, a guide
The OpenMP ARB, a group level memory synchronisation. GCC, viz GCC 9. In addition, the with OpenMP examples is
of major computer hardware Enhanced portability. debugging and performance available for download from
and software vendors, as well Declare variant directive and tools of the vendors are being the OpenMP website.

12 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


REGISTER TODAY! Enjoy substantial savings by joining
ACM, IEEE, or SIGHPC during your registration process. sc18.supercomputing.org

  
Discover a broad program Explore exhibits showcasing Enjoy several days in Dallas,
of HPC-focused technical the latest HPC technologies one of the country’s top tech
presentations, papers, workshops, from the world’s leading cities, and home to great
informative tutorials, timely vendors, universities, and amenities, food, and nightlife.
research posters, and more. research organizations. Bring the family!

November 11–16, 2018


The International Conference for High Performance
Computing, Networking, Storage, and Analysis

Sponsored by:
HIGH PERFORMANCE COMPUTING

Scaling up computing capacity


Wim Slagter highlights the importance of HPC for engineering
simulation – whether on premise or in the cloud

Master3D/Shutterstock.com
significantly sped up by using more CPU ‘Engineers are running bigger models,
cores.’ bigger in terms of size and complexity, and
Slagter noted that the concept came they also have to run an increasing number
from a survey of more than 1,800 Ansys of design variants to ensure product
Ansys has developed a free benchmark users. From the survey Ansys found integrity and robustness. But customers
tool that allows users to test their own that ‘customers said how often they are and engineers are compute bound and
simulation model on a small HPC cluster to constrained by turnaround limitations. constrained by their compute capacity.
demonstrate the benefits of scaling up an It was striking to me that a really large That is why we have established this
organisation’s computing infrastructure. percentage of the respondents limit the free benchmark programme. Customers
This tool is aimed at driving adoption of size or amount of detail for nearly every wonder about the performance of their
HPC resources through HPC appliances simulation model,’ added Slagter. simulation model. No matter how many
and cloud-based solutions that Ansys According to the survey, 40 per cent HPC benchmarks we produce, engineers
offers in collaboration with its partners. of respondents limit detail in simulation still want to know the performance of their
This can benefit organisations that require models due to time constraints. The own model,’ Slagter stated.
HPC but do not have the time or inclination survey also reflects that, in many cases, While limiting a model’s size or scale
to set up, configure and operate an limiting the size or amount of detail can can help to reduce the simulation
in-house cluster. result in lower-fidelity results less useful to time, engineering organisations must
‘I personally have never come across respondents’ design experiments. compete in highly competitive markets
an engineer yet who does not want more which require a careful balance between
compute power. But many of them want performance, innovation and time-to-
to see proof that using more CPU cores “According to the survey market for new products or components.
on a more powerful machine is worth the
investment,’ said Wim Slagter, director of
of customers, 40 per The benchmark tool allows users to look
at their own workloads to see how their
HPC and cloud alliances at Ansys. cent of the more than current projects could be accelerated
‘Engineers need to convince their 1,800 respondents through the application of HPC resources,
boss, and possibly the purchaser of the
organisation, and they can now get that limit the size or amount either on premise or in the cloud.
If engineers have to limit the size of a
proof for free through our benchmark of detail in simulation model ‘then they are almost wasting their
program,’ said Slagter. ‘It was designed
as an easy way for an engineer to see
models because of time time because they have to change and
re-mesh the model in order to be able
proof that their own Ansys model can be constraints” to squeeze it onto the machine or to get g

14 Scientific Computing World August/September 2018 @scwmagazine | www.scientific-computing.com


Bigger, Better, Faster Simulations:
HPC TECHNOLOGY LEADERSHIP
Intense focus on HPC software development and strategic
partnerships enables breakthrough productivity on current
and emerging HPC systems - on premise and in the cloud.

To learn more visit ansys.com/hpc

ANS107
HIGH PERFORMANCE COMPUTING

g acceptable run times,’ said Slagter. ‘This

DRN Studio/Shutterstock.com
is not what they want. They clearly want to
solve their engineering problem and not
unnecessarily to spend time adapting the
model to suit the computing resources
available.’

Selecting the right technology


Once the performance benchmark
has been run on a user’s model, Ansys
engineers will produce a report that
highlights performance on a small
HPC cluster compared to their existing
infrastructure and simulation timings over
a number of CPU cores.
‘We want to demonstrate that, if they
were to run their model on a more powerful
machine, you can get time savings in a
certain order of magnitude. On average we
have shown a six-times shorter runtime
using a small cluster, when compared to a
customer’s current configuration. This is
across the board, across different models
that we have received so far,’ said Slagter.
With this jump in performance, an
organisation ‘can increase their simulation “This allows the and manage an HPC cluster, the barrier to
throughput, tack on more complex
models, larger models with more complex
customers to stay more entry can be too large, which discourages
organisations from taking the first step.
physics, or create more design variants focussed on their core
to demonstrate the performance or make engineering competence HPC out of the box

rather than spending


product trade-off studies. For enterprises that prefer an on premise
The benefits of using HPC for simulation solution, Ansys has partnered with Hewlett
are well understood, yet the technology time on IT” Packard Enterprise (HPE), to offer an out-
has not been adopted by some of-the box solution for users that want
engineering organisations. This can be a fully managed cluster, optimised for
for a number of reasons but generally it This relates not only to flexibility on Ansys software and pre-configured with
comes down to two primary factors; lack compute capacity, but also the ability to Ansys engineering and job management
of expertise or a reluctance to invest in a scale the number of software licenses an software.
new technology. organisation is using. This system has been designed to shift
In Slagter’s experience, it is often that Cloud hosting offers a platform to simulation jobs off local workstations
companies lack the expertise or the staff expand computing capacity or a way of to a central cluster resource, enabling
to set up and manage an HPC cluster. trying HPC on-demand before bringing users to increase resource utilisation. The
‘Proper sizing of an HPC cluster involves a cluster in-house. It can also be used to appliance reduces the time and cost of
more than just choosing the processor, provide burst capacity for users with an acquisition and aims to make it easier to
core count or memory. People need to existing HPC system. adopt and maintain HPC, even if you don’t
consider storage, remote visualisation, ‘We have developed an ecosystem of have IT support and resources. Ansys
job scheduling and possibly workload cloud-hosting partners such as Rescale, offers several different appliances for
management software, and that is not Gompute (Gridcore) and Ubercloud, which different application workloads, with each
what these engineers want to do,’ said provide HPC infrastructure and IT services optimised for a specific set of workflows.
Slagter. for customers,’ said Slagter. ‘This allows ‘Not everybody is ready for cloud
‘We do not want to put a burden on their the customers to stay more focussed on computing yet, but we want to give people
IT department either. We propose two their core engineering competence, rather the choice. Either they choose to adopt
options: one is an on-premise deployment than spending time on IT. Many of those cloud or continue to invest in on premise
option and the other is off-premise engineer organisations lack good IT staff hardware.
deployment in the cloud,’ said Slagter. or even an IT department. Sometimes ‘We need to make it much simpler to
they have to manage the computer acquire a system and to reduce the risk
HPC in the cloud themselves.’ of acquiring a new cluster that may not
Ansys has developed cloud-based These products – developed by Ansys be balanced or optimised for engineering
solutions with a number of cloud-hosting partners – offer a solution to a challenge workloads,’ added Slagter.
partners who provide HPC infrastructure that faces many HPC users. While large ‘Simulation is a different application,
and IT services to cater for either automotive and aerospace companies a more computationally demanding
burst capacity, or as an extension of have been using HPC simulation for application than any other enterprise
an organisation’s in-house computing some time, many smaller enterprises application. Even different Ansys products
capacity. Slagter noted that flexibility can are much less experienced. Without the have different requirements for hardware,’
be a key factor when talking to customers. infrastructure and expertise to configure Slagter concluded.

16 Scientific Computing World August/September 2018 @scwmagazine | www.scientific-computing.com


VIEW
White Papers FOR *
now available online FREE

Data integrity: Audit tails with ease of review Ultra-Fast, High-Fidelity Computational Fluid
By Thermo Fischer Scientific Dynamics on GPUs for Automotive Aerodynamics
By Altair
Learn about the Thermo Scientific™ Chromeleon™ CDS
audit trail controls, the regulatory requirements and Altair ultraFluidX™ is a simulation tool for ultra-fast
guidance that they pertain to, and how Chromeleon CDS prediction of the aerodynamic properties of passenger
will ease the review of audit trails. and heavy-duty vehicles as well as for the evaluation of
building and environmental aerodynamics.

Enabling data integrity from drug discovery


through manufacturing
By Thermo Fisher Scientific How the Internet of Things Validates
the Lab of the Future
There’s no aspect of quality control quite as complex as
By BIOVIA
keeping up with regulatory guidelines.
Imagine a laboratory in the future where all devices and
instruments communicate their status, activities and data
with each other and with enterprise information systems.
ADAS Simulation Under Severe Vibrations Data would be acquired without manual intervention.
By Altair

Automotive radars are becoming standard equipment on


vehicles, with several antenna architectures being used
to cover the different safety functions in complex chassis
environments and where the side effects become more
significant on radar performance.

*Registration required SCIENTIFIC


COMPUTING
www.scientific-computing.com/white-papers WORLD
LABORATORY INFORMATICS

Big data opens


new avenues
for genomics
research
enough to be reading this, you probably relatives within a few years gives an idea
CLARE SANSOM LOOKS own at least one device with a hard drive of what is now possible. The raw sequence
AT THE IMPACT BIG DATA with a capacity of at least 1TB. You could data for a single genome will occupy about
IS HAVING ON GENOMICS fit one trillion virtual wolf bones, or, only 30GB (30 x 109 bytes) of storage, and the
RESEARCH just less implausibly, almost 20,000 copies processed data about 1GB, so you could
of the complete works of Shakespeare theoretically fit a thousand on your home
onto such a hard drive. hard drive (if with little space for anything
Scientific data is a few orders of else).
magnitude further on. Within the sciences, About a third of that first human genome
particle physics is at the top of the was sequenced at the Wellcome Trust
data league; the data centre at CERN, Sanger Institute, south of Cambridge.
What was the earliest example of home of the Large Hadron Collider, This is now one of the largest repositories
digital data? The answer is processes about 1PB (1,000TB, or 20 of gene sequences and associated data
surprisingly clear and probably million Shakespeares) of data every day. in the world, and possibly the largest
earlier than you think. Some 30,000 years Biomedicine may be some way behind, in Europe. Data in the form of DNA
ago, in the Palaeolithic era, someone put but it is fast catching up, driven first and sequences – As, Cs, Gs, and Ts – pours off
57 scratches on a wolf bone. These are foremost by genomics. its sequencers at an unprecedented rate,
arranged in groups of five, making the first initially destined for the Sanger Institute’s
known example of something that is still Sequencing the human genome private cloud and massive data centre.
used from time to time: a tally stick. It is The first human genome sequence This centre was extended from three
also an example of digital data. For almost was completed in 2003, after 15 years’ ‘quadrants’ to four in the summer of 2018,
all human history, however, such data was research and an investment of about $3bn. giving the Institute a massive 50PB in
scarce, relatively few people handled it and The same task today takes less than a day storage space. ‘We now generate raw
it was easy to manage. Only half a century and costs less than a thousand dollars. data at the rate of about 6PB per year,
ago the Moon landings were controlled by There is probably no tally of the number but even keeping scratch space free
banks of computers that together held of human genomes now known, but for processing, we should have enough
less data than one of today’s iPhones. Genomics England’s project to sequence capacity for the next few years’, says the
More than 200 million of these were sold in 100,000 genomes from people with Institute’s director of ICT, Paul Woobey.
2017 alone. cancer and rare disease and their close The data is managed using an open
So we are now in an age of big data, source data management system, iRODS,
but what makes it big? The term was
coined in 2006, but five years earlier an
“Data analysis on the which is becoming a favourite of research
funding bodies in the UK and elsewhere.
analyst called Doug Laney had described fly is commonly used ‘One benefit of iRODS is that it is highly
the growing complexity of data through
‘three Vs’: volume, velocity and variety.
in some disciplines, queryable’, adds Woobey. ‘This makes it
easy to locate, for instance, all the data
That is, there is a lot of data, there are a including particle physics produced by a particular sequencer on a
lot of different kinds and it is growing very and crystallography, particular day.’

but it is only recently


fast. Two more Vs have been added since: By itself, DNA sequence data means
veracity (i.e. data can somehow be verified) very little; like almost any form of data,
and value (it can and should be useful). becoming common in it only becomes meaningful once it is
The 57 scratches on that Palaeolithic
wolf bone form one byte of data, as does
bioinformatics, which is a analysed. Much of this analysis is done
in-house, but many researchers worldwide
any integer up to 255. If you are interested younger science” need access to the Sanger data.

18 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


LABORATORY INFORMATICS

“We are developing


datasets and analysis
tools for integrating these
diverse data types, so
they can be used for drug
discovery. But we have
no ambition to become
involved in this ourselves”

discovery’, says Cutts. ‘But we have


no ambition to become involved in this
ourselves.’ BenevolentAI, headquartered in
London and New York, is one of a new type
of bioinformatics company that is using
machine learning to look for patterns in
Zita/Shutterstock.com

such diverse data and to discover or re-


purpose ‘the right drug for the right patient
at the right time’.
The unique reputation of the Sanger
Institute and, by association, the University
of Cambridge in genetics and genomics, is
The growth in genomics data particle physics and crystallography, but matched at the ‘other place’ by a similarly
Some years ago, scientists would routinely it is only recently becoming common high one in other data-rich biomedical
download Sanger datasets, but this is in bioinformatics, which is a younger sciences: epidemiology and population
not always possible now because of their science’, adds Cutts. health.
sheer size. Tim Cutts, head of scientific Cutts expects that the sequencing The pioneering British Doctors’
computing at the Institute, explains: ‘In capacity of the Sanger Institute will grow Smoking Study began in Oxford in 1951
an extreme scenario, it would take a user further over the next few years, putting when academics Richard Doll and Austin
a year and cost a fortune to download further demands on its data storage and Bradford Hill, both later knighted, sent a
all our data, even over the fastest analysis capacity. However, the most survey of smoking habits to almost 60,000
commodity networks in the world’. This significant growth is likely to be in another registered British doctors. Over two-thirds
is clearly impossible, so the default now of big data’s Vs: variety. Increasingly, of the doctors returned questionnaires,
is for researchers to login to a server and genomic data will be integrated with and the data gathered was of sufficient
analyse the data onsite or in the cloud. physiological and pathological data from statistical power to demonstrate smokers’
Another approach is to perform an initial individuals’ health records, accelerating increased risk of death from lung cancer
analysis as soon as data is generated, the growth of personalised medicine. and from heart and lung disease within five
save that and throw the raw stuff away. ‘We are developing datasets and and seven years respectively.
‘Data analysis on the fly is commonly analysis tools for integrating these diverse Over 60 years later, research in these
used in some disciplines, including data types, so they can be used for drug disciplines has been integrated with the g

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 19


LABORATORY INFORMATICS

Fletcher, CEO of Rome-based technology


consultancy, Lynkeus, has pioneered
machine learning methods for creating
this type of data for e-clinical trials. ‘We
use a method called recursive conditional
parameter aggregation to create data for a
set of artificial patients that is statistically
indistinguishable from the real patient data
it was derived from,’ he explains.
Any health study that depends on
volunteers providing data can also suffer
from the problem of volunteer bias: people
interested enough in their health to join
such studies are generally likely to be
healthier than the population as a whole. It
is possible to collect data directly from the
g rest of Oxford’s biomedical research in a “Some datasets, millions of Fitbit fitness trackers in regular
new Big Data Institute, part-funded with
a generous donation from one of Asia’s
particularly those use, but this would represent an even more
biased dataset, as most people who own
richest and most influential entrepreneurs, including genomic Fitbits are richer and more tech-savvy,
the Hong Kong billionaire Li Ka-Shing. data, are so rich that it as well as more health-conscious than
This institute hosts the single biggest
computer centre in Oxford and takes becomes possible – if very average.
Martin Landray, deputy director of the
in data from all over the world. ‘Our hard – to track down an Big Data Institute, was involved in the
largest datasets come from imaging
and genomics, but we also hold gene
individual from their data, Biobank study of exercise and health,
which overcame some of this bias by
expression and proteomics data, as well even if all identification sending accelerometers to 100,000
as consumer data and NHS records’,
says the Institute’s director, Gil McVean.
has been removed” volunteers and asking them to wear them
for a week.
‘And ‘imaging data’ itself covers a huge Each one returned 100 data-points per
variety of modalities and therefore of publicly available to the wider research second, generating a huge dataset. ‘This
image types, from whole-body imaging community a year later.’ The eventual, presented a big data challenge in three
and functional MRI brain scans to digital if ambitious, aim of UK Biobank is to ways: in logistics, in ‘cleaning’ the raw
pathology slides’. sequence all 500,000 complete genomes data for analysis and in machine learning:
The Big Data Institute holds all the over the next few years and release that picking out movement patterns associated
data collected through one of the largest data, too, to the research community. with each type of activity,’ he says. ‘Only
epidemiology initiatives yet launched: the then could we – after discarding outliers –
UK Biobank, designed to track diseases Ensuring data access and anonymity detect any relationship between actual (as
of middle and old age in the general All academic and even some commercial opposed to self-reported) activity levels
population. This recruited half a million biomedical research is now carried out and health outcomes’.
middle-aged British individuals between under open access principles. These can But whose data is it anyway? All
2006 and 2010. Initially, they provided be hard to apply to human health data, biomedical data, including ‘omics data,
demographic and health information, however, as issues of privacy are also is derived from one or more individuals,
blood, saliva and urine samples, and important. The UK Biobank restricts its and much of the data collection depends
consented to long-term health follow-up. data to bona fide scientists seeking to use on willing volunteers. And, particularly as
This information, including universal the data in the interest of public health; as anonymisation cannot be fully guaranteed
genome-wide genotyping of all such, the type of research projects varies in all circumstances, all these individuals
participants, is being combined with health widely. All the data are ‘pseudonymised’ by as ‘data subjects’ must have some rights
records including primary and secondary using encrypted identifiers that are unique over their data.
care and national death and cancer to each research project, before being Anna Middleton, head of society and
registries, to link genetics, physiology, made available to researchers. ethics research at the Wellcome Trust
lifestyle and disease. However, with the amount of detailed Genome Campus (the home of the Sanger
Many participants have also been to data available nowadays, it is becoming Institute), leads a global project called
answer further web-based questionnaires harder to guarantee anonymity. ‘Some ‘Your DNA Your Say’ that aims to discover
or to wear activity trackers for a given datasets, particularly those including how willing people are to donate and
period, and the genetic part of the study genomic data, are so rich that it becomes share their genomic and other biomedical
is still being extended, as its senior possible – if very hard – to track down data. ‘We are finding that people are
epidemiologist Naomi Allen explains. ‘A an individual from their data, even if all more willing to share their data if they
consortium of pharma companies, led by identification has been removed’, explains understand what it means, and if they trust
Regeneron, has funded exome sequencing McVean. the scientists who will be using it,’ she
– that is, sequencing the two per cent One perhaps drastic solution is the says. ‘Unfortunately, levels of knowledge
of the genome that actually codes for creation of artificial data records. These and trust are low in many countries, and
functional genes – of all participants. They have similar characteristics to patient scientists must do more to communicate
hope to complete the sequencing by the records and can be fully analysed but the meaning and value of our big data and
end of 2019 and UKB will make the data represent no individuals. Edwin Morley- how it drives our work.’

20 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


LABORATORY INFORMATICS

AI advances
healthcare research

Panuwatccn/ WhiteMocca/Shutterstock.com
SOPHIA KTORI EXPLORES
THE ROLE OF AI AND DEEP
LEARNING IN HEALTHCARE
– IN THE FIRST OF A TWO
PART SERIES

Harnessing AI for drug discovery


applications will significantly
speed the identification of
promising drug candidates, believes Matt
Segall, CEO at Optibrium. The UK-based
firm, together with partners Intellegens
and Medicines Discovery Catapult,
recently received a grant from Innovate UK
to help fund a £1 million project focussed “The platform combines algorithms are already being exploited
on combining Optibrium’s existing
StarDrop software for small molecule
two-dimensional and with commercial success in the materials
science space.’ Optibrium’s own StarDrop
design, optimisation and data analysis, three-dimensional software is a very comprehensive platform
and Intellegens’ deep learning platform structure activity for small molecule design optimisation
Alchemite.
The aim is to develop a novel, deep relationships with de and data analysis.
‘StarDrop combines conventional data
learning AI-based method for predicting novo design, to help visualisation techniques with a unique
the ADMET (absorption, distribution,
metabolism, excretion and toxicity)
explore new strategies decision analysis algorithm to tackle
multi-parameter optimisation challenges,
properties of new drugs candidates. for optimisation” and also a very comprehensive suite of
Ultimately, the platform could help to guide computational chemistry capabilities
the selection and design of more effective, for understanding the structure activity
safer compounds earlier in the discovery ‘A big pharma company might have one relationships within existing projects, and
process, Segall states. to two million compounds and thousands applying those to guide the design of new
‘The pharma industry is increasingly of assays, but there’s no way that every compounds,’ Segall comments.
leveraging quantitative structure-activity compound will have been through every ‘The platform combines two-
relationship [QSAR] models to help predict type of assay. Their overall pot of data may dimensional and three-dimensional
the biological activity and toxicity of drug only be a few per cent complete and most structure activity relationships with
compounds, based on their structure,’ of the information that they would ideally de novo design, to help explore new
he explains. ‘People have been applying have for every compound, or for every strategies for optimisation. Our platform
deep learning techniques to datasets to assay, is actually missing. Conventional is an ideal fit with Intellegens’ deep
build these QSAR models, but frankly, they deep learning methods really can’t work learning capabilities for the compound
often add very little new insight. Published with that sort of data input,’ said Segall. optimisation space.’
data suggests that using conventional The Intellegens’ platform is designed to The third partner in the funded project,
deep learning algorithms to build QSAR be able to bridge these gaps, by learning Medicines Discovery Catapult, is providing
models adds very little extra intelligence, underlying correlations and relationships specialist expertise in data curation, to
compared with random forest and other between different bioactivities and increase the amount of data available to
algorithms that are commonly used.’ different assay endpoints, so that missing the algorithms.
The reason for this is that deep learning properties, relationships and activities can
methods are designed to deal with very be predicted. ‘The platform is a unique New avenues for research
large, but complete datasets, whereas the approach to deep learning, based on ‘We have some data that we have
pharmaceutical sector commonly deals sparse and uncertain data, but it’s also generated over the years, and the
with very sparse and incomplete data, very generic and can be applied in almost companies who we are working with have
Segall continues. any vertical,’ Segall notes. ‘Intellegens’ big datasets of their own based on their g

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 21


LABORATORY INFORMATICS

g own compounds and assays, but we also


believe that there is a treasure trove of
hard to find, but valuable data in the public
domain,’ Segall notes, and that is where
the expertise of Medicines Discovery
Catapult – a national, Innovate UK-funded
scientific gateway to specialist expertise –
comes into its own.
‘Medicines Discovery Catapult is using
natural language processing machine
learning technology to find published
papers with relevant data on compounds
and structures, which can be used to build
on what we already have and what our
customers have, and provide a really rich
source of data for these more hard-to-find
targets,’ Segall added. ‘We have presented
some early proof-of-concept work
demonstrating that by combining all this
expertise we can significantly outperform
conventional QSAR approaches.’
The new machine learning technology
for ADMET is also versatile, and can be
tuned to select results with a greater
or lesser confidence. ‘We can ask the
algorithm to present only those activities
that can be predicted with the highest
confidence. It’ll be a smaller number of
Tex vector/Shutterstock.com

results, but the data will be more reliable


for decision making,’ stated Segall.
Turn the predictive power dial the other
way and it is possible to take a bit more
risk and include some more uncertain
data, but you then get a bigger pot of
compounds for selection. ‘Putting it all
together, we believe this platform will be in the drug optimisation space, for effects of potential drug compounds on
able to make more accurate predictions example, some of the new methods individual cells.
across a broader range of important emerging are really more like conventional While traditional high content screening
endpoints for making decisions, and also QSAR machine-learning based methods (HCS) methods rely on teams of scientists
learn in real time from new data as it is rebranded through good marketing. But laboriously setting up the image analysis
generated by our customers or in the the industry needs much more than just to identify and quantify compound-related
literature,’ said Segall. the same old stuff in a different box of phenotypic changes, Imagence takes this
While the pharmaceutical industry is tricks,’ said Segall. processing time down to literally seconds,
embracing deep learning technologies for This new platform is designed to explains Professor Stephan Steigele, head
discovery and development, a tendency represent the next step forward in being of science at Genedata.
towards ‘overhype’ of some research able to answer questions intuitively ‘The software uses machine learning
means that not all algorithms are created using the data already available, however technology to understand, and then
equal, Segall adds. fragmented. ‘Our goal is to give scientists rapidly detect and extract cellular
‘One needs to be very careful, because the power to help make informed phenotypes that provide quantitative
decisions on which compounds they can insight into the effects of an individual
test, but also provide insight into the most compound,’ he states. ‘Importantly, the
“The companies who we appropriate assays for the next round of software can be applied to a wide range of

are working with have


screening, to move promising compounds phenotypic assay formats and workflows.
forwards towards optimised clinical For the pharma industry, this means
big datasets of their candidates,’ Segall added. it is now possible to conduct massive
own, based on their own Rapid detection
screens involving millions of different
chemical substances, in microtiter plate
compounds and assays, Genedata is poised to launch an AI-based format, and generate far more consistent,
but we also believe phenotypic image analysis software,
Imagence, which has been developed
reproducible and high-quality data for the
discovery workflow than with classical
that there is a treasure to automate the workflows and analysis computer vision,’ stated Steigele.
trove of hard to find, but of otherwise massively time-consuming So, traditional HCS approaches tie up

valuable data in the public


high-content screening data. Typically, weeks of time as technicians, IT scientists
this analysis is carried out during and biologists must define and set up
domain” pharmaceutical R&D to evaluate the the screens and the image analysis

22 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


LABORATORY INFORMATICS

parameters, run the workflows, and then for personalised medicine and other areas used in multiple fields, from business
capture and analyse the images. ‘But using of clinical practice, Dr Steigele believes. analytics, to physics or even predictive
Imagence, one single biologist can now Large-scale assays using low-volume medicine. ‘AI is best thought of as an
run the complete workflow from image clinical samples, coupled with more optimiser and a mathematical learning
production to the interpretation of results,’ consistent data extraction and analysis, machine,’ Dr Sirendi continues. ‘It can
Dr Steigele says. will help to identify which and how drugs streamline many processes encountered
‘It saves time, money, resources, and impact on subsets of patients, and make in the modern workplace, which can
a multidisciplinary workforce. Labs can it faster and more cost-effective to apply boost the productivity of existing
now effectively scale-up throughput technologies, such as HCS, for the fast- business activities. But AI can also reveal
and data output massively; and whereas developing field of personalised medicine. unexpected insights from large messy
classical approaches to HCS require major Come out of the lab and into the clinic datasets, which opens up entirely new
computing clusters to analyse huge sets and healthcare environment, and artificial product categories. Our AI is designed to
of images, we use some essentially very intelligence is being harnessed to aid and accomplish both tasks. It will make alarms
primitive hardware setups,’ he adds. speed diagnosis at the patient bedside. more actionable and clinically relevant,
Imagence is the result of collaborations Transformative AI is harnessing artificial while identifying something the human
between Genedata and leaders in intelligence and analytical techniques eye is unable to spot – subtle changes in
the biopharmaceutical industry. It developed at CERN and Cambridge human physiology that precede the onset
was the basis for a Genedata project University to generate predictive of sudden cardiac arrest.’
with AstraZeneca, Deep Learning for monitoring tools that they hope will And that predictive capacity can save
Phenotypic Image Analysis, which won ultimately save patients’ lives. lives, the company believes. Although
the 2018 Bio-IT World Best Practices ‘We believe that predictive analytics doctors can give lifesaving CPR and
Award. ‘We wanted to remove the burden hold the key to transforming healthcare,’ defibrillation to patients already in cardiac
of classical analysis, in which humans states Dr Marek Sirendi, CEO and co- arrest, the arrest occurs suddenly and
have to think about, recognise and then there is no warning, so treatment is started

“We believe that


handcraft features covering relevant only after blood has stopped flowing
phenotypic information from the cell to the brain. This increases mortality
images,’ Dr Steigele notes. ‘For AI-based predictive analytics hold rates and can have damaging effects on
approaches, the biologist just provides the
algorithm with a curated training set, and
the key to transforming long-term neurological function among
survivors, Dr Sirendi states. AI could be
the software can then learn by itself which healthcare. We aim to used to warn doctors in advance of a
features to extract and are best suited to
differentiate between the existing trainings
transform the emergency cardiac event. ‘Telemetry monitoring only
identifies arrhythmias after they have
classes (eg, images of dogs vs cats or medicine paradigm started. Our algorithm can potentially
different cellular states in the applied from rapid response to predict deadly arrhythmias up to an hour

personalised preparation
example of high-content screening). It’s before they begin, giving doctors a chance
the expertise of the biologist that drives to proactively manage this life-threatening
the whole process, but the underlying and prevention” condition.’
complexity is hidden.’ Importantly, the algorithm also reduces
Efficient generation of training data is the number of false alarms. ‘Nurses are
key. As Genedata explains, the process founder at Transformative AI. ‘We aim frustrated by an abundance of alarms,
involves generation of interactive, human to transform the emergency medicine rather than a lack of them,’ Dr Sirendi
readable maps from images in which paradigm from rapid response to adds. We cut out 54 per cent of irrelevant
similar responding cells present in the personalised preparation and prevention.’ alarms while making the remaining more
images co-cluster and thereby allow a The firm’s first product, designed for clinically relevant. The key is to make
very fast exploration of the phenotypic use in hospitals, analyses data from technology smarter, so that people (the
space and the collection of training data monitors to warn doctors in advance nurses, doctors, technicians) can rely on it
for all phenotype classes of interest; a that the patient may be likely to suffer to a greater extent.’
process typically applied during assay deadly cardiac arrhythmias. The algorithm Ultimately, the decision on whether to
development and via deep learning, which detects tiny changes in physiology that act on a machine-derived prediction is up
is capable of detecting even very subtle are predictive of sudden cardiac death, to the clinician, who will also have blood
differences. It’s then over to the biologist, triggering an alarm that gives doctors the test and potentially other clinical data and
who has the expertise to understand opportunity to prepare for, and potentially results to inform their course of action.
which phenotypes are important for the prevent the episode, explains Dr Sirendi. ‘We’re working with a number of hospitals,
assay in development. Relevant training Perhaps surprisingly, the Transformative cardiologists and electrophysiologists,’ Dr
data are then curated in just a very few AI algorithm used for predictive monitoring Sirendi notes.
hours to train networks for production in hospital settings is based on decision Engaging clinicians and healthcare
application on quantities of screening algorithms developed and used at CERN’s providers will be key if AI is to be
data (typically 50,000 to 2.5 million tested large hadron collider, which detect exotic accepted into mainstream healthcare,
substances per screening campaign). proton-proton collision events in real time. the company believes. ‘To get healthcare
‘The algorithm employs a number of state- providers excited about integrating AI
Advancing healthcare of-the-art deep learning models along with into healthcare, new tools shouldn’t just
The ability to apply deep learning other machine learning frameworks. We replicate tasks that human healthcare
techniques to typically workforce- present it with examples, and ask it to learn workers are capable of. Rather, AI tools
intensive tasks in discovery and preclinical to recognise the cases of interest.’ should provide novel insights that elevate
R&D will have important knock-on effects This basic AI approach can thus be the standard of clinical care.’

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 23


LABORATORY INFORMATICS

Phipatbig/Shutterstock.com
Data ecosystems in the cloud
Faisal Mushtaq explains the role of cloud informatics in overcoming the
challenges associated with modern pharmaceutical R&D

However, the sustained growth in traction as a scalable and cost-effective


the volume of information generated approach to help laboratories connect
by modern R&D workflows presents a individual processes to achieve end-
challenge for biotech and pharmaceutical to-end visibility of R&D pipelines. In this
Increased automation and powerful companies, in terms of organising and article, we look at how these cloud-based
bioinformatics have created a utilising these vast datasets. To truly tools are ideally placed to help businesses
pharmaceutical R&D landscape where capitalise on the value of these datasets, take back control of their data and meet
data can be rapidly generated on a truly information management tools must the needs of modern pharmaceutical R&D.
remarkable scale. support integrative thinking and enable
Take genome sequencing, for example. fast, informed decision-making by these The challenge of managing increasingly
Not long ago, mapping gigabase-sized organisations. Moreover, these tools must complex R&D data
sequences took scientists years to not only support innovation today – they Drug discovery today is as challenging
complete using traditional techniques. must be sufficiently flexible and scalable as it’s ever been. R&D budgets may
Today, this can be accomplished in a to adapt to tomorrow’s R&D landscape. be squeezed, yet industry players are
matter of hours with next-generation Increasingly, forward-thinking biotech under continued pressure to bring
sequencing (NGS) technologies(1). Similar and pharmaceutical firms are turning to safe and effective medicines to market
advances in mass spectrometry, synthetic cloud-based informatics platforms, which against accelerated timeframes.
biology and quantitative polymerase overcome data management challenges Meanwhile, regulatory authorities are
chain reaction approaches mean today’s by integrating R&D streams and centrally turning their attention to the integrity of
R&D pipelines are bursting at the seams organising the information they generate. pharmaceutical data, putting additional
with complex, multi-dimensional data. In particular, platform solutions are gaining demands on laboratories to demonstrate

24 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


LABORATORY INFORMATICS

compliance quickly and cost-effectively. expertise to better use driving innovation re-design is needed, data can be carried
Against this backdrop, biotech and – within the R&D pipeline. over in full from the previous framework.
pharmaceutical organisations need an Cloud-based solutions also offer Alternatively, if partial upgrades to an
integrated, open and secure informatics enhanced data sharing functionality, an existing platform are required to bring
platform to utilise their growing volumes increasingly important requirement for additional capabilities on-stream, new
of R&D data in the most efficient way. modern pharmaceutical data and analytics features can be added on without
However, for many organisations, this platforms. The last decade has witnessed fundamental IT infrastructure having to be
isn’t the reality. While large numbers a tangible shift towards more collaborative replaced.
of biotech and pharmaceutical firms working practices, as stakeholders One of the most important features of
have made the leap from paper-based recognise that many of today’s most cloud-based platforms is the ability to
workflows to laboratory information pressing healthcare challenges require scale and adapt to users’ needs. Some
management systems (LIMS) and knowledge and expertise from across the providers offer application libraries that
electronic laboratory notebooks (ELNs), industry. Equally, growth in the contract allow laboratories to install new features,
for a significant proportion, these tools research and manufacturing sector means tools and interfaces as their needs evolve
are not used in a joined-up, integrated that organisational partnerships are and their pipelines grow. Because these
way. becoming the new norm. pre-configured modular applications
Instead, they’re employed as point To thrive, biotech and pharmaceutical are designed to comply with the latest
solutions or assigned to specific companies need platform-based industry best practice and regulatory
workflows or departments – resulting in approaches that make securely sharing requirements, they are ready to be
poor visibility over the full R&D pipeline, large datasets as simple as sending used alongside existing LIMS and ELNs
and severely limiting organisational an email. Cloud-based platforms make from the moment they are installed. By
output. Most importantly, when new supporting organisations through this
technologies are brought on-line, these flexible approach, cloud-based platform
fragmented systems are often unable to “The sustained growth in solutions can help laboratories tackle the
cope. So, what’s the solution?
the volume of information R&D challenges that are most relevant
to them, in the most cost-effective and
Cloud-based informatics: The solution generated by modern scalable way.
to expanding R&D pipelines
Cloud-based laboratory informatics
R&D workflows presents Biotech and pharmaceutical companies
are under continued pressure to find
platforms bring together R&D data, a challenge for biotech informatics solutions to manage the
creating a single connected digital
ecosystem for drug discovery,
and pharmaceutical mountains of multi-dimensional data
generated by their R&D workflows. To
development and manufacture. In companies, in terms of maintain the competitive advantage
doing so, these systems make end- organising and utilising moving forwards, these platforms must
to-end pipeline data fully searchable,
sharable, accessible and actionable, these vast datasets” make searching, sharing and manipulating
large datasets quick and efficient, and
freeing organisations from the technical above all, they must be capable of
challenges associated with fragmented data retrieval and distribution fast evolving with the rapidly changing drug
approaches. And because these tools and straightforward, providing a solid development landscape.
are deployed through cloud-based framework on which to build successful As a result, many future-savvy
architecture, they offer much greater partnerships. By giving authorised organisations are implementing cloud-
flexibility and scalability compared to users real-time access to pipeline data based informatics platforms to create
traditional in-house set-ups. In short, they – from genomics datasets through to a single, integrated digital ecosystem
offer the perfect solution to the challenge chromatography method parameters – for their R&D data and analytics. These
of managing an expanding information these platforms can streamline workflows, modern tools for data and analytics
pipeline. improve communication and accelerate are bringing together R&D streams and
What’s more, because cloud-based R&D innovation at the click of a button. overcoming the limitations associated
platforms are developed and maintained with fragmented approaches, helping
by independent software vendors, many Managing the R&D pipelines organisations to achieve greater pipeline
boast innovative workflow support tools of tomorrow oversight, boost efficiency and drive
that would be unfeasible to develop Developing, implementing and validating faster, more effective decision-making.
in-house. For instance, some of the new IT infrastructure in-house can often
latest platforms incorporate artificial be complex, expensive and resource- Faisal Mushtaq is the Vice President and General
intelligence (AI) functionality, empowering intensive. Manager of the Digital Science business unit at
Thermo Fisher Scientific. Faisal began his career
organisations with highly-effective In contrast, the latest solutions make developing software solutions for healthcare
reporting and trending capabilities. AI migrating to a cloud-based informatics providers. In recent years, Faisal has transitioned
algorithms can, for example, analyse platform simple and straightforward. In to executive management roles at firms that deliver
focussed, software-as-a-service solutions
complex unstructured biological data in particular, systems based on modular
real-time using machine learning, natural frameworks, such as Thermo Fisher
language processing and text analytics, Platform for Science software, offer Reference
enabling faster and smarter decision- additional flexibility and are capable of [1]
 K Kulski. Next-Generation Sequencing – An
J
making. Additionally, AI frees experienced seamlessly integrating data from existing Overview of the History, Tools, and “Omic”
Applications, in Next Generation Sequencing –
scientists from some of the more routine LIMS and ELN systems in a way that is Advances, Applications and Challenges, 2016, Ed.
and time-consuming data analysis consistent with an organisation’s growth JK Kulski, IntechOpen, DOI: 10.5772/61964.
responsibilities, so they can put their needs. Even when a complete system

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 25


MODELLING AND SIMULATION

Decentralised systems
buck data-sharing trend

car model application. This was done network to bypass the need for
ADRIAN GIORDANI REPORTS using a complex machine learning network middlemen between permitted parties
ON THE USE OF BLOCKCHAIN to enable a four-door sedan Porsche within the network, saving time. It
IN THE AUTOMOTIVE Panamera to act as the software client algorithmically decides on the contents
INDUSTRY independently of backend database of the next ‘block’ in the blockchain
support, with overall control managed with a proprietary-created consensus
by the driver through a blockchain mechanism called a ‘Proof of Kernel Work’
application, a type of distributed ledger (PoKW). Only specific users or drivers
technology. are given rights to the car for opening or
XAIN successfully tested their locking the doors or starting the engine.
technology to remotely open and close XAIN’s network runs on the Ethereum
Until the first half of 2018 the Panamera up to six times faster than Blockchain, an open-source distributed
automotive manufacturers before (1.6 seconds). computing platform. Its uses range from
struggled to integrate a ‘However, we need to differentiate here decentralised video streaming services
smartphone-based system into their new that other automotive manufacturers to helping distribute food vouchers and
car models. Communication between an were previously integrating a blockchain mobile banking in third world countries.
external network and a car would only wallet into a car; which only stores private ‘Here security and incentivisation are
work if the manufacturer’s central [cryptographic] keys so that the car can, really the key points. The most powerful
database was always online. However, for example, pursue payments,’ said technique are open consensus methods,
systems that increase trust and remove Leif-Nissen Lundbæk, CEO, chairman that protect from changes, while also
the concept of third parties within a and co-founder of XAIN. ‘In our case, we allowing for incentivising all sorts of
network, known as distributed ledger integrated a full client in a car that also processes, ranging from the consensus
technologies, are growing at a fast pace. verifies all communication, so that the itself to standardisation processes,’ said
This is disrupting centralised data blockchain is not part of the backend Lundbæk.
management approaches, both at a system, but actually the car itself is the The key layer of XAIN’s novel approach
commercial and academic level. blockchain.’ is the distributed ledger technology or
In July at the international exhibition His work focuses on AI algorithms for DLT protocol. It enables a special form of
for global microelectronics industries: distributed cryptographic systems. In electronic data processing and storage.
SEMICON West in San Francisco, US, fact, XAIN originated as a spin-off from his Importantly, a DLT is a database that
analysts touted that the automotive sector academic work at Oxford University and exists across several locations, nodes
– particularly self-driving cars, in addition Imperial College London.
to artificial intelligence and high-powered His firm’s mission is an eXpandable
computing – is driving the next boom in Artificial Intelligence Network (XAIN), “We integrated a
semiconductors. Car chips, for example,
use the same amount of silicon as five to
using blockchain systems as an open or
permission-based system that enables
full client in a car
15 iPhones. connectivity to Internet-connected that also verifies all
In addition to a rise in computational devices. These include electronic control communication, so that
the blockchain is not part
demand, players in the decentralised data units (ECU) in cars to control electrical
movement are working in partnerships to systems or subsystems. XAIN cooperates
solve problems and explore opportunities. with semiconductors and integrates them of the backend system,
This year XAIN, based near Berlin in
Germany, solved trust between a database
with automotive microcontrollers from
other vendors. but actually the car itself
of its automotive client Porsche and its Smart contracts are used by the is the blockchain”

26 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


MODELLING AND SIMULATION

“The most powerful


techniques are open
consensus methods, that
protect from changes,
while also allowing for
incentivising all sorts
of processes, ranging
from the consensus
itself to standardisation
processes”

August, researchers at the San Diego


Supercomputer Center (SDSC), of the

Piusillu/WhiteMocca/Shutterstock.com
University of California San Diego, US,
were awarded a three-year National
Science Foundation grant of $818,000
to design and develop an infrastructure
of open-source distributed ledger
technologies to enable researchers to
efficiently share information about their
scientific data, while preserving the
original information. The hope is that
or participants. A blockchain is just one of this will be obsolete because code can researchers will be able to efficiently
specific type of DLT. Not all DLTs use a replace the work that currently requires access and securely verify data, according
chain of blocks, for example, and not all lots of energy. This goes back to what we to the SDSC press release.
distributed databases are DLTs – the trust are doing in Advanced Blockchain. Known as the Open Science Chain, this
boundaries are different. ‘If we look at distributed ledger is not a supercomputer-related project
A DLT delegates access to functions and technology, blockchain is just one... the according to Subhashini Sivagnanam,
data over multiple parties through secure majority of people and start-ups are principal investigator for the grant and
cryptographic principles. This platform focussing on Fintech, everything around a scientific computing specialist with
works in tandem with Porsche’s traditional finance and money. But there are a few SDSC’s data-enabled scientific computing
centralised servers, but the novel network industries with a higher potential that division. This infrastructure will be
becomes part of the backend with better will disrupt sooner than others; those integrated with actual scientific datasets.
trust and security, and gives network users industries that involve a middle party The Open Science Chain aims to
a new level of access between ‘read’ and have a problem with the decentralised increase productivity, security and
‘write’ permissions and more. movement,’ said Küfner. reproducibility. As the datasets change
DLTs are most famously known for The automotive sector is one such area. over time, new information will be
underpinning the digital currency Bitcoin. It can enable better oversight of odometer appended to the chain.
In 2018, Bitcoin mining represented fraud to car buyers, such as clocking or ‘My first impressions are that it may
roughly 0.6% of global energy demand – busting miles, which is the illegal practice help part of the problem – validating
equivalent to Argentina’s consumption. of rolling back odometers to make it and verifying the data,’ said Les Hatton,
Even though Bitcoin mining consumes appear that vehicles have lower distances emeritus professor of forensic software
lots of energy, a DLT network controlled travelled, costing millions. engineering at Kingston University,
algorithmically does not necessarily have ‘The vehicle identification number, the London, UK.
to consume more energy per node. time stamp and actual mileage of the Hatton, who is not involved in this
‘First of all, I think blockchain is vehicle will be uploaded and stored to research, said: ‘However, nothing is said
outdated. Blockchain became a buzzword, a ledger,’ said Küfner; ‘which cannot be about the software which analyses that
we even used it on our company name,’ compromised. By doing so, you have a data. Full computational reproducibility
said Robert Küfner, advisor to Advanced digital twin logbook of the car.’ depends on the whole package: data,
Blockchain, a German publicly listed analysis software, glue software and
company focusing on the design, Opening up to more reproducible testing software.’
development and deployment of DLTs. scientific research However, the large-scale challenge of
‘Using more energy in order to solve a A distributed approach to data also reproducing scientific data is bigger than
problem is simply wrong,’ said Küfner. ‘All opens up possibilities for academia. In just one technical approach, Hatton states.g

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 27


MODELLING AND SIMULATION

g More education for researchers about maintenance and training of distributed


computational reproducibility and the machine learning for autonomous driving.
basic techniques, rather than something ‘We are working with Daimler and other
sophisticated, may prove more fruitful. automotive manufacturers, representing

spainter_vfx/Shutterstock.com
roughly 39 per cent of the world’s vehicle
Enhanced security production, on integrations in their
Advocates of DLTs say that they enable cars,’ said Lundbæk. ‘We are working
better security between parties in the with Infineon, a leading microcontroller
network by decreasing the chances of a manufacturer, to embed our protocol
malicious outside attack. For example, an on their devices to make our solution
automotive manufacturer stops becoming easier to adapt in vehicles, and more
a single entry point; but, DLTs are not a “If we look at distributed secure using the trusted microcontroller
panacea and must not be used in isolation.
‘It would be wrong to use a blockchain
ledger technology, environment as an accelerator.’
Based on these early adopters, could
to store data, for reasons of scalability and blockchain is just one... a DLT approach make centralised
post-quantum security, so it is used rather the majority of people approaches go the way of the minidisc or
as a synchronisation proof mechanism
and user registry,’ said Lundbæk. and start-ups are eight-track tape?
‘Traditional database will likely continue
These architectures can be used with focussing on Fintech, to be the standard and have their role until
traditional centralised or decentralised
databases to enhance resilience to
everything around these protocols are tested at production
level,’ said Lundbæk.
hacks. However, any executable program finance and money. But ‘This process has just started and it
running within that network could also be a
security risk open to hackers.
there are a few industries is still very challenging for centralised
businesses to think this way, but it is
‘A lot of the most impactful advances with a higher potential mostly the case that they fear that their
are algorithmic in nature, instead of a that will disrupt sooner business will end if they don’t change.’
matter of scale,’ said André Platzer, an
associate professor in the computer than others” According to Küfner, the rise of DLTs
compare with the TCP/IP protocol that
science department at Carnegie Mellon underpins email exchanges. In the 90s,
University, Pennsylvania, US. His work Privacy in a networked world when people tried to explain to others how
focusses on the general principles for The growth of automation and distributed to create and use an email address, there
designing motion or other physical data exchanges heightens the concerns was little knowledge available. Küfner sees
processes in cyber-physical systems, of personal data threats and control. The DLTs moving much faster.
such as surgical robots, aircraft or self- European Union’s General Data Protection As it progresses, blockchain suffers
driving car applications. Regulation (GDPR) came into effect in May from issues such as block response times
‘The most subtle but impactful 2018 for 28 member states, including and block sizes. Advanced Blockchain is
challenges in all these systems is the businesses. The GDPR’s main goals are focussed on implementing another type of
identification of which actions can safely stronger rules on data protection, so DLT that is practical and scalable, called a
be taken under what circumstance and individuals have more control over their Directed Acyclic Graph (DAG).
why,’ said Platzer. ‘For example, when personal data and businesses have a level ‘These issues are not sustainable and
should an aircraft climb and when it should playing field on processing private data. blockchain will be replaced by a better
descend instead; or, when can a car The use of cryptography within DLTs can version called DAG,’ said Küfner. ‘If you
accelerate or coast, or even brake.’ also support GDPR principles. All personal have a fully functional DAG, imagine all
Platzer and his colleagues develop data is encrypted and can be stored on the energy that you can save. So DLT will
methods that enforce safety in individual file storage. be fantastic for climate change because
reinforcement learning algorithms. For ‘A GDPR-compliant process – specific it will reduce the amount of energy that is
example, they use programming principles to a given use case – can then be defined, consumed by all the processes that are
with an automated pipeline approach individualised and built into the user currently running.’
called ‘VeriPhy’ (verified controller interface by mapping out or combining Large organisations or companies, like
executables from verified cyber-physical processes for consent, access, update large cruise ships or tankers, are floating
system models). and erasure policy,’ said Lundbæk. on an ocean of possibilities. Some are
It provides a safe interaction between ‘The user can then decide whether or actively directing their paths straight
the code and actual physics to generate not to grant access of this data to the into the currents of decentralised data
executables that perform exactly in the manufacturer or other parties.’ management streams more than others.
same way that the original models and Now XAIN is testing other applications The market is ambitious.
algorithms were supposed to execute. with Porsche vehicles. These include ‘There is no one in DLT who is a
Platzer says that this is a more disciplined real-time notifications for drivers about a professional and has proven themselves
approach to address security concerns. third-party car access, granting remote to know what they are doing, because
‘Privacy is another matter and, indeed, access of a parked car for secure delivery the industry is so young,’ Küfner said.
it’s not clear if the world is better off of packages and unlocking or locking a ‘In fact, I would not call this an industry,
favouring privacy over exchange of car with a blockchain-powered offline rather a movement. Because it affects
information – in case only the latter can connection, with no server connection. human interactions; that is why it is not a
prevent collisions of cars. Privacy clearly Porsche benefits from increased trust business. It is a beginning of a change to
is something to be thought about in vehicle data, audited information for social-economic behaviour and will involve
carefully,’ said Platzer. reports, local data access for predictive in the near future.’

28 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


Subscribe
for free*
Do you compute?
The only global
publication for scientists
and engineers using
computing and software
in their daily work

*Registration required

Do you subscribe? Register for free now! SCIENTIFIC


COMPUTING
scientific-computing.com/subscribe WORLD
MODELLING AND SIMULATION

Intelligent
Optimisation
However, tolerances and precision in the team, or product optimisation with non-
GEMMA CHURCH LOOKS AT aerospace industry (where optimisation is simulation experts, to name a few.’
THE ROLE OF OPTIMSATION an established tool) are astoundingly tight
SOFTWARE IN MODERN because of the high degree of regulation Digital twins
ENGINEERING PRACTICES and safety required. As a result, errors Optimisation is now helping companies
and deformation from thermal cycling create digital twins with the integration
that occur on the micron scale during of machine learning techniques. Brennan
manufacturing processes can render a explained: ‘As things mature, optimisation
part unusable. algorithms will combine with learning
Manufacturing Technology Centre algorithms to improve product design.
(MTC) is working to optimise the achieved ‘For example, if we have a product out
Optimisation is everywhere. It’s precision in its additive processes and in the field such as a wind turbine, data
prevalent in our day-to-day overcome this challenge. Using Comsol will be gathered on its performance in
decisions as we choose the Multiphysics modeling and simulation various conditions. This puts a load on that
fastest route to work, the cheapest apps, MTC has created an app that device, which starts to make the baseline
product or service for our needs, or the predicts the deformation of parts and design struggle for its normal longevity
healthiest snack (or, maybe, the tastiest) allows designers to build deformations of product duration, particularly during
when hunger arises. directly into their designs. adverse conditions.’
It’s also inherent to optimise the The implications of such apps are wider Brennan added: ‘That data can be
designs, processes and services that reaching, according to Sjodin, who added: used the next time a structure needs to
shape our lives through a range of evolving ‘The field of simulation apps also opens be placed in that same place, under the
simulation techniques and applications. up the door for a variety of optimisation same conditions. That new structure will
As such, optimisation is finding a new opportunities, such as supporting a sales then withstand that environment better
place in the additive manufacturing market because of the information provided to the
and is helping manufacturers meet rising digital twin version with data from the field.
demands for lighter and more energy “3D printing or additive That’s a form of optimisation that takes

manufacturing is
efficient products by optimising mass place over years – after the product leaves
distributions. the manufacturing plant and is in service.
Bjorn Sjodin, VP of product probably the current Product improvement still hasn’t stopped
management at Comsol, explained: ‘3D
printing, or additive manufacturing, is
strongest trend within because the data is taken from the
environment of true operation to inform
probably the current strongest trend the optimisation space” the next generation of that product.’
within the optimisation space. Here,
shape and topology optimisation are

Altair
important methods, and they result in new
shapes and designs that the human mind
could never envision without the help of
computational tools. It is an exciting area
that we are very active in, with respect to
customer interactions and implementation
of new tools for the future.’
Consequently, topology optimisation is
finding a new lease of life in the 3D printing
and additive manufacturing markets.
Jeffrey Brennan, chief marketing officer at
Altair, explained: ‘Topology optimisation,
with its generation of efficient, non-
traditional, organic-like designs, serves
as a perfect complement for 3D printing,
given the manufacturing flexibility that it
offers. But it goes beyond that to include
the design and optimisation of even
complex lattice structures using different
optimisation disciplines.’  The latest Altair 365 and Altair Inspire platform images

30 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


MODELLING AND SIMULATION

 Optimised,

Ansys
varying density,
lattice geometry
created using Ansys
Mechanical showing
displacement

Lightweight designs materials are increasingly gaining “This type of


Altair was the first company to implement
and commercialise topology optimisation.
prominence. And this trend extends
beyond aerospace to other industries
optimisation - based
Now, its primary optimisation focus is as well, including automotive,’ Brennan on the management
simulation and the use of digital tools added. of ‘free’, user-defined
for design improvement, specifically to
reduce component complexity, weight Changing times parameters - allows
and consequently simplify the resulting Ansys is another established name in the to apply the same
manufacturing processes. For example, it
recently developed a 3D printed bracket
optimisation space. DesignXplorer, which
is part of Ansys Workbench. It has been
techniques to a
for the BMW i8 Roadster with the help around for more than 15 years and makes significantly more vast
of optimisation techniques that realised
44 per cent component weight savings
parametric optimisation in any simulation
area easy for engineers of all levels of
spectrum of problems,
compared to the previous car model. experience. More recently, Ansys released compared to topological
Altair also worked with Sogeclair a new topology optimisation capability optimisation, which
Aerospace to find a new development
and manufacturing approach to reduce
within Ansys Mechanical.
The company is noticing real change in remains limited to
the weight of its components, while the optimisation space. Richard Mitchell, geometry problems”
ensuring the safety of the designs. A CAE- lead product marketing manager for
driven design process was developed, structures at Ansys, said: ‘There are
which combined topology optimisation definite moves to adopt new technologies automatic one. In order to carry out
using the optistruct and additive layer and to innovate faster. The trends I see optimisation effectively, a process must be
manufacturing (ALM) tools. becoming less common are building 100 per cent automatic.’
As a result, a greater degree of design models entirely with scripts. We do ‘Optimisation inside of Ansys
freedom was realised, weight savings still have a good number of engineers Workbench is built to be robust and
were maximised while retaining the parts’ who are well versed in working this way, reliable, even if solutions fail. A user is still
stiffness. The overall number of system moving optimisation closer to the user able to make choices though, that cause
parts was reduced, which reduced the and opening up new methods, meaning issues with things like geometry updates
assembly time. that more users can get to better designs (from a CAD system) or exploring the edge
Altair customer Rolo Bikes also applied faster.’ of viable solution sets.
CAE techniques to optimise carbon fibre Challenges remain as optimisation ‘The other challenge is to find the
structures with the objective of optimising naturally leads to more solution runs, optimised result in as few iterations as
a new bike frame to achieve world-leading according to Mitchell, who explained: possible without missing the target.
performance for weight, stiffness and ‘There are two main obvious obstacles Ansys has numerous optimisation tools
comfort. ‘Composites materials are an here. The first is, will the runs be carried that include single and multi-objective
emerging technology as well. Given the out successfully? A 99 per cent automatic algorithms, as well as reducing domain
focus on lightweight designs, composite process is as effective as a 10 per cent algorithms,’ he added. g

www.scientific-computing.com | @scwmagazine October/November 2018 Scientific Computing World 31


MODELLING AND SIMULATION

g Engineering expertise

COMSOL
Esteco has been working in the
optimisation field for nearly two decades,
with a focus on real-life engineering
problems. It promotes an approach based
on ‘optimisation driven design’, where
users can advance their design process
with intelligent algorithms, which utilise
machine learning techniques. These
algorithms ‘understand’ each specific
problem and find optimal designs in
less time, at lower cost and using fewer
computational resources than traditional
techniques.
Matteo Nicolich, Volta product manager
at Esteco, explained: ‘When building
workflows composed of complex software
chains, having a tool that automates all the
repetitive work needed to interface one
software with the other, or to interface the
output of one software with the input of
the following one is essential.’  Shape optimisation analysis of a mounting bracket created using Comsol Multiphysics software
Esteco also focuses on parametric
optimisation. ‘This type of optimisation of variables, which will be used by set of smart algorithms that come with a
– based on the management of ‘free’, user- Volta, either as range limits or specific possibility to run in ‘autonomous’ mode.’
defined parameters – allows to apply the predetermined sets. Kleanthous said: ‘We provide our customers with
same techniques to a significantly more ‘Volta has incorporated those variables algorithms that are not only able to deploy
vast spectrum of problems compared to into easily editable number sets. Volta multiple strategies at once to the same
topological optimisation, which remains has given us the opportunity to run engineering problem, but are also able to
limited to geometry problems,’ Nicolich optimisation and sensitivity analysis for learn from the problem itself and adapt
added. the various scenarios our vessel was accordingly.
Esteco’s optimisation tools have been tested on.’ ‘This is not about optimisation for
used as part of the GasVessel project in The cargo capacity, vessel’s speed, dummies, but it is about obtaining viable
Cyprus, where researchers are developing distance travelled, supply and demand and accurate insights with little time or
a prototype tank containment system, variables have all been optimised using information about the product at hand.’
which will be installed on a vessel and the Volta platform. CHC is working with However, such automation opens up
used to transport compressed natural gas Esteco to add new features or change another can of worms: management of the
(CNG). specific processes in the simulation huge swathes of data that result. ‘Together
The Volta platform has incorporated algorithm. It also expects to integrate Volta with the management of simulation
a deterministic simulation to generate with its other projects. processes, there is the big issue of
results as cost per cubic unit. Athos Kleanthous said: ‘Volta has helped managing the data and all the work in
Kleanthous, commercial analyst at Cyprus identify the smallest vessel capacity progress that is related to the construction
Hydrocarbons Company (CHC), explained: possible for a specific route, less number and creation of process automation, and
‘More specifically, we are running a of ships employed in the specific supply all the data that are generated upon the
simulation of a vessel, which lifts a cargo chain and optimal speed to achieve on- execution of these processes,’ Nicolich
from point A and has to travel a certain time deliveries of the cargo in transit.’ added.
number of nautical miles to point B, where ‘The results Volta has returned have Esteco has developed a Simulation Data
it should discharge the cargo.’ shed light to the development team for the Management tool, which allows users to
The simulation needs a number directions to follow, in order to deliver a keep track of all the information related
viable project. to a simulation process, so they can cycle
‘Besides the algorithm and the results back along the history to see how a single
“We provide our itself, Volta is a web-based tool, which is piece of information was created, by which
customers with easily accessible through any internet
browser, having inputs in a simple format
data and which model.
Nicolich said: ‘With the Engineering
algorithms that are and also offers an extremely simple user Data Intelligence tool, we want the set
not only able to deploy interface,’ Kleanthous added. of functionalities to understand and
mine information in this potentially huge
multiple strategies Autonomous future amount of engineering data, and extract
at once to the same Esteco is now focusing on its optimisation information on how to predict the next

engineering problem, but automation tools. Nicolich explained the


reasoning for this approach: ‘One of the
step of the design iteration.’
It’s an innovative approach and one
are also able to learn from biggest issues can be finding the most that should help Esteco future proof its
the problem itself and effective strategy to tackle the problem
at hand. This is why we invested in
optimisation tools as automation and the
resulting reams of big data start to impact
adapt accordingly” researching and developing a whole new the simulation and modelling space.

32 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


VIEW
FOR *
FREE

*Registration required

Webcasts
now available online
Sponsored by
Creating a connected ecosystem to gain insights across R&D

As data continues to increase exponentially, many organisations are turning to


the latest advancements in digital technology to respond to the evolving needs
of R&D scientists. For many organisations, a key challenge is managing the
complexity and the number of systems in place that generate multiple types of
complex data.

Sponsored by
Ensuring Data Integrity & Facilitating Compliance with ISO 17025

Today’s quality landscape is incredibly complex with multiple data analysis,


informatics, and enterprise systems. To ensure quality results, regulators
across industries are focusing on data integrity, data security and consistent
laboratory practices.

Laboratory 4.0: Moving Beyond Digitalization in the Lab Sponsored by

Today’s laboratories are becoming increasingly complex, with ever more


data being generated and captured. At the same time, regulatory oversight
is stronger than ever and places new compliance burdens on everyday
operations.

Leveraging Machine Learning for Decision Making in the Materials Sciences Sponsored by

Machine learning and Big Data analytics offer asignificant opportunities to


improve R&D in the materials sciences, providing scientists with a new set of
tools to analyze their data.

SCIENTIFIC
COMPUTING
www.scientific-computing.com/webcasts WORLD
RESOURCES

Suppliers’ Directory
Find the suppliers you need quickly and easily
www.scientific-computing.com/suppliers

HIGH-PERFORMANCE LABORATORY Biovia Siemens


COMPUTING INFORMATICS
+ 1 858 799 5000 +1 322 536 2139
Altair Engineering, Inc www.3ds.com/products-services/ industrial-it.swe@siemens.com
Abbott Informatics biovia www.siemens.com/
(248) 614-2400
info@altair.com +44 (0)161 711 0340 n LIMS simaticit-rdsuite
www.altair.com AISalesUK@abbott.com n Electronic Laboratory www.siemens.com/industrial-it/lims
n Software www.abbottinformatics.com Notebooks n Electronic Laboratory
n LIMS Notebooks
Boston Limited n Scientific Document n LIMS
Eusoft
+44 (0)1727 876 100 Management Systems
sales@boston.co.uk +39 080 5426799
www.boston.co.uk info@eusoft.it Thermo Fisher
n Systems Integrator Accelerated Technology www.eusoft.it Scientific
Laboratories n LIMS +44 (0)161 942 3000
Calyos SA +1 910 673 8165 (outside of US) marketing.informatics@
0032 4 75 73 54 65 800 565 LIMS (5467) US and IDBS thermofisher.com
newbusiness@calyos-tm.com Canada info@idbs.com www.thermoscientific.com/
www.calyos-tm.com www.idbs.com informatics
info@atlab.com
n Cooling n LIMS
www.atlab.com n Electronic Laboratory
Notebooks n Scientific Document
Cluster Vision n Bioinformatics
Management Systems
+31 20 407 7550 n LIMS
marketing@clustervision.com InfoChem n Chromatography Data Systems
www.clustervision.com +49 (0)8958 3002 n Spectroscopy Software
Amphora Research Systems
n Software info@infochem.de
+44 (0)845 230 0160
info@amphora-research.com www.infochem.de MODELLING
Eurotech Spa n Cheminformatics
www.amphora-research.com AND SIMULATION
+39 0433 485 411
welcome@eurotech.com n Electronic Laboratory
www.eurotech.com LabWare Altair Engineering, Inc
Notebooks
n Systems Integrator infoEU@labware.com (248) 614-2400
www.labware.com info@altair.com
Autoscribe
Numerical Algorithms Group (NAG) n LIMS www.altair.com
+44 (0)1865 511 245 +44 (0) 118 984 0610
n Software
katie.ohare@nag.co.uk info@autoscribeinformatics.com
Osthus GmbH
www.nag.co.uk www.autoscribeinformatics.com +49 241 943 140 Integrated Engineering
n Software n Data Management office@osthus.de Software
n Electronic Laboratory www.osthus.de +1 204 632 5636
OCF Notebooks
+44 (0) 114 257 2200 n Electronic Laboratory info@integratedsoft.com
scw@ocf.co.uk n LIMS Notebooks www.integratedsoft.com
www.ocf.co.uk n Scientific Document n Cheminformatics n Mathematics, Simulation and
n Systems Integrator Management Systems n LIMS Modelling

Integrated Engineering Eurotech Spa Osthus GmbH OCF


Software +39 0433 485 411 +49 241 943 140 +44 (0) 114 257 2200
+1 204 632 5636 welcome@eurotech.com office@osthus.de scw@ocf.co.uk
info@integratedsoft.com www.eurotech.com www.osthus.de www.ocf.co.uk
www.integratedsoft.com

Integrated Engineering Eurotech high-performance Our experts in life science At OCF we work with you to
Software is a leading computing solutions help and industrial R&D help you meet your high performance
developer of hybrid simulation universities, research centres, to transform information into data processing, data
tools for electromagnetic, companies and governments knowledge by custom data management and data storage
thermal and structural design to excel in their field. integration projects for LIMS/ needs
analysis. ELN/Bio&Chem Informatics.

34 Scientific Computing World October/November 2018 @scwmagazine | www.scientific-computing.com


PARTNERSHIP FOR
ADVANCED COMPUTING
IN EUROPE

12–15 November 2018


Visit PRACE at
BOOTH
2033 PRACE BOOTH

PRACE HOSTING MEMBERS

JULIOT CURIE HAZEL HEN JUWELS SUPERMUC MARCONI MARENOSTRUM 4 PIZ DAINT
@ GENCI @ CEA, France @ GCS @ HLRS, Germany @ GCS @ FZJ, Germany @ GCS @ LRZ, Germany @ CINECA, Italy @ BSC, Spain @ CSCS, Switzerland

Drop by the PRACE booth 2033 and speak to our experts about:
• PRACE Project and Preparatory Access
• PRACE Industry Access and SHAPE (SME HPC Adoption Programme in Europe)
• PRACE Training & Education
• PRACE White Papers & Best Practice Guides EXDCI-2 PROJECT
• PRACE MOOCs, CodeVault IS PRESENTED TOO!

Call 18: 4 September – 30 October 2018


The PRACE Research Infrastructure provides a persistent world-class high performance computing service for
scientists and researchers from academia and industry in Europe. The computer systems and their operations
accessible through PRACE are provided by 5 PRACE members (BSC representing Spain, CINECA representing
Italy, ETH Zurich/CSCS representing Switzerland, GCS representing Germany and GENCI representing France).
The Implementation Phase of PRACE receives funding from the EU’s Horizon 2020 Research and Innovation
Programme (2014-2020) under grant agreement 730913. For more information, see www.prace-ri.eu
LABWARE 7
LIMS and ELN together in a single
integrated software platform.
A laboratory automation solution
for the entire enterprise.

Offices worldwide supporting customers


in more than 100 countries

www.labware.com

Anda mungkin juga menyukai