Workshop 2012

on Supercomputing and computational solid and fluid mechanics


Autumn School

on Parallel solution of large engineering problems

Ostrava, November 19 - 23, 2012

List of abstracts

Autumn school

Johannes Kraus

Auxiliary space multigrid: Two-grid analysis and aspects of parallelization

We introduce a multigrid algorithm for elliptic boundary-value problems that is based on simultaneous two-level factorization of local (finite element stiffness) matrices associated with a sequence of coverings of the entire domain by overlapping or non-overlapping subdomains. In this method, ``coarse-grid correction'' is defined in an auxiliary space which is equipped with an energy inner product. The construction is such that not only the (globally) energy minimizing interpolation (in the auxiliary space) but also the related Galerkin coarse-grid operator have a sparse matrix representation. We present a two-grid analysis and discuss the favorable properties of this algorithm for an implementation on parallel computer architectures.

Svetozar Margenov

Robust solution methods for strongly heterogeneous problems and supercomputing applications in porous media

The first part of the talk is devoted to construction and analysis of Algebraic MultiLevel Iteration (AMLI) methods for strongly heterogeneous problems. Some advantages/disadvantages of the hierarchical bases (HB) and additive Schur complement approximation (ASCA) approaches are addressed.

Three applications related to computer simulations in porous media are discussed in the second part: a) problems of high-frequency and high contrast including strongly anisotropic channels; b) voxel analysis of bone microstructure; c) simulation of radio-frequency hepatic tumor ablation. They demonstrate both, robust discretizations and robust multilevel solution methods and algorithms. Some of the problems have coefficient jumps which are not aligned with the interfaces of the initial mesh. Complex microstructures obtained from high resolution computer tomography are used. The numerical tests include supercomputing simulations on IBM Blue Gene/P.

Oliver Rheinbach

A parallel simulation environment for nonlinear structural mechanics based on a FETI-DP solver and FEAP

In this talk our approach and experiences with a simulation environment built around a legacy implicit Finite Element code and a parallel FETI-DP solver will be presented. We will talk about software related issues as well as numerical and parallel scalability of different FETI-DP methods. We will also discuss inexact and hybrid FETI-DP methods and their advantages and disadvantages.

Erhan Turan

Large scale micro finite element analysis of 3D bone poroelasticity

Osteoporosis is one of the major health problems in the world. A good understanding of bio-mechanical response of human bone under load requires an accurate simulation of underlying physics. For that reason a solver called ParFE is developed using Trilinos that is specialized for 3D micro finite element analysis of bone elasticity. The geometry taken from medical imaging is converted to equal sized voxel that are approximated with trilinear elements which leads to a linear algebraic system that is solved using PCG using multilevel techniques.

In this talk, a discussion will be made on PorFE, a poroelastic variant of ParFE, at which the physical model is extended using Biot's Consolidation. In order to discretize the governing equations, mixed finite element formulation is used which treats interfacial velocities and elemental pressure as primitive variables, as well. Unfortunately, the resulting 3x3 block linear problem is indefinite and needs special Krylov solvers, like Minres or Gmres with suitable preconditioners. For that purpose, a block variant preconditioner is used that relies on Schur complement. The core of the lecture will be the implementation details that focuses on representation of the matrix-vector products and incorporation of the multilevel solvers within the block structures using the TRILINOS library. The course will also address programming issues related to the transient behaviour of the model.

Jakub Šístek

Parallel implementation of Adaptive-Multilevel BDDC method and applications to linear elasticity

In the first part of the talk, basic concepts of iterative substructuring will be recalled, followed by a brief historical overview of domain decomposition methods based on primal unknowns (Neumann-Neumann, BDD, BDDC). Next, basic components of the standard BDDC method by Dohrmann (2003) will be explained in detail. The original method solves large sparse systems of linear equations resulting from discretization of elliptic PDEs, although extensions to other problems have been proposed past the last decade.

In the second half of the talk, we focus on some extensions of BDDC. Namely, the algorithms of Adaptive BDDC, Multilevel BDDC and, finally, Adaptive-Multilevel BDDC will be described. The Adaptive BDDC, which aims at problems with severe numerical difficulties, such as large variations in material coefficients, generates constraints based on prior analysis of the problem and eliminates the worst modes from the space of admissible functions. The Multilevel BDDC is a recursive application of the BDDC method to successive coarse problems aiming at very large problems solved at thousands of compute cores. The Adaptive-Multilevel BDDC method combines the two approaches, hopefully leading to some synergistic effects.

We present a parallel implementation of these concepts within our open-source package BDDCML. Performance of the solver is analysed on several benchmark and engineering problems, and main issues with an efficient implementation of both Adaptive and Multilevel extensions are discussed. This is a joint work with Jan Mandel and Bedřich Sousedík.

Roman Wyrzykowski

Unleashing parallel hybrid architectures with accelerators for numerical modeling of geophysical flows

EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed by the group headed by Piotr K. Smolarkiewicz for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The anelastic dynamic core of the EULAG model includes MPDATA advection algorithm and elliptic solver. One of the most interesting applications of the EULAG model is numerical weather prediction (NWP). The EULAG dynamic core is now being coupled with the COSMO model - high-precision NWP system supported by MeteoSwiss. The result of coupling is capable of forecasting the weather using Alpine topography with at least 0.55 km grid spacing.

Till now 3D MPI parallelization is used for running EULAG on large systems with tens of thousands of cores or even with more than 100K cores. Such large numbers of cores are necessary to provide the required execution time. However, when parallelizing EULAG computation on supercomputers and CPU clusters with large number of cores, the efficiency is declined below 10%. One of negative consequences is large energy consumption. To reduce energy consumption by NWP, we propose to rewrite the EULAG dynamic core and replace standard HPC systems by small hybrid clusters with accelerators such as GPU and Intel MIC.

In this talk we focus on investigating tenets of optimal parallel formulation of anelastic numerical models on modern hybrid architectures. Novel multi-, many-core and GPU platforms require not only a different philosophy of memory management than traditional massively parallel super-computers, but also a comprehensive look at load balancing in the heterogeneous co-processing computing model. We expect that new strategies for memory and computing resources management will allow us to ease memory bounds, and better exploit the theoretical floating point and energy efficiency of hybrid architectures.

SPOMECH Workshop

Owe Axelsson

Efficient preconditioning techniques for phase-field models

In order to avoid interface conditions and enable the use of a fixed mesh in time-dependent problems, it is shown that a diffusive model of Cahn-Hilliard type can be used.Some numerical examples illustrate the method. A preconditioning method is used that needs no update.Efficient implementation on parallel clusters are shown.

Pavel Burda

Stabilization of the finite element method for incompressible flows

Stabilization of the finite element method for incompressible flow problems with higher Reynolds number is the main subject. The semiGLS method is recalled as a modification of the Galerkin Least Squares method. In this contribution we analyze and comment on the accuracy of the method. A posteriori error estimates for incompressible Navier-Stokes equations are used as the principal tool for error analysis and some conclusions concerning accuracy are derived. Numerical results are presented.

Zdeněk Dostál

Twenty years of research in the Department of Applied Mathematics of FEECS VSB-TU Ostrava

The talk on the occasion of the twenty years anniversary of the Department of Applied Mathematics will informally review the research and related activities of the members and students of the department from modest beginnings to this workshop.

Frédéric Feyel

Multiscale approaches and their parallelization

Multiscale approaches are for a wide range of specific materials something that cannot be ignored. Such materials exhibiting complex microstructures cannot be correctly described using classical macroscopic phenomenological equation sets: the mechanical role of the different microscopic phases must be explicitely taken into account and propagated to the macroscopic scale using some kind of homogenisation theory. Such approaches lead to very heavy numerical computations whose cost is adressed using high performance computation technics. During the talk we will present some of these approaches and show how we numerical handle them.

Jaroslav Haslinger

Shape optimization for the Stokes system with treshold slip

The talk deals with the mathematical analysis of optimal shape design problems in fluid mechanics with slip boundary conditions. We restict ourselves to the Stokes system with treshold slip (analogy of Tresca friction in solid mechanics). We prove the existence of at least one optimal shape in an appropriate class of domains. The second part of the contribution is devoted to the discretization of this problem and convergence analysis.

Jaroslav Kruis

Visco-elastic analysis of prestressed concrete bridge

Creep analysis of existing prestressed concrete bridge is presented. The numerical analysis will be compared with measured data. Therefore, 3D numerical model based on hexahedral finite elements was generated. Prestressed tendons are modelled by bar elements which are connected to the bridge with the help of hanging nodes. The creep is described by Bazant's B3 model. The first results are obtained on a mesh which can be processed on a single processor computer by a sparse direct solver but more detailed analysis will require application of a suitable domain decomposition method and parallel processing.

Radek Kučera

On the role of generalized inverses in solving two-by-two block linear systems

The goal is to analyze the role of generalized inverses in the projected Schur complement based algorithm for solving nonsymmetric two-by-two block linear systems. The outer level of the algorithm combines the Schur complement reduction with the null-space method in order to treat the singularity of the (1,1)-block. The inner level uses a projected variant of the Krylov subspace method. We prove that the inner level is invariant to the choice of the generalized inverse to (1,1)-block so that each generalized inverse is internally adapted to the More-Penrose one. The algorithm extends ideas used in the background of the FETI domain decomposition methods. Numerical experiments confirm the theoretical results.

Michal Kuráž

Domain Decomposition Method for the Nonstationary Richards’ Equation Problem

The problem of predicting fluid movement in an unsaturated/saturated zone is important in many fields, ranging from agriculture, via hydrology to technical applications of dangerous waste disposal in deep rock formations. Richards’ equation model on engineering problems typically involves solving systems of linear equations of huge dimensions, and thus multi-thread methods are often preferred in order to decrease the computational time required. In case of non-homogenous materials, if splitting the computational domain efficiently, the problem conditionality can be significantly improved, as each subdomain can contain only a certain material set within some defined parameter range. For linear problems, as e.g. heat conduction, the domain splitting in such a way can be performed very easily. The problem arises in case of the non-linear Richards’ equation, where the coefficients of a single material can vary within several orders of magnitude. If Euler method is considered for the time integration, then a robust algorithm will be obtained if the domain is split adaptively over the time integration levels. The domain decomposition technique considered here is the standard multiplicative Schwarz method with coarse level. Method of the domain decomposition adaptivity will be studied in this presentation.

Dalibor Lukáš

Parallel BEM-based methods

We consider Galerkin boundary element method (BEM) accelerated by means of hierarchical matrices (H-matrices) and adaptive cross approximation (ACA). This leads to almost linear complexity O(n log n) of a serial code, where n denotes the number of boundary nodes or elements. Once the setup of an H-matrix is done, parallel assembling is straightforward via a load-balanced distribution of admissible (far-field) and nonadmissible (near-field) parts of the matrix to N concurrent processes. This traditional approach scales the computational complexity as O((n log n) / N). However, the boundary mesh is shared by all processes. We propose a method, which leads to memory scalability O((n log n)/sqrt(N)), which is optimal due to the dense matter of BEM matrices. The method relies on our recent results in cyclic decompositions of undirected graphs. Each process is assigned to a subgraph and to related boundary submesh. The parallel scalability of the Laplace single-layer matrix is documented on a distributed memory computer up to 133 cores and three milions of boundary triangles.

At the end of the talk, we will make a note on a BEM counterpart to the primal domain decomposition and present first results for the vertex-based algorithm in 2 dimensions.

Maya G. Neytcheva

Numerical solution of constant coefficient stationary Navier-Stokes problems

These lectures aim at describing various preconditioning approaches for the two-by-two block matrices as arising in discrete models of the Navier-Stokes problem in its various forms (constant or variable coefficients, stationary or time-dependent). We will assume that the discretization is done using proper finite element discretisation - either using stable finite element pairs or it has been properly stabilized.

In the first part we consider the stationary constant coefficient case and describe shortly the most popular techniques to precondition the so-arising linearized discrete problems. The algebraic problems to be solved are indefinite linear systems with matrices of two-by-two block form. Whenever suitable, we connect the to preconditioning techniques used for general matrices, split into two-by-two block form. The presentation includes some theoretical results as well as illustrations from numerical experiments.

Numerical solution of variable-coefficient and time-dependent Navier-Stokes problems

The second lecture presents two extensions of the Navier-Stokes problem. First we consider stationary problems with constant density and variable viscosity. We show how some of the well-known preconditioners can be used in this case.

Next we consider the Navier-Stokes equations in their full complexity, including time-dependence and variable density. The original system of partial differential equations is enlarged with one more equation for the density, which varies now in time and space. We discuss some popular splitting schemes for this problem, their pros and cons. A reformulation of the problem is described, where instead of the velocity, one can use the so-called momentum.

Multiphase flow problems: coupled Navier-Stokes and Cahn-Hilliard systems

In this lecture we describe multiphase flow problems and in more details, one of the models, used to simulate such phenomena, the so-called 'diffuse interface' methods. One of the main modelling tool for these methods in the Cahn-Hilliard equation. The corresponding discrete systems to be solved are also of two-by-two block form and some special preconditioners for those are included in the presentation. In presence of convection, the Cahn-Hilliard is coupled to the variable density Navier-Stokes equation. We finalize the presentation with some details on that coupled system and show some results on simulating the so-called Rayleigh-Taylor instability.

Martin Palkovič

IT4Innovations - present and future

During the period of 2012-2015, 6 new strategic research centres are going to be built in Czech Republic. In this talk, IT4Innovations will be introduced as the only strategic research centre focusing on IT and particularly on HPC.

The expertise of the centre w.r.t. different HPC application domains will be reviewed and compared to state-of-the-art. The planned hardware will be introduced and discussed in the context of Top500 list released just one week ago. The talk will be concluded with directions and future of the IT4Innovations research centre.

Štěpán Papáček

Estimation of in-vivo biomolecule mass transport and binding rate parameters from spatio-temporal FRAP images: An inverse ill-posed problem

Since the 1980's, with first commercial confocal microscopes, Fluorescence Recovery after Photobleaching (FRAP) technique is very useful in studies of protein dynamics in live cells. FRAP is based on measurement of the change of fluorescence intensity (so-called recovery) of either fluorescently tagged or autofluorescent molecules in a region of interest (ROI) in response to an external stimulus, so-called bleach. Earlier work on FRAP methods has derived analytical results for some specific geometries of ROI and bleach spot (i.e. initial and boundary conditions), and for some additional assumptions, turning the quantification of the transport and binding rate parameters into the curve fitting problem. Naturally, the underlying process of redistribution of fluorescent molecules can be modeled and solve numerically. Afterwards, the parameter estimation turns into an optimization problem by minimizing the disparity between simulated and experimentally measured data. I will review this more general approach, i.e. the process is modeled by the reaction-diffusion PDE with the noisy initial condition and the time-varying Dirichlet or Neumann boundary condition, and will then describe the unexpected pitfalls residing mainly in the ill-posedness of the inverse problem similar to that of backward heat equation. The ongoing work is motivated by the study of photosynthetic protein mobility in thylakoid membrane of cyanobacteria and algae. The real world example will be provided. Finally, I will conclude by briefly discussing the identifiability issues and how simple method using the sensitivity matrix can shed light on the posedness analysis and parameter uncertainty assessment, i.e. on the calculation of confidence intervals of the estimated parameters.

Alexander Popp

Dual mortar approach and semi-smooth Newton methods for computational contact mechanics

In this talk, a mortar finite element approach for contact interaction in the fully nonlinear realm is presented, which draws its effectiveness from a sound mathematical foundation based on findings in the fields of domain decomposition and non-conforming discretization. An emphasis is put on the so-called dual Lagrange multiplier approach, where the discrete coupling variables are defined based on a biorthogonality relationship with the primary unknowns (i.e. displacements). As compared with standard Lagrange multiplier techniques, a localization of the interface coupling conditions is achieved, and thus the dual Lagrange multiplier approach significantly facilitates the resulting algorithms without impinging upon the optimality of mortar methods.

A fully consistent linearization of the dual mortar approach in the context of implicit time integration is presented, from which very efficient nonlinear solution methods can be derived. Moreover, the inequality nature of contact constraints is accommodated with an enhancement of so-called semi-smooth Newton methods, thus combining the search for the active contact constraints and all other sources of nonlinearities within one single nonlinear iteration scheme.

Mortar finite element methods for general interfaces in solid mechanics, fluid mechanics and coupled problems

In this talk, the applicability of mortar finite element methods for a very broad range of so-called general interfaces is explored. It is demonstrated that dual mortar methods also provide a convenient framework for many other single-field and multi-field problems of computational engineering beyond classical solid and contact mechanics. Exemplarily, mortar finite element discretization in computational fluid dynamics (using a variational multiscale formulation for incompressible flow), mortar-based interface treatment in fluid-structure interaction (using a classical ALE-based moving-grid description) and a coupled fluid-structure-contact interaction approach (using a novel fixed-grid / XFEM formulation) are outlined.

Ivana Pultarová

Fourier analysis of aggregation based multigrid for nonsymmetric problems

We study the aggregation based multigrid method for general nonsymmetric matrices. We present a spectral analysis of the error propagation operator of this method for a certain class of matrices. This approach is based on the Fourier transformation. It is shown for example, how the optimal number of smoothing steps depends on sizes of aggregation groups. Numerical examples are presented.

Talal Rahman

Crouzeix-Raviart mortar finite element and its domain decomposition preconditioners

In this talk, we will consider the Crouzeix-Raviart mortar finite element discretization of the second order elliptic problem on non-matching meshes, based on an approximate matching condition for the discrete functions using only the nodal values only on the mortar side of a subdomain interface for the calculation of the mortar projection, as opposed to using the conventional approach where nodal values in the interior of a subdomain are also required. Since the interior degrees of freedom disappear completely from the computation of the matching condition, it makes the design of any algorithm less intricate and more flexible as compared to the conventional approach. In the second part of the talk, we will present several domain decomposition preconditioners, including two level additive Schwarz and substructuring type, for the Crouzeix-Raviart mortar finite element on problems with discontinuous coefficients.

Domain decomposition methods for a discontinuous Galerkin formulation of the multiscale elliptic problem

Numerical scientists have started to depend more and more on the use of domain decomposition framework to design efficient solvers for their multiscale problems. In this talk, we will consider the multi-scale second order elliptic problem with highly varying coefficients, where we use a discontinuous Galerkin formulation for the discretization, and present several domain decomposition algorithms, including two level additive Schwarz and FETI-DP algorithms, for the numerical solution of the resullting system. For the FETI-DP, we even consider non-matching meshes. The convergence in either case can be shown to be independent of the coefficients under certain assumptions.

Oliver Rheinbach

FETI methods and incompressibility

Almost incompressible elasticity problems require a special treatment in iterative solution methods. We will present a new coarse space for FETI-DP methods for incompressible elasticity in 2D as well as 3D. This coarse space can be implemented, e.g., by projector preconditioning but not by a transformation of basis with partial finite element assembly.

François-Xavier Roux

FETI methods 1

In the first lecture, we present the now standard FETI method for the solution of large scale structural analysis problems. We focus on the main practical issues with the implementation of these methods: reliability of the detection of local rigid body modes, numerical and computational scalability. We also present some techniques for accelerating the convergence in the case of multiple right hand sides.

FETI methods 2

This second lecture is devoted to the many variants of FETI method: FETI-DP, FETI-H, FETI-2, YADP-FETI… We analyze each method, from both theoretical and implementation viewpoints, and explain for which kind of problems it is better fitted.

Zdeněk Strakoš

Challenges in coupling iterative algebraic computations with modeling and discretisation

The current state-of-the art of iterative solvers is an outcome of the tremendous algorithmic development over the last few decades and of investigations of how to solve given problems. In this contribution we focus on Krylov subspace methods and more on the dual question why things do or do not work. In particular, we will pose and discuss open questions such as what does the spectral information tell us about the behaviour of Krylov subspace methods, how important is considering of rounding errors, whether it is useful to view Krylov subspace methods as matching moment model reduction, and how the algebraic error should be measured in the context of adaptive PDE solvers.

Jan Zeman

Modeling Random Heterogeneous Materials by Wang Tilings

Microstructure generation algorithms have become an irreplaceable tool to study the overall behavior of heterogeneous media. In general, they can be classified as reconstruction and compression-based. The goal of the first class of the methods is to reproduce a microstructure corresponding to a given set of statistical descriptors. The objective of the compression algorithms is to replace the original (complex) microstructure with a simpler object, a statistically equivalent unit cell, which approximates the original media with a reasonable accuracy. Here, the terms 'statistically equivalent' refers to the fact that the error induced by the approximation is quantified by the same statistical descriptors as in the first case.

Compression algorithms are particularly useful in multi-scale homogenization schemes, as they allow capturing the most dominant microstructure features at a feasible computational cost. Their main deficiency, however, remains in the selection of suitable conditions at the cell boundary. The usually adopted assumption of periodicity is known to highly influence the overall response, particularly for the problems characterized by localized fields.

In the present contribution, we discuss as yet another approach to microstructural compression, inspired by successful applications of Wang tiles in computer graphics and game industry. Wang tiles are square cells with distinct codes at their edges, that are not allowed to rotate when tiling is performed.

In this framework, we represent the microstructure by a square tile with edges marked by particular code, and the corresponding tile set created by permutation of the codes in all possible directions. On the basis of this representation, the complete microstructure can be generated by randomly gathering compatible members (compatibility on a contiguous edge codes).

The potential of this methodology is illustrated by compression of an artificial microstructure, corresponding to an equilibrium distribution of approximately 10,000 impenetrable equi-sized discs. The parameters of the compressed representation are the number and positions of disc centers in the reference unit cell and the set tiles, respectively. These are found by minimizing the difference between the one- and the two-point probability functions. We shall demonstrate that the non-periodic set provides substantially better approximation to the original structure, and removes to a great extent artificial regularities of its periodic counterpart.