Journal- und Konferenzbeiträge


SPIE iconJunger, Landmann, Speck, Heist, Notni: Multimodal 3D measurement setup for generating multimodal real-world data sets for AI-based transparent object recognition, SPIE-Conference „Dimensional Optical Metrology and Inspection for Practical Applications XIII“, 2024
Example Domain Description ModelBedini, Räth, Maschotta, Sattler, Zimmermann: Automated Transformation of a Domain-Specific-Language for System Modeling to Stochastic Colored Petri Nets, 18th IEEE International Systems Conference (SysCon), 2024
Example Domain Description ModelBedini, Jungebloud, Maschotta, Zimmermann: An Analysis and Simulation Framework for Systems with Classification Components, 12th International Conference on Model-Based Software and Systems Engineering (Modelsward), 2024


TranSpec3D iconJunger, Speck, Landmann, Srokos, Notni: TranSpec3D: A Novel Measurement Principle to Generate A Non-Synthetic Data Set of Transparent and Specular Surfaces without Object Preparation
, Sensors, 2023
Aganian, Köhler, Baake, Eisenbach, Gross: How Object Information Improves Skeleton-Based Human Action Recognition in Assembly Tasks, IEEE Int. Joint Conf. on Neural Networks (IJCNN), 2023
Aganian, Köhler, Stephan, Eisenbach, Gross: Fusing Hand and Body Skeletons for Human Action Recognition in Assembly, ENNS International Conference on Artificial Neural Networks (ICANN), 2023
Aganian, Stephan, Eisenbach, Stretz, Gross: ATTACH Dataset: Annotated Two-Handed Assembly Actions for Human Action Understanding, International Conference on Robotics and Automation (ICRA), 2023
Eisenbach, Lübberstedt, Aganian, Gross: A Little Bit Attention Is All You Need for Person Re-Identification, International Conference on Robotics and Automation (ICRA), 2023
 Jungebloud, Nguyen, Kim, Zimmermann: Hierarchical Model-Based Cybersecurity Risk Assessment During System Design, Int. Conf. on ICT Systems Security and Privacy Protection (IFIP SEC), 2023
SteamVizzardJunger, Buch, Notni: Triangle-Mesh-Rasterization-Projection (TMRP): An Algorithm to Project a Point Cloud onto a Consistent, Dense and Accurate 2D Raster Image, Sensors, 2023
SteamVizzardJunger, Notni: Investigations of closed source registration methods of depth technologies for human-robot collaboration, 60th IWK—Ilmenau Scientific Colloquium, 2023
Köhler, Eisenbach, Gross: Few-Shot Object Detection: A Comprehensive Survey, IEEE Transactions on Neural Networks and Learning Systems, 2023
Räth, Bedini, Sattler, Zimmermann: Interactive Performance Exploration of  Stream Processing Applications Using Colored Petri Nets, ACM International Conference on Distributed and Event-Based System (DEBS), 2023
Räth, Onah, Sattler: Interactive Data Cleaning for Real-Time Streaming Applications, Workshop on Human-In-the-Loop Data Analytics (HILDA), 2023
Räth, Sattler: Traveling Back in Time: A Visual Debugger for Stream Processing Applications, IEEE International Conference on Data Engineering (ICDE), 2023
Schricker, Schmidt, Friedmann, Bergmann: Gap and Force Adjustment during Laser Beam Welding by Means of a Closed-Loop Control Utilizing Fixture-Integrated Sensors and Actuators, Applied Sciences, 13, 2744, 2023


Disparity MapJunger, Notni: Optimisation of a stereo image analysis by densify the disparity map based on a deep learning stereo matching framework, SPIE-Conference „Dimensional Optical Metrology and Inspection for Practical Applications XI“, 2022
Orbbec Best Student Paper Award
Interactive Stream ProcessingRäth, Sattler: Interactive and Explorative Stream Processing, ACM International Conference on Distributed and Event‐Based Systems (DEBS), 2022
SteamVizzardRäth, Sattler: StreamVizzard – An Interactive and Explorative Stream Processing Editor, ACM International Conference on Distributed and Event‐Based Systems (DEBS), 2022
Schmidt, Friedmann, Seibold, Schricker, Bergmann: Digitalisierung in der Lasertechnik, LEF Bricks 2022, Brick 3, 2022
Schmidt, Schricker, Seibold, Friedmann, Bergmann: Vorrichtungsfreies Laserstrahlschweißen – Einsatz von Digitalisierung und KI in der Fertigung, Thüringer KI-Frühling, 2022
Schricker, Schmidt, Friedmann, Römer, Böttger, Bergmann: Ereignisorientierte Prozessüberwachung und -regelung in der Lasermaterialbearbeitung, Jenaer Lasertagung, 2022
Seichter, Fischedick, Köhler, Gross: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments, International Joint Conference on Neural Networks (IJCNN), 2022
Stephan, Aganian, Hinneburg, Eisenbach, Müller, Gross: On the Importance of Label Encoding and Uncertainty Estimation for Robotic Grasp Detection, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022
Walther, Schmidt, Schricker, Junger, Bergmann, Notni, Mäder: Automatic Detection and Prediction of Discontinuities in Laser Beam Butt Welding Utilizing Deep Learning, Journal of Advanced Joining Processes, 2022


Loss FunctionsAganian, Eisenbach, Wagner, Seichter, Gross: Revisiting Loss Functions for Person Re-Identification, ENNS International Conference on Artificial Neural Networks (ICANN), 2021
Example Domain Description ModelBedini, Maschotta, Zimmermann: A generative Approach for creating Eclipse Sirius Editors for generic Systems, IEEE Syscon, 2021
Visual Scene AnalysisEisenbach, Aganian, Köhler, Stephan, Schröter, Gross: Visual Scene Understanding for Enabling Situation-Aware
, IEEE International Conference on Automation Science and Engineering (CASE), 2021
Motion PlanningMüller, Stephan, Gross: MDP-based Motion Planning for Grasping in Dynamic Scenarios, European Conference on Mobile Robots (ECMR), 2021
LaserstrahlschweißenSchmidt, Junger, Schricker, Bergmann, Notni: Echtzeitfähige Ansätze zum Monitoring der dehnungsfeldbasierten Spaltentstehung und resultierender Nahtqualität beim Laserstrahlschweißen, Jenaer Lasertagung, 2021
ESANetSeichter, Köhler, Lewandowski, Wengefeld, Gross: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis, IEEE International Conference on Robotics and Automation (ICRA), 2021

Christina Junger, Martin Landmann, Henri Speck, Stefan Heist, Gunter Notni
SPIE-Conference „Dimensional Optical Metrology and Inspection for Practical Applications XIII“, 2024

Abstract: Multimodal 3D imaging is a key technology in various areas, such as medical technology, trust-based human-robot collaboration and material recognition for recycling. This technology offers new possibilities, particularly for the 3D perception of optically uncooperative surfaces in the VIS and NIR spectral range, e.g., transparent or specular materials. For this purpose, a thermal 3D sensor developed by Landmann et al. allows the 3D detection of transparent and reflective surfaces without object preparation, which can be used to generate real multimodal 3D data sets for AI-based methods. The 3D perception of optically uncooperative surfaces in VIS or NIR is still nowadays an open challenge (cf. Jiang et al.). However, to overcome this challenge with AI-based networks for segmentation, pose estimation or 3D reconstruction (monocular/stereo), data sets with optically uncooperative objects are mandatory. Currently, only few real-world data sets are available. This is due to the high effort and time-consuming process of generating these data sets with ground truth. Currently, transparent objects must be prepared, e.g., painted or powdered, or an identical opaque twin of the uncooperative object is needed (for 3D reconstruction). Currently, transparent object must be labeled manually for segmentation. This makes data acquisition very time consuming and elaborate. We present our multimodal 3D measurement system as well as our new measurement principle, with which we can generate real multimodal 3D data sets with annotation without object preparation techniques.
This system significantly reduces the effort required for data acquisition. We also show the advantages and disadvantages of our measurement principle and data set compared to other data sets (generated with object preparation), as well as the current limitations of our novel method. In addition, we discuss the key role of data sets in AI-based methods.

Mitfinanziert durchJournal of Imaging 2024 Travel AwardPreisgeld


Francesco Bedini, Timo Räth, Ralph Maschotta, Kai-Uwe Sattler, Armin Zimmermann
IEEE Systems Conference  (SysCon), Montréal, Canada, 2024

Abstract: Petri Net models are widely recognized for their ability to analyze concurrent, stochastic processes based on a solid mathematical foundation. However, one drawback of Petri Nets is their low-level abstraction: they offer only a few basic elements like places and transitions to represent all system components.
While this limitation may not be an issue when working with small models, it becomes challenging when attempting to model larger processes or systems. As the complexity increases, the number of elements in the Petri Net also grows, making it difficult to distinguish and maintain them effectively.
Furthermore, Petri Nets require verification to ensure that they accurately represent the behavior of the system they are intended to model. This verification process must be repeated whenever a model is created or modified.
To address these challenges, this paper describes a Stochastic colored Petri Net semantics of a domain-specific language that allows modeling time-based hardware and software systems. We have developed a custom Eclipse-based framework that allows for both graphical and textual modeling, providing editors with useful features such as real-time validation of model constraints, which is not feasible at the low-level Petri Net abstraction due to the lack of contextual information.
The DSL also offers the advantage of easy conversion from other modeling languages thanks to an intermediate language. From the model, valid Stochastic Colored Petri Nets (SCPNs) can be generated, which can automatically simulate certain system properties consistently. This approach aims to enhance modeling capabilities and alleviate some of the limitations associated with traditional Petri Nets.

Francesco Bedini, Tino Jungebloud, Ralph Maschotta, Armin Zimmermann
12th International Conference on Model-Based Software and Systems Engineering (Modelsward), Rome, Italy, 2024

Abstract: Machine learning solutions are becoming more widespread as they can solve some classes of problems better than traditional software. Hence, industries look forward to integrating this new technology into their products and workflows. However, this calls for new models and analysis concepts in systems design that can incorporate the properties and effects of machine learning components.
In this paper, we propose a framework that allows designing, analyzing, and simulating hardware-software systems that contain deep learning classification components.
We focus on the modeling and predicting uncertainty aspects, which are typical for machine-learning applications.
They may lead to incorrect results that may negatively affect the entire system’s dependability, reliability, and even safety.
This issue is receiving increasing attention as „explainable“ or „certifiable'“AI.
We propose a Domain-Specific Language with a precise stochastic colored Petri net semantics to model such systems, which then can be simulated and analyzed to compute performance and reliability measures.
The language is extensible and allows adding parameters to any of its elements, supporting the definition of additional analysis methods for future modular extensions.

Christina Junger, Henri Speck, Martin Landmann, Kevin Srokos, Gunter Notni
Sensors 2023, 2023

Abstract: Estimating depth from images is a common technique in 3D perception. However, dealing with non-Lambertian materials, e.g., transparent or specular, is still nowadays an open challenge. However, to overcome this challenge with deep stereo matching networks or monocular depth estimation, data sets with non-Lambertian objects are mandatory. Currently, only few real-world data sets are available. This is due to the high effort and time-consuming process of generating these data sets with ground truth. Currently, transparent objects must be prepared, e.g., painted or powdered, or an opaque twin of the non-Lambertian object is needed. This makes data acquisition very time consuming and elaborate. We present a new measurement principle for how to generate a real data set of transparent and specular surfaces without object preparation techniques, which greatly reduces the effort and time required for data collection. For this purpose, we use a thermal 3D sensor as a reference system, which allows the 3D detection of transparent and reflective surfaces without object preparation. In addition, we publish the first-ever real stereo data set, called TranSpec3D, where ground truth disparities without object preparation were generated using this measurement principle. The data set contains 110 objects and consists of 148 scenes, each taken in different lighting environments, which increases the size of the data set and creates different reflections on the surface. We also show the advantages and disadvantages of our measurement principle and data set compared to the Booster data set (generated with object preparation), as well as the current limitations of our novel method.

Dustin Aganian, Mona Köhler, Sebastian Baake, Markus Eisenbach, Horst-Michael Gross
International Joint Conference on Neural Networks (IJCNN), 2023

Abstract: As the use of collaborative robots (cobots) in industrial manufacturing continues to grow, human action recognition for effective human-robot collaboration becomes increasingly important. This ability is crucial for cobots to act autonomously and assist in assembly tasks. Recently, skeleton-based approaches are often used as they tend to generalize better to different people and environments. However, when processing skeletons alone, information about the objects a human interacts with is lost. Therefore, we present a novel approach of integrating object information into skeleton-based action recognition. We enhance two state-of-the-art methods by treating object centers as further skeleton joints. Our experiments on the assembly dataset IKEA ASM show that our approach improves the performance of these state-of-the-art methods to a large extent when combining skeleton joints with objects predicted by a state-of-the-art instance segmentation model. Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks. We analyze the effect of the object detector on the combination for action classification and discuss the important factors that must be taken into account.

Dustin Aganian, Mona Köhler, Benedict Stephan, Markus Eisenbach, Horst-Michael Gross
International Conference on Artificial Neural Networks (ICANN), 2023

Abstract: As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker’s fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.

Dustin Aganian, Benedict Stephan, Markus Eisenbach, Corinna Stretz, Horst-Michael Gross
International Conference on Robotics and Automation (ICRA), 2023

Abstract: With the emergence of collaborative robots (cobots), human-robot collaboration in industrial manufacturing is coming into focus. For a cobot to act autonomously and as an assistant, it must understand human actions during assembly. To effectively train models for this task, a dataset containing suitable assembly actions in a realistic setting is crucial. For this purpose, we present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras, which represent potential viewpoints of a cobot. Since in an assembly context workers tend to perform different actions simultaneously with their two hands, we annotated the performed actions for each hand separately. Therefore, in the ATTACH dataset, more than 68\% of annotations overlap with other annotations, which is many times more than in related datasets, typically featuring more simplistic assembly tasks. For better generalization with respect to the background of the working area, we did not only record color and depth images, but also used the Azure Kinect body tracking SDK for estimating 3D skeletons of the worker. To create a first baseline, we report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs. The dataset is available at

Markus Eisenbach, Jannik Lübberstedt, Dustin Aganian, Horst-Michael Gross
International Conference on Robotics and Automation (ICRA), 2023

Abstract: Person re-identification plays a key role in applications where a mobile robot needs to track its users over a long period of time, even if they are partially unobserved for some time, in order to follow them or be available on demand. In this context, deep-learning-based real-time feature extraction on a mobile robot is often performed on special-purpose devices whose computational resources are shared for multiple tasks. Therefore, the inference speed has to be taken into account. In contrast, person re-identification is often improved by architectural changes that come at the cost of significantly slowing down inference. Attention blocks are one such example. We will show that some well-performing attention blocks used in the state of the art are subject to inference costs that are far too high to justify their use for mobile robotic applications. As a consequence, we propose an attention block that only slightly affects the inference speed while keeping up with much deeper networks or more complex attention blocks in terms of re-identification accuracy. We perform extensive neural architecture search to derive rules at which locations this attention block should be integrated into the architecture in order to achieve the best trade-off between speed and accuracy. Finally, we confirm that the best performing configuration on a re-identification benchmark also performs well on an indoor robotic dataset.

Tino Jungebloud, N. Nguyen, Dong Seong Kim, Armin Zimmermann
International Conference on ICT Systems Security and Privacy Protection (IFIP SEC), 2023

Abstract: Abstract.

Christina Junger, Benjamin Buch, Gunther Notni
Sensors. 2023

Abstract: The projection of a point cloud onto a 2D camera image is relevant in the case of various image analysis and enhancement tasks, e.g., (i) in multimodal image processing for data fusion, (ii) in robotic applications and in scene analysis, and (iii) for deep neural networks to generate real datasets with ground truth. The challenges of the current single-shot projection methods, such as simple state-of-the-art projection, conventional, polygon, and deep learning-based upsampling methods or closed source SDK functions of low-cost depth cameras, have been identified. We developed a new way to project point clouds onto a dense, accurate 2D raster image, called Triangle-Mesh-Rasterization-Projection (TMRP). The only gaps that the 2D image still contains with our method are valid gaps that result from the physical limits of the capturing cameras. Dense accuracy is achieved by simultaneously using the 2D neighborhood information (𝑟𝑥,𝑟𝑦) of the 3D coordinates in addition to the points 𝑃(𝑋,𝑌,𝑉). In this way, a fast triangulation interpolation can be performed. The interpolation weights are determined using sub-triangles. Compared to single-shot methods, our algorithm is able to solve the following challenges. This means that: (1) no false gaps or false neighborhoods are generated, (2) the density is XYZ independent, and (3) ambiguities are eliminated. Our TMRP method is also open source, freely available on GitHub, and can be applied to almost any sensor or modality. We also demonstrate the usefulness of our method with four use cases by using the KITTI-2012 dataset or sensors with different modalities. Our goal is to improve recognition tasks and processing optimization in the perception of transparent objects for robotic manufacturing processes.

Christina Junger, Gunther Notni
60th IWK—Ilmenau Scientific Colloquium, Ilmenau, 2023

Abstract: Productive teaming is the new form of human-robot interaction. The multimodal 3D imaging has a key role in this to gain a more comprehensive understanding of production system as well as to enable trustful collaboration from the teams. For a complete scene capture, the registration of the image modalities is required. Currently, low-cost RGB-D sensors are often used. These come with a closed source registration function. In order to have an efficient and freely available method for any sensors, we have developed a new method, called Triangle-Mesh-RasterizationProjection (TMRP). To verify the performance of our method, we compare it with the closedsource projection function of the Azure Kinect Sensor (Microsoft). The qualitative comparison showed that both methods produce almost identical results. Minimal differences at the edges indicate that our TMRP interpolation is more accurate. With our method, a freely available open-source registration method is now available that can be applied to almost any multimodal 3D/2D image dataset and is not like the Microsoft SDK optimized for Microsoft products.

Mona Köhler, Markus Eisenbach, Horst-Michael Gross
IEEE Transactions on Neural Networks and Learning Systems, 2023

Abstract: Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in few-shot object detection. We categorize approaches according to their training scheme and architectural layout. For each type of approaches, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of few-shot object detection.

Timo Räth, Francesco Bedini, Kai-Uwe Sattler, Armin Zimmermann​
ACM International Conference on Distributed and Event-Based System (DEBS), 2023​

Abstract: Stream processing is becoming increasingly important as the amount of data being produced, transmitted, processed, and stored continues to grow. One of the greatest difficulties in designing stream processing applications is estimating the final runtime performance in production. This is often complicated by differences between the development and final execution environments, unexpected outliers in the incoming data, or subtle long-term problems such as congestion due to bottlenecks in the pipeline. In this demonstration, we present an automated tool workflow for interactively simulating and exploring the performance characteristics of a stream processing pipeline in real-time. Changes to input data, pipeline structure, or operator configurations during the simulation are immediately reflected in the simulation results, allowing to interactively explore the robustness of the pipeline to outliers, changes in input data, or long-term effects.

Timo Räth, Ngozichukwuka Onah, Kai-Uwe Sattler​
Workshop on Human-In-the-Loop Data Analytics (HILDA), 2023​

Abstract: The importance of data cleaning systems has continuously grown in recent years. Especially for real-time streaming applications, it is crucial, to identify and possibly remove anomalies in the data on the fly before further processing. The main challenge however lies in the construction of an appropriate data cleaning pipeline, which is complicated by the dynamic nature of streaming applications. To simplify this process and help data scientists to explore and understand the incoming data, we propose an interactive data cleaning system for streaming applications. In this paper, we list requirements for such a system and present our implementation to overcome the stated issues. Our demonstration shows, how a data cleaning pipeline can be interactively created, executed, and monitored at runtime. We also present several different tools, such as the automated advisor and the adaptive visualizer, that engage the user in the data cleaning process and help them understand the behavior of the pipeline.

Timo Räth, Kai-Uwe Sattler
IEEE International Conference on Data Engineering (ICDE), Anaheim 2023


Abstract: Stream processing takes on an important role as a hot topic of our time. More and more applications generate large amounts of heterogeneous data that need to be processed in real-time. However, the dynamic and high frequent nature of stream processing applications complicates the debugging process since the constant flow of data can not be slowed down, paused, or reverted to previous states to analyze the execution step- by-step. In this demonstration, we present StreamVizzard’s visual and interactive pipeline debugger that allows reverting the pipeline state to any arbitrary point in the past to review or repeat critical parts of the pipeline step by step. During this process, our extensive visualizer allows to explore the processed data and statistics of each operator to retrace and understand the data flow and behavior of the pipeline.

Klaus Schricker, Leander Schmidt, Hannes Friedmann, Jean Pierre Bergmann​
Applied Sciences, 13(4), 2744​, 2023

Abstract: The development of adaptive and intelligent clamping devices allows for the reduction of rejects and defects based on weld discontinuities in laser-beam welding. The utilization of fixture-integrated sensors and actuators is a new approach, realizing adaptive clamping devices that enable in-process data acquisition and a time-dependent adjustment of process conditions and workpiece position by means of a closed-loop control. The present work focused on sensor and actuator integration for an adaptive clamping device utilized for laser-beam welding in a butt-joint configuration, in which the position and acting forces of the sheets to be welded can be adjusted during the process (studied welding speeds: 1 m/min, 5 m/min). Therefore, a novel clamping system was designed allowing for the integration of inductive probes and force cells for obtaining time-dependent data of the joint gap and resulting forces during welding due to the displacement of the sheets. A novel automation engineering concept allowed the communication between different sensors, actuators and the laser-beam welding setup based on an EtherCAT bus. The subsequent development of a position control and a force control and their combination was operated with a real time PC as master in the bus system and proved the feasibility of the approach based on proportional controllers. Finally, the scalability regarding higher welding speeds was demonstrated.

Christina Junger, Gunther Notni
SPIE-Conference „Dimensional Optical Metrology and Inspection for Practical Applications XI“ in Orlando, USA, 2022

Orbbec Best Student Paper Award

Abstract: Stereo vision is used in many application areas, such as robot-assisted manufacturing processes. Recently, many different efficient stereo matching algorithms based on deep learning have been developed to solve the limitations of traditional correspondence point analysis, among others. The challenges include texture-poor objects or non-cooperative objects. One of these end-to-end learning algorithms is the Adaptive Aggregation Network (AANet/AANet+), which is divided into five steps: feature extraction, cost volume construction, cost aggregation, disparity computation and disparity refinement. By combining different components, it is easy to create an individual stereo matching model. Our goal is to develop efficient learning methods for robot-assisted manufacturing processes for cross-domain data streams. The aim is to improve recognition tasks and process optimisation. To achieve this, we have investigated the AANet+ in terms of usability and efficiency on our own test-dataset with different measurement setups (passive stereo system). Input of the AANet+ are rectified stereo pairs of the test-dataset and a pre-trained model. Instead of generating our own training dataset, we used two pre-trained models based on the KITTI-2015 and SceneFlow datasets. Our research has shown that the pretrained model based on the Scene Flow dataset predicts disparities with better object delimination. Due to the Out-of-Distribution inputs, only reliable disparity predictions of the AANet are possible for test data sets with parallel measurement setup. We compared the results with two traditional stereo matching algorithms (SemiGlobal block matching and DAISY). Compared to the traditionally computed disparity maps, the AANet+ is able to robustly detect texture-poor objects and optically non-cooperative objects.

Timo Räth, Kai-Uwe Sattler
ACM International Conference on Distributed and Event‐Based Systems (DEBS), Kopenhagen 2022

Interactive Stream Processing

Abstract: Formulating a suitable stream processing pipeline for a particular use case is a complicated process that highly depends on the processed data and usually requires many cycles of refinement. By combining the advantages of visual data exploration with the concept of real-time modifiability of a stream processing pipeline we want to contribute an interactive approach that simplifies and enhances the process of pipeline engineering. As a proof of concept, a prototype has been developed that delivers promising results and allows to modify the parameters and structure of stream processing pipelines at a development stage in a matter of milliseconds. By utilizing collected data and statistics from this explorative intermediate stage we will automatically generate optimized runtime code for a standalone execution of the constructed pipeline.

Timo Räth, Kai-Uwe Sattler
ACM International Conference on Distributed and Event‐Based Systems (DEBS), Kopenhagen 2022


Abstract: Processing continuous data streams is one of the hot topics of our time. A major challenge is the formulation of a suitable and efficient stream processing pipeline. This process is complicated by long restart times after pipeline modifications and tight dependencies on the actual data to process. To approach these issues, we have developed StreamVizzard – an interactive and explorative stream processing editor to simplify the pipeline engineering process. Our system allows to visually configure, execute, and completely modify a pipeline during runtime without any delay. Furthermore, an adaptive visualizer automatically displays the operator’s processed data and statistics in a comprehensible way and allows the user to explore the data and support his design decisions. After the pipeline has been finalized our system automatically optimizes the pipeline based on collected statistics and generates standalone runtime code for productive use at a targeted stream processing engine.

Leander Schmidt, Hannes Friedmann, Marc Seibold, Klaus Schricker, Jean Pierre Bergmann​
LEF Bricks 2022, Brick 3, 2022​

Abstract: Laser beam welding benefits significantly from event-driven process control. This innovative approach encompasses an active clamping technology, penetration depth control for hybrid joints, and acoustic process monitoring, collectively enhancing the efficiency and quality of the welding process. Active clamping technology enables precise, real-time adjustment of the workpiece’s position during welding. This adaptive feature ensures optimal alignment, reducing distortions and, subsequently, improving the overall quality of the weld. Simultaneously, it decreases the rate of rejected welds, thereby increasing efficiency. Penetration depth control is particularly critical when welding materials with varying thicknesses and properties, common in hybrid joints. This automatic regulation ensures a consistent weld seam quality across the entire workpiece. As a result, the mechanical properties of the hybrid joints meet the necessary standards, enhancing the reliability of the weld. Acoustic process monitoring involves analyzing sound waves produced during welding. Anomalies in these acoustic patterns can serve as early indicators of issues in the welding process, such as the presence of pores or cracks. Real-time monitoring empowers timely interventions, ensuring the integrity of the weld. This presentation illustrates the potential of these methods, showing weld results and correlating weld setups.

Leander Schmidt, Klaus Schricker, Marc Seibold, Hannes Friedmann, Jean Pierre Bergmann​
Thüringer KI-Frühling, 2022​

Abstract: The use of jigless laser beam welding offers manufacturing companies significant advantages in terms of flexibility and weld quality. However, the implementation of jigless laser beam welding requires real-time acquisition and processing of process data. This presentation will show concrete examples of how digitization and artificial intelligence can be used in manufacturing to implement jigless laser welding. This can significantly reduce the cost of massive clamping systems and make production more flexible. In addition, early intervention in the welding process can increase process stability and reduce distortion.

Klaus Schricker, Leander Schmidt, Hannes Friedmann, Florian Römer, David Böttger, Jean Pierre Bergmann​
Jenaer Lasertagung, 2022​

Abstract: Process monitoring and process control of laser beam welding processes are becoming increasingly important, requiring a deep understanding of the interactions between process and material technology to provide an event-oriented view of the processes. On the one hand for early detection of deviations and defects and on the other hand for direct intervention in the process control through control loops. This article presents these event-oriented approaches to process monitoring and control based on three application scenarios. First, inline process monitoring of gaps during laser beam butt welding using acoustic process emissions; second, inline process control based on spectral process emissions for laser-based surface processing; and third, online process control for gap compensation using adaptive clamping devices and device-integrated sensors. Based on the application scenarios, the paper shows the possibilities and requirements of different monitoring and control approaches in the field of laser material processing.

Daniel Seichter, Söhnke B. Fischedick, Mona Köhler, Horst-Michael Gross
International Joint Conference on Neural Networks (IJCNN), 2022

Abstract: Semantic scene understanding is essential for mobile agents acting in various environments. Although semantic segmentation already provides a lot of information, details about individual objects as well as the general scene are missing but required for many real-world applications. However, solving multiple tasks separately is expensive and cannot be accomplished in real time given limited computing and battery capabilities on a mobile platform. In this paper, we propose an efficient multi-task approach for RGB-D scene analysis (EMSANet) that simultaneously performs semantic and instance segmentation (panoptic segmentation), instance orientation estimation, and scene classification. We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance – by contrast, the individual tasks are able to benefit from each other. In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation. To the best of our knowledge, we are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.

Benedict Stephan, Dustin Aganian, Lars Hinneburg, Markus Eisenbach, Steffen Müller, Horst-Michael Gross
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022

Abstract: Automated grasping of arbitrary objects is an essential skill for many applications such as smart manufacturing and human robot interaction. This makes grasp detection a vital skill for automated robotic systems. Recent work in model-free grasp detection uses point cloud data as input and typically outperforms the earlier work on RGB(D)-based methods. We show that RGB(D)-based methods are being underestimated due to suboptimal label encodings used for training. Using the evaluation pipeline of the GraspNet-1Billion dataset, we investigate different encodings and propose a novel encoding that significantly improves grasp detection on depth images. Additionally, we show shortcomings of the 2D rectangle grasps supplied by the GraspNet-1Billion dataset and propose a filtering scheme by which the ground truth labels can be improved significantly. Furthermore, we apply established methods for uncertainty estimation on our trained models since knowing when we can trust the model’s decisions provides an advantage for real-world application. By doing so, we are the first to directly estimate uncertainties of detected grasps. We also investigate the applicability of the estimated aleatoric and epistemic uncertainties based on their theoretical properties. Additionally, we demonstrate the correlation between estimated uncertainties and grasp quality, thus improving selection of high quality grasp detections. By all these modifications, our approach using only depth images can compete with point-cloud-based approaches for grasp detection despite the lower degree of freedom for grasp poses in 2D image space.

Dominik Walther, Leander Schmidt, Klaus Schricker, Christina Junger, Jean Pierre Bergmann, Gunther Notni, Patrick Mäder
Journal of Advanced Joining Processes, 2022


Abstract: Laser beam butt welding of thin sheets of high-alloy steel can be really challenging due to the formation of joint gaps, affecting weld seam quality. Industrial approaches rely on massive clamping systems to limit joint gap formation. However, those systems have to be adapted for each individually component geometry, making them very cost-intensive and leading to a limited flexibility. In contrast, jigless welding can be a high flexible alternative to substitute conventionally used clamping systems. Based on the collaboration of different actuators, motions systems or robots, the approach allows an almost free workpiece positioning. As a result, jigless welding gives the possibility for influencing the formation of the joint gap by realizing an active position control. However, the realization of an active position control requires an early and reliable error prediction to counteract the formation of joint gaps during laser beam welding. This paper proposes different approaches to predict the formation of joint gaps and gap induced weld discontinuities in terms of lack of fusion based on optical and tactile sensor data. Our approach achieves 97.4 % accuracy for video-based weld discontinuity detection and a mean absolute error of 0.02 mm to predict the formation of joint gaps based on tactile length measurements by using inductive probes.

Dustin Aganian, Markus Eisenbach, Joachim Wagner, Daniel Seichter, Horst-Michael Gross
ENNS International Conference on Artificial Neural Networks (ICANN), 2021

Loss Functions

Abstract: Appearance-based person re-identification is very challenging, i.a. due to changing illumination, image distortion, and differences in viewpoint. Therefore, it is crucial to learn an expressive feature embedding that compensates for changing environmental conditions. There are many loss functions available to achieve this goal. However, it is hard to judge which one is the best. In related work, the experiments are only performed on the same datasets, but the use of different setups and different training techniques compromises the comparability. Therefore, we compare the most widely used and most promising loss functions under identical conditions on three different setups. We provide insights into why some of the loss functions work better than others and what additional benefits they provide. We further propose sequential training as an additional training trick that improves the performance of most loss functions. In our conclusion, we provide guidance for future usage and research regarding loss functions for appearance-based person re-identification. Source code is available.

Francesco Bedini, Ralph Maschotta, Armin Zimmermann
IEEE Syscon, 2021

Example DomainDescription Model


Abstract: Model-Driven Engineering (MDE) is getting more and more important for modeling, analyzing, and simulating complicated systems. It can also be used for both documenting and generating source code, which is less error-prone than a manually written one. For defining a model, it is common to have a graphical representation that can be edited through an editor. Creating such an editor for a given domain may be a difficult task for first-time users and a tedious, repetitive, and error-prone task for experienced ones. This paper introduces a new automated flow to ease the creation of ready-to-use Sirius editors based on a model, graphically defined by the domain experts, which describe their domains‘ structure. We provide different model transformations to generate the required artifacts to obtain a fully-fledged Sirius editor based on a generated domain metamodel. The generated editor can then be distributed as an Eclipse application or as a collaborative web application. Thanks to this generative approach, it is possible to reduce the cost of refactoring the domain’s model in successive iterations, as only the final models need to be updated to conform to the latest format. At the same time, the editor gets generated and hence updated automatically at practically no cost.

Markus Eisenbach, Dustin Aganian, Mona Köhler, Benedict Stephan, Christof Schröter, Horst-Michael Gross
IEEE International Conference on Automation Science and Engineering (CASE), 2021

Visual Scene Analysis

Abstract: Although in the course of Industry 4.0, a high degree of automation is the objective, not every process can be fully automated – especially in versatile manufacturing. In these applications, collaborative robots (cobots) as helpers are a promising direction. We analyze the collaborative assembly scenario and conclude that visual scene understanding is a prerequisite to enable autonomous decisions by cobots. We identify the open challenges in these visual recognition tasks and propose promising new ideas on how to overcome them.

Steffen Müller, Benedict Stephan, Horst-Michael Gross
European Conference on Mobile Robots (ECMR), 2021

Motion Planning

Abstract: Path planning for robotic manipulation is a well understood topic as long as the execution of the plan takes place in a static scene. Unfortunately, for applications involving human interaction partners a dynamic obstacle configuration has to be considered. Furthermore, if it comes to grasping objects from a human hand, there is not a single goal position and the optimal grasping configuration may change during the execution of the grasp movement. This makes a continuous replanning in a loop necessary. Besides efficiency and security concerns, such periodic planning raises the additional requirement of consistency, which is hard to achieve with traditional sampling based planners. We present an online capable planner for continuous control of a robotic grasp task. The planner additionally is able to resolve multiple possible grasp poses and additional goal functions by applying an MDP-like optimization of future rewards. Furthermore, we present a heuristic for setting edges in a probabilistic roadmap graph that improves the connectivity and keeps edge count low.

Leander Schmidt, Christina Junger, Klaus Schricker, Jean Pierre Bergmann, Gunther Notni
DVS Berichte 367 zur Jenaer Lasertagung, S. 43-54, 2021

Abstract: Das Laserstrahlschweißen bedingt aufgrund des lokalen Aufschmelz- und Erstarrungsprozesses die Entstehung eines dehnungsfeldbasierten Fügespaltes. Industrielle Lösungsansätze setzen zur Begrenzung dieses Phänomens massive Spannsysteme ein, welche sehr kostenintensiv sind und zugleich bei Änderungen der Bauteilgeometrie individuell angepasst werden müssen. Demgegenüber bietet sich zur Flexibilisierung der Probeneinspannung der Einsatz des vorrichtungsfreien Schweißens (engl. jigless welding) an. Erste Ansätze für das lichtbogenbasierte Schweißen demonstrieren hierbei das hohe Potential zur in-Prozess-Anpassung des Fügespaltes unter Einsatz kollaborativer Robotik. Auf Grundlage abweichender Verfahrensanforderungen (insbesondere Positionstoleranz) sind diese Ansätze bislang nicht für das Laserstrahlschweißen umgesetzt. Diesbezüglich fehlen insbesondere Lösungsansätze, welche ein echtzeitfähiges Monitoring der Nahtqualität in Abhängigkeit des Fügespalts sowie des Winkelverzugs der Bleche erlauben. Diese Veröffentlichung zeigt daher erste Ansätze zur Bewertung der Nahtqualität von laserstrahlgeschweißten Feinblechen des Werkstoffs X5CrNi18–10/1.4301 in Abhängigkeit des Fügespalts sowie des Winkelverzugs einer I-Naht am Stumpfstoß. Durch Variation der Schweißgeschwindigkeit (1/5/10 m/min) sowie der Blechdicke (0,5/1/2 mm) wurden verschiedene Charakteristika erfasst und im Verhältnis zur resultierenden Nahtqualität bewertet. Auf Basis einer multimodalen Datenanalyse wurden mögliche Regelungsgrößen evaluiert, welche eine vielversprechende Ausgangsbasis zur Umsetzung einer echtzeitfähigen Prozessregelung bieten.

Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, Horst-Michael Gross IEEE International Conference on Robotics and Automation (ICRA), 2021 ESANet Abstract: Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that we reach state-of-the-art performance while enabling faster inference. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we also show qualitative results in one of our indoor application scenarios.