The Institute for Machine Learning and Analytics (IMLA) is proud to host the 6th URAI Symposium which will take place on November 13th+14th at HS Offenburg.
|
The theme for URAI'24 is "Applied AI in the age of Generative AI"
News
A Tutorial at URAI24
A Tutorial at URAI24
Abstract:
“Okay Google, set a timer for 10 minutes” The ability to interact with AI assistants using natural speech has become a reality for millions of households worldwide. Yet, beyond smart speakers, spoken conversations are sparse in interactive devices, despite being one of the most natural forms of human engagement. Even in the field of Human-Robot Interaction (HRI), spoken dialogue is often overlooked, largely because it’s perceived as too challenging to implement with a satisfying result. Enter Pre-trained Large Language Models. These models, now widely commercially available, promise seamless natural conversation and hence spark excitement that robots could soon achieve similar conversational capabilities with just a simple API call.
In this talk, I will argue for caution. While generative models seem to excel in written contexts, they struggle with the complexity and nuances of spoken interaction. More importantly, pre-trained models are designed to be passive responders, following a user's lead - an approach that falls short in real-world robotic applications where systems must pursue specific, often complex, goals. Ensuring robots adhere to these objectives is far from straightforward with current AI solutions.
Despite these challenges, the potential for natural conversations with robots is immense, spanning disciplines from industrial robotics to in-home support, and from social to task-oriented applications. The message is clear: It is time to make your robots more conversational. Generative AI provides a powerful foundation, but we are still only beginning the journey to robust, goal-oriented spoken interactions.
CV: Maike Paetzel-Prüsmann specializes in Human-AI interaction and user experience research. Her work involves not only programming robots and virtual agents, but also analyzing and shaping how they are perceived by people. With a decade of experience in both academic and industrial research labs, she contributes to the evolution of conversational AI, aiming to excite users about robots and virtual characters in their everyday life. Maike obtained her PhD from Uppsala University, Sweden, in Computer Science with a specialization in Human-Robot Interaction. During her PostDoc, she then shifted her work to dive even deeper into conversational AI, Natural Language Processing and using Large Language Models for human-robot interaction. Most recently, Maike worked as an Associate Research Scientist for Disney Research to help bring artificial characters to life in Disney's theme parks. When she is not helping robots to improve their communication skills, Maike spends her time supporting robots playing soccer as part of the RoboCup Humanoid League.
Extended Abstract in appendix...
Abstract
This paper explores the application of Artificial Intelligence (AI) to contribute to the improvement of quality assurance and troubleshooting in manufacturing. The goal is to identify and resolve quality issues effectively using AI techniques, applying Explainable AI (XAI) to ensure transparency and comprehensibility. We propose three approaches to tackle these industry challenges: semantic reasoning, Long Short-Term Memory (LSTM) networks for time series quality prediction, and a combined Machine Learning (ML) and Fault Tree Analysis (FTA) method for comprehensive fault detection and analysis.
Approaches
Figure 1 shows the X-Quality framework. Data is collected from each machine and AI/XAI methods provide the prediction and explanation for the operators. In parallel, expert knowledge is capitalized as an ontology (a formal model that allows reasoning). The whole will provide integrated explanations to help the foreman identify and understand the origins of quality issues.
Time series anomaly detection is crucial for Industry 4.0, ensuring predictive maintenance and quality control by spotting rare deviations from normal behavior. This work merges advanced data mining methods with XAI techniques to identify anomalies. Utilizing the matrix profile combined with SHAP (SHapley Additive exPlanations) enhances anomaly detection and makes decision-making transparent. Our model addresses the challenge of anomaly detection, enabling early failure detection and supporting proactive maintenance. This demonstrates the method's effectiveness in improving operational efficiency and quality control.
Building upon this, the second approach combines AI and FTA to further enhance the predictive and explanatory power of our system. AI is used to predict Basic Events (BEs) within Fault Trees (FTs), converting these predictions into probabilities to determine the likelihood of the Top Event (TE). This enhances transparency and understanding of system failures. In our experiment we first classified the TE directly. Next, we classified the underlying BEs and used them to determine the TE. This approach outperformed the approach mentioned before. It improves the prediction accuracy, identifies the root causes and provides interpretability.
The insights gained from these AI/XAI methods are then used to exploit a domain ontology, built from expert knowledge. Stream Reasoning plays a key role here by enabling the continuous querying of heterogeneous data streams coming from different sources in real time and incorporating logical reasoning over the data. In anomaly detection, this approach enables to detect quality issues coming from the streams and to enrich the ontology. Reasoning over the ontology allows to explain the origin of the detected quality issue. A first illustrative case study about quality assurance succeeded in detecting anomalies and proposing an explanation.
Conclusion
In conclusion, the X-Quality framework combines machine data with AI/XAI methods to provide predictions of future quality issues and explanations with the possible failures that are producing the quality issue upstream in the production line. This approach will allow the reductions of costs and the improvement of the operational efficiency and maintenance through transparent, data-driven decision-making.
A scalable and rapidly deployable fault detection framework for building heating systems is presented. Unlike existing data-intensive machine learning approaches, a SARIMAX-based concept was implemented to address challenges with limited data availability after commissioning of the plant. The effectiveness of this framework is demonstrated on real-world data from multiple solar thermal systems, indicating potential for extensive field tests and applications for broader systems, including heat pumps and district heating.
full abstract in the appendix
...
Predicting energy production from photovoltaics (PV) is crucial for efficient energy management. In order to be able to apply different operating strategies, it is necessary to be able to predict the expected amounts of PV energy. The operating strategies are typically optimized with regard to economic or technical goals or a combination of both. Within this work, we show a possibility to predict PV power production using local weather data. Based on the measured values from the PV system and an associated weather station, our model is trained and validated with regard to PV production. The measurement data come from the PV system of the former Campus North of Offenburg University of Applied Sciences from the years 2017-2021.
...
full abstract in the appendix
This paper examines the most effective means of supporting small and medium-sized enterprises (SMEs) for the integration of artificial intelligence (AI) into their business practises. In addition to the challenges and limitations described in the existing literature, the experiences of the KI-Labor Südbaden will be used as a case study. Although AI has shown rapid technical progress in the past years, implementation of AI solutions within companies, especially within German SMEs, has been noticeably slower. In this paper, we undertake a literature review, then summarize our experiences from working with local SMEs over the past two years. We have identified many challenges that delay AI adoption, including lack of technical know-how, infrastructure, and useable data. To overcome these challenges, we build off of our experiences to detail a systematic approach for AI onboarding.
Machine learning (ML) models are increasingly used for predictive tasks, yet traditional data-based models relying on expert knowledge remain prevalent. This paper examines the enhancement of an expert model for thermomechanical fatigue (TMF) life prediction of turbine components using ML. Using explainable artificial intelligence (XAI) methods such as Permutation Feature Importance (PFI) and SHAP values, we analyzed the patterns and relationships learned by the ML models. Our findings reveal that ML models can outperform expert models, but integrating domain knowledge remains crucial. The study concludes with a proposal to further refine the expert model using insights gained from ML models, aiming for a synergistic improvement.
Abstract
Large Language Models offer a promising approach to improving phishing detection through advanced natural language processing. This paper evaluates the effectiveness of context-augmented open-source LLMs in identifying phishing emails. An approach was developed that combines the methods of Few-Shot Learning and Retrieval-Augmented Generation (RAG) to significantly improve the performance of LLMs in this area. On this basis, it has been shown that the presented approach can significantly improve the recognition rate even for smaller models.
Introduction
Phishing attacks are a major threat to cybersecurity, using evolving techniques to trick individuals into revealing sensitive information. With estimated 90% of successful cyber attacks starting with phishing, robust detection mechanisms are crucial. Large Language Models (LLMs), such as OpenAI's GPT, have revolutionised NLP by using large text corpora to perform tasks beyond text generation. This makes them suitable for detecting phishing emails. This paper presents a promising approach that combines Few-Shot Learning and Retrieval-Augmented Generation (RAG) with open source LLMs to improve the detection of phishing emails. The focus on open source has the advantage that the LLM can be operated in its own isolated network. This can significantly increase the protection of confidential email content.
Related Work
Previous studies have investigated simple classification approaches for phishing detection using LLMs. These studies showed that pre-trained models could detect phishing attempts, but focused primarily on commercial models such as GPT. However, there have also been attempts using open source networks. For example, Koide et al. achieved an accuracy of 88.61% using the open source LLM Llama2.
Methodology
A balanced dataset was created using legitimate emails from the CSDMC Spam Corpus and phishing emails from the Phishing Pot dataset, resulting in 5,800 emails equally divided between phishing and non-phishing. On this basis, several open source LLMs were evaluated, including Phi-3 3.8B (Microsoft Research), OpenChat 7B, Mixtral 8x7B, Mistral 7B, Gemma 7B (Google Deep Mind) and Llama3 (8B and 70B) from Meta AI. As a phishing detection approach, two specific prompts were designed for evaluation. The first prompt followed a persona pattern and instructed the LLM to identify phishing emails. The second prompt included a list of indicators suggesting phishing attempts. The proposed approach extends the context using Few-Shot Learning and RAG. Relevant examples are retrieved from a knowledge base and conditioned with the LLM prior to generation, thereby improving domain-specific performance without additional fine-tuning.
Experiments and Results
The performance metrics considered were precision, recall, F1 score and accuracy, with 121,800 classifications performed across different models and settings. The results demonstrate the variability in model performance of different models, which is influenced by architecture and training data. Larger models, such as Llama3 70B, consistently outperformed smaller counterparts. Prompt 2 improved recognition rates for most models, particularly those close to 50% accuracy with Prompt 1.
Conclusion
This study demonstrates that LLMs can effectively distinguish between legitimate and phishing emails. The proposed approach, combining Few-Shot Learning and RAG, significantly improves detection rates, particularly for smaller models.
Abstract. The availability of generative artificial intelligence (GenAI) tools has substantially increased, resulting in numerous positive impacts on the marketing sector. However, issues related to misinformation and deepfakes, biases and fairness, privacy, and ethical concerns, among others, have been highlighted. This research aims to examine the effects of utilizing GenAI for text, image, and audio creation in Instagram marketing. Employing the Customer Experience Tracking method, the study evaluated the differences between traditionally created and AI-generated Instagram Reels. The findings indicated that AI-generated content can garner higher levels of user attention, thereby enhancing brand interest. Negative effects such as mistrust or ethical concerns associated with AI were not substantiated in this study. These results suggest that companies can enhance their social media campaigns by integrating AI tools for content creation.
Keywords: Content Creation; CXT; GenAI; Instagram Marketing
In recent years, the integration of artificial intelligence (AI) into various business processes has attracted significant attention. This paper explores the use of generative AI, specifically Large Language Models (LLMs) such as ChatGPT, to improve the Design Thinking (DT) process in business and IT consulting. The primary focus of this paper is on the technical implementation in practice and the insights gained from the integration of AI-based chatbots to facilitate different phases of the design thinking process. The aim is to work out how such technology can inspire and streamline design thinking consultating while addressing potential challenges.
The robust generalization of models to rare, in-distribution (ID) samples drawn from the long tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the major challenges of current deep learning methods. For image classification, this manifests in the existence of adversarial attacks, the performance drops on distorted images, and a lack of generalization to concepts such as sketches. The current understanding of generalization in neural networks is very limited, but some biases that differentiate models from human vision have been identified and might be causing these limitations. Consequently, several attempts with varying success have been made to reduce these biases during training to improve generalization. We take a step back and sanity-check these attempts. Fixing the architecture to the well-established ResNet-50, we perform a large-scale study on 48 ImageNet models obtained via different training methods to understand how and if these biases - including shape bias, spectral biases, and critical bands - interact with generalization. Our extensive study results reveal that contrary to previous findings, these biases are insufficient to accurately predict the generalization of a model holistically. We provide access to all checkpoints and evaluation code at https://github.com/paulgavrikov/biases_vs_generalization
In this paper, we address the problem of segmentation of pathogens within fluorescence microscopy images. To our knowledge, the quantification from such images is an original problem.
As a consequence, there is no available database to rely upon in order to use supervised machine learning techniques. In this paper, we provide a workaround by creating realistic images containing the desired filamentary pattern and variable blur effect. Numerical results show the interest of this data augmentation technique, especially on images corresponding to a difficult segmentation.
The integration of generative artificial intelligence (AI) in sustainable process design has gained substantial traction, with AI increasingly employed to generate innovative solution ideas. However, the efficacy of these AI-generated ideas requires rigorous evaluation to ensure their usefulness, feasibility, novelty, and sustainability. This study examines the reliability of AI evaluations by comparing them with human expert assessments. Advanced generative AI models were utilized to produce design ideas and evaluate them using AI-driven metrics aligned with human evaluation criteria. Concurrently, a panel of domain experts assessed the same ideas based on predefined criteria. The comparative analysis identifies both areas of alignment and divergence between AI and human evaluations, providing valuable insights into the strengths and limitations of AI in early-stage process design. The findings highlight AI’s potential to facilitate sustainable innovation while underscoring the necessity for thorough validation of AI-generated assessments. This research advances AI evaluation methods and provides a framework for integrating AI effectively in sustainable process design.
In this paper, we propose a machine learning based approach for identifying inertia parameters of robotic systems. The method is evaluated in simulation and compared against classical methods. Therefore, parameter identification based upon a numerical optimization is implemented and tested on ground truth data. For a case study, the physical simulation of a four degree of freedom robot arm is setup, formulating the problem with Newton-Euler equations in contrast to the conventional Lagrangian formulation. Additionally, a test methodology for assessing various neural network architectures is derived.
Keywords: Inertia parameters identification, robotics, numerical optimization, Newton-Euler, neural networks
Domain-Driven Design (DDD) is a key framework for developing customer-oriented software, focusing on the precise modeling of an application's domain. Traditionally, metamodels that describe these domains are created manually by system designers, forming the basis for iterative software development. This paper explores the partial automation of metamodel generation using generative AI, particularly for producing domain-specific JSON objects. By training a model on real-world DDD project data, we demonstrate that generative AI can produce syntactically correct JSON objects based on simple prompts, offering significant potential for streamlining the design process. To address resource constraints, the AI model was fine-tuned on a consumer-grade GPU using a 4-bit quantized version of Code Llama and Low-Rank Adaptation (LoRA). Despite limited hardware, the model achieved high performance, generating accurate JSON objects with minimal post-processing. This research illustrates the viability of incorporating generative AI into the DDD process, improving efficiency and reducing resource requirements, while also laying the groundwork for further advancements in AI-driven software development.
Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data.
In this paper, we introduce the first publicly available large-scale dataset for “visual entity matching”, based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ∼786k manually annotated, high resolution product images containing ∼18k different individual retail products which are grouped into ∼3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed “visual entity matching” constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms.
Information about the dataset, evaluation code and download instructions are provided under the website: https://www.retail-786k.org/.
The paper discusses a project aimed at automating the differentiation of mold samples to ensure clean air in offices and production facilities, using deep neural networks to reduce the time and cost of manual differentiation. Two classification models, EfficientNet V2 and Normalization-Free Net (NFNet), were trained on artificially created data to identify five classes of mold plus an "other" category. The NFNet model, trained on unpadded images, achieved superior performance with an accuracy of 85.9%, precision of 83.7%, and recall of 78.9%. The semi-supervised approach employed reduced manual differentiation time by 50%, making the process more efficient and cost-effective. Grad-CAM was used for model interpretability, ensuring transparent decision-making.
Identifying and classifying similar or identical food products in e-commerce and retail is challenging due to diverse data formats, inconsistent product descriptions, and varying detail levels across sources. But precise product matching can optimize databases, enhance customer experience, enable personalized recommendations, and improve competitor price analyses. Current work primarily focuses on recognizing identical products or returning similar items. The relationship types defined in this work extend the conventional distinction between identical and similar products by providing a more nuanced categorization of the similarities between products in the food sector.
For this purpose, general properties were taken from Schema.org which were then adapted to the food sector. The defined product relationships are:
• SameAs: describes identical products.
• IsVariantOf: refers to products of the same brand and product type, but which vary in certain characteristics, such as flavor or processing method.
• IsSimilarTo: classify products of different brands of the same product type with a high degree of similarity between the products.
• Predecessor/Successor: identifies successor or predecessor products that are the result of product updates.
• IsRelatedTo: covers products with further connections such as common areas of use.
• IsConsumableFor: refers to products that serve as refill packs for other products.
For the automated determination of these relationships a multi-stage process using data from various web shops extracted by a web crawler and data from the internal ERP system of a retail company was developed and evaluated. Both text and image data were used.
The method developed to determine these product relationships include three main areas: Data preparation, blocking and classification.
1. Data preparation: This step includes normalisation of the data to ensure consistency, enrichment with a Named Entity Recognition (NER) model to identify product attributes (such as brand) from the product name and the creation of embeddings. BERT, SBERT and OpenAI embeddings for text and ResNet50 embeddings for images were tested to optimise the classification of the different relationships.
2. Blocking: To increase efficiency and limit the amount of product comparisons, a special procedure was implemented based on the classification of GPC bricks, the brand of the product and an ANN (Approximate Nearest Neighbour) approach. A trained BERT model for text classification is used to precisely determine the GPC brick codes for the product. The blocking procedure restricted the comparison set and retained 80 % of the potential product pairs.
3. Classification of the product relationships: Both an attribute-based approach and machine learning methods were used to determine the relationships SameAs, IsVariantOf and IsSimilarTo. For the other relationships, rule-based methods were used. The attribute-based approach analysed the similarities in product attributes (e.g. name, description, ingredients) with metrics like cosine similarity and was used as a baseline for the machine learning models. The Machine learning approach tested random forest models (F1 score of 0.86) and siamese neural networks (F1 score of 0.84) in different experimental settings for the classification of the product relationships.
In Germany, electricity from renewable energy sources is primarily generated from wind energy.
To support the EU's renewable energy targets, it is important to reduce wind turbine downtime and prevent damage. This can be achieved through condition monitoring and intelligent fault diagnosis, which increases the energy yield.
Supervised learning is a method for performing intelligent fault diagnosis on wind turbines.
In literature, numerous solutions based on supervised learning are available. However, these solutions typically focus solely on diagnosing faults in one particular machine, resulting in the development of a separate model for each machine. With an increasing amount of wind turbines, there's also a need for more and more specialized staff members (diagnostics) which monitor the outputs of an increasing amount of machine learning models. Furthermore, to achieve accurate diagnostic results with this approach, a large dataset is required, wherein each fault to be detected must have occurred on each machine. Such a dataset is often not available. As a result, only a general statement can be made whether a system has anomalies. Subsequently, a manual analysis is required to identify the exact defect, which increases the diagnostic teams' workload even more. An automated intelligent fault diagnosis solution is therefore needed, in order to reduce the manual effort.
To this end, we propose a supervised transfer learning framework for fault diagnosis in wind turbines.
Our database includes data from various wind farms and turbines of different types.
SCADA data are typically used for condition monitoring in wind turbines.
In general, the term SCADA stands for "Supervisory Control and Data Acquisition" and refers to the monitoring and control of technical processes using data that originates from sensors, actuators and other field devices and is sent to a control system. Among other things, process variables such as temperature, pressure and similar values are recorded. Each recorded 10-minute window is aggregated into four scalar values: minimum, maximum, standard deviation and average.
The SCADA data feature space has been transformed into an anomaly score feature space by applying several simple anomaly detection models. This provides an anomaly score for each wind turbine component. A higher anomaly score corresponds to a more severe anomaly. We use these anomaly scores as input to our metamodel, which we train for fault diagnosis.
We conducted an extensive evaluation using popular supervised classifiers like Random Forest (RF), Light-GBM and Multi-Layer Perceptron (MLP) as metamodels.
We trained our metamodel on specific wind turbines and transferred this model to similar wind turbines, i.e, the prediction quality of our trained models was measured on similar wind turbines which are part of the test data.
The best performing model is able to classify faults with a high degree of accuracy, which is a significant asset to the diagnostic team.
This paper explores the application of generative AI for systematic innovation and inventive engineering problem solving. Using an automated prompt generation approach, including iterative problem definition and multi-directional promting with numerous elementary solution principles, the study investigates the ability of AI chatbots to autonomously generate solution ideas and create and evaluate innovative concepts based on one or more partial solution ideas. Through several experiments with different Large Language Models, the results show that while generative AI can quickly generate a large number of ideas, it often overestimates the feasibility and usefulness of its solutions and tends to generate overly complex concepts. The varying performance of different AI tools throughout the innovation process provides an opportunity to form mixed AI innovation teams, where different generative chatbots can complement, monitor and correct each other as needed. Using case studies to illustrate different strategies for generating solution concepts, this paper attempts to determine the optimal level of human involvement in the AI-assisted innovation process. The currently observed discrepancy between AI-generated textual descriptions and the practical implementation of engineering solutions underscores a fundamental challenge in the current capabilities of generative AI.
Data-driven modeling of complex physical systems is receiving a growing amount of attention in the simulation and machine learning communities. Since most physical simulations are based on compute-intensive, iterative implementations of differential equation systems, a (partial) replacement with learned, 1-step inference models has the potential for significant speedups in a wide range of application areas. In this context, we present a novel benchmark for the evaluation of 1-step generative learning models in terms of speed and physical correctness.
Our Urban Sound Propagation benchmark is based on the physically complex and practically relevant, yet intuitively easy to grasp task of modeling the 2d propagation of waves from a sound source in an urban environment. We provide a dataset with 100k samples, where each sample consists of pairs of real 2d building maps drawn from OpenStreetmap, a parameterized sound source, and a simulated ground truth sound propagation for the given scene. The dataset provides four different simulation tasks with increasing complexity regarding reflection, diffraction and source variance. A first baseline evaluation of common generative U-Net, GAN and Diffusion models shows, that while these models are very well capable of modeling sound propagations in simple cases, the approximation of sub-systems represented by higher order equations systematically fails.
Abstract: Generative Artificial Intelligence, Large Language Models and Humanoid Robots are increasingly used for educational purposes as well as for the treatment of patients, put to use by the entertainment and creative industries, and applied in the retail and financial sectors. At the same time, these technologies are used for fraud, deception, exploitation, identity-theft and exploitation, and have deeper effects on the post-truth society, the privatisation of the public sphere, and the loss of individual autonomy and societal trust. This keynote will focus on three themes raised by Generative AI, discuss their societal implications and evaluate whether the current regulatory regime provides for adequate safeguards. These are the impact of Generative AI on Autonomy, on Safety and on Truth.
CV: Bart van der Sloot specializes in questions revolving around law and technology, Artificial Intelligence and Human Rights. His most recent book Regulating the Synthetic Society: Generative AI, Legal Questions, and Societal Challenges deals with the many normative, legal and societal complexities raised by Generative AI. Bart is associate professor at the Tilburg Institute for Law, Technology, and Society, Tilburg University, and is the Editor in Chief of the European Data Protection Law Review. He has a dual background in philosophy and law.
Ministerialdirektor Dr. Hans J. Reiter, Ministerium für Wissenschaft, Forschung und Kunst
Oberbürgermeister Marco Steffens, Stadt Offenburg
Rektor Prof. Dr. Stephan Trahasch
Nicole Büttner, Dr. Felix Kalkum, Prof. Bart van der Sloot, Prof. Dr. Janis Keuper