Skip to content
News from

MISSION KI presents data set search engine and automatic quality assessment for precise data searches

Copyright MISSION KI / Christian Laukemper

Berlin, September 8, 2025 – Artificial intelligence requires high-quality data in order to be trained effectively and make accurate predictions. Although large amounts of data are available, only a fraction of it is in an easily usable form. Much of the data is not recorded in a uniform manner and is available without quality descriptions, known as data profiles. This impairs the use of the data. MISSION KI aims to make it easier to find suitable data and improve data quality. As part of a project, the initiative has developed the innovative data set search engine (Daseen), which for the first time enables cross-source searches for data sets. Daseen is now available to the public as a beta version at https://www.daseen.de free of charge and without registration.

The data set search engine is based on an open-source software solution and currently has access to over 70,000 curated data sets from 29 data providers in various domains (e.g., administration, geodata, weather) from public and private data portals and repositories. The database will be continuously expanded in the coming months. The AI service provider beebucket implemented the project for MISSION KI with support from the companies eXXcellent solutions, deltaDAO, and nexyo.

Simplified data searches require data quality descriptions. The partners have developed the Extended Dataset Profile Service (EDPS) for data quality descriptions. The EDPS is a uniform method for indexing and cataloging data. With the EDPS, metadata, known as data profiles, can be created automatically for data sets. Specifically, this means that the new service gives data providers the ability to automatically catalog and curate data from different sources and make it findable and usable based on data profiles. Once the data has been described in this way, data users can find it manually or automatically across data spaces and data portals using the data profiles. The team has integrated the EDPS into Daseen, ensuring that the data quality of the available data sets is immediately visible. The combination of Daseen and the EDPS enables data users to obtain high-quality data that is precisely tailored to their needs.

The EDPS was designed to be operated locally by the data provider. Common connector solutions such as the Eclipse Data Space Connector are used for this purpose. The EDPS thus follows the compute-to-data principle: the algorithms used to create the data profiles are executed where the data is physically located – i.e., at the user's site. This ensures that the data does not have to be transferred in order to generate the desired metadata. 

The team has integrated the powerful service into established data spaces such as Mobility Data Space and Pontus-X. Furthermore, they ensured that the developed software harmonizes with existing technical solutions and standards so that it is secure, openly accessible, and can be operated in the long term. 

Data-providing and data-using companies alike will benefit from the Daseen and EDPS software solutions, which are separate but interoperable components, as will operators of data spaces and data portals. The team is now making the software solutions available as open source on Github to enable widespread reuse: https://github.com/Mission-KI/Dataset-Search-Engine.

Manfred Rauhmeier, Vorsitzender der acatech Stiftung:

“The data set search engine, in combination with the EDPS for the automatic creation of data profiles, is a significant step toward unlocking data treasures. Data becomes easier to find and its quality is made transparent during the search. This allows us to use data more effectively. We are significantly expanding the database for AI innovations, opening up a wide range of opportunities for German and European companies to develop innovative, data-driven business models.” 

Florian Mauer-Endler, managing partner at beebucket GmbH:

"We are delighted that the project has been so successful—and with a technology that did not exist in this form before. The next steps will also be exciting, as the solution developed is relevant for all data services in the European Union. Knowledge of metadata is a prerequisite for the development of sustainable digital services and is essential for legally compliant operation."

More news