With the open source ETL-tool hale studio, the open cloud platform hale connect and their agile harmonisation methodology, German company WeTransform is helping geospatial professionals to implement INSPIRE and other data standards. CEO and co-founder Thorsten Reitz talks about the implications of data harmonisation.
To most of us, geospatial data standards are a given. However, discussing the actual implementation of INSPIRE, ISO191xx, ECH, OGC CityGML, NEN, SOSI or XPlanung can evoke hair-raising moments among stakeholders involved.
In terms of general recognition and overall use, the European INSPIRE project might look a bit bleak in places. Set up in 2007 as an EU directive that should harmonize environmental data, there was no going back. After massive projects involving the obligatory production of metadata and semi-automated alignments, parties involved started asking themselves whether INSPIRE would actually be useful. Thorsten Reitz, co-founder and CEO of WeTransform confirms: “I know for some people it appears like it’s all waste. But on the other hand, what are we expecting? We have twelve and a half thousand organisations across Europe who are supposed to build a common infrastructure together. That is not trivial. Whatever the technology is going to be, there is going to be disagreement and frustration and what not. And there is politics.”
Data, before and after
“Before, there were information silos, either cadastral or assets like cables and pipelines. After, in INSPIRE and similar standards, everything should be connected explicitly. WeTransform is building these platforms right now. Mostly for INSPIRE, but also for other fields.” One might get the feeling that INSPIRE is a done deal, but in fact, a lot still needs to be done. “Maybe a quarter of the work is done in terms of making the data accessible, usable and useful. What people did first, was to describe their datasets in metadata. There we have probably something like 75 percent. What is still missing, is harmonised, INSPIRE compliant data. In 2018, we were at 17 percent. I do not know the Annual Report for this year, but I am assuming it would be something like 25 percent. To be honest, I do not think we will get beyond 75 percent at all.”
Any data is better than no data
When opening your data sets for third parties, people are bound to discover mistakes, errors and omissions. People might even feel shame. “That’s a feeling we often encounter”, says Reitz. “From my perspective, any data is usually better than no data. Usually, the other issue is that surveyors really take pride in getting the optimal precision out. They have parcel borders. Of course they should be accurate. But there are quite a lot of application cases where it does not matter too much whether the dataset’s precision is 10 centimetres or 25 centimetres.”
INSPIRE as a starting point
According to Thorsten Reitz, the successful implementation of INSPIRE is not where the project stops. “Certainly not. From my perspective, it’s only the starting point. If we have the same data models all across Europe to get started with, we will be able to apply the same analytical models all across Europe. And we will be able to learn much faster. For example, let’s say we need to decide on mitigation measures for environmental or climate issues. To scale, the training datasets are too small, or our models are too specific, both for Machine Learning and for classic analytical GIS work. Given common standards, we can circumvent both problems. That’s another reason why I think INSPIRE is totally worth it.”
Go significantly beyond
“Right now, if we want to work with geospatial data in Europe, or even just in Germany, I need to call a gazillion different organisations, make negotiations with them and then get their data on DVDs or USB sticks. That’s the wrong speed for how things are developing today. It is still workable, but it’s a hundred times less efficient than what we could achieve. Efforts like INSPIRE will help us go significantly beyond. Just putting some data out and hoping that somebody will see it on a geo portal is just not working.”
Information models: the main value
“The main value of INSPIRE for me is actually in the information models. We get to work with a lot of ‘dirty stuff’. There might be no documentation, and the experts that built it, they moved on five years ago. Or there was never a formal QA process of the data. And now you have a well-documented common standard. It makes such a big difference! It also helps in making all the data in the community better. Which again simplifies the whole analytical process. The actual network interfaces such as WFS 2.0, which is the current INSPIRE compliant download service, matter less. Switching to a new API is minor work. In our cloud infrastructure, it is literally just toggling a switch to enable WFS3/FeatureAPI. It’s fed from the same information model. INSPIRE and we will both add APIs and encodings to be future proof.”
Data conversion versus harmonisation
“When we combine data from different sources, it can be used like one homogeneous dataset. To make that harmonisation happen, we need to look at several things. Some are specific to geospatial data, like spatial reference systems. There are so many potentials for error in that. It’s also the matching of geometries, like on a border. There should be no overlaps, no gaps.”
Best fit, applying best practices
“Then there are aspects that you can find in every structured dataset. Take individual classification systems. In one country, you might have six levels of roads, in another you might only have five. How do you consolidate that? These are decisions that an expert will need to make at least once. And when you’ve made those decisions, there will always be a semantics discussion. For example: ‘is this road in this dataset really the same thing in that dataset or not’. We look at what is the best fit from our perspective, and applying best practices.”
A stakeholder of an INSPIRE dataset might feel, and maybe rightly so, that they maintain the best dataset or the best data model in the world. Fears may arise that a less perfect model or harmonisation process might degrade existing data. Thorsten Reitz smiles: “We’ve never had the problem where somebody said INSPIRE compliancy has downgraded the quality of their data. In fact, in most cases, the data becomes much better through the harmonization process. Even datasets that are really, really good from the outset can still benefit. Let’s take the Swiss topographical landscape model. It’s one of the best datasets that I have ever seen. Only one in a million objects will have some error. But even those would be amended after going through this process. We can report them back and they can fix them.”
“What INSPIRE stakeholders do say is that harmonization can reduce the usefulness of the data to them directly. That’s often an issue. They might have a database which is fully integrated into their workflow, and now they have a different schema in a different language or a different format, which is not necessarily what they can internally use. Often the internal usefulness of the INSPIRE data gets to be lower. But the quality in terms of the classic criteria, correctness and precision, that usually improves.”
Easy and effective
Thorsten Reitz is clear about it. “Our goal is to make data harmonisation easy and effective. Spatial information can be super-fragmented. With the implementation of INSPIRE, or any data infrastructure, this tends to be a major issue. Every organisation does its own thing. Our main activity fields are to transform spatial data through manipulation and harmonisation. We have done around 200 projects up to now. We have been working across Europe, but also in Canada and a couple of other places, but the focus is really in Europe.”
We Transform: Real-time Transformation and Validation Tools
What are the tools that WeTransform uses to do the magic with the data? “Our methodology is to make data accessible, useable and useful. For this purpose, we provide two tools. Typically, you create a data harmonisation project in Hale Studio. This can become part of an analysis-transform-publish-validate workflow on hale connect. The result is a useful and useable dataset. Different organisations can get running with it immediately.” Europe has a good installed base of the open
source ETL tool hale studio.
There are approximately 5,000 active users. Reitz remembers: “During the HUMBOLDT Project, when my co-founder Simon and I still worked at Fraunhofer IGD, we created the first version. hale studio was developed from the start on to support implementation of open standards. Working with these complex data structures and linked data should be much easier and much more interactive. It’s a real-time feedback, real-time validation ETL tool.”
“To address the whole workflow, we also deliver a cloud platform called hale connect. There the idea is to automate as much as possible of the whole data provisioning process. You can throw up a dataset, transform it to an open standard and then make it accessible through open APIs, and described by automatically generated metadata. All that you need. We now support the most common spatial APIs, and plan to add more as they develop. However, our platform is not just to provide data, it is also to make use of it. We thus also develop tools for analysis and visualisation.
There are already quite a few solutions that build on this toolset, such as the Geohazards application built by Minerva Intelligence, using hale studio and hale connect. This application just won this year’s INSPIRE Data Challenge award in Helsinki.”
Transpose to other standards
For WeTransform, INSPIRE is a core activity, but also a template. Reitz explains: “With INSPIRE, we refine our approach and our technology, but we also want to apply it to other areas. We want to do is provide standards-based data platforms”. INSPIRE mostly provides of environmental information, but there are also other areas like Transport and Logistics where a set of new standards – called TN-ITS – are being implemented. Similar developments are going on in aerospace and in the UAV industry. “The key thing is that many organisations need to move to a common standard. And that is when we can really help. Once we have implemented a couple of integrations, it becomes very easy to bring on board more organisations.”