The collection and integration of data coming from different sources will probably be one of the key elements of many future markets and services, and is indeed the vision buttressing all efforts being made in Cross-CPP. Still, as any analyst would tell, having data is only a necessary condition, not a sufficient one: what is also required are capabilities to analyse and manipulate those data. This realisation was the seed behind the introduction of the CPP Data Analytics Toolbox, a suit of modules designed to simplify the analysis of data, covering from basic statistical functions to complex predictive models.
Yet, could the ambition be to provide a toolbox able to solve any foreseen (and yet to be foreseen) analytics, over data whose nature will evolve and change, and satisfying the needs of services yet to be specified? This is clearly beyond the reach of any 3-years research project. Furthermore, some service providers will prefer to resort to their in-house algorithms and models, especially when these are part of their core business – to illustrate, a weather forecast company would not rely on external models to predict tomorrow’s rain. Instead, the project decided to follow a different strategy: provide basic, yet comprehensive tools that would allow service providers to fast develop prototypes and test ideas.
The Data Analytics Toolbox is based on a modular structure, with different components offering different types of analysis; yet, all of them share the same way of communicating with the user, and of retrieving data from and returning results to the system. We here start reviewing these modules, by focusing on two of them, respectively for trajectories and network analysis.
Trajectories Analysis Component.The concept of “trajectory analysis” is a very general one, encompassing many different analyses on data representing a spatio-temporal evolution. With the exception of buildings, all CPPs composing the Cross-CPP system will be expected to move, at some point of their life. With these concepts in mind, this component aims at providing a set of basic tools to simplify the handling and manipulation of this mathematical object. On one hand, this includes a set of functions to analyse trajectories in an individual fashion, i.e. without considering their interconnections. On the other hand, a second level deals with the analysis of multiple trajectories by taking into account the relationships between them, for instance to detect groups of similar trajectories, or the presence of causal relationships between them.
Network analysis.Sensors in the Cross-CPP ecosystem are organised in complex interaction structures. These structures may be physical, as for instance sensors in a car can be connected through the CAN BUS, and can therefore directly share information. Yet, such structures can also be functional, i.e. the result of the fact that sensors are embedded in a common context. To illustrate, two temperature sensors in two different cars can be yielding the same (or very similar) time series, provided the two cars travel along similar paths. From a mathematical point of view, such connectivity networks can be analysed by means of complex network theory, a statistical physics understanding of the classical graph theory. Complex networks have been used, for instance, to assess and reduce the vulnerability of the resulting communication patterns, or the optimisation of the spread of a new information in the system. This component provides several functions to both manage and analyse networks, like the extraction of metrics or the identification of groups of strongly connected objects.
The project Cross-CPP deals with cross-sectorial Cyber Physical Products – CPPs in short – such as vehicles and smart buildings. CPPs can have many sensors that are collecting information about the CPP environment and their use.
The project offers a big data marketplace as “One-Stop-Shop” to data customers who want to tap into the enormous opportunity that arises from collecting data from various cross-sectorial CPPs. But is this enough, just to collect data from CPP and use it for different applications? Can we be sure that the data coming from CPP is not influenced by other factors such as weather, the geographical location or simply the color of a car?
Have you ever heard about word context? According to Oxford dictionary, context is “The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood.” In the artificial intelligence domain, the concept of context is usually defined as the “generalization of a collection of assumptions”. For Cross-CPP, “Context can be a set of information which characterizes the situation under which sensor data are obtained (e.g. situation under which the data from temperature sensor in a car is obtained)”. Sounds difficult or? Well, let’s take a simple example to understand what context means for a vehicle. Do you know, that for modern vehicles mobile sensor networks can produce over 4000 signals per second per vehicle? This is a huge amount of data isn’t it? Now imagine if this raw sensor data comes with additional information, such as the circumstances under which the data has been collected, or the factors that can influence the sensor measurements that are being observed from vehicles? Such answers can be provided by context. The context information is an additional information that data customers get when they are looking for data collection from Big Data Marketplace. Still not quite clear?
Let’s say we have a black car equipped with an exterior temperature sensor: Wouldn’t it be great if we could retrieve data from this temperature sensor to provide it to a data customer who might build a new service making use of this data? We also know that many factors influence the value measured by the sensor: this could be the colour of car (black), the current location of the car, like altitude, the height of sensor installed in a car, what time of the day or year it is, and many other factors. All this information that is either the car’s metadata or can be measured with other sensors, from now on we will call enhanced monitored data. Furthermore, we can deduce certain situations for the temperature value based on this enhanced monitored data, which certainly defines the context of this car. Such a situation can be that the temperature value measured by the black colour car with sensor located on 20cm above ground level, was standing at mid-day in a summer day in south of France is not very reliable.
We hope that above example is clear enough for understanding concept of context involved in Cross-CPP project. We would also like to use context not only on data collection side but also on security aspect of Cross-CPP modules and usage of services, to provide the CPP user/owner with a flexible (context based) protection for his CPP information.
For Cross-CPP modules, Context will be extracted as specified by the request from the data customer or as needed by internal modules such as the Cross-CPP security module. And as we learned above, in order to extract context the extractor will use an enhanced monitored data (combination of metadata for particular CPP and raw sensor data) together with rules and defined context models.
In case you are wondering, how this is all going to be realised in scope of the project step by step, we are offering several blogs on context topic and we will make sure that you get enough insights to work with context! In the following blogs, we will explain how data customers can work with enhanced monitored and contextual data. We will explain how a context tool can extract context data and how it can be useful for the data customer to make informative decisions. Furthermore we will explain context based security for Cross-CPP modules, where we will learn how context can help to improve security for CPP owners and last but not least we will also provide insights for context related tools to give service providers like Meteologix a toolkit which they can use to improve their innovative services. All of these interesting topics will be provided as series of subsequent blogs, so …
Stay tuned 🙂
Your ATB Team and Cross-CPP consortium partners
The Cross-CPP (Cross-Cyber Physical Products) Project and its consortium partners aim to build a cross-sectorial marketplace, that offers data from various sources.
Service providers can then use this data to enhance their services and offer them for example back to the data owner, e.g. if you are driving a car and opt to share the outside temperature data of it.
We Meteologix as a meteorological service provider can use this data to enhance our own “SwissHD” forecast-system, and in turn provide you with a tailored and even better weather forecast for your car and travel.
To understand this whole process, it might be helpful to dig a little bit into the theory, how modern weather forecast is done in the first place.
Modern forecast-systems are highly complex computer programs, that consist of thousands of lines of code needed to compute a forecast for a specific location at a certain point in time with the help of algorithms,
that process vast amounts of data for these grid points around the world.
What’s a grid point then?
Imagine laying a mesh around the globe – then each node within this mesh is a grid point.
For each grid point a forecast is calculated, that takes the height and other geographical features of this specific location into account. Of course, you can also get a forecast for any other location that is not a grid point: this is achieved by interpolation between nearby grid points.
Thus, the farther away the grid points are from each other and the more coarse-meshed a weather model is, the poorer is its resolution and the more interpolation is needed and vice versa.
There are a lot of weather models on the market and they differ tremendously in resolution, the probably most famous and widely used Global-forecast-system (GFS) has a grid point only every ~22km in mid-latitudes. The use of its data is free, which is why it is the basis of a lot of (low quality) weather apps.
You can observe the problems that arise from low resolution easily in the following comparison of pictures of the terrain in Liechtenstein, that each model can “see” and differentiate with their grid point densities.
Let’s take a look how well these different model resolutions reflect the topography of Liechtenstein:
The first one is a model with grid points every 22km, then one with grid points every 13km, then a ~7km grid, and the last one is our Meteologix Swiss HD 1km model. The differences are quite obvious: the coarse-meshed models only capture two to four different terrain heights as they get averaged and smoothed out. Meaning these models do only take these few different regional features into account, when computing their forecast, which leads to very biased weather predictions. The two more fine-grained models differentiate the regional ground features much better.
Of course there is more to a weather model than just the density of the grid points, its inner logic and formulas are very important as well, but if the mesh is too broad, the underlying topography cannot be projected realistically. The same applies to forecasts of small-scale weather events, such as showers and thunderstorms where higher-resolution models can predict their evolution more accurately than coarser models. Thus, all mathematical sophistication does not help, when the weather model does not “know” for what kind of terrain it calculates the forecast for.
Hence, it is important to have a high-grained weather model to begin with in order to make reasonable forecasts, although it is also important to have as much “ground truth” as possible to enhance the model’s forecasts.
What exactly is meant by “ground truth” and how this Cross-CPP project aims to help with that, so that you as a consumer can get the best weather predictions as possible, we will explain in our next weather blog post.
Stay tuned 🙂
Your Meteologix Team and Cross-CPP consortium partners
The Cross-CPP project defines a new concept of identification services that enable users to share their identity and the identity of related entities with service providers (for example, to be able to get a cheaper vehicle insurance plan if the insurance company is allowed to monitor user’s driving behaviour) but, at the same time, let the user have a full control on the information that does not directly identify an entity (such as a geo-located temperature measurement) but could reveal user’s identity when combined with other data (for example, a regular travel from a distant place in a specific time). The following figure describes an overall schema of the system and positions it in the context of other Cross-CPP modules.
Identification services primarily interact with the CPP Cloud storage and CPP Big data marketplace and interlink the data with additional information. Service providers or, potentially, Cyber-physical products can ask for particular functions by invoking relevant services and reading results. The data access policy is managed by the Cross-CPP Security module but the policy can also specify that the only way a particular service provider could receive a data is in a privacy-aware transformed form (for example, data aggregated for relevant map tiles rather than exact GPS locations). Similarly, a rule for data filtration can employ a context (by invoking the Context awareness module) to deliver only a relevant subset of the data agreed upon a contract between the data owner (for example, a building operator or a vehicle owner) and a service provider (for example, a weather forecast service asking only for plausible measurements of the outside temperature). The implemented functionality will help Cross-CPP guarantee privacy-aware data sharing in the CPP Big Data Marketplace.
Discuss on Twitter
Discuss on Twitter @CrossCPP