MSV – one of the biggest international engineering fairs is over now. As usual, it took place in the city of Brno in the Czech Republic. Cross-CPP was also present – Siemens has a booth and presented the project concept and current results to many visitors from industry, but also various other relevant stakeholders, including policy makers and academia.
The main topics for the exhibition were Industry 4.0 and the Digital Factory. The fair attracted around 80,000 visitors from all over the world. Cross-CPP data that could support the vision of Smart Cities was of particular interest and various visitors, as well as co-exhibitors were keen on sharing ideas on privacy-aware data sharing and its use in innovative future services.
Communication among cars and buildings was also a hot topic during parallel sessions. Siemens showed how to get data from a motorcycle, Škoda Auto introduced new functionalities of their smart e-vehicle, and many more exhibitors presented their solutions that could benefit from the Cross-CPP ecosystem.
The fair provided us a great opportunity to get feedback from potential customers and partners. Thank you all who stopped by and showed interest in our expertise!
Daniel Zeman, Siemens
The Cross-CPP Data Analytics Toolbox provides specialised functions for particular tasks that service providers identified as the most critical for their respective interests. On the other hand, there are also many analytics functions that cannot be simply performed by a pre-defined method and that would be generally applicable on a large set of problems. Such functions need to combine various analytics techniques, to test and evaluate alternative methods, and to keep their freedom in the selection of the best method for each particular case. The need for examining a large pool of potential methods available in a toolbox can be also motivated by the extreme pace of the research and development in advanced data analysis from big data sets.
The most successful methods for particular problems are often based on machine learning (ML) concepts that build complex analytics models from available data (annotated or plain). Data scientists do not usually implement ML-based solutions from scratch. They employ sophisticated libraries and packages to initialize and construct underlying ML structures and to train ML models. Python is used as the primary language to define the data analytics task and build and interact with advanced ML systems. (Even though the core training functionality can be implemented in a system language such as C, there are often Python bindings or specialised interfaces making advantage of the elegant syntax and expressing power of the high-level dynamic language). For example, the Python SciKit-Learn package is one of the most popular frameworks, implementing a wide range of traditional ML methods.
In accordance with many other disciplines, the field of data analytics of very large data has also recently become dominated by deep learning methods. Large companies such as Google, Facebook, Microsoft, or Baidu, as well as various research teams made available sophisticated frameworks (TensorFlow, Torch, etc.) that facilitate definition, training, and experimenting with various neural network architectures and settings. The training phase can then be accelerated using the super-fast implementation of linear algebra operations on modern GPUs. Some algorithms also benefit from the distributed nature of computation nodes in today’s data centres. Consequently, the biggest cloud providers such as Amazon, Microsoft, or Google offer special subscriptions for running ML tasks (using the above-mentioned frameworks) in their computing facilities.
To support the mentioned modern trends, the Cross-CPP Data Analytics Toolbox defines a generic interface that can potentially access any function available in the available frameworks or libraries. To demonstrate the functionality, our realisation interfaces three existing ML and data analytics libraries within the Cross-CPP project (SciKit-Learn, TensorFlow, PyTorch) and specifies protocols necessary to interlink other popular solutions. Furthermore, final project demonstrators will demonstrate applicability and scalability of the implemented solution by means of selected analytics functions operating on the data collected within the project (from cars, buildings, weather forecast, etc.).
The functionality of the ML model connector needs to distinguish between the phase of building an analytical model (for example, training a deep neural network) and the application of the model on new data. This is also reflected in the functional schema of the initial steps of the analytics model preparation shown in the following figure.
The scheme demonstrates the process of model initialization (training) from the provided data. The service invocation needs to fully specify the method to be used (for example, the implementation of the Stochastic Gradient Descent algorithm from the SciKit-Learn package), a set of parameters necessary to start the process (for example, the structure, the batch size and the learning rate for a feed-forward neural network), and the way training data can be obtained (for example, an URI corresponding to a query run in the Big Data Marketplace and joint with additional metadata).
The ‘model ready’ response to the status check indicates, that the model can be used for the analysis of new data. The invocation specifies the model ID and an URI of the data to be processed. The caller has to take care of the format of sent data which has to correspond to the data used in the model training phase. The analysis result matches the way expected results have been provided in the training data. The module also implements additional functionality enabling future updates of existing models (for example, further training of a neural network on newly acquired data), as well as simple tasks of model management (removal of previous models that are not needed any more, stopping/killing the training process in the case of an identified issue in the data etc.).
The last time, we introduced you to the fundamental building blocks of a weather model, and demonstrated how important it was to have a high-resolution, fine-grained weather model in order to be able to project underlying topography.
Another important aspect is that the quality of a weather model is highly dependent on what often gets referred to as “ground truth” .
Ground truth describes actual measurements and observations of weather stations that are used for initialization of the weather model, so it “knows” where to start from. Besides ground weather stations, other measurement methods like weather balloon soundings, radars or satellites are used to complement the initialization data set.
The more measured data the weather model has from its point of initialization, the better the forecast will be as there will be less inaccuracy to begin with. This is especially true for the first forecast hours of very high resolution models because they are able to reflect small-scale features. If the starting point of the model is already a poor guess, any forecast resulting from it will be likely even worse than that.
The density of weather observations across the world is very diverse with some countries having hundreds of weather stations and countries with regions that are almost unknown regarding measured meteorological parameters.
Cross-CPP can help with that, as it aims to provide a platform where service providers like us can buy weather data from sensors, that origin from cars, buildings or other technology. These sensors need a different handling than regular weather station data and must undergo a special plausibility check, that we develop within the project. However, because of their density, their data will still help with the model initialization process, especially in regions and areas where “ground truth” is currently rare.
Individualized weather forecasts
Another aspect of access to different data sources is that we are able to enhance our services for the data provider themselves. For example, imagine a building owners who want to automate the operation of window blinds or optimize energy usage (heating, cooling, etc.): these owners would benefit greatly from a tailored weather forecast for their buildings, that takes the special local meteorological characteristics into account and adjusts for them. Smart buildings usually own a weather station located on the roof of a building, whose data we can use to refine our forecast for that specific building after having collected at least ~1 year of measurements from this station.
Weather sensors and weather data can be derived from various sources and transformed into a variety of use cases. In our next blog, we will show you some other products we are working on like weather-based navigation and a car sensor derived precipitation map!
Thanks for reading and stay with us 🙂
Your Meteologix Team and Cross-CPP consortium partners
We all have experience of online shopping on Amazon, and probably (binge) watching videos on YouTube or Netflix. However, we hardly think about the suggestions that these kind of online platforms create for us which are completely or partially tailor-made for us.
In Data Science and AI, we can usually distinguish between two types of recommendation systems: collaborative and content based. Collaborative systems predict what individual users might like based on data gathered over other users on what they have liked, and content based systems predict what individual user liked in the past. Both of those approaches have their pros and cons, thus we come up with context related recommendation system. For example it would be great to have music recommendation systems consider the places of interest based on location and time, activity the user is performing and even on extreme side consider the mood of the listener. Here comes the value of knowledge modelling or context modelling. In Cross-CPP we have developed an ontology based algorithm where we make suggestions about data that can be relevant for our data customers when they are out looking for CPP data.
How does this work?
The first step was to develop an ontology detailing the relationships among CPP sensors and signals which models the environment of entire CPP system. We took the simple concept of binding sensor signals with relationships, dependencies and made an informative context model which has all the relationships information about the sensor signals that are hosted by Cross CPP Big data Marketplace. The second step involved using the ontology to develop an algorithm that receives the data customer’s selection of signals and uses the ontology to suggest what kind of further signals could be also selected.
The figure below shows the part of vehicle-specific ontology model which contains sensor measurement values and static information or as we call it basic CPP information gathered from vehicles. The relations we introduced primarily indicate the relationships between different sensor measurement values, which is of high importance for generating the recommendation for the data customer. As you might see two very frequently used relationships are standing out: the “affectedBy”-relationship, which indicates that a specific sensor value may be influenced by another sensor value – e.g. the indoor temperature may be influenced by the air conditioning mode. On the other side, we use the “relatedWith” – relationship to indicate that there is a correlation between two sensor values – e.g. there is a direct correlation between engine RPM and engine coolant temperature.
Figure 1 Vehicle specific context model
Based on the user’s current interest we then query this model, i.e. what kind of sensor signals should be recommended to him/her and which can additionally be chosen from and added to the shopping basket. A similar model creation process is under-construction for sensor signals collected by smart buildings.
In the next context related blog, we will give a more detailed look on the context monitoring and extraction aspects of our Context Analysis journey, so stay tuned!
Cross-CPP Early Prototype results have been presented and discussed at the AI Expo Korea 2019 in Seoul. The Brno University of Technology shared its booth with the Czech government agencies CzechInvest and CzechTrade at the AI Expo, July 17-19, 2019. Several visitors from large Korean companies (including those that make their business in the automotive and smart building sectors) have been interested in the preliminary results of the big data analytics and would follow project future development.
Siemens service scenario is related to e-mobility which is of a great business interest today. Our desired service is built on data exchange between vehicles and buildings. It provides end-users (vehicle drivers) most suitable e-chargers according to their needs and building capabilities.
The idea of this service is to send simple information about the presence of a charging station inside a building or outside (public parking lots, airports, hospitals) to the vehicle
Using real-time data in the communication with a car/building about the occupancy of e-charges in a way that the vehicle would send out its own information about its capacity of the battery. This together with its current position and speed could possibly calculate the time of arrival and to reserve an e-charger for this specific car. Within the scope of the project, such a reservation will be done manually by the vehicle driver via the application.
Besides the actual connection to the socket and battery status of the car, there is also considered the energy performance of the building. It is not possible to unlimitedly charge vehicles without considering own power consumption of the building, expected power load or input power.
Energy outcome has to be decentralized in order to sufficiently satisfy multiple users. We have to avoid the situation that on the one hand customer fully charges the vehicle and on top is blocking the socket for other users and on the other hand there is not enough energy to charge rest of the customers. Energy has to be divided accordingly and energy flow controlled systematically.
Putting together vehicle information and requirements, energy will be distributed based on a scheme chosen by the service provider (first comes, first serves – charges vs. fair distribution to those in need – 2% battery left take precedence over 80% charged, etc.)
Indication about finishing the charging will be available in the marketplace coming from vehicle based on its movement and as well there will be information from the building, that the e-charger is unplugged.
After evaluation of this other and requisite, e-charger will be again considered for next customer as available.
The future of mobility is our biggest passion. We are determined to make come alive for you and – more importantly – usable! The Cross-CPP project will make numerous new applications and services conceivable!
For example, could you imagine that your vehicle contributes to predicting the weather for others? When you are on the road with your Volkswagen, your vehicle provides various information: You are informed about outside temperature and a rain sensor detects the intensity of rainfall so that your wipers adapt accordingly. Even the electronic stabilization program (ESP) might the activated automatically. This kind of information sure come in handy for you personally, but they are even more useful when combined with other vehicle data!
In doing so, applications can create weather reports for your planned route ahead. And in case transmitted data of others indicates weather risk, the system can suggest alternatives to your prior planned route. A real milestone in autonomous driving!
Sounds convenient? Applications that result from the EU-funded Cross-CPP project could look quite similar: The project links data of various industry representatives. Besides Volkswagen, companies like Siemens participate in the funded project, providing information that exemplarily circles around the automatization of buildings or charging infrastructure.
The graphic above illustrates the corresponding process:
- Product manufacturers and additional partners generate and transmit data.
- Data is harmonized into CPP-Format.
- Data is saved and at disposal on a Cloud.
- Depending on the desired service, data is made accessible and usable.
- That way new applications and services develop, benefiting all of us.
The usage of data through different cooperation partners and the standardized format of data offer many possibilities within the range from “helpful” to “entertaining”. Locating charging infrastructure for electric vehicles and notifications for rest charging times are only a few examples for further possible applications.
This project meeting took place in Madrid in the premises of UPM (Universidad Politécnica de Madrid), June 11-12, 2019. The focus of this meeting was to align the last doubts with respect to the early prototypes software development, testing and assessment of these prototypes and discussion of the demonstration scenarios of the Ecosystem modules and cross-sectorial services.
The collection and integration of data coming from different sources will probably be one of the key elements of many future markets and services, and is indeed the vision buttressing all efforts being made in Cross-CPP. Still, as any analyst would tell, having data is only a necessary condition, not a sufficient one: what is also required are capabilities to analyse and manipulate those data. This realisation was the seed behind the introduction of the CPP Data Analytics Toolbox, a suit of modules designed to simplify the analysis of data, covering from basic statistical functions to complex predictive models.
Yet, could the ambition be to provide a toolbox able to solve any foreseen (and yet to be foreseen) analytics, over data whose nature will evolve and change, and satisfying the needs of services yet to be specified? This is clearly beyond the reach of any 3-years research project. Furthermore, some service providers will prefer to resort to their in-house algorithms and models, especially when these are part of their core business – to illustrate, a weather forecast company would not rely on external models to predict tomorrow’s rain. Instead, the project decided to follow a different strategy: provide basic, yet comprehensive tools that would allow service providers to fast develop prototypes and test ideas.
The Data Analytics Toolbox is based on a modular structure, with different components offering different types of analysis; yet, all of them share the same way of communicating with the user, and of retrieving data from and returning results to the system. We here start reviewing these modules, by focusing on two of them, respectively for trajectories and network analysis.
Trajectories Analysis Component.The concept of “trajectory analysis” is a very general one, encompassing many different analyses on data representing a spatio-temporal evolution. With the exception of buildings, all CPPs composing the Cross-CPP system will be expected to move, at some point of their life. With these concepts in mind, this component aims at providing a set of basic tools to simplify the handling and manipulation of this mathematical object. On one hand, this includes a set of functions to analyse trajectories in an individual fashion, i.e. without considering their interconnections. On the other hand, a second level deals with the analysis of multiple trajectories by taking into account the relationships between them, for instance to detect groups of similar trajectories, or the presence of causal relationships between them.
Network analysis.Sensors in the Cross-CPP ecosystem are organised in complex interaction structures. These structures may be physical, as for instance sensors in a car can be connected through the CAN BUS, and can therefore directly share information. Yet, such structures can also be functional, i.e. the result of the fact that sensors are embedded in a common context. To illustrate, two temperature sensors in two different cars can be yielding the same (or very similar) time series, provided the two cars travel along similar paths. From a mathematical point of view, such connectivity networks can be analysed by means of complex network theory, a statistical physics understanding of the classical graph theory. Complex networks have been used, for instance, to assess and reduce the vulnerability of the resulting communication patterns, or the optimisation of the spread of a new information in the system. This component provides several functions to both manage and analyse networks, like the extraction of metrics or the identification of groups of strongly connected objects.
The project Cross-CPP deals with cross-sectorial Cyber Physical Products – CPPs in short – such as vehicles and smart buildings. CPPs can have many sensors that are collecting information about the CPP environment and their use.
The project offers a big data marketplace as “One-Stop-Shop” to data customers who want to tap into the enormous opportunity that arises from collecting data from various cross-sectorial CPPs. But is this enough, just to collect data from CPP and use it for different applications? Can we be sure that the data coming from CPP is not influenced by other factors such as weather, the geographical location or simply the color of a car?
Have you ever heard about word context? According to Oxford dictionary, context is “The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood.” In the artificial intelligence domain, the concept of context is usually defined as the “generalization of a collection of assumptions”. For Cross-CPP, “Context can be a set of information which characterizes the situation under which sensor data are obtained (e.g. situation under which the data from temperature sensor in a car is obtained)”. Sounds difficult or? Well, let’s take a simple example to understand what context means for a vehicle. Do you know, that for modern vehicles mobile sensor networks can produce over 4000 signals per second per vehicle? This is a huge amount of data isn’t it? Now imagine if this raw sensor data comes with additional information, such as the circumstances under which the data has been collected, or the factors that can influence the sensor measurements that are being observed from vehicles? Such answers can be provided by context. The context information is an additional information that data customers get when they are looking for data collection from Big Data Marketplace. Still not quite clear?
Let’s say we have a black car equipped with an exterior temperature sensor: Wouldn’t it be great if we could retrieve data from this temperature sensor to provide it to a data customer who might build a new service making use of this data? We also know that many factors influence the value measured by the sensor: this could be the colour of car (black), the current location of the car, like altitude, the height of sensor installed in a car, what time of the day or year it is, and many other factors. All this information that is either the car’s metadata or can be measured with other sensors, from now on we will call enhanced monitored data. Furthermore, we can deduce certain situations for the temperature value based on this enhanced monitored data, which certainly defines the context of this car. Such a situation can be that the temperature value measured by the black colour car with sensor located on 20cm above ground level, was standing at mid-day in a summer day in south of France is not very reliable.
We hope that above example is clear enough for understanding concept of context involved in Cross-CPP project. We would also like to use context not only on data collection side but also on security aspect of Cross-CPP modules and usage of services, to provide the CPP user/owner with a flexible (context based) protection for his CPP information.
For Cross-CPP modules, Context will be extracted as specified by the request from the data customer or as needed by internal modules such as the Cross-CPP security module. And as we learned above, in order to extract context the extractor will use an enhanced monitored data (combination of metadata for particular CPP and raw sensor data) together with rules and defined context models.
In case you are wondering, how this is all going to be realised in scope of the project step by step, we are offering several blogs on context topic and we will make sure that you get enough insights to work with context! In the following blogs, we will explain how data customers can work with enhanced monitored and contextual data. We will explain how a context tool can extract context data and how it can be useful for the data customer to make informative decisions. Furthermore we will explain context based security for Cross-CPP modules, where we will learn how context can help to improve security for CPP owners and last but not least we will also provide insights for context related tools to give service providers like Meteologix a toolkit which they can use to improve their innovative services. All of these interesting topics will be provided as series of subsequent blogs, so …
Stay tuned 🙂
Your ATB Team and Cross-CPP consortium partners