The amount of data generated by millions of connected IoT sensors and devices is growing exponentially. The need to extract relevant information from this data in modern and future generation computing system necessitates efficient data handling and processing platforms that can migrate such big data from one location to other locations seamlessly and securely and can provide a way to preprocess and analyze that data before migrating to the final destination.
Various data pipeline architectures have been proposed allowing the data administrator/user to handle the data migration operation efficiently. However, the modern data pipeline architectures do not offer built-in functionalities for ensuring data veracity, which includes data accuracy, trustworthiness, and security. Furthermore, allowing the intermediate data to be processed, especially in the serverless computing environment, is becoming a cumbersome task.
In order to fill this research gap, this paper introduces an efficient and novel data pipeline architecture, named CCoDaMiC (Coherent Coordination of Data Migration and Computation), which brings both the data migration operation and its computation together into one place. This also ensures that the data delivered to the next destination/pipeline block is accurate and secure. The proposed framework is implemented in a private OpenStack environment and Apache Nifi.
- Introducing CCoDaMiC integrating both data migration and computation operations.
- CCoDaMiC framework ensures data accuracy, trustworthiness, and validation.
- Data management artifacts are aligned with an extended version of TOSCA standard.
- A component to analyze the user’s TOSCA template is introduced.
- Serverless platforms are integrated into data pipeline to process the data.
You can access the paper here.