What is the Sandbox?
One of the COVID-X programme’s main results is the Sandbox…But, what exactly is the Sandbox? What it is not is a box made of sand.
The official definition of the Sandbox is:
A data-centric ecosystem of core enabling services that collectively:
- Enables seamless, unified and interoperable access to annotated and catalogued health and other data from heterogeneous sources: both historical data from health care providers or open data (where needed);
- Provides data/metadata management, harmonization, storage, querying and retrieval, access, visualization and federated learning services;
- Coupled with advanced security services enabling RBAC, secure encrypted communications; and
- an API Gateway for both data and software tools integration from third parties.
Within COVID-X, three technical partners (Netcompany-Intrasoft, Universidad Politécnica de Madrid and Eight Bells) helped the Game Changers understand the Sandbox, and how it works to ingest and deal with their data.
How does it work?
Let’s start with some basics:
- RBAC: Role-based access control. It allows the creation of different roles with different privileges so that the data is safely stored and accessed only by those who own it. (Security)
- ELASTICSEARCH: free, open, and distributed search and analytics engine for all data types, including textual, numeric, geospatial, structured, and unstructured. (Database)
- LOGSTASH: lightweight, open-source, server-side data processing pipeline that allows collecting data from various sources, transforming it on the fly, and sending it to your desired destination. (Harmonization).
- KIBANA: Source-available data visualization dashboard software for Elasticsearch, whose free and open source successor in OpenSearch is OpenSearch Dashboards. (Visualization).
- ELK STACK: It’s made up of Elasticsearch, Kibana and Logstash (also known as the ELK Stack). This solution reliably and securely takes data from any source and format, then stores, searches, analyzes, and visualizes it in real-time.
This diagram shows the components:
The actors could be different profiles of people with different roles: data providers, data processors and the third party software itself. RBAC allows creating different roles: each role has specific and well-defined privileges; each user is assigned to a specific role; therefore, each user has specific privileges to access the Sandbox.
Data can be ingested with a variety of security barriers.
The data ingested needs to be harmonized; then, hospitals can consult it rawly or through Kibana, which allows data visualization (for example, hospitals can graphically check the information).
Then, we have the third party software integrated by the Solution Providers, which is an actor with self-identity and can only access some parts of the data to implement the desired functionalities. An important use case is to get a model trained remotely (without any kind of human interaction concerning stored data), by allowing the algorithm to explore a specific subset of the records stored within the sandbox.
Jenkins is a barrier that introduces the software into the Sandbox.
Why is the Sandbox important?
Significant clinical information allows clinicians and staff to create educated choices to improve care quality. All European healthcare frameworks collect and store clinical information. Still, shockingly, most of this healthcare information is sitting around in data silos, in unstructured formats and usually not easily accessible to interested communities. Hence, medical data is primarily unused within the arrangement of care but regularly holds essential information and insights that contribute to cultivating superior care for all patients.
COVID-X envisions the provision of the COVID-X Sandbox and its data-driven services, as the enabling core data-driven platform for aggregating, curating, structuring, cataloguing and providing seamless interoperable access to anonymized health data from retrospective data sets of Clinical partners, and, when needed, to data from open data sources, as well as to streaming data from connected medical devices/apps.
These services complement security, visualization and data analytics/federated validation services.
The purpose of the COVID-X Sandbox is to provide core services to Clinical partners/data providers to ingest data into the Sandbox and to Solution Providers to integrate their solutions with the Sandbox to consume and manage data.
Thanks to the COVID-X Sandbox, hospitals save time and work when it comes to dealing with patient data and predicting patients’ possibility of relapse or disease development.
To know more about the Sandbox, please access those documents: First Sandbox implementation ,Sandbox Design and Datalake Creation & Ingestion, Sandbox services descriptions and Frequently Asked Questions.