PAR Solution Searcher

What is research data?

We follow the same definition for research data as NWO, adapted from their Data Management Plan template (September 2020):

Research data are the evidence that underpin the answer to research questions, and can be used to validate findings. Data can be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. This definition of research data does not include software (algorithms, scripts and code developed by researchers in the course of their work) and physical objects such as scientific and archaeological collections, physical arts works or biobanks; however, digital information extracted from such objects are to be regarded as research data.

What is the risk classification / confidentiality level of my research data?

The present risk classification is provisional and based on the following categorization according to the confidentiality level of the data (non-exclusive):

Low

Published research data;
Data intended for public disclosure.

Middle

Unpublished research data;
Data with access restrictions;
Low amount of non-special category personal data.

High

Medical research data;
Research with physiological and psycological data;
Vulnerable group data (disabled, low-income);
Special category of (sensitive) personal data;
Data that is subject to Non-Disclosure Agreement;
Research data that requires Data Protection Impact Assessment (DPIA)*.

* A DPIA is required whenever data processing is likely to result in a high risk to the rights and freedoms of individuals.

What are the main specificiations of the storage solutions?

The storage characteristics presented are:

Communication protocol

Sometimes called a "client/server protocol". It provides the commands for opening, reading, writing and closing files across the network. Each protocol has its own performance (latency/data transfer) level.

Disaster recovery

Strategy in place to prevent the service discontinuity in case of significant disruptive events. For example, storing data at different locations. You have risk of using last day updates (up to 24 hours).

File versioning

Possibility of access previous versions of the files.

Server-side encryption

Basic security measure in which files are saved on disk in an encrypted state. It protects files against attackers who might gain direct access to data servers (in datacenters). If an additonal security layer is required (based on your data risk category), consider client-side encryption.

What does latency mean?

Latency is the delay in response to an action, meaning that a low latency system will then have high transfer rate. Consider low latency systems if:

Data interactions demand high transaction or transfer rates;
Thousands of files are distributed amongst many (sub)folders;
Many individual files exceed 10GB.

What is considered immediate access to the compute resources?

Waiting time on the order of several minutes after submitting the job or a request to reserve the nodes.

What is the difference between interactive and batch jobs?

Interactive and batch jobs are defined as follows:

Interactive jobs

Interactive jobs enable users to interact with applications in real-time within a cluster. In these jobs, users initially request (reserve) compute nodes on the cluster and then utilize them to execute commands directly via the command line or graphical interface instead of running a predefined set of commands provided by a batch script.

For instance, you are using software that involves an interactive element, such as Matlab, COMSOL or Mathematica, which have their own interactive interpreters and graphical layers. Interactive jobs can also be useful for tasks like testing, debugging, and troubleshooting code.

Batch jobs

Batch jobs allow to run applications on compute nodes without supervision or interaction and are commonly used for computations that require a long period to complete (hours, days, or weeks). To run these jobs, users must provide a script containing all instructions to run the job in the background, allowing users to log off and perform other tasks while a job runs uninterrupted).

What does multinode mean?

Type of parallelization, used for applications capable of running in MPI.

About
Frequently Asked Questions

Research Data

What is research data?

What is the risk classification / confidentiality level of my research data?

Storage

What are the main specificiations of the storage solutions?

What does latency mean?

Compute

What is considered immediate access to the compute resources?

What is the difference between interactive and batch jobs?

What does multinode mean?

About Frequently Asked Questions

Research Data

What is research data?

What is the risk classification / confidentiality level of my research data?

Storage

What are the main specificiations of the storage solutions?

What does latency mean?

Compute

What is considered immediate access to the compute resources?

What is the difference between interactive and batch jobs?

What does multinode mean?

About
Frequently Asked Questions