Identify and evaluate relevant data sources
When formulating the research question, it is advisable to begin the inventory of available data sources in order to find out what data is available and relevant to the research project. A well-defined question facilitates the subsequent work of identifying and evaluating relevant data sources.
Most data holders have information on their websites regarding their data holdings and how it can be requested, including any requirements or forms that need to be used. Contacting the data holder early in the process is recommended, as they have detailed knowledge of the data source and can provide valuable information for planning.
Assess the availability of the data source
When assessing the availability of a data source, it is important to consider or check whether there are any legal or ethical requirements or restrictions on the data to be used for the research in question. For example:
- If the research requires approved ethical review, for example when processing special categories of personal data or personal data related to criminal convictions and offences.
- If the data is classified as secret and therefore requires a secrecy assessment before it can be disclosed for research purposes.
- If the data is subject to copyright, which means that its use is regulated by licences.
It is also important to find out if there are any practical or technical obstacles that may affect the availability of certain data. For example:
- If the data is spread across different local systems within the same data holder's organisation.
- If the data is only available in formats that are difficult to process.
- If it is not possible to combine the data with other data sources or researcher-generated data.
Has the data holder regulated the disclosure of data?
Legal requirements and data holder procedures may affect the ability to access data. Private entities, for example, are not covered by the principle of public access to information and are therefore not obliged to disclose data. Some data holders have set maximum limits on the working time they can allocate to process disclosure requests, which means that they do not carry out disclosures that exceed that time.
Can the project access the data it needs within the time and budget constraints?
When planning resources, time and costs, consider the following:
- The processing time for data disclosure varies depending on the complexity of the disclosure and the data holder's available resources and processes.
- If data from several data holders is to be linked or collated, the processing time will normally increase.
- The cost of processing a case varies between different data holders and is affected by the complexity of the case.
- Ethical approval will lapse if the research has not commenced within two years of the decision gaining legal force.
- Data management and making data accessible may incur costs.
- The need to involve support functions during the course of the project.
Assess the relevance and usability of the data source
In order to assess the relevance and usability of the data source for a particular project, it is important that the data is representative, relevant and usable for the research question. If necessary, it is recommended to contact the data holder to obtain answers to the following questions:
- Is the data volume sufficient to enable statistically reliable conclusions?
- How was the data collected and what is its quality?
- Are there sources of error in the reporting or non-response or missing values?
- Are there significant differences over time, such as changes in variable definitions or classifications, which could complicate longitudinal analyses?
Different data sources may sometimes contain similar variables. In order to select the most suitable variable to answer the project's question, it may be useful to take the following into account and, if necessary, consult the data holder regarding:
- whether the data sources use comparable concepts and classifications.
- whether the data sources use the same definition for the variable in question.
In order to assess the relevance and usability of the data source, data holders can in some cases assist the researcher with statistics.
Healthcare providers can also carry out feasibility counts prior to clinical research, which means assisting the researcher by providing information on how many people meet pre-specified criteria. The result is then only provided as a number or range – no personal data or lists of individuals are disclosed.
Assess what type of data is needed
What type of data will be needed and what does this mean for the project? Does personal data need to be requested or is anonymised data sufficient?
Related content
Next step in the research data cycle
Publicerat den
Uppdaterat den