Skip to Main Content
De-Identifying Your Data
When a dataset is too sensitive to share in its entirety, it is necessary to consider: "how can a version that is safe to share be created?". The process of doing so involves de-identification or anonymisation to remove all data that can be used to identify individual participants in a research project, thereby protecting their privacy. This may require hiring a professional statistician, which can be written into a grant.
The following are characteristics identified by the BioMedical Informatics Coordinating Committee (BMIC) as being desirable characteristics of data repositories:
- Persistent unique identifiers
- Long-term sustainability
- Curation & quality assurance
- Maximally open access
- Tracking data re-use
- Free of charge
- Common format
Additional considerations for repositories involving human data:
- Fidelity to consent
- Restricted use compliant
- Plan for breach
- Download audit and control
- Clear use guidance
- Retention guidance
- Plan for use violations
- Request review
Source: Huerta MF. Strategic Approaches to Data Science & Open Science: Research Data Management. Presented at: Research Data Management Symposium; 2019 Dec 5; New York, NY.
These repositories are selected from the much longer list available via NIH.
Archived Clinical Research Datasets
The data from NINDS-supported clinical trials are an important scientific resource, made available to the wider scientific community, while ensuring that the confidentiality and privacy of study participants are protected. NINDS requires all investigators seeking access to data from archived NINDS-supported trials to agree to certain terms and conditions.
Gene Expression Omnibus
GEO is a public functional genomics data repository supporting MIAME-compliant data submissions.
A data archive of more than 250,000 files of research in the social and behavioral sciences.
National COVID Cohort Collaborative (N3C)
The NCATS National COVID Cohort Collaborative (N3C) Data Enclave contains harmonized clinical, laboratory and diagnostic data derived from the EHRs of more than 12 million people who were tested for COVID-19 or had related symptoms.
National Institute of Mental Health Data Archive (NDA)
The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains.
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) is a scalable and interoperable resource for the genomic scientific community, that leverages a cloud-based infrastructure for democratizing genomic data access, sharing and computing across large genomic, and genomic-related data sets.
The OpenNeuro database is a public repository of human and non-human brain imaging data collected using several different imaging techniques (MRI, PET, EEG and MEG data). No registration or license agreement is required to obtain the data, which is distributed, by default, using a Public Domain dedication. This is possible because data are anonymized before distribution to protect the confidentiality of participants
These generalist repositories are affiliated with NIH, which suggests depositing in a generalist repository if a domain specific repository cannot be found. List from NIH.
A repository of data underlying scientific and medical publications.
Figshare enables academics to upload, share, cite and importantly discover all manner of research outputs with the security of knowing our hosting options and platform support long term preservation of data.
An open research data repository, where researchers can upload and share their research data. Datasets can be shared privately amongst individuals, as well as published to share with the world.
Open Science Framework
OSF is a free, open source web application that connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research. Researchers use OSF to collaborate, document, archive, share, and register research projects, materials, and data.
Synapse is a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate developed by Sage.
Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform, focused on sharing individual participant-level data from completed clinical trials to serve the international research community.
An all-purpose open-access repository developed under the European OpenAIRE program and operated by CERN.
Using Northwell Data for Research
There are two options for Northwell employees who wish to obtain EHR data for research purposes.
- Fill out this form from Quantitative Intelligence. Under "Service Requested," select "Data Request for Research."
- Visit the Analytic Resource Center, where you can request access to various dashboards and reports on Northwell Healthcare Analytics. These may be especially useful for Quality Improvement projects.
Data Discovery Resources
Zucker School of Medicine at Hofstra/Northwell Data Catalog
A website tool to facilitate researchers’ discovery of data by providing a searchable and browsable online collection of records describing datasets generated by ZSOM, Northwell Health, and Feinstein Institute researchers.
A prototype biomedical data search engine that will allow users to discover data sets across data repositories or data aggregators.
A global registry of research data repositories that cover a wide range of academic disciplines.re3data presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions.
A community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data
Our World in Data
Our World in Data (OWID) is an open access scientific online publication that focuses on large global problems such as poverty, disease, hunger, climate change, war, existential risks, and inequality.
Choosing the Right Repository for Your Data
There are several considerations you may need to account for when choosing a repository for your data.
- Does your funding agency or publisher specify a repository to use?
- If they do not specify a repository, do they have guidelines? For instance, the NIH directs researchers to seek out a discipline specific repository.
- How large is your data? Many repositories have limits on the amount of free storage provided. Do you need to budget for storage?
- Is the data sensitive or under embargo? What controls do you need for your data?
- Who is the funder for the repository? Is it sustainable? Is it endorsed by a scholarly or professional group?
Data Management and Sharing Policies
This site is compliant with the W3C-WAI Web Content Accessibility Guidelines
HOFSTRA UNIVERSITY Hempstead, NY 11549-1000 (516) 463-6600 © 2000-2009 Hofstra University