Selecting the best repository to house a dataset may be straightforward, if there is already a well-established subject based repository in your discipline, or it may take some research to determine the best place for your data. Look for a research data repository with open licenses, to make your datasets more accessible (CC-0 is the least restrictive license). The repository should provide clear, persistent citations for datasets. Repositories offer a range of services to depositors (from data validation to peer review) and to users (from in-browser data exploration to visualization and analysis tools), which may also influence your choice. The Digital Scholarship and Scholarly Communications Team is happy to assist you as you select an appropriate data repository.
There are several useful tools for finding data repositories that serve your field.
The National Institutes of Health (NIH) maintains a list of generalist repositories that may be used if there is no domain-specific repository that is suitable for a particular dataset. Some of those repositories are described in the list below.
Harvard Dataverse is a repository for research data and code. “The Harvard Dataverse is open to all scientific data from all disciplines worldwide. It includes the world’s largest collection of social science research data. It is hosting data for projects, archives, researchers, journals, organizations, and institutions.”
Dryad was originally created by a group of journals and scientific societies to create a location to archive data from their publications. It is flexible about data format and assigns citable DOIs to submissions. It is also committed to long-term preservation and access. Because of its integration with partner journal workflows, it may be a good choice in cases where journals require archiving of data prior to publication.
If you would like to publish a dataset but cannot find an appropriate subject-based repository, you may want to consider using Figshare, “a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.” The research outputs you can upload to Figshare include datasets, figures, papers, posters, and video. When you publish research materials on Figshare, they receive a Digital Object Identifier (DOI), providing a persistent citation. Figshare also supports version control, so that you can update or add to a dataset without confusing other researchers who may wish to cite it.
The Inter-University Consortium for Political and Social Research (ICPSR) archives data from any source. It has the world’s largest collection of Social Science data.
Data can be deposited for free, although there is a fee for curated deposits. Using the openICPSR system, researchers can self-deposit raw data without going through the full ICPSR data review process.
For more information about ICPSR, visit this research guide.
If you are using Github to manage a project, you can easily archive dataset releases using Zenodo. Zenodo assigns a digital object identifier (DOI) to the dataset, making it easier to cite the dataset in publications.
Questions? Contact us