Having a consistent file naming scheme will help you keep track of all your data. It will help you avoid computational mistakes when you analyze the data. It will help you browse your data and see what is in a file folder at a glance. Finally, when you return to old data, it will help you remember what is in each file.
The most important part of creating a file naming scheme is choosing something that you can consistently follow. The second most important part is documenting it. This information can go in a plain-text README in the folder(s) where you are storing your files.
Source: File Naming Best Practices document created by Christine Malinowski, MIT Libraries
Anyone who has tried to load a file created in an obsolete software program knows the pain of unstable file formats. For example, you may have attempted to load an old document created in WordPerfect, and saved in the obsolete format .wpd. File formats can become obsolete, orphaned or subject to abandonware, when the creator of a program abandons it. Stable file formats are formats that are unlikely to suffer from these issues. Using a stable file format helps to preserve your data for yourself and other researchers who may want to use it in the future.
Stable file formats have these characteristics:
Data type |
Preferred file format examples |
Containers |
TAR, GZIP, ZIP |
Databases |
XML, CSV, SQLITE |
Geospatial |
SHP, DBF, GeoTIFF, NetCDF |
Moving images |
MOV, MPEG, AVI, MXF |
Sounds |
WAVE, AIFF, MP3, MXF |
Statistics |
ASCII, DTA, POR, SAS, SAV |
Still images |
TIFF, JPEG2000, PDF, PNG, GIF, BMP |
Tabular data |
CSV |
Text |
XML, PDF/A, HTML, ASCII, UTF-8 |
Web archive |
WARC |
In addition to this general advice, some repositories give directions to researchers on stable file formats to use, for example Dryad. Your funder or research IT may also have preferred file formats.
Some data does not have a file format that reaches the standards laid out here, and must be saved in a proprietary format. When sharing data in a proprietary format, document in your readme the name of the program (and version number, if applicable) that can be used to read the data.
Some data types, for instance GIS files, require multiple files working together to be read. In this case, make sure you supply all the files needed and document the file structure in your readme.
For questions or comments, email us at lena.g.bohman@hofstra.edu.
The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative allows NIH to explore the use of cloud environments to streamline NIH data use by partnering with commercial providers. Click here to get more information.
This site is compliant with the W3C-WAI Web Content Accessibility Guidelines
HOFSTRA UNIVERSITY Hempstead, NY 11549-1000 (516) 463-6600 © 2000-2024 Hofstra University