Skip to Main Content

SOM Data Management: Overview

Data Management Glossary

Access Path: The track chosen by a database management system to collect data requested by the end-user.

Advanced Analytics: The examination of data using sophisticated tools, typically beyond those of traditional Business Intelligence, allowing for deeper insights or predictions to be made.

Administrative Data: Data that helps a data warehouse administrator manage a data warehouse. This data typically includes user profiles and warehouse history.

Aggregate Data: Data that is the end result of applying a process to combine data elements, usually taken collectively or in the form of a summary.

Analytics: The discovery of meaningful patterns in data, usually revealed by an analytics software solution.

Source: Data management Solution Review

Behavioral Analytics: A subset of Business Intelligence that focuses specifically on how and why users behave the way they do, using the data that is connected for analysis.

Big Data: Extremely large data sets that may be analyzed to reveal patterns and trends and that are typically too complex to be dealt with using traditional processing techniques.

Bulk Data Transfer: A mechanism, usually software-based, which is designed to move large data files, supporting compression, blocking and buffering in order to cut down on wait times.

Business Intelligence: A process for analyzing data and presenting actionable insights to stakeholders in order to help them make more informed business decisions.


Citizen Data Scientist: Business analysts and other personnel that may have experience working within an organization’s data architecture and using software tools to derive valuable business insights from stored data.

Cluster: A means of storing data together from multiple tables when the data contains common information that is needed for analysis.

Compliance: Conforming to a set of rules, usually established by a governing body. In terms of Data Management, compliance refers to the following of collection and usage techniques which safeguard private data, and is often used in highly-regulated industries.


Dashboard: A tool that is used to create, deploy and analyze information. Typically, a dashboard will consist of a single screen and show various reports and other metrics that the organization is studying.

Database: A collection of data that is purposefully arranged for fast and convenient search and retrieval by business applications and Business Intelligence software.

Data Blending: Provides a fast and straightforward way to extract value from multiple data sources to find patterns without the deployment of a traditional data warehouse architecture.

Data Cleansing: Transforming data in its native state to a pre-defined standardized format using vendor software.

Data Cube: A database structure with multiple dimensions which can be stacked, combined and manipulated to enable browsing.

Data Democratization: Provides users across an enterprise with access to data, allowing them to run analysis at any time to answer any question.

Data Discovery: User-driven process of searching for patterns in a data set, providing self-service and data democratization. Data Discovery has been labeled by Gartner as “modern Business Intelligence.”

Data Governance: The management of the availability, usability, integrity and security of the data stored within an enterprise.

Data Integration: The combination of technical and business processes used to combine data from disparate sources into meaningful insights.

Data Lake: A storage repository that holds a large amount of raw data in its native format until it is needed.

Data Lineage: Referred to as the data life-cycle, which includes the origins of the data and where it moves over time, describing what happens to data as it goes through diverse processes.

Data Management: The development and execution of architectures, policies and practices to manage the data life-cycle needs of an enterprise.

Data Mart: A collection of reports, metrics and other stored data on a specific subject matter. Think of this as an organization of like information, making for easier discovery.

Data Migration: The process of moving data between two or more storage systems, data formats, warehouses or servers.

Data Mining: Extracting previously unknown data from databases and using that data for important business decisions, in many cases helping to create new insights.

Data Protection: Safeguarding vital business data from corruption or loss.

Data Quality: Refers to the contextually quality of an organization’s collection of data. The more relevant, available, complete and accurate the information, the better chance profitable business insights will be created.

Data Replication: The frequent copying of data from a database to another so that all users may share the same level of information, resulting in a distributed database that allows users to access data relevant to their own specific tasks.

Data Science: A field of study involving the processes and systems used to extract insights from data in all of its forms. The pfofession is seen as a continuation of the other data analysis fields, such as statistics.

Data Staging: A temporary location where all data from outside resources are copied.

Data Warehouse: A system used for Data Analytics. They are a central location of integrated data from other more disparate sources, storing both current (real-time) and historical data which can then be used to create trends reports. In multidimensional data sets, drilling is the process of navigating among levels of data ranging from the most summarized (up) down to the most detailed (down).

Data Visualization: Transforming numerical data into a visual or pictorial context in order to assist users in better understanding what the data is telling them.

Drilling: The process of navigating through different levels of data in multidimensional sets.


Embedded Analytics: The integration of external Business Intelligence tools and capabilities into existing business software.

Enterprise Data Warehouse (EDW): A database environment created to provide a single view of an enterprise and is considered to be a reliable source of controlled information for strategic planning and decision making.

Enterprise Information System (EIS): Applications that are used for presenting and analyzing corporate data, typically used by high-level management.

Enterprise Resource Planning (ERP): This type of software allows a business or organization to manage a suite of integrated applications which are used to collect, manage and store data on a variety of business activities.

Extract, Transform, Load (ETL): A data warehousing process that involves moving data from one location to another. These three functions are combined into one to allow faster migration.


Hadoop: A programming framework that supports the processing of large data sets in a distributed computing environment.


Legacy Solution: An old  or outdated software tool.

Location Intelligence: BI feature that relates geographic contexts to business data and designed to turn data into insights for a host of business purposes.


Machine Learning: A type of artificial intelligence that provides computers with the ability to learn without being specifically programmed to do so, focusing on the development of computer applications that can teach themselves to change when exposed to new data.

Master Data Management: Incorporates processes, policies, standards, and tools that define and manage all of an organization’s critical data in order to formulate one point of reference.

Metadata: Describes other data within a database and is responsible for organization while an end-user sifts through collected data.


Online Analytical Processing (OLAP): A technology solution that is used to organize the databases of large businesses, supporting Business Intelligence.

Operational Analytics: Data Analytics that are focused on improving the internal operations of the enterprise.

Operational Data Store (ODS): A current and relevant store of data used to support tactical decision making within an organization.


Predictive Analytics: BI solutions that help the user discover patterns in large data sets in order to predict future behavior.

Prescriptive Analytics: The area of Business Intelligence dedicated to finding the best course of action for a given situation.


Real-Time Analytics: The ability to use all available enterprise data as needed and usually involves streaming data that allows users to make business decisions on the fly.

Relational Database Management System (RDBMS): A system used to store data manged in relational tables, typically organized according to the relationship between different data values.

Reporting: The collection of data from various sources and software tools for presentation to end-users in a way that is understandable and easy to analyze.

Repository: A mechanism for storing data defining a system at any point in its life-cycle.


Scalability: The ability to increase volumes of data and the number of users to the data warehouse, which is critical for the data and technical architectures of the enterprise.

Schema: The structure that defines how data inside a database is organized.

Self-Service: A BI practice that enables business users to access and work with corporate data without a background in statistical analysis.

Service Level Agreement (SLA): A contract between a service provider or vendor and the customer that defines the level of service expected. SLAs are service-based and specifically define what the customer can expect to receive.

Slice And Dice: The breaking down of large data sets into smaller portions so that they can be analyzed in different perspectives.

Software as a Service (SaaS): A software delivery model in which software is licensed on a subscription basis and is centrally hosted and typically accessed by end-users using a client via web browser.

Snapshot: View of a data set at a particular instance in time.

Structured Query Language (SQL): The accepted standard for relational database systems, covering query, data definition, data manipulation, security and additional aspects of data integrity.


Feinstein Institutes for Medical Research

Research Data Life Cycle

Thinking about best practices and developing a plan for dealing with data at every stage of the lifecycle can help you to develop  a strategy for data management.

Research Data Management Lifecycle  from USCS

NIH Data Management Plan Policy

Previously, the NIH only required grants with $500,000 per year or more in direct costs to provide a brief explanation of how and when data resulting from the grant would be shared.

The 2023 policy is entirely new. Beginning in 2023, ALL grant applications or renewals that generate Scientific Data must now include a robust and detailed plan for how you will manage and share data during the entire funded period. This includes information on data storage, access policies/procedures, preservation, metadata standards, distribution approaches, and more.  You must provide this information in a data management and sharing plan (DMSP). The DMSP is similar to what other funders call a data management plan (DMP).

In addition, to reduce burden on investigators also subject to the Genomic Data Sharing (GDS) Policy, NIH will no longer require submission of separate GDS Plans. Instead, one plan will be expected where applicants describe genomic data sharing within their DMSP.

The DMSP will be assessed by NIH Program Staff (though peer reviewers will be able to comment on the proposed data management budget). The Institute, Center, or Office (ICO)-approved plan becomes a Term and Condition of the Notice of Award.

Adapted from University of Arizona Data Management Plan Guide

If you plan to generate scientific data, you must submit a Data Management and Sharing Plan to the funding NIH ICO as part of the Budget Justification section of your application for extramural awards. 

Your plan should be two pages or fewer and must include:

  • Data Type
  • Related Tools, Software and/or Code
  • Standards
  • Data Preservation, Access, and Associated Timelines
  • Access, Distribution, or Reuse Considerations
  • Oversight of Data Management and Sharing.

Adapted from University of Arizona

Zucker School of Medicine at Hofstra/Northwell Data Catalog

Data Catalog

The data catalog will be a website tool to facilitate researchers’ discovery of data by providing a searchable and browsable online collection of records describing datasets generated by ZSOM, Northwell Health, and Feinstein Institute researchers. To access the data catalog, please click HERE.


What is data management?

Data management is an administrative process that includes acquiring, validating, storing, protecting and processing required data to ensure the accessibility, reliability and timeliness of the data for its users.


Additional Resources:

Contact Us

For questions or comments, email us at:

Hofstra University

This site is compliant with the W3C-WAI Web Content Accessibility Guidelines
HOFSTRA UNIVERSITY Hempstead, NY 11549-1000 (516) 463-6600 © 2000-2009 Hofstra University