Improving Discoverability, Usability and Governance of Priority Agency Data

Overview

To improve the discoverability, usability, and governance of these priority agency data assets agencies shall apply the documenting techniques illustrated below.

Details

The requirements outlined here are meant to be used in conjunction with the documentation for the full DCAT-US Schema v1.1 (Project Open Data Metadata Schema)

Consistent with M-20-16 Federal Agency Operational Alignment to Slow the Spread of Coronavirus COVID-19, in places where the Federal Data Strategy 2020 Action Plan calls for agencies to prioritize data assets and projects, agencies are required to include COVID-19 response data as their highest priority.

Field keyword
Required Agencies must include the keywords COVID-19 and coronavirus.
Recommended Agencies are encouraged to further include keywords that would improve discoverability.
Example {"keyword":["COVID-19","coronavirus”,“viral-testing”,“CARES-Act”,“CORD-19”,"SARS-CoV-2"]}

Data Assets to Fuel AI R&D

Consistent with Executive Order on Maintaining American Leadership in Artificial Intelligence (EO 13859), agencies are directed to improve data “inventory documentation to enable discovery and usability [in order to] prioritize improvements to access and quality of data … based on the AI research community’s user feedback.”

Field keyword
Required Agencies must include the keyword usg-artificial-intelligence.
Recommended Agencies are encouraged to further include keywords that would improve discoverability. Datasets that specifically serve as training data for machine learning applications should additionally include a keyword of usg-ai-training-data
Example {"keyword":["usg-artificial-intelligence","AI","artificial-intelligence","AI-R&D",”natural-language-processing”,“machine-learning”,“research”,“COVID-19]}
Field contactPoint
Required Agencies must include a contact person’s name and email that can discuss restrictions or controls on the dataset with interested AI researchers.
Recommended Agencies are encouraged to identify a domain expert and their contact information who can discuss the dataset with interested AI researchers.
Example See contactPoint
Field dataQuality
Required While dataQuality is generally optional for comprehensive data inventory documentation it is required for all datasets identified for the purposes of AI R&D.
Example {"dataQuality":true}
Field references
Required While references is generally optional for comprehensive data inventory documentation it is required if references, including model documentation that exist for data assets identified for the purposes of AI R&D.
Example {"references":[" https://github.com/GSA/AI-Assistant-Pilot"]}