This section contains explanations of common terms referenced in Project Open Data and the Open Data Policy.
An application programming interface, which is a set of definitions of the ways one piece of computer software communicates with another. It is a method of achieving abstraction, usually (but not necessarily) between higher-level and lower-level software.
Rate limiting will be part of any API platform, without some sort of usage log and analytics showing developers where they stand, the rate limits will cause nothing but frustration. Clearly show developers where they are at with daily, weekly or monthly API usage and provide proper relief valves allowing them to scale their usage properly.
Quality API documentation is the gateway to a successful API. API documentation needs to be complete, yet simple–a very difficult balance to achieve. This balance takes work and will take the work of more than one individual on an API development team to make happen.
API documentation can be written by developers of the API, but additional edits should be made by developers who were not responsible for deploying the API. As a developer, it’s easy to overlook parameters and other details that developers have made assumptions about.
Complete, functioning applications built on an API is the end goal of any API owner. Make sure and showcase all applications that are built on an API using an application showcase or directory. App showcases are a great way to showcase not just applications built by the API owner, but also showcase the successful integrations of ecosystem partners and individual developers.
Basic Auth is a way for a web browser or application to provide credentials in the form of a username and password. Because Basic Auth is integrated into HTTP protocol it is the easiest way for users to authenticate with a RESTful API.
Basic Auth is easily integrated, however if SSL is not used, the username and password are passed in plain text and can be easily intercepted on the open Internet.
A catalog is a collection of datasets or web services.
Working code samples in all the top programming languages are common place in the most successful APIs. Documentation will describe in a general way, how to use an API, but code samples will speak in the specific language of developers.
A web service that provides dynamic access to the page content of a website, includes the title, body, and body elements of individual pages. Such an API often but not always functions atop a Content Management System.
A comma separated values (CSV) file is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, and each field belongs to one table column. CSV files are often used for moving tabular data between two different computer programs (like moving between a database program and a spreadsheet program)
Catalog Service for the Web (CSW) is an API used by geospatial systems to provide metadata in open standards, including in the FGDC-endorsed ISO 19115 schema. The CSW-provided metadata can be mapped into the Project Open Data metadata schema.
A value or set of values representing a specific concept or concepts. Data become “information” when analyzed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary depending on its context. Data includes all data. It includes, but is not limited to, 1) geospatial data 2) unstructured data, 3) structured data, etc.
A hub for data discovery which provides a common location that lists and links to an organization’s datasets. Such a hub is often located at www.example.com/data.
A collection of data elements or datasets that make sense to group together. Each community of interest identifies the Data Assets specific to supporting the needs of their respective mission or business functions. Notably, a Data Asset is a deliberately abstract concept. A given Data Asset may represent an entire database consisting of multiple distinct entity classes, or may represent a single entity class.
A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extensible mark-up language (XML) file, a geospatial data file, or an image file, etc.
A hub for API discovery which provides a common location where an organization’s APIs and their associated documentation. Such a hub is often located at www.example.com/developer.
A collection of data stored according to a schema and manipulated according to the rules set out in one Data Modelling Facility.
An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format.
Error Response Code
Errors are an inevitable part of API integration, and providing not only a robust set of clear and meaningful API error response codes, but a clear listing of these codes for developers to follow and learn from is essential.
API errors are directly related to frustration during developer integration, the more friendlier and meaningful they are, the greater the chance a developer will move forward after encountering an error. Put a lot of consideration into your error responses and the documentation that educates developers.
GitHub is a social coding platform allowing developers to publicly or privately build code repositories and interact with other developers around these repositories–providing the ability to download or fork a repository, as well as contribute back, resulting in a collaborative environment for software development.
An event in which computer programmers and others in the field of software development, like graphic designers, interface designers, project managers and computational philologists, collaborate intensively on software projects. Occasionally, there is a hardware component as well. Hackathons typically last between a day and a week in length. Some hackathons are intended simply for educational or social purposes, although in many cases the goal is to create usable software. Hackathons tend to have a specific focus, which can include the programming language used, the operating system, an application, an API, the subject, or the demographic group of the programmers. In other cases, there is no restriction on the type of software being created.
Information, as defined in OMB Circular A-130, means any communication or representation of knowledge such as facts, data, or opinions in any medium or form, including textual, numerical, graphic, cartographic, narrative, or audiovisual forms.
Information Life Cycle
Information life cycle, as defined in OMB Circular A-130, means the stages through which information passes, typically characterized as creation or collection, processing, dissemination, use, storage, and disposition.
Information system, as defined in OMB Circular A-130, means a discrete set of information resources organized for the collection, processing, maintenance, transmission, and dissemination of information, in accordance with defined procedures, whether automated or manual.
Information System Life Cycle
Information system life cycle, as defined in OMB Circular A-130, means the phases through which an information system passes, typically characterized as initiation, development, operation, and termination.
JSONP or “JSON with padding” is a JSON extension wherein the name of a callback function is specified as an input argument of the underlying JSON call itself. JSONP makes use of runtime script tag injection.
Machine Readable File
Refers to information or data that is in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost.
To facilitate common understanding, a number of characteristics, or attributes, of data are defined. These characteristics of data are known as “metadata”, that is, “data that describes data.” For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, its relationship to other data, and how it should be labeled. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and access management policy. Each of the types of data described above has a corresponding set of metadata. Two of the many metadata standards are the Dublin Core Metadata Initiative (DCMI) and Department of Defense Discovery Metadata Standard (DDMS). The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails. Metadata includes data element names (such as Organization Name, Address, etc.), their definition, and their format (numeric, date, text, etc.). In contrast, data is the actual data values such as the “US Patent and Trade Office” or the “Social Security Administration” for the metadata called “Organization Name”. Metadata may include metrics about an organization’s data including its data quality (accuracy, completeness, etc.).
An open standard for authorization. It allows users to share their private resources stored on one site with another site without having to hand out their credentials, typically username and password.
Open Source Software
Computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under an open-source license that permits users to study, change, improve and at times also to distribute the software.
Open source software is very often developed in a public, collaborative manner. Open source software is the most prominent example of open source development and often compared to (technically defined) user-generated content or (legally defined) open content movements.
A standard developed or adopted by voluntary consensus standards bodies, both domestic and international. These standards include provisions requiring that owners of relevant intellectual property have agreed to make that intellectual property available on a non-discriminatory, royalty-free or reasonable royalty basis to all interested parties.
A special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. The semantics for how parameters can be declared and how the arguments get passed to the parameters of subroutines are defined by the language, but the details of how this is represented in any particular computer system depend on the calling conventions of that system
Resource Description Framework - A family of specifications for a metadata model. The RDF family of specifications is maintained by the World Wide Web Consortium (W3C). The RDF metadata model is based upon the idea of making statements about resources in the form of a subject-predicate-object expression…and is a major component in what is proposed by the W3C’s Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and utilize metadata about the vast resources of the Web, in turn enabling users to deal with those resources with greater efficiency and certainty. RDF’s simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.
A style of software architecture for distributed systems such as the World Wide Web. REST has emerged as a predominant Web service design model. REST facilitates the transaction between web servers by allowing loose coupling between different services. REST is less strongly typed than its counterpart, SOAP. The REST language is based on the use of nouns and verbs, and has an emphasis on readability. Unlike SOAP, REST does not require XML parsing and does not require a message header to and from a service provider. This ultimately uses less bandwidth.
A family of web feed formats (often dubbed Really Simple Syndication) used to publish frequently updated works — such as blog entries, news headlines, audio, and video — in a standardized format. An RSS document (which is called a “feed,” “web feed,” or “channel”) includes full or summarized text, plus metadata such as publishing dates and authorship.
An XML schema defines the structure of an XML document. An XML schema defines things such as which data elements and attributes can appear in a document; how the data elements relate to one another; whether an element is empty or can include text; which types of data are allowed for specific data elements and attributes; and what the default and fixed values are for elements and attributes. A schema is also a description of the data represented within a database. The format of the description varies but includes a table layout for a relational database or an entity-relationship diagram. It is method for specifying constraints on XML documents.
Software Development Kits (SDK) are the next step in providing code for developers, after basic code samples. SDKs are more complete code libraries that usually include authentication and production ready objects, that developers can use after they are more familiar with an API and are ready for integration.
Just like with code samples, SDKs should be provided in as many common programming languages as possible. Code samples will help developers understand an API, while SDKs will actually facilitate their integration of an API into their application. When providing SDKs, consider a software licensing that gives your developers as much flexibility as possible in their commercial products.
Expresses a software architectural concept that defines the use of services to support the requirements of software users. In a SOA environment, nodes on a network make resources available to other participants in the network as independent services that the participants access in a standardized way. Most definitions of SOA identify the use of Web services (using SOAP and WSDL) in its implementation. However, one can implement SOA using any service-based technology with loose coupling among interacting software agents.
SOAP (Simple Object Access Protocol) is a message-based protocol based on XML for accessing services on the Web. It employs XML syntax to send text commands across the Internet using HTTP. SOAP is similar in purpose to the DCOM and CORBA distributed object systems, but is more lightweight and less programming-intensive. Because of its simple exchange mechanism, SOAP can also be used to implement a messaging system.
A specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services. The overarching goal of Swagger is to enable client and documentation systems to update at the same pace as the server. The documentation of methods, parameters and models are tightly integrated into the server code, allowing APIs to always stay in sync.
Terms of Service
Terms of Service provide a legal framework for developers to operate within. They set the stage for the business development relationships that will occur within an API ecosystem. Terms of Service should protect the API owner’s company, assets and brand, but should also provide assurances for developers who are building businesses on top of an API.
A simple text format for a database table. Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. It is a form of the more general delimiter-separated values format.
Data that is more free-form, such as multimedia files, images, sound files, or unstructured text. Unstructured data does not necessarily follow any format or hierarchical sequence, nor does it follow any relational rules. Unstructured data refers to masses of (usually) computerized information which do not have a data structure which is easily readable by a machine. Examples of unstructured data may include audio, video and unstructured text such as the body of an email or word processor document. Data mining techniques are used to find patterns in, or otherwise interpret, this information. Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data – commonly appearing in e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, and Web pages (“The Problem with Unstructured Data.”)
A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.
An XML-based language (Web Services Description Language) used to describe the services a business offers and to provide a way for individuals and other businesses to access those services electronically.
Extensible Markup Language (XML) is a flexible language for creating common information formats and sharing both the format and content of data over the Internet and elsewhere. XML is a formatting language recommended by the World Wide Web Consortium (W3C).