How to Blend Your Data: BEA and BLS Harness Big Data to Gain New Insights about Foreign Direct Investment in the U.S.
A recent collaboration between the Bureau of Economic Analysis (BEA) and the Bureau of Labor Statistics (BLS) helps shed light on the segment of the American workforce employed by foreign multinational companies. This case study shows the opportunities of cross-agency data collaboration, as well as some of the challenges of using big data and administrative data in the federal government.
Bureau of Economic Analysis / Bureau of Labor Statistics
Many data users are curious about the effects of globalization on the U.S. economy. A recent collaboration between the Bureau of Economic Analysis (BEA) and the Bureau of Labor Statistics (BLS) helps shed light on the segment of the American workforce employed by foreign multinational companies.
BEA combined its wealth of survey data on foreign-owned U.S. businesses with the BLS Quarterly Census of Employment and Wages (QCEW) and Occupational Employment Statistics (OES) to uncover new insights on employment, wages, and occupations for foreign-owned companies in 2012:
Did you know that Ohio was home to the top two counties in the country in terms of employment attributed to foreign-owned companies? Foreign-owned companies employed 40 percent of workers in Union County and 34 percent in Logan County.
Or that STEM (science, technology, engineering, and mathematics) occupations make up nearly 13 percent of employment in foreign-owned companies, compared with 6 percent in domestically-owned companies?
By blending these existing data sets, BEA and BLS produced new information at the national, state, and local areas, as well as additional industry-level detail, without increasing public burden.
This case study shows the opportunities of cross-agency data collaboration, as well as some of the challenges of using big data and administrative data in the federal government.
Both BEA and BLS strive to produce data that are accurate, objective, timely, and relevant. These data are used by the private sector to drive important business decisions and by federal, state, and local governments to craft policy and regulations.
Economic development organizations, business leaders, academic researchers, and foreign investors regularly seek data on foreign direct investment that offers granular detail. As part of its regular publication process, BEA produces national and state level data that provide valuable information, but often these data users are interested in getting information for “their” specific areas, down to the county, city, or metropolitan statistical area (MSA).
At the same time, federal statistical agencies are also exploring ways to use big data1 and administrative data2 to produce or enhance statistics without increasing burden on the public, which would occur through more survey-based data collection. Therefore, BEA and BLS recognized this collaboration as an opportunity to use blended data to provide the additional granular information their users wanted.
The Direct Investment Division (DID) within BEA’s International Directorate collects data and publishes statistics on the activities— that is, financial and operating data— of foreign-owned U.S. businesses. These data are collected on annual surveys of U.S. companies with foreign owners. Currently, official U.S. statistics on foreign direct investment in the United States (FDIUS)3 and on the activities of these foreign-owned U.S. businesses are mainly available at the national level, with a few data items available at the state level.
BLS’ QCEW program publishes employment and wages data on a quarterly basis that is reported by employers covering more than 95 percent of U.S. jobs, available at the county, MSA, state, and national levels by industry. These data are the product of a federal-state cooperative program in which State Workforce Agencies (SWAs) provide BLS with administrative data on the employment and wages of workers covered by unemployment insurance legislation.
BLS’ OES program is a federal-state cooperative program between BLS and SWAs to collect data on nonfarm wage and salary workers in over 800 occupations. The OES surveys establishments selected from a list maintained by SWAs for unemployment insurance purposes.
Data from BEA’s 2012 Benchmark Survey of Foreign Direct Investment in the United States were used to identify establishments in the QCEW that were part of foreign-owned companies. These same establishments were then identified in the OES survey data for the 2011-2013 period. Special adjustments were made to the OES methodology to estimate employment and wages by occupation for establishments with foreign ownership.
Even though the team focused on a “research” set of statistics for the calendar year 2012, data users are interested in an ongoing data set with data for more recent years. Both agencies are exploring the possibility of producing a similar dataset for a more recent period.
Blending data sets can be a very labor-intensive endeavor, requiring a big upfront investment of time and resources to work on the initial data link and to fulfill legal and privacy requirements. To ensure data protection and security, BEA and BLS staff created a new inter-agency agreement for data sharing. To facilitate the linking, the team recommends developing procedures that allow for the secure access of confidential data “on-site” at participating agencies. Also, while the use of common identifiers available in both data sets facilitated the link by allowing for an initial automated match4, manual review and validation efforts may be necessary and should be factored into the production timeline. Still, although laying the groundwork for linking the data might seem daunting, once the initial link is completed, subsequent links may be less time- and labor-intensive, allowing more frequent linkages.
There were many benefits resulting from this collaboration. BEA and BLS combined already existing information to create a new data set that satisfied the needs of their data users, allowing the agencies to further their missions without the more substantial resource investment traditionally needed to produce new data products. No additional data collection efforts were necessary, and therefore public burden was not increased. In addition, information relating enterprises to establishments – a byproduct of the link – will be useful for other linking projects, while blending with administrative data allows for the improvement and validation of survey data.
This BLS-BEA collaboration provides a model for other agencies to publish linked or blended data to satisfy user demands for expanded data products, illustrating the necessary resource investments for such projects, as well as their payoff.
The Incubator Project helps federal data practitioners think through how to improve government services, enabling the public to get the most out of federal data. This Proof Point and others will highlight the many successes and challenges data innovators face every day, revealing valuable lessons learned to share with data practitioners throughout government.
Big data are data sources that typically are described by their volume (number of records or file size, usually too big to be opened on a desktop), velocity (high frequency of data generation), and variety (highly dimensional data with a large number of fields, types of data like imagery and text, or varying data structures). The main benefits of big data are (1) higher statistical power from more observations, (2) greater coverage of variable concepts, and (3) higher resolution information that enables more granular insights. ↩
Administrative data are sources of information that are collected for record keeping and operational purposes. These may include transactions, registries, or other ‘touch points.’ If the data cover a sufficiently large proportion of the population, administrative data may be viable substitutes for survey collections. ↩
BEA’s statistics on FDIUS and on the activities of foreign-owned U.S. businesses are produced from BEA surveys of such businesses, who report at a consolidated enterprise level for all their U.S. operations (i.e. one report can potentially include hundreds, or even thousands, of business establishments). This level of reporting does not allow for production of statistics at a sub-national level, except for the select data items that are collected by state. ↩
For this project, the employer identification number (EIN), a unique nine-digit number issued by the Internal Revenue Service to identify a business entity, was used as the common identifier. ↩