The next session featured two speakers from the UK Office for National Statistics (ONS), Peter Brodie and Sarah Henry, and one speaker from Statistics Canada, Eric Rancourt.
PETER BRODIE: FOREIGN VIEW OF BENEFITS AND COSTS OF TRANSPARENCY
Peter Brodie began his presentation by providing some context, noting that he was very interested in the presentations that preceded his because issues regarding transparency are very different in the United States and the United Kingdom. The United Kingdom has the fifth largest economy in the world, and the ONS has the responsibility of producing the information that supports the public good through better statistics that provide for better decisions, including economic ones.
Brodie said that there are costs and benefits of that work. Unfortunately, the benefits are hard to measure, and the burdens on businesses for providing data and on both businesses and people to respond to surveys are not easy to assess either. As ONS increases its use of administrative data, the burden becomes increasingly hard to measure. What is the burden of collecting administrative data, and what is its value to decision making for public policies? Brodie said the office’s challenges and opportunities include the following: matching and linking data to describe the society and the economy, exploiting nonsurvey data in the age of the data revolution, remaining trusted in an era of allegations of “fake news,” and safeguarding the data while allowing it to be used as a public asset for the public good.
Brodie explained that the ONS has a unique position: as a result of its legal framework to collect data, it is in control of a tremendous public asset. Thus, it has a duty to exploit that public asset for the public good. He pointed out that ONS’s approach is different from that in the United States in part because the office is part of the UK Statistics Authority, which is itself answerable to Parliament, but it is a body that is independent of government. That is a subtle but important difference, he stressed.
Brodie explained that the UK Statistics Authority comprises three entities. One is the Government Statistical Service, a cross-government network, led by the national statistician. Although ONS produces most of the statistics for the Authority, there are other government agencies that produce their own: notably, agricultural statistics and health statistics are not under ONS’s jurisdiction. The second entity is ONS; it not only produces the majority of the nation’s official statistics, but it also has the goal of improving national statistics. The third entity is the Office for Statistical Regulation, which ensures that statistics are produced and disseminated in the public interest. It does so by assessing official statistics for compliance with the Code of Practice for Official Statistics, monitoring and reporting publicly on statistical issues, and challenging the misuse of statistics. Looking toward the future, Brodie said, he expects ONS’s organization to be quite different than it currently is. There are three layers expected to change—delivery and impact, production and flow, and capacity and capability. Delivery and impact are the most visible parts of the organization; production and flow are where all of these statistics are produced; and capability and capacity are the least visible part of the operation, encompassing how ONS is actually reacting to what is needed outside of itself.
One of the key drivers for change is how to be much more flexible in the future, Brodie said, not just to produce the statistical products ONS has always produced. What are the policy issues coming in the future? For example, something that is topical in the United Kingdom at the moment is the measurement of net migration. Brodie said ONS should be ahead of the game on what it is doing on this issue and not wait to be asked by policy makers. Another example is in the measurement of services, which could be more agile. On this topic, ONS is determined to work more closely with academics in the future. Internationally, he said, ONS realizes that collaboration with its international partners will answer the questions facing UK society. Brodie noted that with Britain’s exit from the European Union, those international relationships will change. How this will play out remains to be seen, he said, but it may lead to an increased focus on relationships with the OECD, the International Monetary Fund, and the United Nations. Eurostat and other national statistics institutes in Europe and around the globe will remain key partners and colleagues, he said.
SARAH HENRY: FOREIGN VIEW OF THE BENEFITS AND COSTS OF TRANSPARENCY
Sarah Henry began her presentation on the subject of what opportunities transparency and reproducibility will provide not only to ONS as a statistics agency but to society and the economy. The first opportunity, she said, is enhanced knowledge. ONS staff have spent some time thinking about how to use the information they have to provide the story that sits behind some of the country’s most important issues by not just describing the society and the economy, but understanding them. She said that over the past decade ONS is increasingly looking at how to describe the economy and society together; they are not two separate entities.
Henry noted that the United Kingdom is facing some type of “social recession” that has not been predicted. For many survey questions in the United Kingdom, the results show a split in society. There is a huge amount of blogging and tweeting, but the volume is no indication that people really understand what happened in terms of Brexit, for example. What the agency can do, she suggested, is to make sure that data and methods are made available to people who want to focus on explaining these important issues. Democratizing the data in a safe and responsible way is an important pillar of democracy.
Turning to products, Henry said that ONS has a wide range of statistical data products. Many of them are just descriptive statistics. However, ONS wants to be more engaging with a wider range of users, so it is improving its Website and improving how it tweets to help its users understand and interpret what the agency is producing. Although many years ago most users may have been professionally knowledgeable, the vast majority of current users are lay users of statistics.
Henry explained that another big opportunity for ONS is the data revolution. There is a huge amount of data produced, some of which is administrative data that taxpayers have paid to produce and arguably own as well. How does ONS make that available? In many cases, data are by-products of companies producing things—and those data are very important to them—but there are also data that are products in their own right. She gave an example of the difference. Mobile phone companies provide services and, as a by-product, produce very interesting data that can help explain and describe a wide range of phenomena, such as commuting patterns. In contrast, a company like Thomson Reuters, which offers intelligence, technology, and human expertise, produces data that are very important to explaining the economy and society. ONS would like to tap into those kinds of data.
One very important achievement in the past few months is the passing of the Digital Economy Act, Henry said, particularly with regard to im-
provements for data sharing. Data sharing touches on a number of areas, although statistics and research are the ones that matter most to ONS in the context of this workshop. Another very important improvement in the new act relates to public service delivery. The underlying argument, she said, is that public service agencies could stop operating in silos and gain access to data that help them improve delivery of services.
Henry said that the new legislation generally gives ONS the right to go to pretty much any organization and say it needs that organization’s data (though it is not quite as simple as that). It then uses those data to improve its data products. She noted that ONS is now attempting to replace some of its survey data with administrative or other data and is creating an entirely new set of other statistics, along with enhancing their statistical outputs.
Another opportunity and challenge for ONS is centered around trust, Henry said. This is where transparency is of the utmost importance. ONS has regulations and it also attempts to explain the statistics, as well as the processes used to produce them, as much as possible. ONS is not shy about addressing inaccurate representation when it happens.
Henry noted that the United Kingdom’s national statistician announced that the practice of prerelease access for some users would cease as of July 1, 2017. This change had been under debate for quite a while, and she said it represents the importance of transparency.
ONS also has a code of practice.1 Henry said that her sense is that the code captures quite a lot of the things that were mentioned in the previous talks. It is written in plain English to make it accessible to as many people as possible, but there are technical notes included as well. Of the eight principles, she highlighted the one concerning sound methods and assured quality. Within that principle, she indicated eight practices:
- Seek to achieve continuous improvement in statistical processes.
- Produce official statistics to a level of quality that meets users’ needs.
- Adopt quality assurance procedures.
- Inform users about the quality of statistical outputs.
- Publish quality guidelines and ensure staff are suitably trained in quality management.
- Produce official statistics according to scientific principles.
- Promote comparability within the United Kingdom and internationally.
- Produce consistent historical data where possible.
Henry acknowledged that there is tension. When one gets into the complexities, ONS tries its best to produce consistent historical data, but it is
1 See https://www.statisticsauthority.gov.uk/publication/code-of-practice [January 2018].
not easy. The treatment of time series is a good example of the difficulty: if ONS starts using different methods and different data, it will lose that continuity. What is the tradeoff between using an inferior estimate or reducing comparability? How much is an agency willing to invest in running both in parallel? Given ONS’s limited resources, doing so is not always easy.
Regarding official statistics as a public asset, Henry said, data are probably the most important manmade asset in a society. Furthermore, it is crucial to see it as a public asset, because the public paid for it. The goal is to provide as much access as possible to the raw material. Obviously, ONS cannot provide all of the raw data, since there are important questions and constraints concerning confidentiality. However, she said, ONS can probably do more than it does now in providing access to raw data. Henry explained that ONS has something called the Secure Research Service, which provides, in a very secure environment, the rawest data possible. Sometimes, they are just de-identified data. The Secure Research Service also provides samples of data from the census and surveys. Accredited researchers who explain who they are and what they want to do with the data can access the data in that environment. ONS is working on improving that even further, she said. Most researchers use the service to carry out their own research, which would have previously been impossible. In addition, details of research projects are also published to promote greater transparency, and all accredited researchers can agree to publish the results of all relevant research.
Finally, Henry said, there is the issue of methodological transparency. ONS is now building a methods library. The goal is for the methods library to be available to the public so that when people access the data that ONS has made available, they can also access the methods used to create those data. Furthermore, the methods library will include not just the methods but also the code that sits behind the methods, including such information as when it was last updated and when it will be updated again.
ERIC RANCOURT: FOREIGN VIEW OF THE BENEFITS AND COSTS OF TRANSPARENCY
Eric Rancourt said that his goal was to outline the information management strategy that they have at Statistics Canada. In addition, he wanted to point out issues related to transparency and reproducibility along the way. He said that his presentation would be a little bit Statistics Canada-centric, but he would at times present a picture of the whole Canadian government and how it functions.
Statistics Canada is very centralized. It has all types of statistical activities, including national accounts, balance of payments, censuses of population and agriculture, economics, social justice, and the like. One difference between the context of the United Kingdom and that of Canada is probably
the momentum at all levels at this time. In society and in the Parliament, there is a very favorable view of Statistics Canada. In fact, the current government, as part of its electoral platform, explicitly included statistics. The first item in the mandate letter by the prime minister to the minister responsible for Statistics Canada was an order to reinstate the mandatory long-form census and modernize the agency, and the second item was about increasing access to data not only for Statistics Canada but for the whole of government. And many ministers’ letters instructed them to make decisions based on information and to connect with Statistics Canada if data could be useful. Rancourt said these instructions set a tone of Statistics Canada’s importance to public policy.
When the new cabinet was formed in November 2015, the first decision of the new government was to reinstate the mandatory long-form census. Statistics Canada also has a relatively new chief statistician, and he has embarked on an aggressive modernization agenda to update how the agency works and how it positions itself within the national statistical system, starting within the federal system. Instead of just producing statistics on a cost-recovery basis, the goal is for Statistics Canada to be much more proactive and engaging on entering into partnerships and participating in other departments’ production cycles. The same goal applies for the provinces.
Rancourt noted that since 1953, Statistics Canada has been very survey-centric, but it is now changing the paradigm to consider administrative data first. By administrative data, he said, he means nonsurvey data, which include satellite imagery, sensor data, data from the private sector, data from telephone companies, data from credit cards, etc.
The agency will take stock of what exists administratively and then carry out surveys for what is missing, for what is not collected frequently enough, or for assisting in improving data quality. He stressed, however, that this change does not mean that Statistics Canada is going to throw away the 350 surveys that it runs. Some of them will remain because they are successful and the alternatives are not preferred. Statistics Canada has already made a big start in this direction, he said. Every year, it acquires about 13,000 administrative files from outside the agency. At some point, the intake of administrative data is going to surpass its other collection efforts.
Statistics Canada’s information management strategy is the main interest of this workshop, although Rancourt said it is a work in progress. Information management is a priority for the government of Canada, for the chief statistician, and for the Library and Archives of Canada. The vision for information is that Statistics Canada brings value to Canadians using information assets. Information is considered an asset that is digitally available, optimally collected and processed, and professionally safeguarded. He said that managing information is a legal obligation, a policy obligation, and a business imperative.
Rancourt said that in terms of what Statistics Canada is trying to achieve, one could think of its strategy as a pyramid. At the bottom is the information that Statistics Canada knows it holds. Moving up the pyramid, that information is made accessible and usable. Then that information is managed as a strategic asset. In the past, Statistics Canada did not talk about retention periods and access responsibilities, and deleting files was almost a sin. But it turns out that some files are not of business value, he said. Information management has now been gradually integrated into daily activities, which is a culture shift for staff at Statistics Canada. As a result, employees can perform their activities more effectively, and given that, Canadians are better served.
Rancourt listed six benefits of information management as follows: (1) it helps to preserve corporate memory, which enhances reproducibility; (2) it identifies, documents, and preserves corporate information assets; (3) it facilitates information access and retrieval and increases work efficiencies; (4) it improves data sharing, knowledge transfer, and the preservation of information; (5) it reduces the amount of information retained, keeping only what is of business value; and (6) it reduces the risk of information loss. He said that one major goal of this approach is that within the federal family, every dataset produced and collected could be made available to everyone else. Although that is not going to happen due to the sensitivity of some information, there is openness by design.
Looking at the policy and legal instruments for Statistics Canada, Rancourt said, there is a government-wide information management policy, the Management Accountability Framework, which is a monitoring process under the secretary of the Treasury Board to determine compliance. There are also the Statistics Act, Statistics Canada’s policy and strategy, the policy informing users of data quality and methodology, and finally, audits. In terms of the audits, he noted, it is better not to wait for them but to make information available on a voluntary basis.
Rancourt added that there are outside drivers for Statistics Canada’s information management strategy. Society is changing, and there are increased demands and different views of statistics and data needs. There is an open government plan at the same time as a lot of innovation and new technologies, and standardization of various kinds. The standardization includes international standardization, which includes the General Statistical Business Process Model and the Statistical Data and Metadata Exchange, as well as the Data Documentation Initiative. Statistics Canada is following these.
Rancourt then talked about the strengths and needs of Statistics Canada’s information management strategy. The strengths are a rich heritage of data management, a secured processing environment, a strong culture of confidentiality, a set of highly skilled and professional employees, and an
exemplary track record of an “information contract” with Canadians; they trust us with their data, he said.
The Statistics Canada information management strategy is centered around three pillars. First is the people and information. For example, the agency is embarking on a mobility approach for staff so people can work from home, which involves working with data from portable devices. Related to this, Statistics Canada is considering using the cloud for some aspects of computing. The agency will not put the census on the cloud tomorrow, but it might be able to use the cloud for storing some data, Rancourt said.
The second pillar of the strategy is comprehensiveness. It covers data (microdata, aggregate data, metadata), documents (articles, presentations, e-mail, spreadsheets), and other information (collaborative space, corporate services information). Statistics Canada allows everyone to work in a collaborative space, but information of business value covers documents that are collaboratively refined to a final version.
He noted that the third element of this pillar is still being built and it is the use of modern tools. This is not necessarily state-of-the-art software, but it is much more modern than what has been used to date. There are three systems that support this, Rancourt explained. One is the Government of Canada Documentation System (GCDOCS); the second is Picasso, a system that manages data and metadata; and the third is the Corporate Access Request System (CARS). These systems are to be fully implemented in 2019. Statistics Canada recently started implementing GCDOCS and Picasso. The agency also has other systems, including MEDOC for methodology documents, which is an internal system that Statistics Canada would like to make available to everyone. He noted that the agency also has other tools for finance and human resources.
Rancourt says that Statistics Canada is trying to distinguish between information of business value and information that is transitory. One of the things that the agency has not done well in the past is remove files that are no longer needed. Once transitory information has been used, it can be discarded. This strategy entails putting an end date to the preservation of information because otherwise a system could be overwhelmed by the proliferation of files and versions. He added that in the stewardship of information, a strong set of security practices may be necessary.
The next issue Rancourt addressed was the cost of implementing the strategy. First, there are the costs of analysts’ time. Currently at Statistics Canada, 35 people are responsible for aspects of information management. As part of that group, there is a subgroup that looks at the legal aspects and the directives and policies. Second, he said, is the cost of developing and acquiring systems, maintenance, licensing, and related activities. He noted that there is also a program effort, because if one deploys a new
system, it impacts the managers and the employees. Third, he said, there is bureaucratic creep. If one creates many layers of practices, there is a risk that the agency is adding policies and procedures without subtracting anything. So there might be some kind of assessment when there is an opportunity to simplify things. To address this issue, he said, Statistics Canada explicitly put in place a lean process to prevent having too much bureaucracy.
Rancourt then discussed progress to date on implementing Statistics Canada’s information management strategy. Many new policy instruments have been developed, the agency has modernized the library, and solid microdata and aggregated data management practices are in place. The agency also has many metadata-driven statistical programs, and it has promoted a culture of information management. And as he noted earlier, the Picasso statistical and metadata management system has been developed, CARS has been implemented for some divisions, and the GCDOCS document management tool has been prepared. As a result, Rancourt said, instead of having a file folder system for each employee, Statistics Canada will have only one corporate file folder with GCDOCS. This is all occurring while enabling a more mobile workforce.
Rancourt then looked at the path ahead. Statistics Canada will soon have completed a shift to a modern information management culture. The benefits to employees will become more and more obvious—ease of storage, ease of retrieval, no need to maintain one’s own classification, and ease of information sharing. In addition to the new systems just mentioned, the agency has strengthened information management communications aligned with a mobile workplace. He acknowledged that some units have been somewhat reluctant to go along, not fully adopting everything, which creates inconsistencies. Uniform adoption is important, he said, and for this to happen, the value of information management has to be clear to employees. He said that the resistance has come mostly from employees and less from senior management. Rancourt said that it is important to build a sound architecture before one switches systems. Software deployment is not only a switching of tools, but an evolution in information management practices. Rancourt noted that the strategy seems to be working well when there is a matrix approach. The Statistics Canada group dedicated to information management has delegates in areas that it wants to change.
The strategy faces challenges, he said. One is legacy information, that is, paper holdings. Statistics Canada has an electronic vault with 120 million e-mails from employees who are no longer at the agency that have to be dealt with, as well as some physical files. Once the new strategy is fully in place, the agency still has to get rid of or manage those data. Furthermore, there are specific requirements for regional offices and research data centers for which managing information is slightly different, mainly in
terms of different levels of access. He noted again that Statistics Canada is also considering and embracing cloud environments, and asked, What can Statistics Canada process there? How can Statistics Canada manage that with the protection of confidentiality and privacy? Using the cloud is going to be a substantial change, he said.
Rancourt closed by noting that a few months ago the government decided to renovate the privacy act. He is hopeful that it will become an act that is open and that allows Statistics Canada to play a more central role in the Canadian system.
The first question from the floor was about the architecture and goals of the new systems and the progress toward implementation. The questioner also noted that Google does not recognize Picasso, which Rancourt had described.
Henry responded that ONS is in the process of setting up a private cloud on Cloudera with a range of tools. A statistical production platform on it will draw from a data platform and the methods library. Both of those will be managed so that they have one repository with access controls. That is the new vision. For the development of this capability, ONS is using agile development, and currently there are a handful of examples on it, along with strategies for migration from existing platforms. Henry said that the UK 2021 census will exist on that production platform.
Rancourt acknowledged that Picasso is not “out there.” It was initiated 3 years ago, and a few more months are needed to complete it. It is a Statistics Canada product for its own data files. In response to a follow-up question about architecture, particularly the logical architecture, Rancourt responded that in Canada, information technology is centralized. There is Shared Services Canada, which provides service to all departments, including Statistics Canada. A few services are on the agency’s own premises. Shared Services Canada is considering having a private cloud space for processing private, public, and even community cloud computing.
Rancourt continued that traditionally their data were from survey collections, but now they have a lot of administrative data. Some of the data are already public, but some are only for internal consumption. In terms of the logical architecture, Rancourt said that his difficulty in answering is because it has changed. If he answers in terms of what now exists, it may not be what Statistics Canada wants it to be in the future. However, he said, the agency wants to go mainly with cloud computing. The problem is that right now there are two networks: one network and set of systems is internal to Statistics Canada, while the other network is available externally and is built around it. Statistics Canada is trying to change that so that all of its holdings will be mainly on the public network, with just a few on the private one.
The next question noted that both ONS and Statistics Canada seem to be moving away from the traditional outputs to the idea of outcomes. If the
outcomes are better decisions by the decision makers, who determines what those outcomes are going to be? Who decides how to measure them? And how is someone going to evaluate whether or not the resulting decisions are effective? One participant noted that it is a complicated issue because there are so many intermediate filters.
In responding, Henry first offered a slight correction. ONS is not moving away from outputs at all. Outputs are very much what it produces. What ONS is trying to do is be more efficient in creating those outputs so that they can carve out more capacity for more outputs that are more relevant to what decision makers are asking the agency to produce. She said this means more thematic outputs, more frequent outputs, etc. In terms of outcomes, the result of better decisions by users is something that they obtain input on through user engagement. ONS believes that setting priorities on the resources devoted to various issues is something that it could be doing. This priority setting includes stopping work on projects that are not being used.
Rancourt said that Statistics Canada, roughly speaking, has complete authority and freedom for its work and what is done to create content, but that is always achieved through comprehensive consultations. For surveys, Statistics Canada has the authority to decide what is to be covered. For example, Statistics Canada is responsible for economic statistics, but there is nothing specific about which variables or which topics. The Statistics Act lists topics, such as transportation, agriculture, etc. One example is cannabis. Legalization is likely coming soon, so Statistics Canada is trying to get ahead of things by planning to estimate the current sales. He said the agency can do this because it does not have to report to any ministry—it has a minister that represents the agency in the cabinet. The chief statistician is at the level of a deputy minister and has access to the meetings of the deputy ministers. Thus, Statistics Canada is trying to inform decision making as thoroughly as possible.
The next question was about Rancourt’s statement that the Statistics Canada approach is to use administrative records first. Rancourt said that Statistics Canada’s approaches are still very survey-centric. Over the years, the agency used administrative records data, like most other statistical agencies, to help build frames and to have information for more intelligent sampling, for editing, for imputation, for calibration, and for estimation. More recently, however, Statistics Canada has started replacing surveys. For example, instead of asking for revenue in the census, the agency tells respondents that it can get it from their tax forms if they provide the approval to do so. This reduces response burden. More broadly, he said, there exists a wealth of data that can help provide information on a wide variety of topics for which Statistics Canada already provides summary statistics. Credit card companies have retail information, so why does Statistics
Canada conduct a retail survey? Statistics Canada is trying to shift to using these alternative sources instead of fielding surveys. If there are some gaps in the administrative source, then the agency can use a survey to try to compensate. Or maybe it would be useful to carry out a survey for the changes over time or to assess the quality of the data.
The participant continued that the interesting scientific question is how to design surveys to validate administrative data, in contrast to using surveys to get the data and using administrative data to validate the surveys. Rancourt responded that he thinks the key point for official statistics is to remain capable of doing inference for the complete population. Private-sector organizations and others produce data based on what they find, with much less concern about coverage and bias because they have different priorities than a national statistical organization such as Statistics Canada.
The next question centered on whether ONS and Statistics Canada get much input from their stakeholders about quality criteria and about information necessary to support informed decisions. Brodie noted it was an important point because, historically, ONS’s users have not worried too much about quality. They just want a number. As part of the agency’s code of practice, he said, it always releases quality information, but how much of it is used is not clear. Education plays a role here. A current example is the issue of the measurement of migration into the United Kingdom. The agency has determined that the number has changed from 350,000 people per year to 273,000 people per year, with an error of 40,000. Given the error magnitude, the measure is not going to be very adequate given the goal, which is to tell whether immigration is increasing or decreasing. Therefore, there is a real need for education for policy makers, he said.
The next question concerned what Statistics Canada knows at present or is moving toward in terms of understanding the cost structure of collecting various data and generating various estimates. With these changes, is Statistics Canada saving money or does it have higher quality or a greater degree of granularity for its statistical products? Rancourt said that Statistics Canada started with the mindset of establishing the business case: if the agency can get the data for less than it costs to produce directly, it should do so. But the modernization agenda has shifted with the coming of the new chief statistician. The new focus is how Statistics Canada can more proactively and effectively be part of the overall data-driven decision-making system. That is, for data produced outside of the agency, how can Statistics Canada help with edit, imputation, and data quality? The agency is even entering partnerships in which others will collect data for it, and it might collect data for others. There is not always going to be a clear-cut line between the producer and the recipient in these data-sharing activities, Rancourt explained.
In terms of costs, he said, Statistics Canada is certainly trying to save money because every time it does so, it provides an opportunity to do something else toward the next priority. Thus, Statistics Canada is trying to get rid of the less relevant programs and create new ones. In response to the question about what users want, Rancourt said that the agency constantly gets requests from users, but not for accuracy: timeliness is the driver. He explained that if a department asks for a survey on a given topic and the agency agrees, telling them that the results will be available in 3 years will cause the department to go somewhere else—to a source that will likely produce data of much poorer quality than Statistics Canada would have produced.
Rancourt added that another characteristic of data quality that is not valued the same way it used to be is coherence. That is, Statistics Canada now produces statistics that are not necessarily exhaustive and/or perfectly benchmarked in all dimensions. However, he noted, some users have become more sophistical and can tolerate statistical discrepancies among different sources, which is changing the way the agency does calibration and weighting. Also, the change is giving the agency a bit more freedom to expand quality and maybe speed up some processes.
Another participant raised questions about challenges to transparency: once an agency moves to administrative data, it loses control over the data production process. Because an agency does not control the entire process, the metadata associated with acquisition and the documentation of how these things are collected is going to be a bigger challenge than if the agency had done this in-house. How does this affect ensuring transparency, down to where the data are collected and transmitted, without degradation of the information that was acquired in that way?
Henry responded that, first of all, the metadata are key. ONS is sending staff to the government departments from which it receives administrative data so that they get to know the data at least as well as those departments. What the agency is then going to do is link and match that data to the data management platform for the purpose of making the data more useful, and it will also apply such techniques as editing and imputation to create a higher-quality version. There will be a quality improvement process as well, she said. On the whole, there will be quality implications from losing control because of the scale of the data, its reach, and the fact that users of services have a vested interest in the data being accurate. Henry said that the agency is hoping that the quality will be better.
In response to a follow-up question about embedding staff for commercial providers, Henry agreed it may not be feasible. However, she said, there are things that the agency can offer. There is a tradeoff. The challenge that ONS now faces with lowered responses to surveys and that burden is that people are less willing to participate in surveys. Also, people have less
trust in the survey responses. On the whole, she said, she thinks the quality is going to be better from using the administrative data, which is why ONS wants to focus on that.
Rancourt added that using administrative data is not new. Statistics Canada has been using administrative data for many years. The agency’s registers are founded on administrative data, and they also use administrative data as auxiliary information.
The questioner clarified that the key issue of interest is the quality of the metadata that is being communicated and transparency as to how the data are collected. Rancourt responded that this lack of transparency is actually an important risk at Statistics Canada in terms of data acquisition, but there are mitigation strategies. Statistics Canada is not going to enter into an arrangement with a data producer and cancel a survey right from the start. In the negotiations, contracts, and memos of understanding, the agency includes requirements that the producer must inform Statistics Canada of any changes by providing them with updated metadata. Before a data source becomes a regular program or part of a regular program, Statistics Canada could make sure that there is stability. This is something that they are constantly trying to mitigate because that is inherently the main risk that they have with greater use of administrative and other data as they become more dependent on exterior sources.
Another participant said that it occurred to her from the recent comments that there is perhaps an assumption that the data that agencies now have are of reasonably high quality. In survey data, it is important to know how much false survey response there is. With administrative data, one might need to determine, for instance, how much underreporting there is of tax records. She said that she does not know how one records those assumptions and maybe they are not recorded. She asked whether anyone had thoughts on this issue, and another participant agreed that she had a valid point, as there is often interviewer information that one does not know about.
Another aspect that some agencies are looking to exploit a bit more is paradata. Electronic data collection has provided a lot of paradata. For instance, with paper self-completed questionnaires, one knows how long it takes people to send them back and how many times one has to call to get a response. With electronic questionnaires, one will have records of how long a person spent on each page and how long it took to answer individual questions, and that information can be linked to the number of errors made. Thus, one can actually compare some of the information from cognitive testing with information about whether someone is struggling with a response. There seems to be an assumption that agencies are starting with a gold standard for surveys, but that is only an assumption
because the measurement error is unknown. Just because data collection is changing does not mean that things are going to be worse.
Henry added that ONS is not under the illusion that the data that they are getting from administrative sources is perfect by any stretch of the imagination. ONS and other agencies could be useful in the development of methods for imputation and editing of these new data. And the more linking and matching there is, the better one is able to develop such tools.
One participant added that he thinks the field is entering a time in which practice will lead to more theory, similar to the breakthroughs in editing and imputation between 1970 and 2000. Everybody knows that more administrative data can be used. The next step is to go forward a bit more. There is a little bit of catch-up in terms of quality indicators and making sure that, on the theory side, everything meshes perfectly.
Another participant said that he does not believe anybody thinks that survey data are perfect. People have spent decades measuring data quality with validation tests, field tests, pretests, and posttests; part of the bias against administrative data comes from the fact that no one knows if they are better or worse. It is not known in part because there is no access to the instrument itself. Administrative data come from somebody else running the survey, and the survey instrument did not get pretested or get cognitive testing, etc. The users are downstream from the information flow. There is a loss of metadata. However, the data still may be better because they are collected at the source, where the verification editors are in place. There are now ways to measure different aspects of the data to assess their quality that have not been used for survey data because they were not relevant for survey data. In summary, the quality of survey data is known, but it is not known for data from other sources.
This page intentionally left blank.