Information matching generally involves the comparison of one set of records with another, to find records in both sets of data that belong to the same person. When it is done by a computer, it is known as data matching. An example is the comparison of a list of people receiving a monetary benefit with a list of people who have been imprisoned. In some programmes it is the absence of a person in one set of records that is of interest. The process is commonly used to detect fraud in public assistance programmes or to trace people wanted by the State. Less frequently, the technique is used to assist individuals (e.g. to identify someone who has not claimed an entitlement).
Information matching is perceived to have negative effects on privacy by:
Unchecked, data matching would seriously undermine public trust in government. To address the risks, the Privacy Act regulates the practice of data matching in the public sector. It does this through controls directed at:
Figure 1 illustrates the processes involved in typical authorised information matching programmes, and some of the safeguards applied to ensure fairness and data quality.
The process starts with two databases, one at the source agency and the other at the user agency (though in more complex programmes there may be more databases or agencies involved). Records, typically only those relating to people who have been involved in a recent transaction or activity, such as leaving the country, are selected from the source agency database. Certain information is extracted from the records that have been selected. For example, the agency may have 20 items of data relating to individuals who have left the country but only five of these may need to be extracted for the programme.
The extracted information is sent by one agency to the other for matching. Sometimes an outsourced computer bureau performs this function on the user agency's behalf. The matching is an automated process that compares the lists of data. The information being matched is kept physically separate from operational records until checking processes are complete. It is important that unverified information is not added to an individual's file until it is confirmed that the data do indeed relate to that individual and are accurate and relevant.
An algorithm is developed and used to establish what constitutes a successful match or 'hit'. For example, the algorithm may establish as a match, cases where the full name, date of birth and address are all the same. The algorithm may also allow for the identification of 'likely' matches even when all data do not exactly correspond (e.g. where the surname and date of birth are the same even though the first name differs). The process will normally produce pairs of records which are judged likely to relate to the same person, but that cannot be said to be certain. The algorithm to be used requires careful thought and practical trialling before implementation; too 'tight' an algorithm will miss many matches of records which are actually about the same individual, and too 'loose' an algorithm will pair an unacceptably high proportion of records which are really about different individuals.
Once the match has been run, information that did not produce any pairs of records (hits) must be destroyed. The matching process normally produces a list of raw hits that are put through confirmation procedures. This may involve a manual check of the original records held by the user agency. The confirmation procedures may reveal some mismatches, which are then also destroyed.
If the resultant checked hits are to be used as a basis for taking action against individuals, they should be acted upon in a timely fashion. The Act sets maximum time limits. The information must not be allowed to become out of date since this may prejudice the individuals concerned. However, the confirmation procedures do not necessarily verify that the two records are about the same person. As was mentioned earlier, unverified information derived from matching must not be added to administrative files.
It is not advisable to act on the basis of an apparent discrepancy produced by a match, even with some in-house checking completed. In fairness, the information should be verified by being shown to the individual concerned before action is taken. This allows an opportunity for the data to be challenged. People should not be 'presumed guilty' solely on the basis of inferences drawn from a matching process. Notice is an especially important safeguard where the matching process might have wrongly associated records relating to different individuals.
If a government agency intends to take adverse action based on a discrepancy revealed by a data matching programme, the user agency must first serve written notice on the individual under s.103 of the Privacy Act giving details of the discrepancy and the proposed adverse action, and allowing the individual five working days from receipt of the notice to show reason why such action should not be taken.
The information matching rules prohibit the use of on-line computer connections for transferring information unless approval is obtained from the Privacy Commissioner.
Agencies continue to adopt new forms of online transfer as their preferred method of conducting their programmes. Just over half of all operating programmes use on-line transfers.
This diagram shows the flow of information between agencies involved in information matching. Details of each programme (by number) can be found in the programme-by-programme reports of the 2011/12 Annual Report.
The following graph shows the total number of operating (active) programmes, the number of programmes that have been authorised but that we know are not expected to operate (inoperative), and an estimate of the number of programmes that have been authorised and that agencies intend to implement (expected) as their resources permit. It does not include programmes that we know agencies are considering but for which there is not yet any authorising legislation.