Our website uses cookies to give you the best experience and for us to analyse our site usage. If you continue to use our site, we will take it you are OK about this. Click on More for information about the cookies on our site and what you can do to opt out.

We respect your Do Not Track preference.

Data matching

Figure 1 illustrates the processes involved in typical authorised information matching programmes, and some of the safeguards applied to ensure fairness and data quality.

The process starts with two databases, one at the source agency and the other at the user agency (though in more complex programmes there may be more databases or agencies involved). Records, typically only those relating to people who have been involved in a recent transaction or activity, such as leaving the country, are selected from the source agency database. Certain information is extracted from the records that have been selected. For example, the agency may have 20 items of data relating to individuals who have left the country but only five of these may need to be extracted for the programme.

The extracted information is sent by one agency to the other for matching. Sometimes an outsourced computer bureau performs this function on the user agency's behalf. The matching is an automated process that compares the lists of data. The information being matched is kept physically separate from operational records until checking processes are complete. It is important that unverified information is not added to an individual's file until it is confirmed that the data do indeed relate to that individual and are accurate and relevant.

An algorithm is developed and used to establish what constitutes a successful match or 'hit'. For example, the algorithm may establish as a match, cases where the full name, date of birth and address are all the same. The algorithm may also allow for the identification of 'likely' matches even when all data do not exactly correspond (e.g. where the surname and date of birth are the same even though the first name differs). The process will normally produce pairs of records which are judged likely to relate to the same person, but that cannot be said to be certain. The algorithm to be used requires careful thought and practical trialling before implementation; too 'tight' an algorithm will miss many matches of records which are actually about the same individual, and too 'loose' an algorithm will pair an unacceptably high proportion of records which are really about different individuals.

Once the match has been run, information that did not produce any pairs of records (hits) must be destroyed. The matching process normally produces a list of raw hits that are put through confirmation procedures. This may involve a manual check of the original records held by the user agency. The confirmation procedures may reveal some mismatches, which are then also destroyed.

If the resultant checked hits are to be used as a basis for taking action against individuals, they should be acted upon in a timely fashion. The Act sets maximum time limits. The information must not be allowed to become out of date since this may prejudice the individuals concerned. However, the confirmation procedures do not necessarily verify that the two records are about the same person. As was mentioned earlier, unverified information derived from matching must not be added to administrative files.

It is not advisable to act on the basis of an apparent discrepancy produced by a match, even with some in-house checking completed. In fairness, the information should be verified by being shown to the individual concerned before action is taken. This allows an opportunity for the data to be challenged. People should not be 'presumed guilty' solely on the basis of inferences drawn from a matching process. Notice is an especially important safeguard where the matching process might have wrongly associated records relating to different individuals.

If a government agency intends to take adverse action based on a discrepancy revealed by a data matching programme, the user agency must first serve written notice on the individual under s.103 of the Privacy Act giving details of the discrepancy and the proposed adverse action, and allowing the individual five working days from receipt of the notice to show reason why such action should not be taken.

Figure 1: Typical Information Matching Process