National identification number: Finland part2

Continuing our theme from last time. The Finnish social security number (FSSn) has the form xxxxxxyzzzq, where the check digit is q.

If you want to check if the FSSn is real the check digit is matched to the remainder of xxxxxxzzz / 31. The check digit can be numbers from 0-9 followed by letters A, B, C, D, E, F, H, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y. If the remainder is a number between 0-9 then it's matched to the numbers 0-9, if the remainder is from 10-30 it's matched to the letters in the order they are given. So for example 10 is A and 20 is M.

It makes sense to drop O, I, and Z from the alphabet because they can be confused to 0, 1 and 2, but quite often handwritten S and 5 get mixed up.

The algorithm in R would be the following
#Given x a vector of FSSn's

finID.real = function(x){ 
 #if the amount of characters of a certain FSSn is not 11
 #or some FSSn is NA we change them as ""
 if(sum(nchar(x) != 11 | is.na(x)) >= 1) FSSn[nchar(x) != 11 | is.na(x) >= 1] = ""  
 check.char = c(0,1,2,3,4,5,6,7,8,9,LETTERS) #LETTERS equals the whole alphabet
 check.char = check.char[-c(17,19,25,27,36)] #We remove the unwated letters
 last.chars = substr(x,11,11) #The check digits
 x = matrix(x,nrow=length(x)) #The standard trick so that apply works.
 x = apply(x, 1, function(x) { 
      # as.integer("5") gives out 5.
      x = as.integer(paste(substr(x,1,6),substr(x,8,10),sep="")) %% 31
      x = x+1 #We add +1 because vectors in R begin from 1 not 0.
 #the ifelse return either the check.char or NA 
 #which is then matched to the last character
 bol = last.chars == ifelse(!is.na(x), check.char[x], NA)
 if(sum(is.na(bol)) >= 1) bol[is.na(bol)] = FALSE #Chancing NAs to FALSE
  return(bol) #Returning a TRUE / FALSE vector. 

No comments:

Post a Comment