by

*Lin Yangchen*

Laboratory of Computational Philately

Coconut Academy of Sciences

A fascinating thing about perfins is the mathematical and typographical arrangement of the holes, but this area has seen very little research. Mustacich (2015b,c) developed computer programs for characterizing stamp perforations, but these pertained to holes in straight lines while perfins are two-dimensional.

I present a computational workflow (download open-source R code) that automatically detects and measures perfin holes from scans or photographs. I also detail mathematical formulae to quantify higher-order perfin structure and typography, and an algorithm to automatically align and compare the holes of two samples of a perfin. This work was originally published in

*The Perfins Bulletin*(Lin 2021a).

As a case study I used a selection of coconut definitives bearing eight different perfins. As far as possible I chose stamps with a complete set of cleanly punched holes to minimize measurement errors.

**Perfin imaging**

Stamps were placed face-down on a black background to maximize perfin contrast and flattened with an optical-grade quartz plate. They were photographed using a camera system developed by the author.

Where necessary, edges of stamps were cropped out from the images and they were flipped and/or rotated to make the perfins read the proper way. After initial processing, the images were saved in lossless Portable Network Graphics format for computational analysis. Everything from this point on can be executed automatically with a single command.

**Hole detection**

Pixels darker than a specified threshold are recognized as holes. For maximal accuracy, my algorithm considers not only the brightness of each pixel but also the colour. The background is black in this study and most philatelic situations, so the computer looks for pixels that are very dark in all three RGB channels.

One should sample some pixels from the image to determine the best threshold values to apply. This can be done during the initial processing above. I used a threshold of 50 out of 255 for each channel.

If there are stray fibres criss-crossing the holes, a hole may be detected as multiple objects or a weird-shaped object. This problem can be mitigated by opening the original image in raster editing software and using the lasso or brush tool to paint over the fibres with black. This procedure is not recommended if many of the holes in a perfin are affected, as too much “cleaning” may make the subsequent measurements of overall hole characteristics inaccurate.

**Hole position and diameter**

The positions and sizes of the holes are measured using the computeFeatures.moment and computeFeatures.shape functions in the EBImage library in R. For each hole, the algorithm measures the radius at different angles and calculates an average for that hole. There may be stray specks in the image that the computer mistakenly thought were holes. These are now automatically identified and deleted as they are much smaller than the actual holes.

The characteristic hole diameter of a given perfin is estimated as the most frequently occurring diameter in the distribution. This is analogous to the statistical mode; it does not matter whether the distribution is skewed or not. Smaller peaks are usually from imperfect holes. This example is from the CBI (Kuala Lumpur) perfin.

**Perfin typography**

How does one distill the design of perfin characters into a simple number that can be calculated from just the hole positions and sizes? I wanted to avoid a “phenological” formula that explicitly encodes the characters (e.g. “C”, “B”, “I”). This would make it too complicated and less comparable across perfins.

I propose three easily calculated metrics of perfin typography: stroke clarity, hole congestion, and perfin readability. They are described below.

**Stroke clarity**

The perceived sharpness or clarity of the strokes in a dotted character depends on three variables:

1. Characteristic hole diameter as described earlier. The larger the holes, the fatter and less distinct the stroke.

2. Distance between a hole and the hole nearest to it. This is calculated for every hole and the average is taken. The closer the holes are to each other, the clearer the stroke generally.

3. Cap height (height of the uppercase letters). For the same interhole distance and hole size, a greater cap height gives clearer strokes since there are more holes tracing out shorter strokes. Most perfins are composed mostly or entirely of uppercase letters so cap height is a good indicator of font size. The algorithm measures cap height as the height difference between the centres of the lowest and highest holes. It may slightly overestimate but will never underestimate (pun not intended) cap height.

These variables are combined in an overall measure of stroke clarity given by

*cap height ÷ (characteristic hole diameter × average distance to nearest hole)*

The number of characters in the perfin and the presence of lowercase letters and punctuation do not affect this definition of stroke clarity, so different perfins can be compared. It however ignores the spacing between characters. Closely spaced characters can make the perfin hard to read, such as in the SHELL perfin.

Because of how cap height is measured, the stroke clarity calculation works only for perfins with a single, uncurved line of text oriented horizontally in the image. Slanted perfins can of course be rotated to horizontal during initial processing. Automatic measurement of cap height in multi-line perfins would require optical character recognition algorithms. This would be challenging for perfins because the strokes are dotted rather than continuous.

**Hole congestion**

There is a phenomenon not accounted for in the stroke clarity equation. When the holes are large but the distance between them is small, the perfin can look congested and hard to make out (see illustrations of SMN and SHELL above). Congestion is therefore measured by

*characteristic hole diameter ÷ average distance to nearest hole*

**Perfin readability**

The readability of a perfin can be conceptualized as the presence of “structure” in the arrangement of its holes, which helps the eye to recognize the characters. Structure is measured using the average nearest neighbour (ANN) method widely used in spatial analysis. For each hole, the algorithm measures the distance to the nearest neighbouring hole (first-order neighbour), second nearest (second-order neighbour) and so on, using the nndist function in R. Then it calculates the average distance for each neighbour order. If there is little structure in the arrangement of holes, the average distance will increase smoothly with neighbour order. If the hole arrangement is highly structured, the increase will be uneven.

To compare this across perfins, the experiment must control for the different number of holes and different total area of each perfin. This is done by first comparing each perfin with a set of as many randomly distributed holes occupying the same area. The area is defined as the bounding box whose edges intersect the centres of the top, bottom, leftmost and rightmost holes of the perfin.

For each perfin, 250 sets of uniformly distributed random points are simulated using R’s pseudorandom number generator. This yields 250 ANN distributions from which an “average” distribution is obtained. The algorithm then calculates the root mean square error (RMSE) between this and the perfin’s ANN. The larger the RMSE, the more structured and more readable the perfin. Left, LYS; right, SHELL.

Table: vital statistics of some Malayan perfins. “Area” refers to the total area of all the holes in the perfin. Lengths and areas are in mm and mm

^{2}. The proposed metrics of stroke clarity, hole congestion and perfin readability work reasonably well. For example, SMN and SHELL are particularly hard to read because of oversized holes and congestion. The larger version of CBI is more congested than the smaller one, yet it is more readable because of clearer strokes. Meanwhile, LYS gets top marks in all three metrics.

**Automatic hole alignment**

One may wish to compare two samples of a perfin for small differences in hole positions or sizes that may indicate forgeries or multiple punch dies in a perfin machine. Matching by eye may be possible if you have good eyesight, a good loupe and steady hands. But computerised analysis has advantages. It can objectively and accurately record the results, which facilitates further analysis or dissemination. And if you have numerous samples of a perfin, the computer can rapidly match all of them pairwise and reveal otherwise invisible statistical patterns.

The matching algorithm can take perfins of any size or shape. As a case study I ran the algorithm on two samples of the HSBC perfin and 25 samples of the YSB perfin. The YSB samples encompass almost all denominations of both KGV and KGVI issues including the top values.

*Euclidean transformation.*First the algorithm finds the centroid or “center of mass” of each sample by taking the average of the coordinates of its hole centres. Then it translates the sample images such that their centroids both lie on (0,0). The centered samples are then rotated to get a least-squares match of their hole centers. The rotation angle is determined by principal component analysis (PCA) and linear algebra, which are standard techniques for this purpose.

The samples are first rotated into rough alignment by multiplying them with their respective component loadings matrices obtained from PCA. If the orientation of the loadings gives a reversed image of the sample as indicated by the sign of the loadings, the first principal component is flipped and the data is rotated 180 degrees to get the image the right way round.

For each hole in one sample, the computer identifies the hole nearest to it, i.e. the corresponding hole, in the other sample. This step is needed because the hole detection algorithm described earlier does not necessarily number the holes in the same order across different samples. The result can be verified visually by plotting and labelling the points.

Finally, the angle and direction of rotation that give the least-squares fit between the hole centres of the two samples are found via singular value decomposition of the covariance matrix (Arun et al. 1987) of the corresponding hole pairs. This serves a similar purpose to the earlier PCA but is less susceptible to noise and ideal for points that have a one-to-one correspondence, such as in perfins.

The percentage misalignment of each pair of samples was calculated by summing the black and white pixels and dividing the result by the sum of the total hole pixels of both samples. This “double counting” reduces bias by averaging out any difference in total hole area between the samples.

The HSBC samples shown above are misaligned by 50%, suggesting strongly that they came from two different punch dies. In this case one can tell from examining the actual stamps that they are different, but the computer records precisely what the differences are and gives a measurement.

Part of the code for aligning perfins and generating the misalignment map.

As for the YSB perfin, the much larger sample size permits more statistical analysis. For each sample, the computer takes the misalignment values of all pairings with that sample and plots the distribution. The bandwidth of the kernel density must be standardized across samples but can be tuned across the board to resolve or smooth out details as shown below. Any aberrant sample will have an odd-shaped distribution that stands out from the rest. This anomaly detection method can find the “needle in the haystack” more quickly, consistently and reliably than manual examination by eye or even by regular image editing software.

For the 25 YSB samples, although the peaks are somewhat spread out, none is far off enough to raise suspicion. Noise from stray fibres and imperfectly punched holes in some samples can cause misalignment values to diverge. Some judgement is needed, but one can always go back and inspect the sample in question.

One could also try to deduce from the data whether multiple punch dies were used and how many. For the YSB perfin, the ranked misalignment values appear Gaussian-distributed across a fairly narrow range from about 5% to 15%, so there is no reason to suspect multiple dies. If there were multiple dies, one may see two or more distinct groups of misalignment values, since samples from the same die tend to match more closely with one another than with samples from another die. This of course assumes that the differences between multiple dies are greater than the background noise. There may be cases of multiple dies that show no statistical difference.

If the statistics do indicate multiple dies, the dies can be separated using DBSCAN (Density-Based Spatial Clustering of Applications with Noise), a robust and widely used algorithm by Ester et al. (1996). Unlike some common algorithms, DBSCAN does not require a priori knowledge of the number of clusters (dies). One only has to input the approximate threshold misalignment value larger than which two samples are regarded as being from different dies. This value can be deduced from the aforementioned ranked misalignment values. If successful, the samples will get grouped into the different dies for further characterization.

**References**

back to table of contents