PILL-ID: Matching and Retrieval of Drug Pill Imprint Images Young-Beom Lee1, Unsang Park2, and Anil K. Jain1,2 1Brain
and Cognitive Engineering Korea University, Korea
2Computer
Science and Engineering Michigan State University, USA http://Biometrics.cse.msu.edu
• Legal drug pill or illicit drug pill? • If illicit pill, which cartel manufactured it? • What is the effective way to identify illicit drug?
• ~35M in the U.S. used illicit or abused prescription drugs; $14B spent for drug treatment & prevention (2007) • Prescription pills must be identifiable (by color, shape, and imprints) per FDA regulations
• Illicit pills (e.g., narcotics) also contain imprints to identify the cartel or distributor
• Databases of prescription pills and illegal pills are available (pharmaceutical companies, FBI) Query
• • • • •
Rank-1
2
Imprint : 5883 Shape : round Color : brown Ingredient : MDMA, BZP, TFMPP Cartel : Gulf
contents
3
4
5
6
• Imprint is an indented or printed mark on a pill, tablet or capsule • Symbol, text, digits or their combination
Legal drug pills
Illicit drug pills
• Sobel operator to obtain gradient magnitude image • Segmentation, scale normalization
Original Image
Gradient magnitude Image
• Rotation normalization Primary & Secondary Dominant Orientations
• Landmarks (key-points) are selected within a preset radius (SIFT descriptor)
Multiple template with Rotation variation
• Gradient magnitude images have smaller intra-class variations Original image Gray image Gradient Magnitude image Rank-1 accuracy (%) Method
Gradient magnitude
Grayscale
Optimized SIFT descriptor
90.03
83.55
(using 602 query-gallery dataset)
Images that did not match at rank-1 using SIFT but matched using the proposed method (fixed key points + SIFT descriptor) Method Number of key-points Rank-1 accuracy (%)
Original SIFT Min
Max
Avg.
17
340
126
43.02
Our method (SIFT descriptor) 29 90.03
Red dots: SIFT key points, Blue dots: preset key points
• Select a set of key-points • Collect gradient magnitude and orientation with Gaussian weighting and tri-linear interpolation • Truncation • Length of feature vector: 4 × 4 × 8 = 128 128 × 29 = 3712
Gaussian weighting
Gaussian window centered at a key point
Tri-linear interpolation
Truncation
• LBP histograms with multiple neighborhood parameters (P,R) are created and concatenated
P=8, R=1.0
P=4, R=1.0
P=12, R=2.0
• Feature vectors are constructed with the following parameters (P, R)
Window size
Shift value
U(8, 1)
20 X 20
4
U(4, 1)
10 X 10
2
U(12, 2)
30 X 30
6
• Length of feature vector: U(8,1) = 59, (4,1) = 16, U(12,2) = 135 59 X(13 X 13)+16 X(31 X 31)+135 X(7 X 7) = 31,962
• Given a query image (q) and N gallery images (g), the K feature vectors of the query are compared with the Ln feature vectors of the nth gallery images (n = 1 to N, L2 norm). • Ln is different for each gallery image • The ID of the closest match in the gallery is selected as the ID Feature vectors j of gallery images, g n
Feature vectors i of a query image, qm
Ln (=j) … … … …
Km (=i)
…
N
........
n
........
IDm arg min d (qmi , gnj )
.....
…
• 822 illicit drug pill images from the Australian Federal Police; 138 illicit drug pill images and 14,003 legal pill images from the U.S. DEA website, Drug information online and pharmer.org • Image size: from 48 X 42 to 2,088 X 1,550 pixels; 96 dpi • Query set: 602 illicit drug pills with duplicate images of the same imprint pattern (88 distinctive patterns) • Gallery set: 960 (illicit drug pill images) + 14,003 (legal drug pill images) = 14,963 images • Leave-one-out method to match each of the 602 query to all the 14,962 gallery images
•
SIFT descriptor parameters are optimized for pill imprint matching 1. Smoothing
2. Gradient orientation & magnitude
3. Gaussian weighting
4. Trilinear interpolation
5. Truncation with threshold values of 0.2, 0.5 and 1
Method
Rank-1 accuracy (%)
Truncation value
Rotation Normalization
Edge image
Grayscale image
SIFT with 1, 2, 3, 4, 5 (Original sift)
0.2
No
83.89
83.39
SIFT with 2, 3, 4, 5
0.2
No
87.87
78.74
SIFT with 2, 4, 5
0.2
No
88.70
79.57
SIFT with 2, 5
0.2
No
87.54
81.56
SIFT with 2, 4, 5
0.5
No
87.71
-
SIFT with 2, 4, 5
1.0
No
87.71
-
SIFT with 2, 4, 5
0.2
Yes
90.03
-
• 602 query and 14,962 gallery images Method
Rank 1 (%)
Rank 20 (%)
MLBP
64.78
82.72
SIFT descriptor
82.72
90.20
SIFT (0.7)+MLBP (0.3)
84.39
91.53
Query
Top-6 retrievals
•
Queries that were not correctly retrieved in top 20 matches
Query
Top-6 retrievals
Rank of true mate
− Illumination noise in the background
13042
− Similar shape and imprints
12841
3402
3259
1897
− Very similar pattern between query and top retrieved images
• Numeric or text information in imprints can be used for matching/filtering
5883
• • • • •
Imprint : 5883 Shape : round Color : brown Ingredient : MDMA, BZP, TFMPP Cartel : Gulf
Shape : Round Color: Pink Text: no Numbers: no
Query
… Rank 1
2
3 4 5 6 Using only imprints
7
…
97
… Rank 1
2
3 4 5 6 Using imprint shape and color
7
…
15
Content based matching can reduce retrieval errors
• Proposed an image retrieval system for identifying illicit drugs • 84.4% rank-1 (91.53% rank-20) accuracy with ~600 query and ~15K gallery images
• Evaluated two image descriptors (SIFT and MLBP) & their fusion; rotation invariant matching scheme was used • Computation time: 2.3 (0.5) sec/image for feature extraction and 13.0 (4.0) sec for each query with ~15K gallery for SIFT (MLBP); code in MATLAB running on 2.8 GHz CPU, 8 GB RAM • Future work – Content based matching/filtering – Evaluation on a larger database; collaboration with AFP – More efficient matching scheme
• If we can identify numbers or texts in imprints, content based methods can be used.
Number : 5883
Text : WYETH
Examples of the number and text imprint
•
MLBP is also evaluated with a various parameters using 602 querygallery dataset to optimize it for pill imprint matching 1. Number of LBPs 2. Sub-region (window size, shift value) 3. Input image size
Method
Rank-1 accuracy (%)
LBP
Sub-region
Image size
u2 LBP8,1+4,1
No
60
51.01
u2 u2 LBP8,1+4,1+12,2
No
60
54.15
u2 u2 LBP8,1+4,1+12,2
No
70
55.81
u2 u2 LBP8,1+4,1+12,2
(32, 8)(16, 4)(48, 12)
70
63.12
u2 u2 LBP8,1+4,1+12,2
(16, 4)(8, 2)(24, 6)
70
65.78
u2 u2 LBP8,1+4,1+12,2
(20, 4)(10, 2)(30, 6)
70
75.42
Gradient magnitude image
Multiple Templates
Orientation histogram 15 10 5
……
……
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35