Pitfalls of Bibliometrics: What are the scientometric data good for?

Pitfalls of Bibliometrics: What are the scientometric data good for? Karol Życzkowski (Jagiellonian University and Center for Theoretical Physics, PAS...
Author: Bonnie Farmer
7 downloads 0 Views 1MB Size
Pitfalls of Bibliometrics: What are the scientometric data good for? Karol Życzkowski (Jagiellonian University and Center for Theoretical Physics, PAS)

Polish Academy of Science Warsaw, April 9, 2013

my scientific interests: - Nonlinear Dynamics - Mathematical Physics - Quantum Chaos - Quantum Information

also - Statistical Physics - Voting Theory - Bibliometrics

Nobel Prize in Physics 2012 Nobel Prize 2012 Serge Haroche David J. Wineland

Fr USA

Papers Citations Index H 219 16350 65 260 21205 71

S. Haroche D. Wineland Experiments on control of photons and atoms, of key importance for foundations of quantum theory

Fields Medal 2010 (Mathematics) E. Lindenstrauss B.C. Ngo Fields Medal 2010

o

Elon Lindenstrauss Israel Bao Chau Ngo Viet/Fr Stanislav Smirnov Ru/Sui Cedric Villani Fr

Papers Citations Index H 34 364 9 9 121 7 27 461 8 51 919 14

S. Smirnov C. Villani

Fields Medal 1998 (Mathematics) R. Borcherds

T. Gowers Papers Citations Index H Richard Borcherds UK 20 1302 13 Timothy Gowers UK 29 744 14 Maxim Kontsevich Ru/Fr 42 2265 18 Curtis McMullen USA 46 790 18

Fields Medal 1998

o

M. Kontsevich

C. McMullen

an explanation : Standards in various fields differ! Mathematicians: - publish less papers a year - quote less references in each article - the average number of the authors of a paper is smaller in comparison with physicists ! Conclusion:

One should not compare bare data from different fields ! (but rather use data rescaled to the average...)

A case example : Consider a (productive) mathematician writing alone 10 articles during last 10 years,

and a physicist writing jointly with 4 other colleagues five papers each year, so his 10-year list of publications consists of 50 items. What is a) the amount of work done ? b) estimated increase of their number of citations and e.g. their H-index ?

Comparison of different fields I (Data: USA 2006 - 2011) Field Mathematics Physics Chemistry Computer Sci. Engineering Space Science Geosciences Agriculture Biological Sci. Medical Science Psychology Social Science

Authors Articles Citations Citations Citations Articles Scientists per per year per per year per compared per year (USA) article per author article scientist to Math.

4190 18227 16430 2188 14609 3187 11621 3469 49614 58664 9805 12020

37000 49000 86000 20000 144000 5000 21000 22000 193000 45000 114000 100000

2.0 5.3 4.3 3.0 3.8 5.9 4.0 4.3 5.3 5.6 3.2 1.9

0.2 2.0 0.8 0.3 0.4 3.8 2.2 0.7 1.4 7.3 0.3 0.2

15.0 27.8 31.7 17.4 18.7 41.2 33.1 20.8 41.1 43.2 46.5 26.8

3.5 54.8 26.0 5.8 7.3 154.9 73.0 14.2 55.8 315,3 13.0 6.2

1 15.9 7.5 1.7 2.1 44.9 21.2 4.1 16.2 91.4 3.8 1.8

Comparison of different fields II Citations of 100-th highly cited scientist (Web of Science, ESI 2002-20012)

Field Mathematics Physics Chemistry Computer Sci. Engineering Space Science Geosciences Agricultural Sci. Biology & Bioch. Psychiat./Psycho. Social Sciences

Citations Relative of 100-th to scientist Mathem.

733 14772 12420 1247 3165 9700 3571 1288 6092 3256 948

1.0 20.2 16.9 1.7 4.3 13.2 4.9 1.7 8.3 4.4 1.3

Field

Citations Relative of 100-th to scientist Mathem.

Clinical Medicine 17051 Economics 960 Ecology 3013 Immunology 4169 Materials Sci. 4750 Microbiology 3244 Molecular Biol. 9021 Neuroscience 5781 Pharmacology 2081 Plant & Animal 3114

23.3 1.3 4.1 5.7 6.5 4.4 12.3 7.9 2.8 4.3

The above numbers are only approximate and should not be treated as convertion coefficients! The reason: differences inside a given field Example:

1.0 Pure Mathematics

20.2 Particle Physics

Mathematical Physics Applied Math. Entire spectrum of intermediate cases exists…

Statistics of citations in scientific papers

What are they good for? • some people (librarians, publishers, editors) do care about bibliometric indices, e.g.

impact factor (IF-2), E. Garfield 1972 IF(i) = ci /(Zi-1 + Zi-2) ci = the number of citations gained in year i by articles published in i-1 and i-2. Zi = the number of articles published in i.

• can we afford the luxury to neglect it ?? perhaps not...

2008 Impact factor for exemplary journals selected out of 6220 covered by Journal Citatation Report Cancer J. Clin. New Engl. J. Med Ann. Rev. Immun. Rev. Mod. Phys. Nature Cell Lancet Science Ann. Rev. Astr. Chem. Rev.

74.6 50.0 41.1 34.0 31.4 31.2 28.4 28.1 24.3 23.6

Phys. Rep. 16.3 Phys. Rev. Lett. 7.2 Phys. Rev. D 5.1 Ann. Math. 3.5 Phys. Rev. E 2.5 J. Am. Math. Soc 2.4 J. Phys. A 1.5 Duke. Math. J. 1.4 Acta. Phys. Pol.B 0.8 Rep. Math. Phys. 0.6

Mean Impact factor , mean number of citations probability that a paper will be quoted in 2-years data from Journal of Citation Reports for 1994-2005

Field

Size [%]

Biology

511

4.76

45.8

20.5

Astronomy Medicine Chemistry Physics Economics Comp. Scien. Mathematics History

25 766 145 503 159 124 149 23

4.29 2.89 2.61 1.91 0.82 0.63 0.56 0.41

38.3 33.9 33.1 24.0 30.4 17.2 18.4 81.8

21.5 18.3 17.0 16.7 12.1 19.3 8.5 10.1

Althouse, West, Bergstrom, 2008

Data from Journal of Citation Reports for 1994-2005 mean Impact Factor

mean number of citations

Althouse, West, Bergstrom, 2008

Universality of citation distribution P(cf) Radicchi, Fortunato and Castellano, 2008 P(c) • Study of citation patterns in 19 scientific fields P(c)

c

• In each discipline one finds the mean citation and defines the

rescaled citation number cf =c/

• The distribution P(cf) is claimed to be universal! • log normal + algebraic distribution for the tails of P(cf)

… and different publication years

cf

Rescaled data allow for comparison across different fields and …

cf

Traps and pitfalls related to citation statistics

R. Adler, J. Ewing, P. Taylor, Report of Joint Committee on Quantitative Assessment of Research, November 2008

Distribution of citations for a given paper is given by a power law, P(k) = a k -b, (Garfield 1987)

k

Example a): 2005 data for Proceedings of AMS imply impact factor IF= 0.43

Example b): 2005 data for Transactions of

AMS imply impact factor IF= 0.85

Question: Is an average Transaction paper twice as good as the one published in Proceedings ? • What is the probability that a random Proceedings paper as at least as many citations as the one from Transactions?



?

Example b): 2005 data for Transactions of

AMS imply impact factor IF= 0.85

Question: Is an average Transaction paper twice as good as the one published in Proceedings ? • What is the probability that a random Proceedings paper as at least as many citations as the one from Transactions?



the answer is:

62%

We are more often wrong than right !

• The 2008 report of Adler, Ewing, Taylor criticise sole reliance on the impact factor, since the ‘objectivity’ of such numbers can be illusory. They compare the judgments of a journal based on its Impact Factor alone to using the weight of a patient to judge his health…

• See also: E. Falgas and V. Alexiou: The top-ten in journal impact factor manipulations Arch. Immunol. Theor. Exp. 2008

Why Impact Factor of journals should not be used for evaluating research P. O. Seglen, British Medical J. 1997 IF does not reflect the squewness of the distribution (50% of citations are gained by 10% papers).

IF depends on the field. Article citations determine the journal IF, but not vice versa! IF depends on the mean number of references in each article.

Adding Impact Factor of journals in which all papers of a given author where published is a capital crime against

the rules of bibliometry !!

(would you sum e.g. the $$$ price of each issue?)

• If one considers citations as a quality indicator, one should rather care about the average (true)

impact factor of his papers (older than 3 years) defined as the sum of citations his articles published in year i gained in years i+1 and i+2 . = his direct contribution to IF of the journals !

?

?

Are all citations equal ? • Some experts believe that even mediocre articles can make it to the list of often cited papers: just because they are easy to understand and simple to quote... • Is a citation of my work by an expert in the field more valuable than the one by a beginner ? • Perhaps the value of any citation should depend on the scientific output of the citing author.. • Page Rank algorithm of Google !

Various constructions of the graph Each node of the graph represents: a) an individual article Chen, Xie, Maslov, Redner ‘07 Ma, Guan, Zhao ‘08 b) an individual author * K. Życzkowski, ‘Citation graph, weighted impact factors and performance indices’, Scientometrics 2010 * Radicchi et al. Sept. 2009 c) a single scientific journal, Bergstrom 2007

Page Rank algorithm: • One forms a citation graph, the corresponding citation matrix G of size N • One normalises each column of G to obtain a stochastic matrix G’ • To assure that G’ is irreducible (and increase the spectral gap) one defines PageRank matrix

H = a G’ +(1-a) W where Wij =1/N, (in Google a =0.85) • Page rank is defined by components of the leading eigenvector of H (Frobenius-Perron !)

• Bergstrom C, Eigenfactor: measuring the value and prestige of scholarly journals. C&RL News 68, 314–316 (2007) • Advantages: insensitivity to field differences insensitivity to insignificant journals free available at:

• http://www.eigenfactor.org

Mathematical Journals: SCI - JCR 2011 data Eigenfactor Article Influence Score (EF-AIS) characterizes journals better than Impact Factor No. Journal 1

J AM MATH SOC.

2

FOUND COMPUT MATH

3

ACTA MATH-DJURSHOLM

4

ANN MATH

5

COMMUN PUR APPL MATH

6

INVENT MATH

...



…..

10

FIXED POINT THEORY

11

J DIFFER GEOM

12

DUKE MATH J

13

NONLINEAR ANAL-THEOR

14

ANN SCI ECOLE NORM

IF-2

3.8 3.6 3.3 2.9 2.6 2.3 … 1.6 1.6 1.5 1.5 1.4

IF-5

3.2 2.9 4.0 3.3 3.9 2.9 … 1.8 1.5 1.7 1.6 1.8

EF –AIS

5.3 3.0 5.9 5.0 4.1 4.5 … 0.7 2.2 2.8 0.7 3.2

Impact Factor vs Eigenfactor

Score

Article Influence

journal Rev. Mod. Phys.

for exemplary journals :

Cancer J. Clin.

Ann. Rev. Immun. New Engl. J. Med.

2011 data

Cell Nature

ISI Journal Citatation Report

Lancet Science Chem. Rev. Econometrica Astronom. Rev.

J. Am. Math Soc.

IF-2

43.9 67.4 42.9 50.7 34.7 36.2 38.2 31.2 40.1 3.0 11.5 3.8

AIS

28.9 24.5 23.4 21.3 20.5 20.3 13.6 17.5 13.3 8.6 8.1 5.2

Is the number of citations really an objective measure of the scientific quality of an article ??

Several authors formulate various arguments against this statement: e.g. some simple papers are quoted more often than deep articles… while we* analyze data from our own lists of publications to contribute to this discussion ----------------------------------------------------------* B. Kamys, M. Kuś, J. Zakrzewski, K. Życzkowski

A humble example I : Number of citations & IF Total number of citations

150

100

50

0 0,00

2,00

4,00

6,00

8,00

Impact Factor (IF) of a journal

Number of citations of 100 selected papers (in quantum physics) by two authors (KZ and JZ) is weakly correlated with Impact Factor of the journal they appeared

Scientific quality

A humble example II : Number of citations vs scientific quality 8

6 4 2 0

0

50

100

Number of citations of an article Number of citations of selected papers is only weakly correlated with their scientific quality (assesd a posteriori by the authors in scale: [0,10])

Scientific quality

A humble example III : Scientific quality vs Impact Factor 8 S…

6 4 2 0 0,00

2,00

4,00

6,00

8,00

Impact factor (IF) of a journal

Scientific quality of selected papers (assesed a posteriori by the authors) is very weakly correlated with the Impact Factor of the journals they appeared

Remarks on tools used: We often work with ISI Web of Science

(standard in several fields !), but

• Mathematicians prefer MathSciNet • Some physicists like Scopus or • + Publish or Perish is free • Eigenfactor Article Influence is well normalized and much more reliable than IF

ISI Web of Science : negligence concerning (some) European journals Example: Reports of Mathematical Physics (RoMP) - best Polish journal in the field, edited by Copernicus University, Toruń: Articles published in 1970-1980 are recorded in ISI database, but do not appear in an author’s citation summary: citations Jamiołkowski A, Rep. Math. Phys. 1972 275 Kijowski J, Rep. Math. Phys. 1974 81

ISI Web of Science : not significant (= redundant) data Calculation of 2-year Impact Factor IF-2 Example: Reproductive Systems Journals Oxford Rev. Reprod. B: IF=1.765 =C/D C= 30 (cites in 1992 to papers from 1990-91 D=17 (articles published in 1990-91) If all articles had jointly one citation more C’=31, IF=1.82 => last digits of IF=1.765 are not significant !!

A. Molinié, G. Bodenhausen Bibliometrics as Weapons of Mass Citation Chimia 2010 Since we fear that decision-takers of granting agencies will be too busy to read our humble paper in Chimia, we appeal to scientists of all countries and disciplines to unite

against the tyranny of bibliometrics

Role of the bibliometric approach i) Bibliometric data have statistical character, so they can be used to compare scientific output in a single country or progress in a given branch of science. ii) Bibliometric data data are less useful to evaluate research record of an individual scientist and they allow for a reasoning in one direction only: - an author with a small number of papers and citations is unlikely to be an influential scientist, - a huge number of papers and citations does not prove that the author is a prominent scientist =>

peer review is required !

Concluding remarks I I.1) To characterize quality of journals Impact Factor (IF) should be replaced by modern tools, e.g. Eigenfactor Article Score which is: a) normalized (average value is equal to unity) b) insensitive to differencies between the fields c) much more difficult to manipulate than IF d) available free on line ! I.2) Impact Factor should be ruled out from any process of evaluation of scientific record of a single scientist or his grants proposals… I.3) Comparing two scientists from different fields one cannot use bare bibliometric data.

Concluding remarks II.a

a) for researchers: * Do well your research, write fine papers and publish them in good journals. * Do not care much about impact factors, citations, benchmarks and indices… * Do not use your time and energy to enter the game to inflate your bibilometric indices: ** A good scientist will have healthy numbers according to any bibliometric measure...

Concluding remarks II.b

b) for reviewers: • During the evaluation procedure treat any bibliometric numbers as auxiliary data only. • bureaucrats love to play with a single numerical indicator (as it often remains as an only tool for them!) • You should take advantage of your deep knowledge of the field: look into the essence of the proposal before looking at any parametric data…

Concluding remarks II.c c) for managers of science: •

• • •

Science has multiple goals – do not try to project the multi-dimensional system into a single axis! Do not hope to get a single bibliometric parameter universally suitable for evaluation. Usage of several bibliometric indicators in parallel reduces the risk of data manipulation. Support versatile usage of bibliometry, in which evaluated researchers take active part.

example: benchmarks used in ERC grants: Applicant selects his ten best papers published in the last decade and provides the number each of them was cited.

Bibliometrics: Use and Abuse in the Review of Research Performance Stockholm, May 2013 Critique in a Nutshell • Researchers today face an overload of evaluation activities of all kinds, • Time pressure that may jeopardize the quality of assessments, • A heavy bias disqualifying humanities and social sciences which do not fit in the system, • An increasing imbalance between the costs of the evaluation system, the advancement of science, and the profits made by private companies

The very last final remark: If necessary,

do use bibliometric data, but do it in a reasonable way!

! Thank You !

Suggest Documents