Even a man of slow intellect who is trained and exercised
in arithmetic,
if he gets nothing else from it,
will at least improve and become sharper than before
Plato
Preface
These days, we are no stranger to
statistics. We hear the number of fatalities on the road, the changes of
public opinion for or against the government, how many housewives now use
the latest Channel 5 products. Politicians, newspaper editors, advertisers
and people with an axe to grind all throw statistics at our head. If we
are to evaluate such statistics properly, we require an understanding of
statistical reasoning and methods.
As a writer says "statistical
thinking will one day be as necessary for efficient citizenship as the
ability to read and write". In this series, I will attempt to introduce
you to this interesting topic. This is a result of one man's effort, it
is thus hardly perfect and mistakes are inevitable. I would welcome any
comment, suggestion you may have to help me improving the series.
Statistics
as a formal system
I feel a little bit anxious to write
about statistics to this audience, because I know that there are more than
one of you, who had distaste for this discipline in your university days.
I certainly did. More than twenty years ago, while I was a first year student
in statistics in Vietnam, I felt dizzy with strange probabilistic concepts,
even though I was mathematically trained; I was thinking of quitting the
subject. I was not intimidated by the arithmetic, but the appalling description
of statistical ideas and concepts in many textbooks at the time (and even
now in Western countries).
However, one day, I incidentally
read a few lines from a book written by Bertrand Russell which, in part:
"there was a footpath leading across fields to New Southgate, and
I used to go there alone to watch the sunset and contemplate suicide. I
did not, however, commit suicide, because I wished to learn more about
mathematics". I decided to learn more about mathematics and statistics.
Now that after fifteen years of involvement in teaching and research, I
look back and have no regret about the career path that I have chosen.
If we think seriously about mathematics,
we find that it is the basis for everything we do nowadays. It is not surprised
that mathematics is regarded as the prince of science. In most sciences,
particularly biological science, the speed of "discoveries" makes
one confused. Indeed, these so-called discoveries often turn out to be
untrue. This is not surprising given that in biological science, what one
has established another undoes. On the contrary, in mathematics, each generation
adds a new story to the old structure. Mathematical ideas are timeless
and may be described as "eternal truth".
You are now probably wondering whether
you have been reading the wrong article, since so far I have talked about
mathematics, not really statistics. However, the rationale for my preamble
is that statistics is usually defined as a branch of applied mathematics,
which is in turn a modern discipline of modern mathematics. But, in practice,
modern mathematics is one of the principal tools of statistics. So, a few
words of mathematical introduction is required. In this series, I will
introduce to you some of the modern statistical concepts and ideas that
have somewhat become hallmarks of modern science. I will begin with a discussion
of an elementary question: what is statistics?
Many textbooks define statistics
as an applied mathematical discipline, concerning with the collection,
analysis and interpretation of data. However, I think statistics is a composite
domain, containing at least two distinctly different intellectual activities:
(1) the acquisition, logical organisation and numerical presentation of
data, and (2) the analysis of the data to arrive at decisions about degrees
of variation, interrelation, and difference. The first type of activity
may be called "descriptive statistics"; it produces the collection
of data that appears in financial charts, birth rates, death rates, population
census, etc. The second type of activity may be referred to as "inferential
statistics"; it is responsible for such calculations such as t-test,
confidence intervals, chi square test, linear regression, analysis of variance,
etc. The first activity requires no particular scholastic training in statistics
and can be performed by any intelligent person, while the second activity
requires a formal statistical and mathematical training.
Statistical activities resemble
closely the task of science, which is to gather natural knowledge, to arrange
that knowledge coherently and to comprehend patterns or theories discerned
therein. Statistics is thus a science. It is also a formal system. Examples
of a formal system may include logic, grammar and mathematics. These are
concerned with the form, not the content, of statements. We may write "the
Vietnamese always have blond hair". Grammatically, this is correct,
but substantially, it is nonsense.
The study of grammar does not protect
one from writing nonsense. Similarly, logic is the study of formal properties
of propositions, and the rules tell us what conclusions deduced from them,
are valid, but logic does not ensure that conclusions are true. Consider,
for example,
Premise 1: All men are creature
of habit;
Premise 2: All creatures
of habit are fool;
Conslusion: All men are
fools.
The conclusion is validly deduced
from the two premises, but it is not necessarily true. It is only true
if both premises are true. It should be noted that because premise is false,
it does not follow that conclusion is also false. Men can be fools for
other reasons, love, for example! The science of inferential statistics
may be considered in the same way. For example, the average of the set
of values 1, 1, 2, 3, 2, 1, 11 is 3, which is statistically correct, but
is meaningless as a representative value, since it conceals entirely the
abnormal value of 11. In statistics, a number of axioms and postulates
are stated and conclusions deduced from them by the mathematical game.
This system, like logic, may be used as a model to ensure that premises
or assumption it makes are warranted by the nature of the phenomena it
purports to describe. I will return to this important point in the next
few articles.
But statistics does not just deal
with scientific theories, it is also concerned with more practical issues
in real life. In the old days, medicine was taught as a black-and-white
discipline - there was no room for error, the doctor was always right.
However, with the advance of knowledge and information, modern medicine
finally realises that doctors can indeed make errors in diagnosis and clinical
judgement. Sir G. Pickering, a prominent British medical researcher, implicitly
acknowlege this by noting that "doctors want to help patients, but
the extent to which they can help obviously depends on the doctor's knowledge.
But knowledge is a matter of probability. Diagnosis is a matter of probability,
and in judging treatment, doctors have to base their judgment on knowledge
of probability". A new drug is unlikely to treat successfully 100%
of of patients. The reality of the world is harsh and unyielding, and must
be dealt with on its own terms.
There is no way to eliminate completely
the risks of being wrong. Our real problem is not how to eliminate them,
but how to live with them intelligently. In Vietnamese, we have a wonderful
saying "Mu+u su+. ta.i nha^n, tha`nh su+. ta.i thie^n". In real
life, things do not always work out the way we hypothesized or we planned.
The main reasons for this are likely that (i) our hypothesis is incorrect
and/or (ii) we do not have enough evidence to reject/accept the hypothesis.
The former is hypothetical idea which can be re-defined, however, the latter
is fact and can not be changed but can be dealt with in probabilistic terms.
It is not surprised that statistics has now become an important, if not
to say essential, tool in quality control, medical research and any experimental
research.
I have heard from prominent academics
in the US, who said privately that the distinction in science between the
East and the West is that the latter knows how to do statistics in research,
while the former does not. Although the comment is rather arrogant and
tasteless, there is an element of truth in it. I have personally reviewed
many scientific papers and research grants of researchers from Eastern
European and Japan, and found that while their experimental works are fine,
their treatment of data is absolutely ridiculous. There is no choice but
such pieces of research have to be rejected for publication. That perhaps
explains why most researchs were mainly published by the US and Western
scientists.
Vietnamese students are traditionally
competent mathematicians, yet to my knowledge, very few specialise in statistics.
This has root in the education of statistics there. The teaching of statistics
in Vietnam was and is still dominated by the first type of activity (descriptive
statistics), which is not in line with the rest of the world, notably in
developed countries, who are more concerned with inferential statistics.
In fact, most universities in Vietnam do not have the department of statistics
and most statisticians there are pure mathematicians, not applied statistician.
On the other hand, statistics, while taught at universities, has not yet
found its way to application in industry and research. There are however
encouraging signs in Hanoi and Saigon, where academic books in statistics
have been translated and used in the teaching of statistics.
Some years ago, I read a book which
has the following lines of advice, which I would like to quote here: "If
you are young, I would suggest you to learn statistics as soon as you can.
Do not dismiss it through ignorance or because it calls for thought. Do
not pass into eternity without having examined these techniques and thought
about the possibility of application in your field of work, because very
likely you will find it an excellent substitute for your lack of experience
in some directions. If you are older and already crowned with the laurels
of success, see to it that those under your wing who look to you for advice
are encouraged to look into this subject. In this way, you will show that
your arteries are not yet hardened and you will be able to reap the benefits
without doing overmuch work yourself. Whoever you are, if your work calls
for the interpretation of data, you may be able to do without statistics,
but you will not be able to do so well." I strongly believe that the
advice is still appropriate to our brothers and sisters.
Tuan V. Nguyen, Ph.D.
Bone and Mineral Research Division
Garvan Institute of Medical Research
384 Victoria St Sydney 2010 Australia
Phone: +612 295 8246
Fax: +612 295 8241
t[email protected]
For discussion on this column, join [email protected]
Copyright © 1996 by VACETS and T V Nguyen