James Tauber : Parts of Speech and Number of Accents

I thought I'd write a quick Python script to check how many accents were on each of the lemmata in MorphGNT 5.06.

Here are the counts by part of speech and number of accents on lemma:

	0	1	2
A	-	9159	-
C	924	17361	-
D	1592	4606	-
I	-	17	-
N	30	28271	1
P	5433	5488	-
RA	19862	4	-
RD	-	1744	-
RI	-	1165	-
RP	-	11584	-
RR	-	1677	-
V	8	28101	1
X	147	844	-

Some of the low numbers are definitely errors in the database. Now to investigate...

UPDATE (2005-07-16): both 2-accent cases were mistakes. The 30 0-accent nouns and 5 of the 0-accent verbs were foreign loan words that intentionally weren't accented but 3 of the 0-accent verbs were mistakes. The 4 accented articles were the result of crasis with the following noun and the word should probably be analyzed as a noun rather than an article. I guess there'll be a 5.07 release soon. NOTE: I haven't looked at the particles, adverbs, conjunctions or prepositions yet.

The original post was in the category: morphgnt but I'm still in the process of migrating categories over.