Next comes living. At first it also seems to be a special case (since it can be Noun, Gerund, Verb - as part of a Compound Tense - Adjective or Participle). Instruction No 69 establishes that this word ends in -ing and No 70 sends it for further analysis to instructions No 246-303. Almost towards the end (instructions No 300 and 301), the algorithm decides to attribute living to the acknowledging that it is a Present Participle. If the program were more precise, it would be able also to say that living is an Adjective used as an attribute. The last word in this sequence is quarters. The way it ends very much resembles a verbal ending (3rd person singular). Will the algorithm make a mistake this time? Instruction No 67 recognizes that the ending -s is ambiguous and sends quarters to instructions No 165 245 for more detailed analysis. Then the word passes unsuccessfully (unrecognized) through many instructions till it finally reaches instruction No 233, where it is evidenced that quarters is followed by a Punctuation Mark and this serves as sufficient reason to attribute it to the NG:
Finally, our algorithmic analysis of the above sentence ends with commendable results: no error. However, in the long run we would expect errors to appear, mainly when we deal with Verbs, but these are not likely to exceed 2 per cent. For example, an error can be detected in the following sample sentence: .Not only has his poetic fame - as was inevitable - been overshadowed by that of Shakespeare but he was long believed to have entertained and to have taken frequent opportunities oj expressing a malign jealousy oj one both greater and more successful than himself.
This sentence is divided into VG and NG in the following manner:
His poetic fameNG
By that of ShakespeareNG
Was long believed to have entertainedVG
To have takenVG
Frequent opportunities of expressingNG
A malign jealousy of one both greaterNG
More successful than himself.NG
As is seen in the above example, the word long was wrongly attributed to the VG (according to our specifications laid down as a starting point for the algorithm it should belong to the NG). The reader, if he or she has enough patience, can put to the test many sentences in the way described above (following the algorithmic instructions), to prove for himself (herself) the accuracy of our description. Though this is a description designed for computer use (to be turned into a computer software program), nevertheless it will surely be quite interesting for a moment or two to put ourselves on a par with the computer in order to understand better how it works. Of course, that is not the way we would do the job. Our knowledge of grammar is far superior, and we understand the meaning of the sentence while the computer does not. The information used by the computer is extremely limited, only that presented in the instructions (operations) and in the Lists. Further on we will try to give the computer more information (Algorithm No 3 and the algorithms in Part 2) and correspondingly increase our requirements.
- Most of the procedures to determine the nominal or verbal nature of the wordform, depending on its context, are based on the phrasal and syntactic structures present in the Sentence (for example, instructions 11 and 12, 67 and 68, 85, etc.), i.e. structures such as Preposition + Article + Noun; will (shall) + be + (Adverb) + Participle; to + be + (not) + Participle 2nd + to + Verb; -ing + Possessive Pronoun + Noun, etc. (the words in brackets represent alternatives).
- When constructing the algorithm it was thought to be more expedient to deal first with the auxiliary and short words of two-letter length, then with words of three-letter length, then with the rest of the words - for frequency considerations and also because they represent the main body of the markers.
- The approach presented in this study is not based on formal grammars and is to be used exclusively for text analysis (not for text synthesis). One should not associate the VP (Verbal Phrase) with the VG and the NP (Noun Phrase) with the NG - for these are completely different notions as has been shown by the presentation.
- The algorithm can be checked by feeding in texts through the procedures (the instructions) manually and if the reader is dissatisfied he or she may change the instructions to improve the results. (See Section 3.3 for details of how the performance of the algorithms can be hand checked.) The algorithm can be easily programmed in one of the existing artificial languages best suited for this type of operation.
1. Brill, E. and Mooney, R.J. (1997), An overview of empirical natural language processing', in AI Magazine, 18 (4): 13-24.
2.Chomsky, N. (1957), Syntactic Structures. The Hague: Mouton. Curme, G.O. (1955), English Grammar. New York: Barnes and Noble.
3. Dowty, D.R., Karttunen, L. and Zwicky, A.M. (eds) (1985), Natural Language Parsing. Cambridge: Cambridge University Press. Garside, R. (1986),
4. 'The CLAWS word-tagging system', in R. Garside, G. Leech and G. Sampson (eds) The Computational Analysis of English. Harlow: Longman. Gazdar, G. and Mellish, C. (1989), Natural Language Processing in POP-11. Reading, UK: Addison-Wesley. Georgiev, H. (1976),
5. 'Automatic recognition of verbal and nominal word groups in Bulgarian texts', in t.a. information, Revue International du traitement automatique du langage, 2, 17-24. Georgiev, H. (1991), 'English Algorithmic Grammar', in Applied Computer Translation, Vol. 1, No. 3, 29-48.
6. Georgiev, H. (1993a), 'Syntparse, software program for parsing of English texts', demonstration at the Joint Inter-Agency Meeting on Computer-assisted Terminology and Translation, The United Nations, Geneva.
7. Georgiev, H. (1993b), 'Syntcheck, a computer software program for orthographical and grammatical spell-checking of English texts', demonstration at the Joint Inter-Agency Meeting on Computer-assisted Terminology and Translation, The United Nations, Geneva. Georgiev, H. (1994-2001), Softhesaurus, English Electronic Lexicon, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS/ Windows. Georgiev, H. (1996-2001a),
8. Syntcheck, a computer software program for orthographical and grammatical spell-checking of German texts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS/Windows. Georgiev, H. (1996-200lb), Syntparse, software program for parsing of German texts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS Windows.
9. Georgiev, H. (1997-2001a), Syntcheck, a computer software program for orthographical and grammatical spell-checking of French texts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS Windows.
10. Georgiev, H. (1997-2001b), Syntparse, software program for parsing of French texts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS/Windows.
11. Georgiev, H. (2000 2001), Syntcheck, a computer software program for orthographical and grammatical spell-checking of Italian texts, produced and marketed by LANGSOFT, Sprachlernmittel, Switzerland; platform: DOS/Windows.
12. Giorgi, A. and Longobardi, G. (1991), The Syntax of Noun Phrases: Configuration, Parameters and Empty Categories. Cambridge: Cambridge University Press. Graver, B. D. (1971), Advanced English Practice. Oxford: Oxford University Press.
13. Grisham, R. (1986), Computational Linguistics. Cambridge: Cambridge University Press. Harris, Z.S. (1982)
14. A Grammar of English on Mathematical Principles. New York: Wiley. Hausser, R. (1989), Computation of Language. Berlin: Springer. Hornby. A.S. (1958)
15. A Guide lo Patterns and Usage in English. London: Oxford University Press. Kavi, M. and Nirenburg, S. (1997), 'Knowledge-based systems for natural language', in A.B. Tucker (ed.) The Computer Science and Engineering Handbook. Boca Raton, FL: CRC Press, Inc., 637 53.
16. Koverin, A.A. (1972), 'Grammatical analysis, on a computer, of French scientific and technical texts' (in Russian), PhD thesis, Leningrad University, Russia. Leech, S. and Svartvik, J. (1975)
17. A Communicative Grammar of English. London: Longman. Manning, C. and Schutze, H. (1999), Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Marcus, M.P. (1980)
18. A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: MIT Press. McEnery, T. (1992), Computational Linguistics. Wilmslow, UK: Sigma Press.
19. Mihailova, I.V. (1973), Automatic recognition of the nominal group in Spanish texts' (in Russian), in R. G. Piotrovskij (ed.) Injenernaja Linguistika. St Petersburg: Politechnical Institute, 148-75.
20. Primov, U.V. and Sorokina, V.A. (1970), 'Algorithm for automatic recognition of the nominal group in English technical texts' (in Russian), in R.G.
21. Piotrovskij (ed.) Statistika Teksta, II. Minsk: Politechnical Institute. Pullum, G.K. (1984), 'On two recent attempts to show that English is not a CFL', Computational Linguistics, 10 (3-4), 182-6. Quirk, R. and Greenbaum, S. (1983),