Métodos de Aprendizaje Automático aplicados a la Predicción de Palabras para el Portugués de Brasil

  1. Cruz Cavalieri, Daniel
  2. Filho, Teodiano Freire Bastos
  3. Palazuelos Cagigas, Sira Elena
  4. Macías Guarasa, Javier
  5. Martín Sánchez, José Luis
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Any de publicació: 2010

Número: 45

Pàgines: 87-94

Tipus: Article

Altres publicacions en: Procesamiento del lenguaje natural

Resum

People with physical disabilities may have serious problems to use computer keyboards to write. For this reason, they may use specific tools that include systems to assist the writing process, such us word prediction, in order to reduce the number of keystrokes needed to write the text. Word prediction may be based on different sources of information: statistical, grammatical, specific of the subject or/and the user, etc. In this paper we increase the quality of the word prediction in Brazilian Portuguese by improving the prediction of the part of speech (POS) of the predicted word. We propose the following methods to predict the POS: artificial neural networks, support vector machines, regularized logistic models and a naïve Bayes classifier. When included in the word prediction system, they save between 32.55 % and 34,58 % of the keystrokes needed to write the text.

Referències bibliogràfiques

  • Bick, Eckhard. 2000. The Parsing Sys- tem \Palavras": Automatic Grammati- cal Analysis of Portuguese in a Cons- traint Grammar Framework. Ph.D. tesis, Aarhus University, Aarhus, Denmark, November.
  • Burges, Christopher J. C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167.
  • Cavalieri, Daniel C., Teodiano F. Bastos- Filho, M´ario Sarcinelli-Filho, y Sira Elena Palazuelos Cagigas. 2008. Redes neuronales artificiales para predicci´on de categorias de palabras en portugu´es de brasil. En V Congreso IBERDISCAP, Cartagena, Colombia.
  • Christianini, N. y Shawe-Taylor. 2000. An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press.
  • Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14, University of California, California, San Diego. Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, y Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874.
  • Figueira, Cleonis Viater. 2006. Modelos de regressao logística. Master’s thesis, Universidade Federal do Rio Grande do Sul - UFRGS, Instituto de Matem´atica da UFRGS, Porto Alegre, Brasil, March.
  • Garay-Vitoria, N. y J. González-Abascal. 1997. Intelligent word prediction to enhance text input rate (a syntactic analysis based word prediction aid for people with severe motor speech disability). En Annual International Conference on Inte- lligent User Interfaces, páginas 241-247.
  • Joachims, Thorsten. 1998. Text categorization with support vector machines: Learning with many relevant features. p´aginas 137–142. Springer Verlag.
  • Lin, Chih-jen, Ruby C. Weng, y S. Sathiya Keerthi. 2007. Trust region newton method for large-scale logistic regression. En An Interior-Point Method For Large- Scale l1-Regularized Logistic Regression.
  • Nakamura, Masami, Katsuteru Maruyama F, Takeshi Kawabata F, y Kiyohiro Shikano Tit. 1990. Neural network approach to word category prediction for english texts. En Helsinki University, p´aginas 213–218. Ng, Andrew Y. 2004. Feature selection, l1 vs. l2 regularization, and rotational invariance. En In ICML.
  • Osuna, Edgar, Robert Freund, y Federico Girosi. 1997. Training support vector machines: An application to face detection. p´aginas 130–136.
  • Palazuelos, Sira Elena. 2001. Contribution to Word Prediction in Spanish and its In- tegration in Technical Aids for People with Physical Disabilities. Ph.D. tesis, Universidad Politécnica de Madrid, Madrid, España.
  • Santos, Diana y Paulo Rocha. 2004. The key to the first clef in portuguese: Topics, questions and answers in chave. En 5th Workshop of the Cross-Language Evalua- tion Forum, CLEF 2004, p´aginas 821-832, Bath, UK, September 15-17.
  • Sundarkantham, K. y S. M. Shalinie. 2007. Word preditor using natural language grammar induction technique. Journal of Theoretical and Applied Information Technology, 3(3):1–8.
  • Vert, J. p. 2002. Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings. En Proceedings of the Paci c Symposium on Biocomputing, p´aginas 649-660. World Scientific.
  • Zhang, Harry y Jiang Su. 2008. Naive bayes for optimal ranking. J. Exp. Theor. Artif. Intell., 20(2):79–93.