My current and past research is in the broad area of Machine Learning, with particular emphasis on graph-structured data. My work has been applied to challenging problems arising in a variety of domains, including computer program optimization and analysis, computer vision, and computational biology.
I am currently focusing my efforts on investigating ways to use modern deep learning approaches to improve program optimization and analysis.
Machine Learning for Program Optimization
Computer systems today, from embedded devices to exascale computing systems, are being developed using heterogeneous architectures. Due to this architectural diversity, most programs run well below the performance limits of computational, memory, and network capability of modern systems. Developers are now forced to spend significant amounts of time tuning and porting software across multiple target architectures.
My current research interests aim to bridge recent developments in deep learning and program optimization within the context of computationally intensive applications. With the goal of maximizing performance on today's and future hardware platforms, I investigate ways to characterize programs as graph structures from source code or intermediate representations, and the development of deep learning to learn feature representations from graph characterizations of programs.
An interesting application in the context of machine learning for graph representations of programs is the automatic identification of Malware. In prior research, our experiments show that graph-based representations are able to improve the classification accuracy over the corresponding feature-vector representations. Further improvements in accuracy were also observed when different representations are combined.
- J. Hope, M. Gjergji, J. Di Girolamo, M. Alvarez, and A. Qasem. Characterizing input-sensitivity in tightly-coupled collaborative graph algorithms. In IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), 287–296. 2021. Best Paper Award.
- A. Jilling and M. Alvarez. Optimizing recommendations for clustering algorithms using meta-learning. In International Joint Conference on Neural Networks (IJCNN), 1–10. 2020.
- L. Xu, D. Zhang, M. Alvarez, J. Morales, X. Ma, and J. Cavazos. Dynamic android malware classification using graph-based representations. In International Conference on Cyber Security and Cloud Computing (CSCloud), 220–231. Beijing, China, 2016.
- W. Killian, R. Miceli, E. Park, M. Alvarez, and J. Cavazos. Performance improvement in kernels by guiding compiler auto-vectorization heuristics. White Paper, Performance Prediction, Partnership for Advanced Computing in Europe (PRACE), 2014.
- L. Xu, W. Wang, M. Alvarez, J. Cavazos, and D. Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. In International Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG). Vienna, Austria, 2014. Best Paper Award.
- L. Xu, W. Wang, M. Alvarez, and J. Cavazos. Parallelization of the shortest path graph kernel on the gpu. In International Workshop on OpenCL (IWOCL). Atlanta, GA, USA, 2013.
- E. Park, J. Cavazos, and M. Alvarez. Using graph-based program characterization for predictive modeling. In International Symposium on Code Generation and Optimization (CGO), 196–206. San Jose, CA, USA, 2012.
Apparent/Real Age Estimation and Applications
The automated detection of minors/children in images and videos is a challenging problem aggravated by the presence of poor image and video quality and variations in media dimensions. We address the problem by developing methods to detect faces in images/videos and estimate their apparent and real ages. Based on Deep Learning approaches, we achieved state-of-the-art results for apparent and real age estimation using the APPA-Real dataset.
As a natural application of this research, we developed a framework to automatically detect whether a media file contains illicit material. Fusing deep learning models that detect nudity and/or the presence of a minor can make possible to identify instances of child pornography without ever coming into contact with the illicit material during model training. The performance of this approach is thoroughly evaluated on several widely used age estimation and nudity detection datasets. Additionally, preliminary tests were conducted with the help of a local law enforcement agency on a private dataset taken from real-world cases with up to 97% accuracy of video classification.
- J. Rondeau, D. Deslauriers, T. Howard III, and M. Alvarez. A deep learning framework for finding illicit images/videos of children. Machine Vision and Applications, 33(5):66, 2022.
- D. Moreira, E. Pereira, and M. Alvarez. Improving real age estimation from apparent age data. In International Joint Conference on Neural Networks (IJCNN), 1–7. 2021.
- D. Moreira, E. Pereira, and M. Alvarez. Peda 376k: a novel dataset for deep-learning based porn-detectors. In International Joint Conference on Neural Networks (IJCNN), 1–8. 2020.
- J. Rondeau and M. Alvarez. Deep modeling of human age guesses for apparent age estimation. In International Joint Conference on Neural Networks (IJCNN), 01–08. 2018.
Parts of this research were supported by Award No. 2016-MU-CXK015 granted by the National Institute of Justice, U.S. Department of Justice.
Collaborations and Interdisciplinary Research
In recent years I was fortunate to work with collaborators from other fields, involving students from my lab. The nature and motivation for these collaborations are driven by the need to explore novel approaches using deep learning to solve problems in other fields. I usually invite graduate and advanced undergraduate students to get involved in these collaborative efforts, to gain experience with the application of deep learning to real-world data, and to develop basic research skills before embarking into their own research projects.
Our joint work has led to state-of-the-art performance in the development of deep learning approaches for several computer vision applications, a number of scientific publications in conferences and journals, as well as, attracting funding in the form of research grants.
- F. Borges, J. Balta, M. Roghanian, A. Gonçalves, M. Alvarez, and H. Pistori. The interference of optical zoom in human and machine classification of pollen grain images. In Workshop de Visão Computacional (WVC), 100–106. 2021.
- J. Couret, D. Moreira, D. Bernier, A. Loberti, E. Dotson, and M. Alvarez. Delimiting cryptic morphological variation among human malaria vector species using convolutional neural networks. PLOS NTDs, 14(12):1–21, 2020.
- J. Souza, V. Weber, A. Gonçalves, M. Alvarez, M. Cereda, W. Gonçalves, V. Odakura, and H. Pistori. Viable yeast identification using bag of visual words in colored images. In Workshop de Visão Computacional (WVC). Uberlandia, Brasil, 2020.
- G. Astolfi, A. Gonçalves, G. Menezes, F. Borges, A. Astolfi, E. Matsubara, M. Alvarez, and H. Pistori. Pollen73s: an image dataset for pollen grains classification. Ecological Informatics, 60:101165, 2020.
- M. Gjergji, V. Weber, L. Silva, R. Gomes, T. De Araujo, H. Pistori, and M. Alvarez. Deep learning techniques for beef cattle body weight prediction. In International Joint Conference on Neural Networks (IJCNN), 1–8. 2020.
- P. Asadi, M. Gindy, M. Alvarez, and A. Asadi. A computer vision based rebar detection chain for automatic processing of concrete bridge deck gpr data. Automation in Construction, 112:103106, 2020.
- E. Tetila, B. Machado, G. Menezes, A. Oliveira Jr, M. Alvarez, W. Amorim, N. Belete, G. da Silva, and H. Pistori. Automatic recognition of soybean leaf diseases using uav images and deep convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 17(5):903–907, 2020.
- P. Asadi, M. Gindy, and M. Alvarez. A machine learning based approach for automatic rebar detection and quantification of deterioration in concrete bridge deck ground penetrating radar b-scan images. KSCE Journal of Civil Engineering, 23:2618–2627, 2019.
- R. Viana, R. Rodrigues, M. Alvarez, and H. Pistori. Svm with stochastic parameter selection for bovine leather defect classification. In Pacific Rim Conference on Advances in Image and Video Technology (PSIVT), 600–612. Santiago, Chile, 2007.
- R. Rodrigues, R. Viana, A. Pasquali, H. Pistori, and M. Alvarez. Máquinas de vetores de suporte aplicadas à classificação de defeitos em couro bovino. In Workshop de Visão Computacional (WVC). São José do Rio Preto, SP, Brasil, 2007.
Graph kernels and applications in Bioinformatics
The development of machine learning approaches for tasks that involve graph-structured data in computational biology was the focus my PhD work. I proposed novel graph representations for proteins and developed machine learning approaches based on graph kernels to address protein function prediction. The study of protein structure and function is one of the most important subjects in bioinformatics.
By modeling protein as graphs, we evaluated our methods under two types of function prediction, the discrimination of proteins as enzymes or not, and the recognition of DNA binding proteins. In both cases, the resulting performance is higher than existing methods. Furthermore, given the establishment of ontologies as a popular topic in biomedical research, we propose a novel semantic similarity measure between pairs of proteins, based on graph kernels. This latter approach, when compared with state-of-the-art methods, yields an improved performance.
- M. Alvarez and C. Yan. A new protein graph model for function prediction. Computational Biology and Chemistry, 37:6–10, 2012.
- M. Alvarez. Graph Kernels and Applications in Bioinformatics. Ph. D. Dissertation, Department of Computer Science, Utah State University, 2011. Committee: Adele Cutler, Changhui Yan, Minghui Jiang, Vicki Allan, and Xiaojun Qi.
- M. Alvarez, X. Qi, and C. Yan. A shortest-path graph kernel for estimating gene product semantic similarity. Journal of Biomedical Semantics, 2(1):3, 2011.
- M. Alvarez and C. Yan. Exploring structural modeling of proteins for kernel-based enzyme discrimination. In Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 1–5. Montreal, Canada, 2010.
Semantic Similarity between Words/Terms
The problem of measuring the semantic similarity between pairs of words is considered a fundamental operation in several machine learning and information retrieval applications. Developing a computational method is a challenging task due to the subjective nature of similarity. I introduced a novel algorithm that exploits concepts, relationships, and descriptive glosses available in the WordNet ontology. Experiments indicated higher correlations with with human ratings than existing algorithms.
The proposed method was also evaluated within the context of the Gene Ontology (GO). When calculating the semantic similarity between a pair of GO terms, our algorithm takes into account their shortest path in the ontology, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. We use our method to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between their annotating GO terms. The performance was evaluated by comparing our algorithm with expert annotations for functional similarity between proteins and with sequence similarity. Results show that our method is highly competitive with state-of-the-art methods, which require external databases of functional annotations.
- M. Alvarez and C. Yan. A graph-based semantic similarity measure for the gene ontology. Journal of Bioinformatics and Computational Biology, 2011.
- M. Alvarez, X. Qi, and C. Yan. Go-based term semantic similarity. In Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, chapter IX. IGI Publishing, 2011.
- M. Alvarez and S. Lim. Discovering interchangeable words from string databases. In International Conference on Digital Information Management (ICDIM), 25–30. Lyon, France, 2007.
- M. Alvarez and S. Lim. A graph modeling of semantic similarity between words. In International Conference on Semantic Computing (ICSC), 355–362. Irvine, CA, USA, 2007.
Computer Science Education
Education in Computer Science represents another of my interests. While I am not a computing education researcher, I am always open to explore new ways of teaching and learning computer science, and to make contributions to the design of computer science curricula in higher education programs.
- H. Pistori, M. Pereira, M. Alvarez, and X. Qi. Open source tools and project-based teaching as enablers of research experience in computer vision students. In Congresso Brasileiro de Educação em Engenharia (COBENGE). Gramado, RS, Brasil, 2013.
- B. Shelton, J. Scoresby, T. Stowell, M. Capell, M. Alvarez, and C. Coats. A frankenstein approach to open source: the construction of a 3d game engine as meaningful educational process. IEEE Transactions on Learning Technologies, 3:85–90, 2010.
- B. Shelton, M. Alvarez, M. Capell, C. Coats, J. Scoresby, and T. Stowell. The heat engine: a demonstration of sustainable design from an open-source 3d game engine. In Open Education Conference 2008: Celebrating Ten Years of Open Content. Logan, UT, USA, 2008.
- B. Shelton, M. Alvarez, M. Capell, C. Coats, J. Scoresby, and T. Stowell. Iterations of an open-source 3d game engine: multiplayer environments for learners. In Meaningful Play. East Lansing, MI, USA, 2008.
- M. Alvarez and S. Lim. A machine learning approach for one-stop learning. In Data Mining and Knowledge Discovery Technologies, chapter XIV. IGI Publishing, 2008.
- M. Alvarez, J. Baiocchi, and J. Pow-Sang. Computing and higher education in peru. Inroads, 40:35–39, 2008.