Arun Balaji Ramathilagam

Virudhunagar · Tamil Nadu, India · arunthilak95@gmail.com

Research and Development Engineer with 2+ years of experience in handling large geospatial datasets and satellite imagery using SQL and Python. Hands-on experience in building and deploying advanced Machine Learning/Deep Learning models for regression, classification and segmentation tasks.


Experience

Research and Development Engineer

Vassarlabs IT Solutions Pvt Ltd
  • Developed and executed python code to automate data cleaning, preprocessing and transformation pipelines using packages like Geopandas, GDAL, pandas and numpy.
  • Executed Google Earth Engine scripts to download remote sensing data for different use cases.
  • Saved time by implementing k-means and hierarchical clustering to analyze temporal NDVI profile of crops and classifying them with very few ground truth points.
  • Quantified and visualized the changes in cropping pattern over time in Telangana using Sentinel-1 and Sentinel-2 timeseries data.
  • Worked closely with the backend team to download and process data according to the requirement to build a drought monitoring dashboard at national level.
May 2023 - Present

Junior Research Fellow

SASTRA University - School of Computing
  • Used 1D CNN, LSTM and transformer models for crop classification using time series Sentinel-1 and Sentinel-2 data.
  • Developed unique research methodology to develop NDVI from SAR data using pix2pix GAN.
  • Used UNet, ResNet and LinkNet for agricultural field boundary extraction using high resolution Indian Remote Sensing data and compared the results.
  • Pre-processed the remote sensing data using GDAL, Numpy, SNAP and QGIS.
June 2021 - March 2023

Junior Research Fellow

Shiv Nadar University - Department of Civil Engineering
  • Collected spectral reflectance data from agricultural fields everyday using spectroradiometer required for crop water stress monitoring.
  • Automated preprocessing of reflectance data and derived 10 different spectral indices using python script.
  • Compiled reports and documented research methodology.
January 2021 - May 2021

Volunteer

Humanitarian Openstreetmap
  • Adding and improving base data in OpenStreetMap using Satellite imagery and local knowledge.
  • Validating various HOTOSM tasks across the world.
  • One of the active OSM contributors in India.
August 2016 - Present

Intern

Mahalanobis National Crop Forecast Center
  • Classifying paddy growing areas using multi-temporal RISAT-1 data.
  • Predicting the transplanting dates, fresh biomass and grain yield for paddy.
August 2016 - November 2016

Education

University of Twente - Indian Institute of Remote Sensing, ISRO (ITC-IIRS JEP)

Master of Science in Geoinformatics

GPA: 7.93

  • Estimating pearl millet biomass, LAI and crop height from RADARSAT-2 data using the Water Cloud Model.
  • Published a paper titled "Evaluation of different Machine Learning classifiers for Pearl Millet crop classification using Sentinel-1 and RADARSAT-2 data" using Decision Tree, Random Forest and SVM classifiers
  • Received IIRS Golden Jubilee Scholarship for scoring highest marks in the Earth Observation module.
September 2018 - June 2020

Tamil Nadu Agricultural University

Bachelor of Technology in Agricultural Information Technology

GPA: 8.37

  • Published a paper titled "Area estimation of cotton and maize crops in Perambalur district of Tamil Nadu using multi date Sentinel-1A SAR and optical data"
  • Was part of the University Basketball team and won tournaments at state and national level.
July 2013 - May 2017

Skills

Programming Languages
  • Python
  • SQL
  • R
Libraries/Frameworks
  • Numpy
  • Pandas
  • Scikit-learn
  • Keras
  • Tensorflow
  • Seaborn
  • Matplotlib
  • Rasterio
  • GDAL
  • Geopandas
  • Basics of Streamlit
Tools
  • JOSM
  • ArcGIS
  • QGIS
  • PostGIS
  • ERDAS
  • Basics of Google Earth Engine
  • Tableau
  • SNAP

Projects

Donor Choose Classification
  • 100k project proposals were classified into two classes using Decision Tree, Naive Bayes, XGBoost and LSTM and cross-validated
  • Pre-processed data, including encoding with TF-IDF and Word2Vec, visualizing with t-SNE, transforming and performing initial analysis
  • Conducted hyper-parameter tuning using k-fold cross validation to get the maximum AUC score.
Microsoft Malware Detection
  • Used Random Forest and XGBoost classifiers to identify malware using .byte and .asm files
  • Pre-processed data, including cleaning, encoding, visualizing, transforming and performing initial analysis
  • Preformed feature engineering by creating using bi-grams from byte files, pixel values from .asm files and selected the best features to optimize model performance
  • Conducted hyper-parameter tuning to increase performance of XGBoost model
Document Classification using CNN
  • Classified a total of 18828 text files into 20 different classes using 1D CNN with Tensorflow and Keras.
  • Preprocessed the text data using different python packages like regex to remove tabs, spaces, emails, numbers, special character and SpaCy for chunking the text
  • Embedding layer using Keras and pre-trained GloVe vectors were used to vectorize the text data and the results were compared.
New York City taxi trip duration prediction
  • Predicted the trip duration of taxi between locations in NYC using XGBoost Regressor.
  • Using the best model, a web app was created using streamlit.
  • Approach: Performed feature engineering to reduce the prediction error. Additional features were generated by adding extra data using table and spatial joins.
  • Use Case: Can be used in logistics domain for estimating the trip duration between two locations.

Certifications