Arun Balaji Ramathilagam

Virudhunagar · Tamil Nadu, India · arunthilak95@gmail.com

Junior Data Scientist with ~3 years of experience, proficient in Python, SQL, Advanced Excel, and GIS with a strong foundation in Machine Learning. Skilled at enabling cross-functional teams with actionable insights and workflow automation across business functions.


Experience

Junior Data Scientist

Mitti Labs
  • Created and optimized spatial SQL queries in Metabase to generate reports of rice sowing progression at project sites, supporting strategic planning and on-ground decision-making.
  • Automated agricultural field boundary segmentation using satellite images using the Segment Anything Model (SAM) and U-Net model, reducing manual annotation efforts from months to weeks. Implemented metrics to perform quick quality checks on the predicted farm boundaries.
  • Replaced manual NDVI profile extraction with unsupervised clustering techniques (K-means and hierarchical clustering) to handle scarce labels.
  • Extracted and processed MODIS and Sentinel-2 data using the Google Earth Engine API to identify Crop Rotation, Start, End, Peak and Length of Seasons at field level and village levels.
  • Worked cross-functionally with science and engineering teams to understand spatial data requirements and deliver tailored geospatial solutions.
February 2024 - Present

Junior Research Fellow

SASTRA University - School of Computing
  • Used 1D CNN, LSTM and transformer models for crop classification using time series Sentinel-1 and Sentinel-2 data.
  • Developed unique research methodology to develop NDVI from SAR data using pix2pix GAN.
  • Used UNet, ResNet and LinkNet for agricultural field boundary extraction using high resolution Indian Remote Sensing data and compared the results.
  • Pre-processed the remote sensing data using GDAL, Numpy, SNAP and QGIS.
June 2021 - March 2023

Junior Research Fellow

Shiv Nadar University - Department of Civil Engineering
  • Collected spectral reflectance data from agricultural fields everyday using spectroradiometer required for crop water stress monitoring.
  • Automated preprocessing of reflectance data and derived 10 different spectral indices using python script.
  • Compiled reports and documented research methodology.
January 2021 - May 2021

Volunteer

Humanitarian Openstreetmap
  • Adding and improving base data in OpenStreetMap using Satellite imagery and local knowledge.
  • Validating various HOTOSM tasks across the world.
  • One of the active OSM contributors in India.
August 2016 - Present

Intern

Mahalanobis National Crop Forecast Center
  • Classifying paddy growing areas using multi-temporal RISAT-1 data.
  • Predicting the transplanting dates, fresh biomass and grain yield for paddy.
August 2016 - November 2016

Education

University of Twente - Indian Institute of Remote Sensing, ISRO (ITC-IIRS JEP)

Master of Science in Geoinformatics

GPA: 7.93

  • Estimating pearl millet biomass, LAI and crop height from RADARSAT-2 data using the Water Cloud Model.
  • Published a paper titled "Evaluation of different Machine Learning classifiers for Pearl Millet crop classification using Sentinel-1 and RADARSAT-2 data" using Decision Tree, Random Forest and SVM classifiers
  • Received IIRS Golden Jubilee Scholarship for scoring highest marks in the Earth Observation module.
September 2018 - June 2020

Tamil Nadu Agricultural University

Bachelor of Technology in Agricultural Information Technology

GPA: 8.37

  • Published a paper titled "Area estimation of cotton and maize crops in Perambalur district of Tamil Nadu using multi date Sentinel-1A SAR and optical data"
  • Was part of the University Basketball team and won tournaments at state and national level.
July 2013 - May 2017

Skills

Programming Languages
  • Python
  • SQL
  • R
Libraries/Frameworks
  • Numpy
  • Pandas
  • Scikit-learn
  • Keras
  • Tensorflow
  • Seaborn
  • Matplotlib
  • Rasterio
  • GDAL
  • Geopandas
  • Basics of Streamlit
Tools
  • JOSM
  • ArcGIS
  • QGIS
  • PostGIS
  • ERDAS
  • Basics of Google Earth Engine
  • Tableau
  • SNAP

Projects

Donor Choose Classification
  • 100k project proposals were classified into two classes using Decision Tree, Naive Bayes, XGBoost and LSTM and cross-validated
  • Pre-processed data, including encoding with TF-IDF and Word2Vec, visualizing with t-SNE, transforming and performing initial analysis
  • Conducted hyper-parameter tuning using k-fold cross validation to get the maximum AUC score.
Microsoft Malware Detection
  • Used Random Forest and XGBoost classifiers to identify malware using .byte and .asm files
  • Pre-processed data, including cleaning, encoding, visualizing, transforming and performing initial analysis
  • Preformed feature engineering by creating using bi-grams from byte files, pixel values from .asm files and selected the best features to optimize model performance
  • Conducted hyper-parameter tuning to increase performance of XGBoost model
Document Classification using CNN
  • Classified a total of 18828 text files into 20 different classes using 1D CNN with Tensorflow and Keras.
  • Preprocessed the text data using different python packages like regex to remove tabs, spaces, emails, numbers, special character and SpaCy for chunking the text
  • Embedding layer using Keras and pre-trained GloVe vectors were used to vectorize the text data and the results were compared.
New York City taxi trip duration prediction
  • Predicted the trip duration of taxi between locations in NYC using XGBoost Regressor.
  • Using the best model, a web app was created using streamlit.
  • Approach: Performed feature engineering to reduce the prediction error. Additional features were generated by adding extra data using table and spatial joins.
  • Use Case: Can be used in logistics domain for estimating the trip duration between two locations.

Certifications