Skip to main content

Excel Parser

What does Excel Parser do?

Excel Parser takes an excel file, detects the number of sheets in it and dynamically creates tables based on data types. For each file uploaded, Excel Parser creates a new SQLite db, sanitizes the column names, creates SQLite tables, and insterts data into the tables. Excel Parser also maintains the queries which have been ran for each file so a user can verify for themselves. Excel parser also supports an API for running raw SQL queries on the database created for any file.

Setting up locally

Clone the repository locally

git clone https://github.com/BharatSahAIyak/excel-parser.git && cd excel-parser

Create and activate venv. Ex:

python -m venv venv &&
source venv/bin/activate

This project uses poetry.

pip install poetry==1.7.0
poetry install

Start the django server

python manage.py runserver <PORT NUMBER>

Architecture Diagram

Architecture Diagram

API Specification

Download

POST
/download/ Download files associated with an taskId

Request Body

NameTypeRequiredDescription
taskIDStringYESThe Id returned when we start processing an excel using /process/
formatStringNOThe Format which we want the data in. supports xlsx,db and sql

If no format is provided for an taskId, all three supported formats are returned

Example request using cURL

  • Format is not passed
curl --location 'localhost:8000/download/' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"' \

{
"data": {
"taskId": "d3607557-4263-4285-afb7-e6e15861ad27",
"url": "MINIO LINK FOR THE XLSX FILE",
"files": {
"d3607557-4263-4285-afb7-e6e15861ad27": {
"url": {
"xlsx": "MINIO LINK FOR THE XLSX file uploaded",
"sql": "MINIO LINK FOR THE sql file",
"db": "MINIO LINK FOR THE DB"
}
}
}
},
"error": false
}
  • When format is passed
curl --location 'localhost:8000/download/' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"' \
--form 'format="db"'
{
"data": {
"taskId": "d3607557-4263-4285-afb7-e6e15861ad27",
"url": "MINIO LINK FOR THE XLSX FILE",
"files": {
"d3607557-4263-4285-afb7-e6e15861ad27": {
"url": {
"db": "MINIO LINK FOR THE DB"
}
}
}
},
"error": false
}

Process

POST
/process/ Upload an excel file and create the DB for its data

Request Body

NameTypeRequiredDescription
fileFileYESThe file we want to process

Example request using cURL

curl --location POST 'localhost:8000/process/' \
--form 'file=@"/home/shady/Downloads/Testing Spreadsheet.xlsx"'
{
"data": {
"taskId": "5c36800e-2a94-4261-8129-69e3596a1a05",
"sqlURL": "MinIO for the SQL file created",
"dbURL": "MinIO for the database created",
"xlsxURL": "MinIO for the xlsx file uploaded"
},
"error": false
}

Query

POST
/query/ Run raw SQL quesies on the DB for an taskId

Request Body

NameTypeRequiredDescription
taskIdStringYESThe Id returned when we start processing an excel using /process/
queryStringYESThe query we want to run on the database created for taskId

Example request using cURL

curl --location 'localhost:8000/query/' \
--form 'query="SELECT name FROM sqlite_master WHERE type='\''table'\'';"' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"'

{
"data": {
"rows": [
{
"name": "GenderSplit"
},
{
"name": "GenderQnA"
},
{
"name": "Bank"
}
]
},
"error": false
}