Excel Parser
What does Excel Parser do?
Excel Parser takes an excel file, detects the number of sheets in it and dynamically creates tables based on data types. For each file uploaded, Excel Parser creates a new SQLite db, sanitizes the column names, creates SQLite tables, and insterts data into the tables. Excel Parser also maintains the queries which have been ran for each file so a user can verify for themselves. Excel parser also supports an API for running raw SQL queries on the database created for any file.
Setting up locally
Clone the repository locally
git clone https://github.com/BharatSahAIyak/excel-parser.git && cd excel-parser
Create and activate venv. Ex:
python -m venv venv &&
source venv/bin/activate
This project uses poetry.
pip install poetry==1.7.0
poetry install
Start the django server
python manage.py runserver <PORT NUMBER>
Architecture Diagram
API Specification
Download
POST
/download/ Download files associated with an taskIdRequest Body
Name | Type | Required | Description |
---|---|---|---|
taskID | String | YES | The Id returned when we start processing an excel using /process/ |
format | String | NO | The Format which we want the data in. supports xlsx ,db and sql |
If no format is provided for an taskId
, all three supported formats are returned
Example request using cURL
- Format is not passed
curl --location 'localhost:8000/download/' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"' \
{
"data": {
"taskId": "d3607557-4263-4285-afb7-e6e15861ad27",
"url": "MINIO LINK FOR THE XLSX FILE",
"files": {
"d3607557-4263-4285-afb7-e6e15861ad27": {
"url": {
"xlsx": "MINIO LINK FOR THE XLSX file uploaded",
"sql": "MINIO LINK FOR THE sql file",
"db": "MINIO LINK FOR THE DB"
}
}
}
},
"error": false
}
- When format is passed
curl --location 'localhost:8000/download/' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"' \
--form 'format="db"'
{
"data": {
"taskId": "d3607557-4263-4285-afb7-e6e15861ad27",
"url": "MINIO LINK FOR THE XLSX FILE",
"files": {
"d3607557-4263-4285-afb7-e6e15861ad27": {
"url": {
"db": "MINIO LINK FOR THE DB"
}
}
}
},
"error": false
}
Process
POST
/process/ Upload an excel file and create the DB for its dataRequest Body
Name | Type | Required | Description |
---|---|---|---|
file | File | YES | The file we want to process |
Example request using cURL
curl --location POST 'localhost:8000/process/' \
--form 'file=@"/home/shady/Downloads/Testing Spreadsheet.xlsx"'
{
"data": {
"taskId": "5c36800e-2a94-4261-8129-69e3596a1a05",
"sqlURL": "MinIO for the SQL file created",
"dbURL": "MinIO for the database created",
"xlsxURL": "MinIO for the xlsx file uploaded"
},
"error": false
}
Query
POST
/query/ Run raw SQL quesies on the DB for an taskIdRequest Body
Name | Type | Required | Description |
---|---|---|---|
taskId | String | YES | The Id returned when we start processing an excel using /process/ |
query | String | YES | The query we want to run on the database created for taskId |
Example request using cURL
curl --location 'localhost:8000/query/' \
--form 'query="SELECT name FROM sqlite_master WHERE type='\''table'\'';"' \
--form 'taskId="d3607557-4263-4285-afb7-e6e15861ad27"'
{
"data": {
"rows": [
{
"name": "GenderSplit"
},
{
"name": "GenderQnA"
},
{
"name": "Bank"
}
]
},
"error": false
}