Vendor Report Generator
A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.
📘 For Taskboard Integration: See TASKBOARD_INTEGRATION_CONTEXT.md for detailed context and integration possibilities.
Features
- Direct Excel Processing: Reads Excel files directly using pandas
- Data Normalization: Automatically normalizes vendor names, statuses, and priorities
- 24-Hour Updates: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
- Priority Tracking: Groups items by priority levels (Very High, High, Medium, Low)
- Oldest Unaddressed Items: Identifies and highlights the oldest 3 unaddressed items per vendor
- Interactive HTML Reports: Generates searchable, filterable HTML reports with tabs and filters
- JSON Export: Exports structured JSON data for further processing
- SharePoint Integration: Automatically download Excel files from SharePoint
- Scheduled Generation: Automatically generate reports on a schedule (interval or cron)
- Web API: REST API for on-demand report generation
Requirements
- Python 3.8 or higher
- Dependencies listed in
requirements.txt
Installation
-
Clone the repository:
git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git cd vendor_report -
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt
Setup
-
Prepare your Excel files:
- Place your Excel files (
.xlsxor.xls) in thereports/directory - Ensure your Excel files have the following columns (in order):
- Column 0: Punchlist Name
- Column 1: Vendor
- Column 2: Priority
- Column 3: Description
- Column 4: Date Identified
- Column 5: Status Updates
- Column 6: Issue Image
- Column 7: Status
- Column 8: Date Completed (optional)
- Place your Excel files (
-
Create necessary directories (if they don't exist):
mkdir -p reports output
Usage
Basic Usage
Generate a report from Excel files in the reports/ directory:
python3 report_generator.py
This will:
- Process all Excel files in the
reports/directory - Generate a JSON report at
output/report.json - Generate an HTML report at
output/report.html - Save preprocessed data to
output/preprocessed_data.txt
Command-Line Options
python3 report_generator.py [OPTIONS]
Options:
--reports-dir DIR: Directory containing Excel files (default:reports)--output FILE: Output JSON file path (default:output/report.json)--verbose: Print verbose output (default: True)
Examples:
# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files
# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json
# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json
Programmatic Usage
You can also use the report generator in your own Python scripts:
from report_generator import generate_report
# Generate report with default settings
report_data = generate_report()
# Or with custom settings
report_data = generate_report(
reports_dir="my_reports",
output_file="my_output/report.json",
verbose=True
)
# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")
Report Structure
JSON Report Structure
The generated JSON report follows this structure:
{
"report_generated_at": "2025-11-05T22:00:00",
"vendors": [
{
"vendor_name": "VendorName",
"total_items": 10,
"closed_count": 5,
"open_count": 3,
"monitor_count": 2,
"updates_24h": {
"added": [...],
"closed": [...],
"changed_to_monitor": [...]
},
"oldest_unaddressed": [...],
"very_high_priority_items": [...],
"high_priority_items": [...],
"closed_items": [...],
"monitor_items": [...],
"open_items": [...]
}
],
"summary": {
"total_vendors": 5,
"total_items": 50,
"total_closed": 25,
"total_open": 15,
"total_monitor": 10
}
}
HTML Report Features
The HTML report includes:
- Summary Cards: Overview statistics at the top
- Vendor Tabs: Quick navigation between vendors
- Status Tabs: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
- Search & Filters:
- Search by item name or description
- Filter by vendor, status, or priority
- Quick Filters:
- Show only vendors with yesterday's updates
- Show only vendors with oldest unaddressed items
- Show all vendors
- Interactive Elements: Click tabs to switch views, use filters to narrow down results
Data Processing Details
Vendor Name Normalization
The tool automatically normalizes vendor names:
- Handles case variations (e.g., "autstand" → "Autstand")
- Preserves intentional capitalization (e.g., "AutStand" stays as-is)
- Normalizes combined vendors (e.g., "Autstand/Beumer")
- Handles vendors in parentheses (e.g., "MFO (Amazon)")
Status Normalization
Statuses are normalized to:
- Complete: Items with status containing "complete" or "complette"
- Monitor: Items with status containing "monitor" or "montor"
- Incomplete: All other items (default)
Priority Classification
Priorities are classified as:
- Very High: Priority contains "(1) Very High" or "Very High"
- High: Priority contains "(2) High" or "High" (but not "Very High")
- Medium: Priority contains "(3) Medium" or "Medium"
- Low: Priority contains "(4) Low" or "Low"
24-Hour Window Calculation
The tool uses Baltimore/Eastern timezone (America/New_York) for calculating 24-hour updates:
- Items are considered "added in last 24h" if their
date_identifiedfalls on yesterday's date - Items are considered "closed in last 24h" if their
date_completedfalls on yesterday's date - Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window
Output Files
After running the generator, you'll find:
output/report.json: Structured JSON report dataoutput/report.html: Interactive HTML report (open in browser)output/preprocessed_data.txt: Human-readable preprocessed data (for debugging)
Project Structure
vendor_report/
├── report_generator.py # Main report generation script
├── data_preprocessor.py # Excel data preprocessing and normalization
├── html_generator.py # HTML report generation
├── models.py # Pydantic data models
├── excel_to_text.py # Utility for Excel to text conversion
├── sharepoint_downloader.py # SharePoint file downloader
├── scheduler.py # Scheduled report generation
├── api_server.py # REST API for on-demand reports
├── web_ui.py # Web UI for easy access
├── config.py # Configuration management
├── config.yaml.template # Configuration template
├── requirements.txt # Python dependencies
├── reports/ # Directory for input Excel files
├── output/ # Directory for generated reports
└── README.md # This file
Troubleshooting
No Excel files found
Ensure your Excel files are in the reports/ directory and have .xlsx or .xls extensions.
Date parsing errors
The tool supports common date formats:
MM/DD/YY(e.g.,10/14/25)MM/DD/YYYY(e.g.,10/14/2025)YYYY-MM-DD(e.g.,2025-10-17)YYYY-MM-DD HH:MM:SS(e.g.,2025-10-17 00:00:00)
Permission errors
If you encounter permission errors, ensure you have write access to the output/ directory.
Missing dependencies
If you get import errors, ensure all dependencies are installed:
pip install -r requirements.txt
Timezone Notes
The tool uses Baltimore/Eastern timezone (America/New_York) for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.
SharePoint Integration
The application can automatically download Excel files from SharePoint before generating reports. This is useful when your source data is stored in SharePoint.
Setup SharePoint Integration
-
Create a configuration file:
cp config.yaml.template config.yaml -
Edit
config.yamland configure SharePoint settings:sharepoint: enabled: true site_url: "https://yourcompany.sharepoint.com/sites/YourSite" folder_path: "/Shared Documents/Reports" local_dir: "reports" use_app_authentication: true # Recommended for automation client_id: "your-azure-ad-client-id" client_secret: "your-azure-ad-client-secret" -
Authentication Options:
Option A: App Authentication (Recommended)
- Register an app in Azure AD
- Grant SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All)
- Use
client_idandclient_secretin config - Set
use_app_authentication: true
Option B: User Authentication
- Use your SharePoint username and password
- Set
usernameandpasswordin config - Set
use_app_authentication: false
-
Test SharePoint download:
python sharepoint_downloader.py
Manual SharePoint Download
Download files from SharePoint without generating a report:
python sharepoint_downloader.py
Scheduled Report Generation
The application can automatically generate reports on a schedule, optionally downloading from SharePoint first.
Setup Scheduling
-
Edit
config.yaml:scheduler: enabled: true schedule_type: "interval" # or "cron" interval_hours: 24 # Generate every 24 hours # OR use cron expression: # cron_expression: "0 8 * * *" # 8 AM daily timezone: "America/New_York" -
Start the scheduler:
python scheduler.pyThe scheduler will run continuously and generate reports according to your schedule.
-
Schedule Types:
- interval: Generate report every N hours
- cron: Use cron expression for precise scheduling (e.g., "0 8 * * *" for 8 AM daily)
- once: Run once immediately (for testing)
Running Scheduler as a Service
Linux (systemd):
# Create service file: /etc/systemd/system/vendor-report-scheduler.service
[Unit]
Description=Vendor Report Scheduler
After=network.target
[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/vendor_report
ExecStart=/usr/bin/python3 /path/to/vendor_report/scheduler.py
Restart=always
[Install]
WantedBy=multi-user.target
# Enable and start
sudo systemctl enable vendor-report-scheduler
sudo systemctl start vendor-report-scheduler
Windows (Task Scheduler):
- Create a scheduled task that runs
python scheduler.pyat startup or on a schedule
Web UI & On-Demand Report Generation
The application includes both a Web UI and a REST API for generating reports on demand.
Web UI (Recommended for Easy Access)
A simple, user-friendly web interface for generating reports without using the terminal.
-
Start the Web UI server:
python web_ui.py -
Open in browser:
http://localhost:8080 -
Features:
- One-click report generation
- Download from SharePoint & generate (single button)
- View generated reports
- View service status
- View configuration
- No terminal knowledge required!
REST API
The application also includes a REST API for integration with other systems or manual triggers.
Setup API Server
-
Edit
config.yaml:api: enabled: true host: "0.0.0.0" port: 8080 api_key: "your-secret-api-key" # Optional, for authentication -
Start the Web UI (recommended):
python web_ui.pyThen open
http://localhost:8080in your browser.OR start the API server (for programmatic access):
python api_server.py -
Generate report via API:
# Without authentication curl -X POST http://localhost:8080/api/generate \ -H "Content-Type: application/json" \ -d '{"download_from_sharepoint": true}' # With API key authentication curl -X POST http://localhost:8080/api/generate \ -H "Content-Type: application/json" \ -H "X-API-Key: your-secret-api-key" \ -d '{"download_from_sharepoint": true}'
API Endpoints
-
POST
/api/generate: Generate report on demand- Request body (optional):
{ "download_from_sharepoint": true, "reports_dir": "reports", "output_file": "output/report.json" }
- Request body (optional):
-
GET
/api/status: Get service status and configuration -
GET
/health: Health check endpoint
Example: Integration with Webhook
You can trigger report generation from SharePoint webhooks, Power Automate, or any HTTP client:
import requests
response = requests.post(
'http://your-server:8080/api/generate',
json={'download_from_sharepoint': True},
headers={'X-API-Key': 'your-api-key'}
)
print(response.json())
Configuration
The application uses a YAML configuration file (config.yaml) for all settings. You can also use environment variables:
Environment Variables
# SharePoint
export SHAREPOINT_ENABLED=true
export SHAREPOINT_SITE_URL="https://yourcompany.sharepoint.com/sites/YourSite"
export SHAREPOINT_FOLDER_PATH="/Shared Documents/Reports"
export SHAREPOINT_CLIENT_ID="your-client-id"
export SHAREPOINT_CLIENT_SECRET="your-client-secret"
export SHAREPOINT_USE_APP_AUTH=true
# Scheduler
export SCHEDULER_ENABLED=true
export SCHEDULER_INTERVAL_HOURS=24
# API
export API_ENABLED=true
export API_PORT=8080
export API_KEY="your-api-key"
Complete Workflow Example
Here's a complete example setup for automated SharePoint → Report generation:
-
Setup configuration (
config.yaml):sharepoint: enabled: true site_url: "https://company.sharepoint.com/sites/Reports" folder_path: "/Shared Documents/Vendor Reports" use_app_authentication: true client_id: "your-client-id" client_secret: "your-client-secret" scheduler: enabled: true schedule_type: "cron" cron_expression: "0 8 * * *" # 8 AM daily timezone: "America/New_York" report: output_dir: "output" reports_dir: "reports" -
Start scheduler:
python scheduler.py -
The scheduler will:
- Download latest Excel files from SharePoint at 8 AM daily
- Generate reports automatically
- Save to
output/report.jsonandoutput/report.html
License
[Add your license information here]
Contributing
[Add contribution guidelines if applicable]
Support
For issues or questions, please contact [your contact information or issue tracker URL].