544 lines
15 KiB
Markdown
544 lines
15 KiB
Markdown
# Vendor Report Generator
|
|
|
|
A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.
|
|
|
|
> **📘 For Taskboard Integration**: See [TASKBOARD_INTEGRATION_CONTEXT.md](./TASKBOARD_INTEGRATION_CONTEXT.md) for detailed context and integration possibilities.
|
|
|
|
## Features
|
|
|
|
- **Direct Excel Processing**: Reads Excel files directly using pandas
|
|
- **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities
|
|
- **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
|
|
- **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low)
|
|
- **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor
|
|
- **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters
|
|
- **JSON Export**: Exports structured JSON data for further processing
|
|
- **SharePoint Integration**: Automatically download Excel files from SharePoint
|
|
- **Scheduled Generation**: Automatically generate reports on a schedule (interval or cron)
|
|
- **Web API**: REST API for on-demand report generation
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8 or higher
|
|
- Dependencies listed in `requirements.txt`
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository**:
|
|
```bash
|
|
git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
|
|
cd vendor_report
|
|
```
|
|
|
|
2. **Create a virtual environment** (recommended):
|
|
```bash
|
|
python3 -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
3. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Setup
|
|
|
|
1. **Prepare your Excel files**:
|
|
- Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory
|
|
- Ensure your Excel files have the following columns (in order):
|
|
- Column 0: Punchlist Name
|
|
- Column 1: Vendor
|
|
- Column 2: Priority
|
|
- Column 3: Description
|
|
- Column 4: Date Identified
|
|
- Column 5: Status Updates
|
|
- Column 6: Issue Image
|
|
- Column 7: Status
|
|
- Column 8: Date Completed (optional)
|
|
|
|
2. **Create necessary directories** (if they don't exist):
|
|
```bash
|
|
mkdir -p reports output
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
Generate a report from Excel files in the `reports/` directory:
|
|
|
|
```bash
|
|
python3 report_generator.py
|
|
```
|
|
|
|
This will:
|
|
- Process all Excel files in the `reports/` directory
|
|
- Generate a JSON report at `output/report.json`
|
|
- Generate an HTML report at `output/report.html`
|
|
- Save preprocessed data to `output/preprocessed_data.txt`
|
|
|
|
### Command-Line Options
|
|
|
|
```bash
|
|
python3 report_generator.py [OPTIONS]
|
|
```
|
|
|
|
**Options**:
|
|
- `--reports-dir DIR`: Directory containing Excel files (default: `reports`)
|
|
- `--output FILE`: Output JSON file path (default: `output/report.json`)
|
|
- `--verbose`: Print verbose output (default: True)
|
|
|
|
**Examples**:
|
|
|
|
```bash
|
|
# Use a custom reports directory
|
|
python3 report_generator.py --reports-dir /path/to/excel/files
|
|
|
|
# Specify custom output file
|
|
python3 report_generator.py --output /path/to/output/report.json
|
|
|
|
# Combine options
|
|
python3 report_generator.py --reports-dir my_reports --output my_output/report.json
|
|
```
|
|
|
|
### Programmatic Usage
|
|
|
|
You can also use the report generator in your own Python scripts:
|
|
|
|
```python
|
|
from report_generator import generate_report
|
|
|
|
# Generate report with default settings
|
|
report_data = generate_report()
|
|
|
|
# Or with custom settings
|
|
report_data = generate_report(
|
|
reports_dir="my_reports",
|
|
output_file="my_output/report.json",
|
|
verbose=True
|
|
)
|
|
|
|
# report_data is a dictionary containing the full report structure
|
|
print(f"Processed {len(report_data['vendors'])} vendors")
|
|
```
|
|
|
|
## Report Structure
|
|
|
|
### JSON Report Structure
|
|
|
|
The generated JSON report follows this structure:
|
|
|
|
```json
|
|
{
|
|
"report_generated_at": "2025-11-05T22:00:00",
|
|
"vendors": [
|
|
{
|
|
"vendor_name": "VendorName",
|
|
"total_items": 10,
|
|
"closed_count": 5,
|
|
"open_count": 3,
|
|
"monitor_count": 2,
|
|
"updates_24h": {
|
|
"added": [...],
|
|
"closed": [...],
|
|
"changed_to_monitor": [...]
|
|
},
|
|
"oldest_unaddressed": [...],
|
|
"very_high_priority_items": [...],
|
|
"high_priority_items": [...],
|
|
"closed_items": [...],
|
|
"monitor_items": [...],
|
|
"open_items": [...]
|
|
}
|
|
],
|
|
"summary": {
|
|
"total_vendors": 5,
|
|
"total_items": 50,
|
|
"total_closed": 25,
|
|
"total_open": 15,
|
|
"total_monitor": 10
|
|
}
|
|
}
|
|
```
|
|
|
|
### HTML Report Features
|
|
|
|
The HTML report includes:
|
|
|
|
- **Summary Cards**: Overview statistics at the top
|
|
- **Vendor Tabs**: Quick navigation between vendors
|
|
- **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
|
|
- **Search & Filters**:
|
|
- Search by item name or description
|
|
- Filter by vendor, status, or priority
|
|
- **Quick Filters**:
|
|
- Show only vendors with yesterday's updates
|
|
- Show only vendors with oldest unaddressed items
|
|
- Show all vendors
|
|
- **Interactive Elements**: Click tabs to switch views, use filters to narrow down results
|
|
|
|
## Data Processing Details
|
|
|
|
### Vendor Name Normalization
|
|
|
|
The tool automatically normalizes vendor names:
|
|
- Handles case variations (e.g., "autstand" → "Autstand")
|
|
- Preserves intentional capitalization (e.g., "AutStand" stays as-is)
|
|
- Normalizes combined vendors (e.g., "Autstand/Beumer")
|
|
- Handles vendors in parentheses (e.g., "MFO (Amazon)")
|
|
|
|
### Status Normalization
|
|
|
|
Statuses are normalized to:
|
|
- **Complete**: Items with status containing "complete" or "complette"
|
|
- **Monitor**: Items with status containing "monitor" or "montor"
|
|
- **Incomplete**: All other items (default)
|
|
|
|
### Priority Classification
|
|
|
|
Priorities are classified as:
|
|
- **Very High**: Priority contains "(1) Very High" or "Very High"
|
|
- **High**: Priority contains "(2) High" or "High" (but not "Very High")
|
|
- **Medium**: Priority contains "(3) Medium" or "Medium"
|
|
- **Low**: Priority contains "(4) Low" or "Low"
|
|
|
|
### 24-Hour Window Calculation
|
|
|
|
The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates:
|
|
- Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date
|
|
- Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date
|
|
- Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window
|
|
|
|
## Output Files
|
|
|
|
After running the generator, you'll find:
|
|
|
|
- `output/report.json`: Structured JSON report data
|
|
- `output/report.html`: Interactive HTML report (open in browser)
|
|
- `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
vendor_report/
|
|
├── report_generator.py # Main report generation script
|
|
├── data_preprocessor.py # Excel data preprocessing and normalization
|
|
├── html_generator.py # HTML report generation
|
|
├── models.py # Pydantic data models
|
|
├── excel_to_text.py # Utility for Excel to text conversion
|
|
├── sharepoint_downloader.py # SharePoint file downloader
|
|
├── scheduler.py # Scheduled report generation
|
|
├── api_server.py # REST API for on-demand reports
|
|
├── web_ui.py # Web UI for easy access
|
|
├── config.py # Configuration management
|
|
├── config.yaml.template # Configuration template
|
|
├── requirements.txt # Python dependencies
|
|
├── reports/ # Directory for input Excel files
|
|
├── output/ # Directory for generated reports
|
|
└── README.md # This file
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### No Excel files found
|
|
|
|
Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions.
|
|
|
|
### Date parsing errors
|
|
|
|
The tool supports common date formats:
|
|
- `MM/DD/YY` (e.g., `10/14/25`)
|
|
- `MM/DD/YYYY` (e.g., `10/14/2025`)
|
|
- `YYYY-MM-DD` (e.g., `2025-10-17`)
|
|
- `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`)
|
|
|
|
### Permission errors
|
|
|
|
If you encounter permission errors, ensure you have write access to the `output/` directory.
|
|
|
|
### Missing dependencies
|
|
|
|
If you get import errors, ensure all dependencies are installed:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Timezone Notes
|
|
|
|
The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.
|
|
|
|
## SharePoint Integration
|
|
|
|
The application can automatically download Excel files from SharePoint before generating reports. This is useful when your source data is stored in SharePoint.
|
|
|
|
### Setup SharePoint Integration
|
|
|
|
1. **Create a configuration file**:
|
|
```bash
|
|
cp config.yaml.template config.yaml
|
|
```
|
|
|
|
2. **Edit `config.yaml`** and configure SharePoint settings:
|
|
```yaml
|
|
sharepoint:
|
|
enabled: true
|
|
site_url: "https://yourcompany.sharepoint.com/sites/YourSite"
|
|
folder_path: "/Shared Documents/Reports"
|
|
local_dir: "reports"
|
|
use_app_authentication: true # Recommended for automation
|
|
client_id: "your-azure-ad-client-id"
|
|
client_secret: "your-azure-ad-client-secret"
|
|
```
|
|
|
|
3. **Authentication Options**:
|
|
|
|
**Option A: App Authentication (Recommended)**
|
|
- Register an app in Azure AD
|
|
- Grant SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All)
|
|
- Use `client_id` and `client_secret` in config
|
|
- Set `use_app_authentication: true`
|
|
|
|
**Option B: User Authentication**
|
|
- Use your SharePoint username and password
|
|
- Set `username` and `password` in config
|
|
- Set `use_app_authentication: false`
|
|
|
|
4. **Test SharePoint download**:
|
|
```bash
|
|
python sharepoint_downloader.py
|
|
```
|
|
|
|
### Manual SharePoint Download
|
|
|
|
Download files from SharePoint without generating a report:
|
|
```bash
|
|
python sharepoint_downloader.py
|
|
```
|
|
|
|
## Scheduled Report Generation
|
|
|
|
The application can automatically generate reports on a schedule, optionally downloading from SharePoint first.
|
|
|
|
### Setup Scheduling
|
|
|
|
1. **Edit `config.yaml`**:
|
|
```yaml
|
|
scheduler:
|
|
enabled: true
|
|
schedule_type: "interval" # or "cron"
|
|
interval_hours: 24 # Generate every 24 hours
|
|
# OR use cron expression:
|
|
# cron_expression: "0 8 * * *" # 8 AM daily
|
|
timezone: "America/New_York"
|
|
```
|
|
|
|
2. **Start the scheduler**:
|
|
```bash
|
|
python scheduler.py
|
|
```
|
|
|
|
The scheduler will run continuously and generate reports according to your schedule.
|
|
|
|
3. **Schedule Types**:
|
|
- **interval**: Generate report every N hours
|
|
- **cron**: Use cron expression for precise scheduling (e.g., "0 8 * * *" for 8 AM daily)
|
|
- **once**: Run once immediately (for testing)
|
|
|
|
### Running Scheduler as a Service
|
|
|
|
**Linux (systemd)**:
|
|
```bash
|
|
# Create service file: /etc/systemd/system/vendor-report-scheduler.service
|
|
[Unit]
|
|
Description=Vendor Report Scheduler
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=your-user
|
|
WorkingDirectory=/path/to/vendor_report
|
|
ExecStart=/usr/bin/python3 /path/to/vendor_report/scheduler.py
|
|
Restart=always
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
|
|
# Enable and start
|
|
sudo systemctl enable vendor-report-scheduler
|
|
sudo systemctl start vendor-report-scheduler
|
|
```
|
|
|
|
**Windows (Task Scheduler)**:
|
|
- Create a scheduled task that runs `python scheduler.py` at startup or on a schedule
|
|
|
|
## Web UI & On-Demand Report Generation
|
|
|
|
The application includes both a **Web UI** and a **REST API** for generating reports on demand.
|
|
|
|
### Web UI (Recommended for Easy Access)
|
|
|
|
A simple, user-friendly web interface for generating reports without using the terminal.
|
|
|
|
1. **Start the Web UI server**:
|
|
```bash
|
|
python web_ui.py
|
|
```
|
|
|
|
2. **Open in browser**:
|
|
```
|
|
http://localhost:8080
|
|
```
|
|
|
|
3. **Features**:
|
|
- One-click report generation
|
|
- Download from SharePoint & generate (single button)
|
|
- View generated reports
|
|
- View service status
|
|
- View configuration
|
|
- No terminal knowledge required!
|
|
|
|
### REST API
|
|
|
|
The application also includes a REST API for integration with other systems or manual triggers.
|
|
|
|
### Setup API Server
|
|
|
|
1. **Edit `config.yaml`**:
|
|
```yaml
|
|
api:
|
|
enabled: true
|
|
host: "0.0.0.0"
|
|
port: 8080
|
|
api_key: "your-secret-api-key" # Optional, for authentication
|
|
```
|
|
|
|
2. **Start the Web UI** (recommended):
|
|
```bash
|
|
python web_ui.py
|
|
```
|
|
|
|
Then open `http://localhost:8080` in your browser.
|
|
|
|
**OR start the API server** (for programmatic access):
|
|
```bash
|
|
python api_server.py
|
|
```
|
|
|
|
3. **Generate report via API**:
|
|
```bash
|
|
# Without authentication
|
|
curl -X POST http://localhost:8080/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"download_from_sharepoint": true}'
|
|
|
|
# With API key authentication
|
|
curl -X POST http://localhost:8080/api/generate \
|
|
-H "Content-Type: application/json" \
|
|
-H "X-API-Key: your-secret-api-key" \
|
|
-d '{"download_from_sharepoint": true}'
|
|
```
|
|
|
|
### API Endpoints
|
|
|
|
- **POST `/api/generate`**: Generate report on demand
|
|
- Request body (optional):
|
|
```json
|
|
{
|
|
"download_from_sharepoint": true,
|
|
"reports_dir": "reports",
|
|
"output_file": "output/report.json"
|
|
}
|
|
```
|
|
|
|
- **GET `/api/status`**: Get service status and configuration
|
|
|
|
- **GET `/health`**: Health check endpoint
|
|
|
|
### Example: Integration with Webhook
|
|
|
|
You can trigger report generation from SharePoint webhooks, Power Automate, or any HTTP client:
|
|
|
|
```python
|
|
import requests
|
|
|
|
response = requests.post(
|
|
'http://your-server:8080/api/generate',
|
|
json={'download_from_sharepoint': True},
|
|
headers={'X-API-Key': 'your-api-key'}
|
|
)
|
|
print(response.json())
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The application uses a YAML configuration file (`config.yaml`) for all settings. You can also use environment variables:
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# SharePoint
|
|
export SHAREPOINT_ENABLED=true
|
|
export SHAREPOINT_SITE_URL="https://yourcompany.sharepoint.com/sites/YourSite"
|
|
export SHAREPOINT_FOLDER_PATH="/Shared Documents/Reports"
|
|
export SHAREPOINT_CLIENT_ID="your-client-id"
|
|
export SHAREPOINT_CLIENT_SECRET="your-client-secret"
|
|
export SHAREPOINT_USE_APP_AUTH=true
|
|
|
|
# Scheduler
|
|
export SCHEDULER_ENABLED=true
|
|
export SCHEDULER_INTERVAL_HOURS=24
|
|
|
|
# API
|
|
export API_ENABLED=true
|
|
export API_PORT=8080
|
|
export API_KEY="your-api-key"
|
|
```
|
|
|
|
## Complete Workflow Example
|
|
|
|
Here's a complete example setup for automated SharePoint → Report generation:
|
|
|
|
1. **Setup configuration** (`config.yaml`):
|
|
```yaml
|
|
sharepoint:
|
|
enabled: true
|
|
site_url: "https://company.sharepoint.com/sites/Reports"
|
|
folder_path: "/Shared Documents/Vendor Reports"
|
|
use_app_authentication: true
|
|
client_id: "your-client-id"
|
|
client_secret: "your-client-secret"
|
|
|
|
scheduler:
|
|
enabled: true
|
|
schedule_type: "cron"
|
|
cron_expression: "0 8 * * *" # 8 AM daily
|
|
timezone: "America/New_York"
|
|
|
|
report:
|
|
output_dir: "output"
|
|
reports_dir: "reports"
|
|
```
|
|
|
|
2. **Start scheduler**:
|
|
```bash
|
|
python scheduler.py
|
|
```
|
|
|
|
3. **The scheduler will**:
|
|
- Download latest Excel files from SharePoint at 8 AM daily
|
|
- Generate reports automatically
|
|
- Save to `output/report.json` and `output/report.html`
|
|
|
|
## License
|
|
|
|
[Add your license information here]
|
|
|
|
## Contributing
|
|
|
|
[Add contribution guidelines if applicable]
|
|
|
|
## Support
|
|
|
|
For issues or questions, please contact [your contact information or issue tracker URL].
|
|
|