Go to file

ilia gu cf28d462b2 Fix status handling and deduplication

- Fix incomplete status detection: check for 'incomplete' before 'complete' to avoid substring match bug
- Change incomplete status badge color to yellow/orange (warning) instead of red
- Remove redundant 'Open' status card, keep only actual statuses (Closed, Monitor, Incomplete)
- Add deduplication at Excel processing level to prevent duplicate items
- Skip Sheet1 and Comments sheets which contain duplicate data
- Improve HTML deduplication when combining items for All tab
- Fix open_count to use incomplete_count for consistency

2025-11-08 18:20:06 +04:00

output

Fix status handling and deduplication

2025-11-08 18:20:06 +04:00

reports

Fix status handling and deduplication

2025-11-08 18:20:06 +04:00

.gitignore

first commit

2025-11-05 22:40:20 +04:00

data_preprocessor.py

Fix status handling and deduplication

2025-11-08 18:20:06 +04:00

excel_to_text.py

first commit

2025-11-05 22:40:20 +04:00

html_generator.py

Fix status handling and deduplication

2025-11-08 18:20:06 +04:00

models.py

first commit

2025-11-05 22:40:20 +04:00

README.md

Add Incomplete status tab and filtering, update incomplete color consistency

2025-11-06 00:59:12 +04:00

report_generator.py

Fix status handling and deduplication

2025-11-08 18:20:06 +04:00

requirements.txt

Remove missing files from tracking

2025-11-05 22:41:49 +04:00

README.md

Vendor Report Generator

A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.

Features

Direct Excel Processing: Reads Excel files directly using pandas
Data Normalization: Automatically normalizes vendor names, statuses, and priorities
24-Hour Updates: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
Priority Tracking: Groups items by priority levels (Very High, High, Medium, Low)
Oldest Unaddressed Items: Identifies and highlights the oldest 3 unaddressed items per vendor
Interactive HTML Reports: Generates searchable, filterable HTML reports with tabs and filters
JSON Export: Exports structured JSON data for further processing

Requirements

Python 3.8 or higher
Dependencies listed in requirements.txt

Installation

Clone the repository:

git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
cd vendor_report

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Setup

Prepare your Excel files:
- Place your Excel files (.xlsx or .xls) in the reports/ directory
- Ensure your Excel files have the following columns (in order):
  - Column 0: Punchlist Name
  - Column 1: Vendor
  - Column 2: Priority
  - Column 3: Description
  - Column 4: Date Identified
  - Column 5: Status Updates
  - Column 6: Issue Image
  - Column 7: Status
  - Column 8: Date Completed (optional)
Create necessary directories (if they don't exist):
```
mkdir -p reports output
```

Usage

Basic Usage

Generate a report from Excel files in the reports/ directory:

python3 report_generator.py

This will:

Process all Excel files in the reports/ directory
Generate a JSON report at output/report.json
Generate an HTML report at output/report.html
Save preprocessed data to output/preprocessed_data.txt

Command-Line Options

python3 report_generator.py [OPTIONS]

Options:

--reports-dir DIR: Directory containing Excel files (default: reports)
--output FILE: Output JSON file path (default: output/report.json)
--verbose: Print verbose output (default: True)

Examples:

# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files

# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json

# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json

Programmatic Usage

You can also use the report generator in your own Python scripts:

from report_generator import generate_report

# Generate report with default settings
report_data = generate_report()

# Or with custom settings
report_data = generate_report(
    reports_dir="my_reports",
    output_file="my_output/report.json",
    verbose=True
)

# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")

Report Structure

JSON Report Structure

The generated JSON report follows this structure:

{
  "report_generated_at": "2025-11-05T22:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
}

HTML Report Features

The HTML report includes:

Summary Cards: Overview statistics at the top
Vendor Tabs: Quick navigation between vendors
Status Tabs: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
Search & Filters:
- Search by item name or description
- Filter by vendor, status, or priority
Quick Filters:
- Show only vendors with yesterday's updates
- Show only vendors with oldest unaddressed items
- Show all vendors
Interactive Elements: Click tabs to switch views, use filters to narrow down results

Data Processing Details

Vendor Name Normalization

The tool automatically normalizes vendor names:

Handles case variations (e.g., "autstand" → "Autstand")
Preserves intentional capitalization (e.g., "AutStand" stays as-is)
Normalizes combined vendors (e.g., "Autstand/Beumer")
Handles vendors in parentheses (e.g., "MFO (Amazon)")

Status Normalization

Statuses are normalized to:

Complete: Items with status containing "complete" or "complette"
Monitor: Items with status containing "monitor" or "montor"
Incomplete: All other items (default)

Priority Classification

Priorities are classified as:

Very High: Priority contains "(1) Very High" or "Very High"
High: Priority contains "(2) High" or "High" (but not "Very High")
Medium: Priority contains "(3) Medium" or "Medium"
Low: Priority contains "(4) Low" or "Low"

24-Hour Window Calculation

The tool uses Baltimore/Eastern timezone (America/New_York) for calculating 24-hour updates:

Items are considered "added in last 24h" if their date_identified falls on yesterday's date
Items are considered "closed in last 24h" if their date_completed falls on yesterday's date
Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window

Output Files

After running the generator, you'll find:

output/report.json: Structured JSON report data
output/report.html: Interactive HTML report (open in browser)
output/preprocessed_data.txt: Human-readable preprocessed data (for debugging)

Project Structure

vendor_report/
├── report_generator.py      # Main report generation script
├── data_preprocessor.py     # Excel data preprocessing and normalization
├── html_generator.py        # HTML report generation
├── models.py                # Pydantic data models
├── excel_to_text.py         # Utility for Excel to text conversion
├── requirements.txt         # Python dependencies
├── reports/                 # Directory for input Excel files
├── output/                  # Directory for generated reports
└── README.md               # This file

Troubleshooting

No Excel files found

Ensure your Excel files are in the reports/ directory and have .xlsx or .xls extensions.

Date parsing errors

The tool supports common date formats:

MM/DD/YY (e.g., 10/14/25)
MM/DD/YYYY (e.g., 10/14/2025)
YYYY-MM-DD (e.g., 2025-10-17)
YYYY-MM-DD HH:MM:SS (e.g., 2025-10-17 00:00:00)

Permission errors

If you encounter permission errors, ensure you have write access to the output/ directory.

Missing dependencies

If you get import errors, ensure all dependencies are installed:

pip install -r requirements.txt

Timezone Notes

The tool uses Baltimore/Eastern timezone (America/New_York) for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.

License

[Add your license information here]

Contributing

[Add contribution guidelines if applicable]

Support

For issues or questions, please contact [your contact information or issue tracker URL].