From f294ac155e3ba794869ab58501fe711f84ab8328 Mon Sep 17 00:00:00 2001 From: ilia gu Date: Wed, 5 Nov 2025 22:43:06 +0400 Subject: [PATCH] Add comprehensive README with setup and usage instructions --- README.md | 270 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 270 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..3cdab14 --- /dev/null +++ b/README.md @@ -0,0 +1,270 @@ +# Vendor Report Generator + +A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports. + +## Features + +- **Direct Excel Processing**: Reads Excel files directly using pandas (no LLM required) +- **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities +- **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone) +- **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low) +- **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor +- **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters +- **JSON Export**: Exports structured JSON data for further processing + +## Requirements + +- Python 3.8 or higher +- Dependencies listed in `requirements.txt` + +## Installation + +1. **Clone the repository**: + ```bash + git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git + cd vendor_report + ``` + +2. **Create a virtual environment** (recommended): + ```bash + python3 -m venv venv + source venv/bin/activate # On Windows: venv\Scripts\activate + ``` + +3. **Install dependencies**: + ```bash + pip install -r requirements.txt + ``` + +## Setup + +1. **Prepare your Excel files**: + - Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory + - Ensure your Excel files have the following columns (in order): + - Column 0: Punchlist Name + - Column 1: Vendor + - Column 2: Priority + - Column 3: Description + - Column 4: Date Identified + - Column 5: Status Updates + - Column 6: Issue Image + - Column 7: Status + - Column 8: Date Completed (optional) + +2. **Create necessary directories** (if they don't exist): + ```bash + mkdir -p reports output + ``` + +## Usage + +### Basic Usage + +Generate a report from Excel files in the `reports/` directory: + +```bash +python3 report_generator.py +``` + +This will: +- Process all Excel files in the `reports/` directory +- Generate a JSON report at `output/report.json` +- Generate an HTML report at `output/report.html` +- Save preprocessed data to `output/preprocessed_data.txt` + +### Command-Line Options + +```bash +python3 report_generator.py [OPTIONS] +``` + +**Options**: +- `--reports-dir DIR`: Directory containing Excel files (default: `reports`) +- `--output FILE`: Output JSON file path (default: `output/report.json`) +- `--verbose`: Print verbose output (default: True) + +**Examples**: + +```bash +# Use a custom reports directory +python3 report_generator.py --reports-dir /path/to/excel/files + +# Specify custom output file +python3 report_generator.py --output /path/to/output/report.json + +# Combine options +python3 report_generator.py --reports-dir my_reports --output my_output/report.json +``` + +### Programmatic Usage + +You can also use the report generator in your own Python scripts: + +```python +from report_generator import generate_report + +# Generate report with default settings +report_data = generate_report() + +# Or with custom settings +report_data = generate_report( + reports_dir="my_reports", + output_file="my_output/report.json", + verbose=True +) + +# report_data is a dictionary containing the full report structure +print(f"Processed {len(report_data['vendors'])} vendors") +``` + +## Report Structure + +### JSON Report Structure + +The generated JSON report follows this structure: + +```json +{ + "report_generated_at": "2025-11-05T22:00:00", + "vendors": [ + { + "vendor_name": "VendorName", + "total_items": 10, + "closed_count": 5, + "open_count": 3, + "monitor_count": 2, + "updates_24h": { + "added": [...], + "closed": [...], + "changed_to_monitor": [...] + }, + "oldest_unaddressed": [...], + "very_high_priority_items": [...], + "high_priority_items": [...], + "closed_items": [...], + "monitor_items": [...], + "open_items": [...] + } + ], + "summary": { + "total_vendors": 5, + "total_items": 50, + "total_closed": 25, + "total_open": 15, + "total_monitor": 10 + } +} +``` + +### HTML Report Features + +The HTML report includes: + +- **Summary Cards**: Overview statistics at the top +- **Vendor Tabs**: Quick navigation between vendors +- **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open) +- **Search & Filters**: + - Search by item name or description + - Filter by vendor, status, or priority +- **Quick Filters**: + - Show only vendors with yesterday's updates + - Show only vendors with oldest unaddressed items + - Show all vendors +- **Interactive Elements**: Click tabs to switch views, use filters to narrow down results + +## Data Processing Details + +### Vendor Name Normalization + +The tool automatically normalizes vendor names: +- Handles case variations (e.g., "autstand" → "Autstand") +- Preserves intentional capitalization (e.g., "AutStand" stays as-is) +- Normalizes combined vendors (e.g., "Autstand/Beumer") +- Handles vendors in parentheses (e.g., "MFO (Amazon)") + +### Status Normalization + +Statuses are normalized to: +- **Complete**: Items with status containing "complete" or "complette" +- **Monitor**: Items with status containing "monitor" or "montor" +- **Incomplete**: All other items (default) + +### Priority Classification + +Priorities are classified as: +- **Very High**: Priority contains "(1) Very High" or "Very High" +- **High**: Priority contains "(2) High" or "High" (but not "Very High") +- **Medium**: Priority contains "(3) Medium" or "Medium" +- **Low**: Priority contains "(4) Low" or "Low" + +### 24-Hour Window Calculation + +The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates: +- Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date +- Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date +- Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window + +## Output Files + +After running the generator, you'll find: + +- `output/report.json`: Structured JSON report data +- `output/report.html`: Interactive HTML report (open in browser) +- `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging) + +## Project Structure + +``` +vendor_report/ +├── report_generator.py # Main report generation script +├── data_preprocessor.py # Excel data preprocessing and normalization +├── html_generator.py # HTML report generation +├── models.py # Pydantic data models +├── excel_to_text.py # Utility for Excel to text conversion +├── requirements.txt # Python dependencies +├── reports/ # Directory for input Excel files +├── output/ # Directory for generated reports +└── README.md # This file +``` + +## Troubleshooting + +### No Excel files found + +Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions. + +### Date parsing errors + +The tool supports common date formats: +- `MM/DD/YY` (e.g., `10/14/25`) +- `MM/DD/YYYY` (e.g., `10/14/2025`) +- `YYYY-MM-DD` (e.g., `2025-10-17`) +- `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`) + +### Permission errors + +If you encounter permission errors, ensure you have write access to the `output/` directory. + +### Missing dependencies + +If you get import errors, ensure all dependencies are installed: +```bash +pip install -r requirements.txt +``` + +## Timezone Notes + +The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects. + +## License + +[Add your license information here] + +## Contributing + +[Add contribution guidelines if applicable] + +## Support + +For issues or questions, please contact [your contact information or issue tracker URL]. +