Utility for Data Reconciliation: Comparing Table Extracts and Files

Utility for Data Reconciliation: Comparing Table Extracts and Files

·

2 min read

In data reconciliation projects, it's crucial to ensure the integrity and consistency of data between various sources. To address this need, I have developed a Python utility that compares extracts from tables and files. This utility simplifies the process of identifying discrepancies and ensures data accuracy.

Key Features:

  • Compatibility with CSV and Excel: The utility can handle both CSV and Excel file formats, making it versatile for different data sources.

  • Visual Differentiation: It highlights matching cells in green and differing cells in red, providing clear visual feedback.

  • Error Handling: The script includes error handling to manage scenarios where input files may be missing.

Python Utility Code:

import pandas as pd
from openpyxl import Workbook, load_workbook
from openpyxl.styles import PatternFill

def read_file(file):
    if file.endswith('.csv'):
        return pd.read_csv(file, header=None)
    elif file.endswith('.xlsx'):
        return pd.read_excel(file, header=None)
    else:
        raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")

def compare_files(file1, file2, file3):
    try:
        # Load the data from the files
        df1 = read_file(file1)
        df2 = read_file(file2)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        print(f"Please check if the files {file1} and {file2} exist.")
        return

    # Create a new workbook for the result
    wb = Workbook()
    ws = wb.active

    green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
    red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

    # Compare each cell
    for i in range(df1.shape[0]):
        for j in range(df1.shape[1]):
            cell_value1 = df1.iloc[i, j] if i < len(df1) and j < len(df1.columns) else None
            cell_value2 = df2.iloc[i, j] if i < len(df2) and j < len(df2.columns) else None

            if cell_value1 == cell_value2:
                ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
                ws.cell(row=i+1, column=j+1).fill = green_fill
            else:
                ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
                ws.cell(row=i+1, column=j+1).fill = red_fill

    # Save the result to File3
    wb.save(file3)

# Example usage
file1 = 'File1.csv'  # or 'File1.xlsx'
file2 = 'File2.xlsx'  # or 'File2.xlsx'
file3 = 'comparison_results.xlsx'  # result file (should be Excel to handle cell colors)

compare_files(file1, file2, file3)

Result File(comparison_results.xlsx):

Github:

https://github.com/vipinputhanveetil/data-reconcil-util

How to Use:

  1. Install Required Libraries:

    bash

     pip install pandas openpyxl
    
  2. Prepare Your Files:

    • Ensure your input files (File1 and File2) are in either .csv or .xlsx format and place them in the same directory as your script.
  3. Run the Script:

    • Update the file1, file2, and file3 variables with your actual file names.

    • Execute the script to generate the comparison result in File3.

This utility is designed to make data reconciliation more efficient and error-free, saving you valuable time and effort. If you have any questions or need further customization, feel free to reach out!