In data reconciliation projects, it's crucial to ensure the integrity and consistency of data between various sources. To address this need, I have developed a Python utility that compares extracts from tables and files. This utility simplifies the process of identifying discrepancies and ensures data accuracy.
Key Features:
Compatibility with CSV and Excel: The utility can handle both CSV and Excel file formats, making it versatile for different data sources.
Visual Differentiation: It highlights matching cells in green and differing cells in red, providing clear visual feedback.
Error Handling: The script includes error handling to manage scenarios where input files may be missing.
Python Utility Code:
import pandas as pd
from openpyxl import Workbook, load_workbook
from openpyxl.styles import PatternFill
def read_file(file):
if file.endswith('.csv'):
return pd.read_csv(file, header=None)
elif file.endswith('.xlsx'):
return pd.read_excel(file, header=None)
else:
raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
def compare_files(file1, file2, file3):
try:
# Load the data from the files
df1 = read_file(file1)
df2 = read_file(file2)
except FileNotFoundError as e:
print(f"Error: {e}")
print(f"Please check if the files {file1} and {file2} exist.")
return
# Create a new workbook for the result
wb = Workbook()
ws = wb.active
green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
# Compare each cell
for i in range(df1.shape[0]):
for j in range(df1.shape[1]):
cell_value1 = df1.iloc[i, j] if i < len(df1) and j < len(df1.columns) else None
cell_value2 = df2.iloc[i, j] if i < len(df2) and j < len(df2.columns) else None
if cell_value1 == cell_value2:
ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
ws.cell(row=i+1, column=j+1).fill = green_fill
else:
ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
ws.cell(row=i+1, column=j+1).fill = red_fill
# Save the result to File3
wb.save(file3)
# Example usage
file1 = 'File1.csv' # or 'File1.xlsx'
file2 = 'File2.xlsx' # or 'File2.xlsx'
file3 = 'comparison_results.xlsx' # result file (should be Excel to handle cell colors)
compare_files(file1, file2, file3)
Result File(comparison_results.xlsx):
Github:
https://github.com/vipinputhanveetil/data-reconcil-util
How to Use:
Install Required Libraries:
bash
pip install pandas openpyxl
Prepare Your Files:
- Ensure your input files (
File1
andFile2
) are in either.csv
or.xlsx
format and place them in the same directory as your script.
- Ensure your input files (
Run the Script:
Update the
file1
,file2
, andfile3
variables with your actual file names.Execute the script to generate the comparison result in
File3
.
This utility is designed to make data reconciliation more efficient and error-free, saving you valuable time and effort. If you have any questions or need further customization, feel free to reach out!