import os
import re
import pandas as pd
...
date_pattern = r"\d{4}-\d{2}-\d{2}"# Date format
amount_pattern = r"\d+(\.\d+)?"# Amount format
location_pattern = r"阿苏港"# Location keyword
# Process text files
for filename in os.listdir(data_folder):
if filename.endswith(".txt"):
filepath = os.path.join(data_folder, filename)
try:
with open(filepath,'r', encoding='utf-8') as f:
...
...
Chen Cheng wrote this piece of code that first defined regular expressions to match dates, amounts, and key locations, and then iterated over all Excel, CSV, and text files on the hard drive, pulled table data using the Pandas library, and used regular expressions to extract matching information.
All the extracted data were stored in a list, eventually converted into a DataFrame and saved to a CSV file, for later analysis.
Mei Xiang stood by, stunned, witnessing the fully engrossed Chen Cheng, as if she was watching a hacker code something remarkable.