Add data processing and sampling for fake news dataset

This commit is contained in:
2025-03-26 17:49:39 +02:00
parent 97466edeae
commit 1dc796b59e
7 changed files with 737 additions and 187 deletions

6
archives/fnc1b.log Normal file
View File

@@ -0,0 +1,6 @@
nohup: ignoring input
🔍 Loading data from Parquet file at 'processed_fakenews.parquet'
🔍 Dataset contains 8,528,956 rows.
📉 Reducing dataset from 8,528,956 to 852,895 rows...
✅ Sample contains 852,895 rows (expected 852,895 rows)
💾 Sample saved to 'sampled_fakenews.csv' and 'sampled_fakenews.parquet'.