Miscs (Interro 28)
parent
0836d5809d
commit
7e7045293a
60
Readme.org
60
Readme.org
|
|
@ -1,10 +1,11 @@
|
|||
#+title: Script
|
||||
#+author: Sébastien Miquel
|
||||
#+date: 14-03-2026
|
||||
# Time-stamp: <08-05-26 22:52>
|
||||
# Time-stamp: <14-05-26 08:55>
|
||||
#+OPTIONS:
|
||||
|
||||
* Quézaco
|
||||
* Méta
|
||||
** Quézaco
|
||||
|
||||
Ce dépôt contient un certain nombre de script Python que j'utilise
|
||||
pour faire corriger des copies par Gemini.
|
||||
|
|
@ -20,7 +21,7 @@ pour faire corriger des copies par Gemini.
|
|||
4. Ces annotations manuscrites sont lues et recompilées en une
|
||||
version de la copie pour l'élève.
|
||||
|
||||
* Disclaimer
|
||||
** Disclaimer
|
||||
|
||||
J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai
|
||||
fait peu d'efforts pour le rendre universel et simple à l'emploi.
|
||||
|
|
@ -37,9 +38,9 @@ examples du rendu final (dans le sous dossier =BGnot=).
|
|||
Cette situation s'améliorera peut-être, mais faciliter l'utilisation
|
||||
de ce système n'est pas une priorité.
|
||||
|
||||
* Requirements
|
||||
** Requirements
|
||||
|
||||
** Python
|
||||
*** Python
|
||||
|
||||
Libraries :
|
||||
|
||||
|
|
@ -47,13 +48,13 @@ Libraries :
|
|||
pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google
|
||||
#+END_SRC
|
||||
|
||||
** Poppler (for pdf2image)
|
||||
*** Poppler (for pdf2image)
|
||||
|
||||
+ Linux : install poppler-utils
|
||||
+ Windows : Download from: https://github.com/oschwartz10612/poppler-windows
|
||||
and add it to your PATH
|
||||
|
||||
** Accès à Gemini
|
||||
*** Accès à Gemini
|
||||
|
||||
Il faut créer une clef API pour Gemini (pas facile).
|
||||
|
||||
|
|
@ -66,7 +67,7 @@ Puis ajouter =GEMINI_API_KEY= à l'environnement avec :
|
|||
export GEMINI_API_KEY=…
|
||||
#+END_SRC
|
||||
|
||||
* Correction d'un paquet de copies
|
||||
** Correction d'un paquet de copies
|
||||
|
||||
1. Créer un fichier =names= dans le dossier courant, avec les
|
||||
noms/prénoms des élèves, un par ligne
|
||||
|
|
@ -83,7 +84,8 @@ export GEMINI_API_KEY=…
|
|||
pour tel truc, etc)
|
||||
6. Suivre les étapes plus bas.
|
||||
|
||||
* Prétraitement
|
||||
* Étapes et Script
|
||||
** Prétraitement
|
||||
|
||||
1. =./rotate_all.sh Interro=
|
||||
(facultatif)
|
||||
|
|
@ -107,14 +109,14 @@ export GEMINI_API_KEY=…
|
|||
|
||||
Rerun on a single file with =python cutleft.py Interro/Copie01.pdf=
|
||||
|
||||
* Génération d'information sur l'énoncé
|
||||
** Génération d'information sur l'énoncé
|
||||
|
||||
1. =python enonce_info.py Interro= (gestion perso)
|
||||
OU
|
||||
2. =python gemini_for_enonce.py Interro=
|
||||
+ Nécessite =enonce.tex/org= et `correction.tex/org`
|
||||
|
||||
* Labelisation et regroupement
|
||||
** Labelisation et regroupement
|
||||
|
||||
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||
|
||||
|
|
@ -130,25 +132,27 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
|||
+ Quand un label est manquant, il est possible de cliquer sur
|
||||
l'image, ce qui copie les coordonnées dans le presse papier
|
||||
(sous linux…), puis on peut l'ajouter à la main.
|
||||
+ Utilisation de `_`, `|…` et `…|`
|
||||
+ Utilisation de `_`, `|…` et `…|` :
|
||||
+ `|…` n'est pas arrêté verticalement par son type opposé.
|
||||
+ `…|` est stoppé horizontalement par le `|…` le plus proche.
|
||||
Pour modifier une seule copie :
|
||||
=python plotting.py Interro/Copie01.pdf=
|
||||
|
||||
It also generates les =Copie01.json=, à partir des =Copie01_01.json=
|
||||
3. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
|
||||
- Réordonner les pages du fichier pdf
|
||||
- Rerun =python cutleft.py Interro/Copie{id}=
|
||||
- Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
|
||||
vérifier, pas sûr que ça marche.
|
||||
4. =python splitting_int.py Interro=
|
||||
1. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
|
||||
- Réordonner les pages du fichier pdf
|
||||
- Rerun =python cutleft.py Interro/Copie{id}=
|
||||
- Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
|
||||
vérifier, pas sûr que ça marche.
|
||||
3. =python splitting_int.py Interro=
|
||||
|
||||
Découpe les copies suivant les exercices
|
||||
5. =python grouping.py Interro=
|
||||
4. =python grouping.py Interro=
|
||||
|
||||
Regroupe les mêmes questions de différentes copies en groupes de
|
||||
tailles raisonnables.
|
||||
|
||||
* Correction et annotation
|
||||
** Correction et annotation
|
||||
|
||||
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||
|
||||
|
|
@ -170,16 +174,18 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
|||
Pour diminuer le coût, il est possible de batch les requêtes, qui
|
||||
seront alors traitées sous au plus 24h.
|
||||
+ =python correction.py Interro --batch=
|
||||
+ OU =python correction.py Interro --batch-from 'Ex 4'=
|
||||
+ =python submit_batches.py Interro=
|
||||
+ =python batch_status.py=
|
||||
+ =python fetch_batched_results.py Interro=
|
||||
+ =python correction.py Interro --deal-with-batched=
|
||||
3. =python post-correction.py Interro=
|
||||
|
||||
Essaye de corriger des erreurs d'encodage/d'accents dans
|
||||
=correction.json=.
|
||||
- Essaye de corriger des erreurs d'encodage/d'accents dans
|
||||
=correction.json=.
|
||||
- aussi échappe les `_` en dehors du mode math, pour LaTeX.
|
||||
|
||||
* Génération des copies annotées
|
||||
** Génération des copies annotées
|
||||
|
||||
1. =python annotating.py Interro= (facultatif)
|
||||
|
||||
|
|
@ -208,7 +214,7 @@ OU
|
|||
- Vider =Syncthing/Annotées= sur la tablette et localement.
|
||||
À automatiser, aussi c'est lent…
|
||||
|
||||
* Lecture de la correction manuscrite
|
||||
** Lecture de la correction manuscrite
|
||||
|
||||
1. =python from_tablette.py Interro= (gestion perso)
|
||||
|
||||
|
|
@ -243,6 +249,7 @@ OU
|
|||
+ =gestion_classe ne= pour créer l'interro puis
|
||||
+ =gestion_classe we= (set barème here)
|
||||
+ =python update_ods.py Interro=
|
||||
ou =python update_ods.py Interro --sum= (en l'absence de barème)
|
||||
+ =gestion_classe re=
|
||||
+ =gestion_classe wsent=
|
||||
+ =python add_final_score.py Interro21=
|
||||
|
|
@ -252,10 +259,7 @@ OU
|
|||
+ update the copies from =miqmacs.fr/admin=.
|
||||
6. (gestion perso) Impression d'une copie. Via Evince » print to pdf.
|
||||
|
||||
|
||||
|
||||
* Recorrection d'une seule copie (peu testé)
|
||||
|
||||
** Recorrection d'une seule copie (peu testé)
|
||||
|
||||
!! Attention, refaire ne marchera pas si tu fais une annotation non
|
||||
groupée into refaire !!
|
||||
|
|
|
|||
|
|
@ -160,11 +160,35 @@ def main():
|
|||
|
||||
used_prefixes.add(unique_prefix)
|
||||
|
||||
existing_items = set()
|
||||
max_existing_group = 0
|
||||
|
||||
|
||||
if not args.overwrite and os.path.exists(bgnot_dir):
|
||||
for d in os.listdir(bgnot_dir):
|
||||
if d.startswith(f"{unique_prefix} G"):
|
||||
try:
|
||||
g_id = int(d.split(' G')[-1])
|
||||
max_existing_group = max(max_existing_group, g_id)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
bnote_path = os.path.join(bgnot_dir, d, "bnote.json")
|
||||
if os.path.exists(bnote_path):
|
||||
with open(bnote_path, "r") as bf:
|
||||
bdata = json.load(bf)
|
||||
for img in bdata.get("images", []):
|
||||
existing_items.add((img["id"], img["label"]))
|
||||
|
||||
items_to_render = []
|
||||
for sid, lbls in results.items():
|
||||
for lbl in labels:
|
||||
if lbl in lbls:
|
||||
items_to_render.append((sid, lbl, lbls[lbl]))
|
||||
# Only add if it hasn't been generated yet
|
||||
if (sid, lbl) not in existing_items:
|
||||
items_to_render.append((sid, lbl, lbls[lbl]))
|
||||
if not items_to_render:
|
||||
continue
|
||||
|
||||
# Sort structurally: by student id and label
|
||||
items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1])))
|
||||
|
|
@ -217,7 +241,7 @@ def main():
|
|||
batches = batches2
|
||||
|
||||
for i, batch in enumerate(batches, 1):
|
||||
save_batch(batch, unique_prefix, i, root_dir, args.overwrite)
|
||||
save_batch(batch, unique_prefix, max_existing_group + i, root_dir, args.overwrite)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
|
|||
182
correction.py
182
correction.py
|
|
@ -5,14 +5,11 @@ from pathlib import Path
|
|||
import argparse
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
sys.exit("Usage: python script.py InterroTest/Ex 2/Group_1.jpg OR <InputDir>")
|
||||
|
||||
arg_path = Path(sys.argv[1])
|
||||
tasks = [] # List of tuples: (filepath_str, label_str)
|
||||
results = {}
|
||||
sys.exit("Usage: python script.py 'InterroTest/Ex 2/Group_1.jpg' OR <InputDir> OR 'file1' 'file2'")
|
||||
|
||||
# Parse Arguments
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("paths", nargs="+", help="List of images or directories")
|
||||
parser.add_argument("--overwrite", action="store_true",
|
||||
help="Force redo requests even if output exists")
|
||||
parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer")
|
||||
|
|
@ -20,28 +17,40 @@ parser.add_argument("--refaire", action="store_true",
|
|||
help="Redo specific copies/labels defined in refaire.json")
|
||||
parser.add_argument("--batch", action="store_true",
|
||||
help="Generate a JSONL file of requests to send to the Gemini Batch API")
|
||||
parser.add_argument("--batch-from", type=str, metavar="LABEL",
|
||||
help="Do live requests before LABEL, and batch requests from LABEL onwards")
|
||||
parser.add_argument("--deal-with-batched", action="store_true",
|
||||
help="Process a JSONL file containing completed batch results")
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
tasks = [] # List of tuples: (filepath_str, label_str)
|
||||
results = {}
|
||||
|
||||
|
||||
for path_str in args.paths:
|
||||
arg_path = Path(path_str)
|
||||
|
||||
if arg_path.suffix == ".jpg":
|
||||
INPUT_DIR = str(arg_path.parents[1])
|
||||
FULL_LABEL = arg_path.parent.name
|
||||
tasks.append((str(arg_path), FULL_LABEL))
|
||||
results[FULL_LABEL] = []
|
||||
else:
|
||||
# Directory behaviour
|
||||
INPUT_DIR = str(arg_path)
|
||||
if not arg_path.exists():
|
||||
sys.exit(f"Directory {INPUT_DIR} not found.")
|
||||
print(f"Warning: {path_str} not found. Skipping.")
|
||||
continue
|
||||
|
||||
for sub in arg_path.iterdir():
|
||||
if sub.is_dir() and sub.name.startswith("Ex"):
|
||||
label = sub.name
|
||||
if arg_path.is_file() and arg_path.suffix.lower() == ".jpg":
|
||||
# Handle individual file
|
||||
# Note: assumes structure InterroTest/Ex 2/Group_1.jpg to get parents[1]
|
||||
label = arg_path.parent.name
|
||||
tasks.append((str(arg_path), label))
|
||||
if label not in results:
|
||||
results[label] = []
|
||||
for img in sub.glob("*.jpg"):
|
||||
tasks.append((str(img), label))
|
||||
|
||||
elif arg_path.is_dir():
|
||||
# Handle directory (original behavior)
|
||||
for sub in arg_path.iterdir():
|
||||
if sub.is_dir() and sub.name.startswith("Ex"):
|
||||
label = sub.name
|
||||
if label not in results:
|
||||
results[label] = []
|
||||
for img in sub.glob("*.jpg"):
|
||||
tasks.append((str(img), label))
|
||||
|
||||
my_prompt = """I'm giving you an image of several written answers to an exam.
|
||||
|
||||
|
|
@ -135,17 +144,15 @@ You are asked to score the question or exercice labeled `<<label>>`,
|
|||
do not score or give feedback to any other question."""
|
||||
|
||||
def make_prompt(full_label):
|
||||
# l = full_label.split(" ")
|
||||
# ex_label = l[0] + " " + l[1]
|
||||
# text = (Path(INPUT_DIR) / "Text" / ex_label).read_text()
|
||||
# corr = (Path(INPUT_DIR) / "Sol" / ex_label).read_text()
|
||||
# persp = (Path(INPUT_DIR) / "Persp" / ex_label).read_text()
|
||||
def read_longest_prefix_file(subdir):
|
||||
dir_path = Path(INPUT_DIR) / subdir
|
||||
matches = [f for f in dir_path.iterdir() if f.is_file() and full_label.startswith(f.name)]
|
||||
matches = [f for f in dir_path.iterdir()
|
||||
if f.is_file()
|
||||
and full_label.startswith(f.name)
|
||||
and f.suffix not in [".pdf", ".tex"]]
|
||||
if not matches:
|
||||
return ""
|
||||
return max(matches, key=lambda f: len(f.name)).read_text()
|
||||
return max(matches, key=lambda f: len(f.name)).read_text(encoding="utf-8", errors="replace")
|
||||
|
||||
text = read_longest_prefix_file("Text")
|
||||
corr = read_longest_prefix_file("Sol")
|
||||
|
|
@ -482,7 +489,7 @@ def handle_label_errors(pid, label, res, pdf_path):
|
|||
error_type = res.get("error")
|
||||
|
||||
all_labels = read_all_labels(INPUT_DIR)
|
||||
labels_txt = (Path(INPUT_DIR) / "labels").read_text()
|
||||
labels_txt = (Path(INPUT_DIR) / "labels").read_text(encoding="utf-8", errors="replace")
|
||||
enonce = enonce_total(INPUT_DIR)
|
||||
|
||||
if error_type == "wrong-label":
|
||||
|
|
@ -499,7 +506,7 @@ Here is the full content of the exam :
|
|||
|
||||
{enonce}
|
||||
|
||||
Here is a list of all possible lables. You need to answer with one of these :
|
||||
Here is a list of all possible labels. You need to answer with one of these :
|
||||
|
||||
{labels_txt}
|
||||
"""
|
||||
|
|
@ -780,62 +787,89 @@ if __name__ == "__main__":
|
|||
print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr)
|
||||
|
||||
|
||||
if args.batch:
|
||||
batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
|
||||
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
|
||||
if args.batch or args.batch_from:
|
||||
from utils import read_all_labels
|
||||
all_labels = read_all_labels(INPUT_DIR)
|
||||
|
||||
count_flash = 0
|
||||
count_pro = 0
|
||||
batch_tasks = []
|
||||
if args.batch_from:
|
||||
if args.batch_from not in all_labels:
|
||||
sys.exit(f"Error: Label '{args.batch_from}' not found. Available labels: {all_labels}")
|
||||
|
||||
with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
|
||||
open(batch_pro_file, "w", encoding="utf-8") as f_pro:
|
||||
target_idx = all_labels.index(args.batch_from)
|
||||
live_tasks = []
|
||||
|
||||
for task in tasks_to_process:
|
||||
file_path, label = task[0], task[1]
|
||||
group_name = os.path.splitext(file_path)[0]
|
||||
json_path = group_name + '.json'
|
||||
lbl = task[1]
|
||||
# Any label found sequentially equal or after `args.batch_from` gets batched
|
||||
if lbl in all_labels and all_labels.index(lbl) >= target_idx:
|
||||
batch_tasks.append(task)
|
||||
else:
|
||||
live_tasks.append(task)
|
||||
|
||||
with open(json_path, 'r') as jf:
|
||||
group_data = json.load(jf)
|
||||
use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
|
||||
tasks_to_process = live_tasks # Keep live tasks to be run right after
|
||||
else:
|
||||
batch_tasks = tasks_to_process
|
||||
tasks_to_process = [] # Run nothing live if just `--batch`
|
||||
|
||||
image_data = Path(file_path).read_bytes()
|
||||
b64_img = base64.b64encode(image_data).decode("utf-8")
|
||||
if batch_tasks:
|
||||
batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
|
||||
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
|
||||
|
||||
# Format payload matching Gemini Batch API file requirements
|
||||
req = {
|
||||
"key": file_path, # The ID returned in the output file
|
||||
"request": {
|
||||
"contents": [{
|
||||
"role": "user",
|
||||
"parts": [
|
||||
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
|
||||
{"text": make_prompt(label)}
|
||||
]
|
||||
}],
|
||||
"generation_config": {
|
||||
"temperature": 1.0,
|
||||
"topP": 0.95,
|
||||
"maxOutputTokens": 65535,
|
||||
"responseMimeType": "application/json",
|
||||
"responseSchema": UNROLLED_SCHEMA
|
||||
# TypeAdapter(List[EvaluationEntry]).json_schema()
|
||||
count_flash = 0
|
||||
count_pro = 0
|
||||
|
||||
with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
|
||||
open(batch_pro_file, "w", encoding="utf-8") as f_pro:
|
||||
|
||||
for task in batch_tasks:
|
||||
file_path, label = task[0], task[1]
|
||||
group_name = os.path.splitext(file_path)[0]
|
||||
json_path = group_name + '.json'
|
||||
|
||||
with open(json_path, 'r') as jf:
|
||||
group_data = json.load(jf)
|
||||
use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
|
||||
|
||||
image_data = Path(file_path).read_bytes()
|
||||
b64_img = base64.b64encode(image_data).decode("utf-8")
|
||||
|
||||
# Format payload matching Gemini Batch API file requirements
|
||||
req = {
|
||||
"key": file_path, # The ID returned in the output file
|
||||
"request": {
|
||||
"contents": [{
|
||||
"role": "user",
|
||||
"parts": [
|
||||
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
|
||||
{"text": make_prompt(label)}
|
||||
]
|
||||
}],
|
||||
"generation_config": {
|
||||
"temperature": 1.0,
|
||||
"topP": 0.95,
|
||||
"maxOutputTokens": 65535,
|
||||
"responseMimeType": "application/json",
|
||||
"responseSchema": UNROLLED_SCHEMA
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if use_flash:
|
||||
f_flash.write(json.dumps(req) + "\n")
|
||||
count_flash += 1
|
||||
else:
|
||||
f_pro.write(json.dumps(req) + "\n")
|
||||
count_pro += 1
|
||||
if use_flash:
|
||||
f_flash.write(json.dumps(req) + "\n")
|
||||
count_flash += 1
|
||||
else:
|
||||
f_pro.write(json.dumps(req) + "\n")
|
||||
count_pro += 1
|
||||
|
||||
print(f"Batch generation complete.")
|
||||
print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
|
||||
print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
|
||||
print("Upload these files via the File API and create two separate batch jobs.")
|
||||
sys.exit(0)
|
||||
print(f"Batch generation complete.")
|
||||
print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
|
||||
print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
|
||||
print("Upload these files via the File API and create two separate batch jobs.")
|
||||
|
||||
# If there's no live tasks to do, and we aren't doing a batched ingestion, exit right away
|
||||
if not tasks_to_process and not args.deal_with_batched:
|
||||
sys.exit(0)
|
||||
|
||||
batched_responses = {}
|
||||
if args.deal_with_batched:
|
||||
|
|
@ -883,7 +917,7 @@ if __name__ == "__main__":
|
|||
print("Time elapsed : ", end_time - start_time)
|
||||
print("Requests to pro / flash : ", pro_count, flash_count)
|
||||
if errors_summary:
|
||||
print("\n--- Summary of Exceptions ---", file=sys.stderr)
|
||||
print("\n--- Summary of Exceptions (You can use several images on one instance) ---", file=sys.stderr)
|
||||
for (err, file) in errors_summary:
|
||||
print(err, file=sys.stderr)
|
||||
escaped_path = shlex.quote(str(file))
|
||||
|
|
|
|||
|
|
@ -296,7 +296,7 @@ def process_copy_group(group_key, files):
|
|||
continue # Retry immediately
|
||||
else:
|
||||
name = "Unknown"
|
||||
|
||||
annota.name = name
|
||||
# Save result
|
||||
with open(output_json, "w", encoding="utf-8") as f:
|
||||
json.dump(annota.model_dump(), f, indent=2)
|
||||
|
|
|
|||
|
|
@ -12386,6 +12386,7 @@ maternelles
|
|||
maternité
|
||||
mathématicien
|
||||
mathématique
|
||||
mathématiquement
|
||||
mathématiques
|
||||
maths
|
||||
matière
|
||||
|
|
|
|||
|
|
@ -7,6 +7,46 @@ import argparse
|
|||
if len(sys.argv) < 2:
|
||||
sys.exit("Usage: python script.py <InputDir>")
|
||||
|
||||
def escape_latex_underscores(text):
|
||||
r"""
|
||||
Escape '_' outside LaTeX math environments.
|
||||
Supports:
|
||||
- $...$
|
||||
- $$...$$
|
||||
- \( ... \)
|
||||
- \[ ... \]
|
||||
"""
|
||||
|
||||
# Regex matching LaTeX math blocks
|
||||
math_pattern = re.compile(
|
||||
r'(\$\$.*?\$\$|' # $$...$$
|
||||
r'\$.*?\$|' # $...$
|
||||
r'\\\(.*?\\\)|' # \( ... \)
|
||||
r'\\\[.*?\\\])', # \[ ... \]
|
||||
re.DOTALL
|
||||
)
|
||||
|
||||
parts = []
|
||||
last_end = 0
|
||||
|
||||
for match in math_pattern.finditer(text):
|
||||
start, end = match.span()
|
||||
|
||||
# Escape underscores outside math
|
||||
outside = text[last_end:start].replace('_', r'\_')
|
||||
parts.append(outside)
|
||||
|
||||
# Keep math block unchanged
|
||||
parts.append(match.group(0))
|
||||
|
||||
last_end = end
|
||||
|
||||
# Remaining text after last math block
|
||||
outside = text[last_end:].replace('_', r'\_')
|
||||
parts.append(outside)
|
||||
|
||||
return ''.join(parts)
|
||||
|
||||
arg_path = Path(sys.argv[1])
|
||||
tasks = [] # List of tuples: (filepath_str, label_str)
|
||||
results = {}
|
||||
|
|
@ -79,7 +119,8 @@ def clean_string(s: str) -> str:
|
|||
if '\x00' in s:
|
||||
s = fast_fix(s)
|
||||
s = s.replace('\x00', '')
|
||||
return some_other_replacements(s)
|
||||
s = some_other_replacements(s)
|
||||
return escape_latex_underscores(s)
|
||||
|
||||
|
||||
def clean_obj(obj):
|
||||
|
|
|
|||
|
|
@ -8,6 +8,9 @@ import shutil
|
|||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
|
||||
carreau = 1000 // 38
|
||||
|
||||
|
||||
def decode_json(pdf_file):
|
||||
file_path = Path(pdf_file)
|
||||
with open(file_path.with_suffix(".json"), "r") as f:
|
||||
|
|
@ -26,8 +29,7 @@ def decode_json(pdf_file):
|
|||
for d in bb_list:
|
||||
(b, label) = d["box_2d"], d["label"]
|
||||
pn = page_number(b)
|
||||
carreau = 1000 // 38
|
||||
result.append((label, pn, b[0] - int(carreau), b[2]-int(carreau), b[1], b[3]))
|
||||
result.append((label, pn, b[0] - carreau, b[2]-carreau, b[1], b[3]))
|
||||
result.sort(key=lambda x: (x[1], x[2]))
|
||||
return (name, result)
|
||||
|
||||
|
|
@ -98,7 +100,7 @@ def split_an_interro(base_dir, input_pdf, coords_list):
|
|||
|
||||
# RULE 2: Determine stopping label
|
||||
for next_item in coords_list[idx + 1:]:
|
||||
n_clean, n_type, n_pn, n_y_start, _, _, _ = next_item
|
||||
n_clean, n_type, n_pn, n_y_start, n_y_end, _, _ = next_item
|
||||
|
||||
if c_type == "L":
|
||||
is_stop = (n_type in ("L", "N"))
|
||||
|
|
@ -109,7 +111,9 @@ def split_an_interro(base_dir, input_pdf, coords_list):
|
|||
|
||||
if is_stop:
|
||||
end_page = n_pn
|
||||
end_y_target_raw = n_y_start
|
||||
# end_y_target_raw = n_y_start
|
||||
# On avait retiré un carreau précédemment, on le rajoute…
|
||||
end_y_target_raw = min(n_y_start + int(1.25 * carreau), 1000)
|
||||
break
|
||||
|
||||
# RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width)
|
||||
|
|
|
|||
|
|
@ -72,8 +72,6 @@ def main():
|
|||
|
||||
print("-" * 50)
|
||||
print("All batch jobs have been initiated.")
|
||||
print("Save the Batch Job Names above. You can monitor them with:")
|
||||
print(" client.batches.get(name='YOUR_BATCH_JOB_NAME')")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
|
|||
110
update_ods.py
110
update_ods.py
|
|
@ -1,3 +1,4 @@
|
|||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
|
|
@ -12,11 +13,12 @@ ODS_PATH = "/home/sebastien/Rust/gestion_classe/Staging/current_eval.ods"
|
|||
TARGET_DIR_NAME = "A Rendre"
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
# Default to current directory if not provided, or raise error
|
||||
work_dir = os.getcwd()
|
||||
else:
|
||||
work_dir = os.path.abspath(sys.argv[1])
|
||||
parser = argparse.ArgumentParser(description="Update ODS with student scores.")
|
||||
parser.add_argument("work_dir", nargs="?", default=os.getcwd(), help="Directory to process")
|
||||
parser.add_argument("--sum", action="store_true", help="Write only the total sum per student")
|
||||
args = parser.parse_args()
|
||||
|
||||
work_dir = os.path.abspath(args.work_dir)
|
||||
|
||||
all_labels = read_all_labels(Path(work_dir))
|
||||
|
||||
|
|
@ -101,53 +103,65 @@ def main():
|
|||
# Start filling from Row 2 (index 2), immediately below the name line
|
||||
start_row = 2
|
||||
|
||||
# for i, key in enumerate(scores_data.keys()):
|
||||
for i, key in enumerate(all_labels):
|
||||
row_idx = start_row + i
|
||||
|
||||
# Ensure we don't go out of bounds
|
||||
if row_idx >= sheet.nrows():
|
||||
sheet.append_rows(1)
|
||||
|
||||
if key in scores_data:
|
||||
val_str = str(scores_data[key])
|
||||
else:
|
||||
val_str = ""
|
||||
|
||||
# Logic: if "" -> "NT"
|
||||
new_val = "NT" if val_str == "" else val_str
|
||||
|
||||
cell = sheet[row_idx, col_idx]
|
||||
current_val = cell.value
|
||||
|
||||
# Conflict Detection
|
||||
# Normalize current ODS value to string for comparison
|
||||
# ODS might store 2.0 as float 2.0. JSON has "2.0".
|
||||
is_different = False
|
||||
|
||||
if current_val is not None and current_val != "":
|
||||
# specific check to handle float/string mismatch (2.0 vs "2.0")
|
||||
if args.sum:
|
||||
# Calculate total
|
||||
total = 0.0
|
||||
for val in scores_data.values():
|
||||
try:
|
||||
if float(str(current_val)) != float(str(new_val)):
|
||||
is_different = True
|
||||
except ValueError:
|
||||
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
|
||||
if str(current_val).strip() != str(new_val).strip():
|
||||
is_different = True
|
||||
total += float(val)
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
|
||||
if is_different:
|
||||
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
|
||||
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
|
||||
cell = sheet[start_row, col_idx]
|
||||
cell.set_value(total)
|
||||
print(f"Set sum for {item}: {total}")
|
||||
else:
|
||||
for i, key in enumerate(all_labels):
|
||||
row_idx = start_row + i
|
||||
|
||||
# Set value
|
||||
# Try to set as float if it looks like a number, otherwise string
|
||||
if new_val == "NT":
|
||||
cell.set_value(new_val)
|
||||
else:
|
||||
try:
|
||||
cell.set_value(float(new_val))
|
||||
except ValueError:
|
||||
# Ensure we don't go out of bounds
|
||||
if row_idx >= sheet.nrows():
|
||||
sheet.append_rows(1)
|
||||
|
||||
if key in scores_data:
|
||||
val_str = str(scores_data[key])
|
||||
else:
|
||||
val_str = ""
|
||||
|
||||
# Logic: if "" -> "NT"
|
||||
new_val = "NT" if val_str == "" else val_str
|
||||
|
||||
cell = sheet[row_idx, col_idx]
|
||||
current_val = cell.value
|
||||
|
||||
# Conflict Detection
|
||||
# Normalize current ODS value to string for comparison
|
||||
# ODS might store 2.0 as float 2.0. JSON has "2.0".
|
||||
is_different = False
|
||||
|
||||
if current_val is not None and current_val != "":
|
||||
# specific check to handle float/string mismatch (2.0 vs "2.0")
|
||||
try:
|
||||
if float(str(current_val)) != float(str(new_val)):
|
||||
is_different = True
|
||||
except ValueError:
|
||||
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
|
||||
if str(current_val).strip() != str(new_val).strip():
|
||||
is_different = True
|
||||
|
||||
if is_different:
|
||||
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
|
||||
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
|
||||
|
||||
# Set value
|
||||
# Try to set as float if it looks like a number, otherwise string
|
||||
if new_val == "NT":
|
||||
cell.set_value(new_val)
|
||||
else:
|
||||
try:
|
||||
cell.set_value(float(new_val))
|
||||
except ValueError:
|
||||
cell.set_value(new_val)
|
||||
|
||||
print("Saving ODS file...")
|
||||
doc.save()
|
||||
|
|
|
|||
2
utils.py
2
utils.py
|
|
@ -14,7 +14,7 @@ def enonce_total(base_dir):
|
|||
if not text_dir.is_dir():
|
||||
return ""
|
||||
|
||||
files = [f for f in text_dir.iterdir() if f.is_file()]
|
||||
files = [f for f in text_dir.iterdir() if f.is_file() and f.suffix not in [".pdf", ".tex"]]
|
||||
files.sort(key=lambda f: natural_key(f.name))
|
||||
|
||||
output = []
|
||||
|
|
|
|||
Loading…
Reference in New Issue