Miscs (Interro 28)

master
Sébastien Miquel 2026-05-14 09:02:09 +02:00
parent 0836d5809d
commit 7e7045293a
10 changed files with 281 additions and 161 deletions

View File

@ -1,10 +1,11 @@
#+title: Script #+title: Script
#+author: Sébastien Miquel #+author: Sébastien Miquel
#+date: 14-03-2026 #+date: 14-03-2026
# Time-stamp: <08-05-26 22:52> # Time-stamp: <14-05-26 08:55>
#+OPTIONS: #+OPTIONS:
* Quézaco * Méta
** Quézaco
Ce dépôt contient un certain nombre de script Python que j'utilise Ce dépôt contient un certain nombre de script Python que j'utilise
pour faire corriger des copies par Gemini. pour faire corriger des copies par Gemini.
@ -20,7 +21,7 @@ pour faire corriger des copies par Gemini.
4. Ces annotations manuscrites sont lues et recompilées en une 4. Ces annotations manuscrites sont lues et recompilées en une
version de la copie pour l'élève. version de la copie pour l'élève.
* Disclaimer ** Disclaimer
J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai
fait peu d'efforts pour le rendre universel et simple à l'emploi. fait peu d'efforts pour le rendre universel et simple à l'emploi.
@ -37,9 +38,9 @@ examples du rendu final (dans le sous dossier =BGnot=).
Cette situation s'améliorera peut-être, mais faciliter l'utilisation Cette situation s'améliorera peut-être, mais faciliter l'utilisation
de ce système n'est pas une priorité. de ce système n'est pas une priorité.
* Requirements ** Requirements
** Python *** Python
Libraries : Libraries :
@ -47,13 +48,13 @@ Libraries :
pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google
#+END_SRC #+END_SRC
** Poppler (for pdf2image) *** Poppler (for pdf2image)
+ Linux : install poppler-utils + Linux : install poppler-utils
+ Windows : Download from: https://github.com/oschwartz10612/poppler-windows + Windows : Download from: https://github.com/oschwartz10612/poppler-windows
and add it to your PATH and add it to your PATH
** Accès à Gemini *** Accès à Gemini
Il faut créer une clef API pour Gemini (pas facile). Il faut créer une clef API pour Gemini (pas facile).
@ -66,7 +67,7 @@ Puis ajouter =GEMINI_API_KEY= à l'environnement avec :
export GEMINI_API_KEY=… export GEMINI_API_KEY=…
#+END_SRC #+END_SRC
* Correction d'un paquet de copies ** Correction d'un paquet de copies
1. Créer un fichier =names= dans le dossier courant, avec les 1. Créer un fichier =names= dans le dossier courant, avec les
noms/prénoms des élèves, un par ligne noms/prénoms des élèves, un par ligne
@ -83,7 +84,8 @@ export GEMINI_API_KEY=…
pour tel truc, etc) pour tel truc, etc)
6. Suivre les étapes plus bas. 6. Suivre les étapes plus bas.
* Prétraitement * Étapes et Script
** Prétraitement
1. =./rotate_all.sh Interro= 1. =./rotate_all.sh Interro=
(facultatif) (facultatif)
@ -107,14 +109,14 @@ export GEMINI_API_KEY=…
Rerun on a single file with =python cutleft.py Interro/Copie01.pdf= Rerun on a single file with =python cutleft.py Interro/Copie01.pdf=
* Génération d'information sur l'énoncé ** Génération d'information sur l'énoncé
1. =python enonce_info.py Interro= (gestion perso) 1. =python enonce_info.py Interro= (gestion perso)
OU OU
2. =python gemini_for_enonce.py Interro= 2. =python gemini_for_enonce.py Interro=
+ Nécessite =enonce.tex/org= et `correction.tex/org` + Nécessite =enonce.tex/org= et `correction.tex/org`
* Labelisation et regroupement ** Labelisation et regroupement
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
@ -130,25 +132,27 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
+ Quand un label est manquant, il est possible de cliquer sur + Quand un label est manquant, il est possible de cliquer sur
l'image, ce qui copie les coordonnées dans le presse papier l'image, ce qui copie les coordonnées dans le presse papier
(sous linux…), puis on peut l'ajouter à la main. (sous linux…), puis on peut l'ajouter à la main.
+ Utilisation de `_`, `|…` et `…|` + Utilisation de `_`, `|…` et `…|` :
+ `|…` n'est pas arrêté verticalement par son type opposé.
+ `…|` est stoppé horizontalement par le `|…` le plus proche.
Pour modifier une seule copie : Pour modifier une seule copie :
=python plotting.py Interro/Copie01.pdf= =python plotting.py Interro/Copie01.pdf=
It also generates les =Copie01.json=, à partir des =Copie01_01.json= It also generates les =Copie01.json=, à partir des =Copie01_01.json=
3. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre) 1. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
- Réordonner les pages du fichier pdf - Réordonner les pages du fichier pdf
- Rerun =python cutleft.py Interro/Copie{id}= - Rerun =python cutleft.py Interro/Copie{id}=
- Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À - Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
vérifier, pas sûr que ça marche. vérifier, pas sûr que ça marche.
4. =python splitting_int.py Interro= 3. =python splitting_int.py Interro=
Découpe les copies suivant les exercices Découpe les copies suivant les exercices
5. =python grouping.py Interro= 4. =python grouping.py Interro=
Regroupe les mêmes questions de différentes copies en groupes de Regroupe les mêmes questions de différentes copies en groupes de
tailles raisonnables. tailles raisonnables.
* Correction et annotation ** Correction et annotation
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
@ -170,16 +174,18 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
Pour diminuer le coût, il est possible de batch les requêtes, qui Pour diminuer le coût, il est possible de batch les requêtes, qui
seront alors traitées sous au plus 24h. seront alors traitées sous au plus 24h.
+ =python correction.py Interro --batch= + =python correction.py Interro --batch=
+ OU =python correction.py Interro --batch-from 'Ex 4'=
+ =python submit_batches.py Interro= + =python submit_batches.py Interro=
+ =python batch_status.py= + =python batch_status.py=
+ =python fetch_batched_results.py Interro= + =python fetch_batched_results.py Interro=
+ =python correction.py Interro --deal-with-batched= + =python correction.py Interro --deal-with-batched=
3. =python post-correction.py Interro= 3. =python post-correction.py Interro=
Essaye de corriger des erreurs d'encodage/d'accents dans - Essaye de corriger des erreurs d'encodage/d'accents dans
=correction.json=. =correction.json=.
- aussi échappe les `_` en dehors du mode math, pour LaTeX.
* Génération des copies annotées ** Génération des copies annotées
1. =python annotating.py Interro= (facultatif) 1. =python annotating.py Interro= (facultatif)
@ -208,7 +214,7 @@ OU
- Vider =Syncthing/Annotées= sur la tablette et localement. - Vider =Syncthing/Annotées= sur la tablette et localement.
À automatiser, aussi c'est lent… À automatiser, aussi c'est lent…
* Lecture de la correction manuscrite ** Lecture de la correction manuscrite
1. =python from_tablette.py Interro= (gestion perso) 1. =python from_tablette.py Interro= (gestion perso)
@ -243,6 +249,7 @@ OU
+ =gestion_classe ne= pour créer l'interro puis + =gestion_classe ne= pour créer l'interro puis
+ =gestion_classe we= (set barème here) + =gestion_classe we= (set barème here)
+ =python update_ods.py Interro= + =python update_ods.py Interro=
ou =python update_ods.py Interro --sum= (en l'absence de barème)
+ =gestion_classe re= + =gestion_classe re=
+ =gestion_classe wsent= + =gestion_classe wsent=
+ =python add_final_score.py Interro21= + =python add_final_score.py Interro21=
@ -252,10 +259,7 @@ OU
+ update the copies from =miqmacs.fr/admin=. + update the copies from =miqmacs.fr/admin=.
6. (gestion perso) Impression d'une copie. Via Evince » print to pdf. 6. (gestion perso) Impression d'une copie. Via Evince » print to pdf.
** Recorrection d'une seule copie (peu testé)
* Recorrection d'une seule copie (peu testé)
!! Attention, refaire ne marchera pas si tu fais une annotation non !! Attention, refaire ne marchera pas si tu fais une annotation non
groupée into refaire !! groupée into refaire !!

View File

@ -160,11 +160,35 @@ def main():
used_prefixes.add(unique_prefix) used_prefixes.add(unique_prefix)
existing_items = set()
max_existing_group = 0
if not args.overwrite and os.path.exists(bgnot_dir):
for d in os.listdir(bgnot_dir):
if d.startswith(f"{unique_prefix} G"):
try:
g_id = int(d.split(' G')[-1])
max_existing_group = max(max_existing_group, g_id)
except ValueError:
pass
bnote_path = os.path.join(bgnot_dir, d, "bnote.json")
if os.path.exists(bnote_path):
with open(bnote_path, "r") as bf:
bdata = json.load(bf)
for img in bdata.get("images", []):
existing_items.add((img["id"], img["label"]))
items_to_render = [] items_to_render = []
for sid, lbls in results.items(): for sid, lbls in results.items():
for lbl in labels: for lbl in labels:
if lbl in lbls: if lbl in lbls:
items_to_render.append((sid, lbl, lbls[lbl])) # Only add if it hasn't been generated yet
if (sid, lbl) not in existing_items:
items_to_render.append((sid, lbl, lbls[lbl]))
if not items_to_render:
continue
# Sort structurally: by student id and label # Sort structurally: by student id and label
items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1]))) items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1])))
@ -217,7 +241,7 @@ def main():
batches = batches2 batches = batches2
for i, batch in enumerate(batches, 1): for i, batch in enumerate(batches, 1):
save_batch(batch, unique_prefix, i, root_dir, args.overwrite) save_batch(batch, unique_prefix, max_existing_group + i, root_dir, args.overwrite)
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -5,14 +5,11 @@ from pathlib import Path
import argparse import argparse
if len(sys.argv) < 2: if len(sys.argv) < 2:
sys.exit("Usage: python script.py InterroTest/Ex 2/Group_1.jpg OR <InputDir>") sys.exit("Usage: python script.py 'InterroTest/Ex 2/Group_1.jpg' OR <InputDir> OR 'file1' 'file2'")
arg_path = Path(sys.argv[1])
tasks = [] # List of tuples: (filepath_str, label_str)
results = {}
# Parse Arguments # Parse Arguments
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("paths", nargs="+", help="List of images or directories")
parser.add_argument("--overwrite", action="store_true", parser.add_argument("--overwrite", action="store_true",
help="Force redo requests even if output exists") help="Force redo requests even if output exists")
parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer") parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer")
@ -20,28 +17,40 @@ parser.add_argument("--refaire", action="store_true",
help="Redo specific copies/labels defined in refaire.json") help="Redo specific copies/labels defined in refaire.json")
parser.add_argument("--batch", action="store_true", parser.add_argument("--batch", action="store_true",
help="Generate a JSONL file of requests to send to the Gemini Batch API") help="Generate a JSONL file of requests to send to the Gemini Batch API")
parser.add_argument("--batch-from", type=str, metavar="LABEL",
help="Do live requests before LABEL, and batch requests from LABEL onwards")
parser.add_argument("--deal-with-batched", action="store_true", parser.add_argument("--deal-with-batched", action="store_true",
help="Process a JSONL file containing completed batch results") help="Process a JSONL file containing completed batch results")
args, _ = parser.parse_known_args() args, _ = parser.parse_known_args()
tasks = [] # List of tuples: (filepath_str, label_str)
results = {}
for path_str in args.paths:
arg_path = Path(path_str)
if arg_path.suffix == ".jpg":
INPUT_DIR = str(arg_path.parents[1])
FULL_LABEL = arg_path.parent.name
tasks.append((str(arg_path), FULL_LABEL))
results[FULL_LABEL] = []
else:
# Directory behaviour
INPUT_DIR = str(arg_path)
if not arg_path.exists(): if not arg_path.exists():
sys.exit(f"Directory {INPUT_DIR} not found.") print(f"Warning: {path_str} not found. Skipping.")
continue
for sub in arg_path.iterdir(): if arg_path.is_file() and arg_path.suffix.lower() == ".jpg":
if sub.is_dir() and sub.name.startswith("Ex"): # Handle individual file
label = sub.name # Note: assumes structure InterroTest/Ex 2/Group_1.jpg to get parents[1]
label = arg_path.parent.name
tasks.append((str(arg_path), label))
if label not in results:
results[label] = [] results[label] = []
for img in sub.glob("*.jpg"):
tasks.append((str(img), label)) elif arg_path.is_dir():
# Handle directory (original behavior)
for sub in arg_path.iterdir():
if sub.is_dir() and sub.name.startswith("Ex"):
label = sub.name
if label not in results:
results[label] = []
for img in sub.glob("*.jpg"):
tasks.append((str(img), label))
my_prompt = """I'm giving you an image of several written answers to an exam. my_prompt = """I'm giving you an image of several written answers to an exam.
@ -135,17 +144,15 @@ You are asked to score the question or exercice labeled `<<label>>`,
do not score or give feedback to any other question.""" do not score or give feedback to any other question."""
def make_prompt(full_label): def make_prompt(full_label):
# l = full_label.split(" ")
# ex_label = l[0] + " " + l[1]
# text = (Path(INPUT_DIR) / "Text" / ex_label).read_text()
# corr = (Path(INPUT_DIR) / "Sol" / ex_label).read_text()
# persp = (Path(INPUT_DIR) / "Persp" / ex_label).read_text()
def read_longest_prefix_file(subdir): def read_longest_prefix_file(subdir):
dir_path = Path(INPUT_DIR) / subdir dir_path = Path(INPUT_DIR) / subdir
matches = [f for f in dir_path.iterdir() if f.is_file() and full_label.startswith(f.name)] matches = [f for f in dir_path.iterdir()
if f.is_file()
and full_label.startswith(f.name)
and f.suffix not in [".pdf", ".tex"]]
if not matches: if not matches:
return "" return ""
return max(matches, key=lambda f: len(f.name)).read_text() return max(matches, key=lambda f: len(f.name)).read_text(encoding="utf-8", errors="replace")
text = read_longest_prefix_file("Text") text = read_longest_prefix_file("Text")
corr = read_longest_prefix_file("Sol") corr = read_longest_prefix_file("Sol")
@ -482,7 +489,7 @@ def handle_label_errors(pid, label, res, pdf_path):
error_type = res.get("error") error_type = res.get("error")
all_labels = read_all_labels(INPUT_DIR) all_labels = read_all_labels(INPUT_DIR)
labels_txt = (Path(INPUT_DIR) / "labels").read_text() labels_txt = (Path(INPUT_DIR) / "labels").read_text(encoding="utf-8", errors="replace")
enonce = enonce_total(INPUT_DIR) enonce = enonce_total(INPUT_DIR)
if error_type == "wrong-label": if error_type == "wrong-label":
@ -499,7 +506,7 @@ Here is the full content of the exam :
{enonce} {enonce}
Here is a list of all possible lables. You need to answer with one of these : Here is a list of all possible labels. You need to answer with one of these :
{labels_txt} {labels_txt}
""" """
@ -780,62 +787,89 @@ if __name__ == "__main__":
print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr) print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr)
if args.batch: if args.batch or args.batch_from:
batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl" from utils import read_all_labels
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl" all_labels = read_all_labels(INPUT_DIR)
count_flash = 0 batch_tasks = []
count_pro = 0 if args.batch_from:
if args.batch_from not in all_labels:
sys.exit(f"Error: Label '{args.batch_from}' not found. Available labels: {all_labels}")
with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \ target_idx = all_labels.index(args.batch_from)
open(batch_pro_file, "w", encoding="utf-8") as f_pro: live_tasks = []
for task in tasks_to_process: for task in tasks_to_process:
file_path, label = task[0], task[1] lbl = task[1]
group_name = os.path.splitext(file_path)[0] # Any label found sequentially equal or after `args.batch_from` gets batched
json_path = group_name + '.json' if lbl in all_labels and all_labels.index(lbl) >= target_idx:
batch_tasks.append(task)
else:
live_tasks.append(task)
with open(json_path, 'r') as jf: tasks_to_process = live_tasks # Keep live tasks to be run right after
group_data = json.load(jf) else:
use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500 batch_tasks = tasks_to_process
tasks_to_process = [] # Run nothing live if just `--batch`
image_data = Path(file_path).read_bytes() if batch_tasks:
b64_img = base64.b64encode(image_data).decode("utf-8") batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
# Format payload matching Gemini Batch API file requirements count_flash = 0
req = { count_pro = 0
"key": file_path, # The ID returned in the output file
"request": { with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
"contents": [{ open(batch_pro_file, "w", encoding="utf-8") as f_pro:
"role": "user",
"parts": [ for task in batch_tasks:
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}}, file_path, label = task[0], task[1]
{"text": make_prompt(label)} group_name = os.path.splitext(file_path)[0]
] json_path = group_name + '.json'
}],
"generation_config": { with open(json_path, 'r') as jf:
"temperature": 1.0, group_data = json.load(jf)
"topP": 0.95, use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
"maxOutputTokens": 65535,
"responseMimeType": "application/json", image_data = Path(file_path).read_bytes()
"responseSchema": UNROLLED_SCHEMA b64_img = base64.b64encode(image_data).decode("utf-8")
# TypeAdapter(List[EvaluationEntry]).json_schema()
# Format payload matching Gemini Batch API file requirements
req = {
"key": file_path, # The ID returned in the output file
"request": {
"contents": [{
"role": "user",
"parts": [
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
{"text": make_prompt(label)}
]
}],
"generation_config": {
"temperature": 1.0,
"topP": 0.95,
"maxOutputTokens": 65535,
"responseMimeType": "application/json",
"responseSchema": UNROLLED_SCHEMA
}
} }
} }
}
if use_flash: if use_flash:
f_flash.write(json.dumps(req) + "\n") f_flash.write(json.dumps(req) + "\n")
count_flash += 1 count_flash += 1
else: else:
f_pro.write(json.dumps(req) + "\n") f_pro.write(json.dumps(req) + "\n")
count_pro += 1 count_pro += 1
print(f"Batch generation complete.") print(f"Batch generation complete.")
print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})") print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})") print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
print("Upload these files via the File API and create two separate batch jobs.") print("Upload these files via the File API and create two separate batch jobs.")
sys.exit(0)
# If there's no live tasks to do, and we aren't doing a batched ingestion, exit right away
if not tasks_to_process and not args.deal_with_batched:
sys.exit(0)
batched_responses = {} batched_responses = {}
if args.deal_with_batched: if args.deal_with_batched:
@ -883,7 +917,7 @@ if __name__ == "__main__":
print("Time elapsed : ", end_time - start_time) print("Time elapsed : ", end_time - start_time)
print("Requests to pro / flash : ", pro_count, flash_count) print("Requests to pro / flash : ", pro_count, flash_count)
if errors_summary: if errors_summary:
print("\n--- Summary of Exceptions ---", file=sys.stderr) print("\n--- Summary of Exceptions (You can use several images on one instance) ---", file=sys.stderr)
for (err, file) in errors_summary: for (err, file) in errors_summary:
print(err, file=sys.stderr) print(err, file=sys.stderr)
escaped_path = shlex.quote(str(file)) escaped_path = shlex.quote(str(file))

View File

@ -296,7 +296,7 @@ def process_copy_group(group_key, files):
continue # Retry immediately continue # Retry immediately
else: else:
name = "Unknown" name = "Unknown"
annota.name = name
# Save result # Save result
with open(output_json, "w", encoding="utf-8") as f: with open(output_json, "w", encoding="utf-8") as f:
json.dump(annota.model_dump(), f, indent=2) json.dump(annota.model_dump(), f, indent=2)

View File

@ -12386,6 +12386,7 @@ maternelles
maternité maternité
mathématicien mathématicien
mathématique mathématique
mathématiquement
mathématiques mathématiques
maths maths
matière matière

View File

@ -7,6 +7,46 @@ import argparse
if len(sys.argv) < 2: if len(sys.argv) < 2:
sys.exit("Usage: python script.py <InputDir>") sys.exit("Usage: python script.py <InputDir>")
def escape_latex_underscores(text):
r"""
Escape '_' outside LaTeX math environments.
Supports:
- $...$
- $$...$$
- \( ... \)
- \[ ... \]
"""
# Regex matching LaTeX math blocks
math_pattern = re.compile(
r'(\$\$.*?\$\$|' # $$...$$
r'\$.*?\$|' # $...$
r'\\\(.*?\\\)|' # \( ... \)
r'\\\[.*?\\\])', # \[ ... \]
re.DOTALL
)
parts = []
last_end = 0
for match in math_pattern.finditer(text):
start, end = match.span()
# Escape underscores outside math
outside = text[last_end:start].replace('_', r'\_')
parts.append(outside)
# Keep math block unchanged
parts.append(match.group(0))
last_end = end
# Remaining text after last math block
outside = text[last_end:].replace('_', r'\_')
parts.append(outside)
return ''.join(parts)
arg_path = Path(sys.argv[1]) arg_path = Path(sys.argv[1])
tasks = [] # List of tuples: (filepath_str, label_str) tasks = [] # List of tuples: (filepath_str, label_str)
results = {} results = {}
@ -79,7 +119,8 @@ def clean_string(s: str) -> str:
if '\x00' in s: if '\x00' in s:
s = fast_fix(s) s = fast_fix(s)
s = s.replace('\x00', '') s = s.replace('\x00', '')
return some_other_replacements(s) s = some_other_replacements(s)
return escape_latex_underscores(s)
def clean_obj(obj): def clean_obj(obj):

View File

@ -8,6 +8,9 @@ import shutil
from pathlib import Path from pathlib import Path
from collections import defaultdict from collections import defaultdict
carreau = 1000 // 38
def decode_json(pdf_file): def decode_json(pdf_file):
file_path = Path(pdf_file) file_path = Path(pdf_file)
with open(file_path.with_suffix(".json"), "r") as f: with open(file_path.with_suffix(".json"), "r") as f:
@ -26,8 +29,7 @@ def decode_json(pdf_file):
for d in bb_list: for d in bb_list:
(b, label) = d["box_2d"], d["label"] (b, label) = d["box_2d"], d["label"]
pn = page_number(b) pn = page_number(b)
carreau = 1000 // 38 result.append((label, pn, b[0] - carreau, b[2]-carreau, b[1], b[3]))
result.append((label, pn, b[0] - int(carreau), b[2]-int(carreau), b[1], b[3]))
result.sort(key=lambda x: (x[1], x[2])) result.sort(key=lambda x: (x[1], x[2]))
return (name, result) return (name, result)
@ -98,7 +100,7 @@ def split_an_interro(base_dir, input_pdf, coords_list):
# RULE 2: Determine stopping label # RULE 2: Determine stopping label
for next_item in coords_list[idx + 1:]: for next_item in coords_list[idx + 1:]:
n_clean, n_type, n_pn, n_y_start, _, _, _ = next_item n_clean, n_type, n_pn, n_y_start, n_y_end, _, _ = next_item
if c_type == "L": if c_type == "L":
is_stop = (n_type in ("L", "N")) is_stop = (n_type in ("L", "N"))
@ -109,7 +111,9 @@ def split_an_interro(base_dir, input_pdf, coords_list):
if is_stop: if is_stop:
end_page = n_pn end_page = n_pn
end_y_target_raw = n_y_start # end_y_target_raw = n_y_start
# On avait retiré un carreau précédemment, on le rajoute…
end_y_target_raw = min(n_y_start + int(1.25 * carreau), 1000)
break break
# RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width) # RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width)

View File

@ -72,8 +72,6 @@ def main():
print("-" * 50) print("-" * 50)
print("All batch jobs have been initiated.") print("All batch jobs have been initiated.")
print("Save the Batch Job Names above. You can monitor them with:")
print(" client.batches.get(name='YOUR_BATCH_JOB_NAME')")
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -1,3 +1,4 @@
import argparse
import os import os
import sys import sys
import json import json
@ -12,11 +13,12 @@ ODS_PATH = "/home/sebastien/Rust/gestion_classe/Staging/current_eval.ods"
TARGET_DIR_NAME = "A Rendre" TARGET_DIR_NAME = "A Rendre"
def main(): def main():
if len(sys.argv) < 2: parser = argparse.ArgumentParser(description="Update ODS with student scores.")
# Default to current directory if not provided, or raise error parser.add_argument("work_dir", nargs="?", default=os.getcwd(), help="Directory to process")
work_dir = os.getcwd() parser.add_argument("--sum", action="store_true", help="Write only the total sum per student")
else: args = parser.parse_args()
work_dir = os.path.abspath(sys.argv[1])
work_dir = os.path.abspath(args.work_dir)
all_labels = read_all_labels(Path(work_dir)) all_labels = read_all_labels(Path(work_dir))
@ -101,53 +103,65 @@ def main():
# Start filling from Row 2 (index 2), immediately below the name line # Start filling from Row 2 (index 2), immediately below the name line
start_row = 2 start_row = 2
# for i, key in enumerate(scores_data.keys()): if args.sum:
for i, key in enumerate(all_labels): # Calculate total
row_idx = start_row + i total = 0.0
for val in scores_data.values():
# Ensure we don't go out of bounds
if row_idx >= sheet.nrows():
sheet.append_rows(1)
if key in scores_data:
val_str = str(scores_data[key])
else:
val_str = ""
# Logic: if "" -> "NT"
new_val = "NT" if val_str == "" else val_str
cell = sheet[row_idx, col_idx]
current_val = cell.value
# Conflict Detection
# Normalize current ODS value to string for comparison
# ODS might store 2.0 as float 2.0. JSON has "2.0".
is_different = False
if current_val is not None and current_val != "":
# specific check to handle float/string mismatch (2.0 vs "2.0")
try: try:
if float(str(current_val)) != float(str(new_val)): total += float(val)
is_different = True except (ValueError, TypeError):
except ValueError: continue
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
if str(current_val).strip() != str(new_val).strip():
is_different = True
if is_different: cell = sheet[start_row, col_idx]
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). " cell.set_value(total)
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.") print(f"Set sum for {item}: {total}")
else:
for i, key in enumerate(all_labels):
row_idx = start_row + i
# Set value # Ensure we don't go out of bounds
# Try to set as float if it looks like a number, otherwise string if row_idx >= sheet.nrows():
if new_val == "NT": sheet.append_rows(1)
cell.set_value(new_val)
else: if key in scores_data:
try: val_str = str(scores_data[key])
cell.set_value(float(new_val)) else:
except ValueError: val_str = ""
# Logic: if "" -> "NT"
new_val = "NT" if val_str == "" else val_str
cell = sheet[row_idx, col_idx]
current_val = cell.value
# Conflict Detection
# Normalize current ODS value to string for comparison
# ODS might store 2.0 as float 2.0. JSON has "2.0".
is_different = False
if current_val is not None and current_val != "":
# specific check to handle float/string mismatch (2.0 vs "2.0")
try:
if float(str(current_val)) != float(str(new_val)):
is_different = True
except ValueError:
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
if str(current_val).strip() != str(new_val).strip():
is_different = True
if is_different:
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
# Set value
# Try to set as float if it looks like a number, otherwise string
if new_val == "NT":
cell.set_value(new_val) cell.set_value(new_val)
else:
try:
cell.set_value(float(new_val))
except ValueError:
cell.set_value(new_val)
print("Saving ODS file...") print("Saving ODS file...")
doc.save() doc.save()

View File

@ -14,7 +14,7 @@ def enonce_total(base_dir):
if not text_dir.is_dir(): if not text_dir.is_dir():
return "" return ""
files = [f for f in text_dir.iterdir() if f.is_file()] files = [f for f in text_dir.iterdir() if f.is_file() and f.suffix not in [".pdf", ".tex"]]
files.sort(key=lambda f: natural_key(f.name)) files.sort(key=lambda f: natural_key(f.name))
output = [] output = []