Miscs (Interro 28)

2026-05-14 09:02:09 +02:00 · 2026-05-14 09:02:09 +02:00 · 7e7045293a
parent 0836d5809d
commit 7e7045293a
10 changed files with 281 additions and 161 deletions
--- a/Readme.org
+++ b/Readme.org
@ -1,10 +1,11 @@
 #+title:  Script
 #+author: Sébastien Miquel
 #+date:   14-03-2026
-# Time-stamp: <08-05-26 22:52>
+# Time-stamp: <14-05-26 08:55>
 #+OPTIONS:
-* Quézaco
+* Méta
 ** Quézaco
 Ce dépôt contient un certain nombre de script Python que j'utilise
 pour faire corriger des copies par Gemini.
@ -20,7 +21,7 @@ pour faire corriger des copies par Gemini.
 4. Ces annotations manuscrites sont lues et recompilées en une
    version de la copie pour l'élève.
-* Disclaimer
+** Disclaimer
 J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai
 fait peu d'efforts pour le rendre universel et simple à l'emploi.
@ -37,9 +38,9 @@ examples du rendu final (dans le sous dossier =BGnot=).
 Cette situation s'améliorera peut-être, mais faciliter l'utilisation
 de ce système n'est pas une priorité.
-* Requirements
+** Requirements
-** Python
+*** Python
 Libraries :
@ -47,13 +48,13 @@ Libraries :
 pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google
 #+END_SRC
-** Poppler (for pdf2image)
+*** Poppler (for pdf2image)
 + Linux : install poppler-utils
 + Windows : Download from: https://github.com/oschwartz10612/poppler-windows
   and add it to your PATH
-** Accès à Gemini
+*** Accès à Gemini
 Il faut créer une clef API pour Gemini (pas facile).
@ -66,7 +67,7 @@ Puis ajouter =GEMINI_API_KEY= à l'environnement avec :
 export GEMINI_API_KEY=…
 #+END_SRC
-* Correction d'un paquet de copies
+** Correction d'un paquet de copies
 1. Créer un fichier =names= dans le dossier courant, avec les
    noms/prénoms des élèves, un par ligne
@ -83,7 +84,8 @@ export GEMINI_API_KEY=…
    pour tel truc, etc)
 6. Suivre les étapes plus bas.
-* Prétraitement
+* Étapes et Script
 ** Prétraitement
 1. =./rotate_all.sh Interro=
    (facultatif)
@ -107,14 +109,14 @@ export GEMINI_API_KEY=…
    Rerun on a single file with =python cutleft.py Interro/Copie01.pdf=
-* Génération d'information sur l'énoncé
+** Génération d'information sur l'énoncé
 1. =python enonce_info.py Interro= (gestion perso)
 OU
 2. =python gemini_for_enonce.py Interro=
    + Nécessite =enonce.tex/org= et `correction.tex/org`
-* Labelisation et regroupement
+** Labelisation et regroupement
 Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
@ -130,25 +132,27 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
    + Quand un label est manquant, il est possible de cliquer sur
      l'image, ce qui copie les coordonnées dans le presse papier
      (sous linux…), puis on peut l'ajouter à la main.
-    + Utilisation de `_`, `|…` et `…|`
+    + Utilisation de `_`, `|…` et `…|` :
      + `|…` n'est pas arrêté verticalement par son type opposé.
      + `…|` est stoppé horizontalement par le `|…` le plus proche.
    Pour modifier une seule copie :
 =python plotting.py Interro/Copie01.pdf=
    It also generates les =Copie01.json=, à partir des =Copie01_01.json=
- 3. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
+    1. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
-    - Réordonner les pages du fichier pdf
+       - Réordonner les pages du fichier pdf
-    - Rerun =python cutleft.py Interro/Copie{id}=
+       - Rerun =python cutleft.py Interro/Copie{id}=
-    - Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
+       - Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
-      vérifier, pas sûr que ça marche.
+         vérifier, pas sûr que ça marche.
- 4. =python splitting_int.py Interro=
+ 3. =python splitting_int.py Interro=
    Découpe les copies suivant les exercices
- 5. =python grouping.py Interro=
+ 4. =python grouping.py Interro=
    Regroupe les mêmes questions de différentes copies en groupes de
    tailles raisonnables.
-* Correction et annotation
+** Correction et annotation
 Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
@ -170,16 +174,18 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
    Pour diminuer le coût, il est possible de batch les requêtes, qui
    seront alors traitées sous au plus 24h.
    + =python correction.py Interro --batch=
    + OU =python correction.py Interro --batch-from 'Ex 4'=
    + =python submit_batches.py Interro=
    + =python batch_status.py=
    + =python fetch_batched_results.py Interro=
    + =python correction.py Interro --deal-with-batched=
 3. =python post-correction.py Interro=
-    Essaye de corriger des erreurs d'encodage/d'accents dans
+    - Essaye de corriger des erreurs d'encodage/d'accents dans
- =correction.json=.
+ =correction.json=.
    - aussi échappe les `_` en dehors du mode math, pour LaTeX.
-* Génération des copies annotées
+** Génération des copies annotées
 1. =python annotating.py Interro= (facultatif)
@ -208,7 +214,7 @@ OU
    - Vider =Syncthing/Annotées= sur la tablette et localement.
      À automatiser, aussi c'est lent…
-* Lecture de la correction manuscrite
+** Lecture de la correction manuscrite
 1. =python from_tablette.py Interro= (gestion perso)
@ -243,6 +249,7 @@ OU
    + =gestion_classe ne= pour créer l'interro puis
    + =gestion_classe we= (set barème here)
    + =python update_ods.py Interro=
      ou =python update_ods.py Interro --sum= (en l'absence de barème)
    + =gestion_classe re=
    + =gestion_classe wsent=
    + =python add_final_score.py Interro21=
@ -252,10 +259,7 @@ OU
    + update the copies from =miqmacs.fr/admin=.
 6. (gestion perso) Impression d'une copie. Via Evince » print to pdf.
-
+** Recorrection d'une seule copie (peu testé)
 * Recorrection d'une seule copie (peu testé)
 !! Attention, refaire ne marchera pas si tu fais une annotation non
 groupée into refaire !!
--- a/annotating_by_label.py
+++ b/annotating_by_label.py
@ -160,11 +160,35 @@ def main():
        used_prefixes.add(unique_prefix)
        existing_items = set()
        max_existing_group = 0
        if not args.overwrite and os.path.exists(bgnot_dir):
            for d in os.listdir(bgnot_dir):
                if d.startswith(f"{unique_prefix} G"):
                    try:
                        g_id = int(d.split(' G')[-1])
                        max_existing_group = max(max_existing_group, g_id)
                    except ValueError:
                        pass
                    bnote_path = os.path.join(bgnot_dir, d, "bnote.json")
                    if os.path.exists(bnote_path):
                        with open(bnote_path, "r") as bf:
                            bdata = json.load(bf)
                            for img in bdata.get("images", []):
                                existing_items.add((img["id"], img["label"]))
        items_to_render = []
        for sid, lbls in results.items():
            for lbl in labels:
                if lbl in lbls:
-                    items_to_render.append((sid, lbl, lbls[lbl]))
+                    # Only add if it hasn't been generated yet
                    if (sid, lbl) not in existing_items:
                        items_to_render.append((sid, lbl, lbls[lbl]))
        if not items_to_render:
            continue
        # Sort structurally: by student id and label
        items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1])))
@ -217,7 +241,7 @@ def main():
            batches = batches2
        for i, batch in enumerate(batches, 1):
-            save_batch(batch, unique_prefix, i, root_dir, args.overwrite)
+            save_batch(batch, unique_prefix, max_existing_group + i, root_dir, args.overwrite)
 if __name__ == "__main__":
    main()
--- a/correction.py
+++ b/correction.py
@ -5,14 +5,11 @@ from pathlib import Path
 import argparse
 if len(sys.argv) < 2:
-    sys.exit("Usage: python script.py InterroTest/Ex 2/Group_1.jpg OR <InputDir>")
+    sys.exit("Usage: python script.py 'InterroTest/Ex 2/Group_1.jpg' OR <InputDir> OR 'file1' 'file2'")
 arg_path = Path(sys.argv[1])
 tasks = [] # List of tuples: (filepath_str, label_str)
 results = {}
 # Parse Arguments
 parser = argparse.ArgumentParser()
 parser.add_argument("paths", nargs="+", help="List of images or directories")
 parser.add_argument("--overwrite", action="store_true",
                    help="Force redo requests even if output exists")
 parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer")
@ -20,28 +17,40 @@ parser.add_argument("--refaire", action="store_true",
                    help="Redo specific copies/labels defined in refaire.json")
 parser.add_argument("--batch", action="store_true",
                    help="Generate a JSONL file of requests to send to the Gemini Batch API")
 parser.add_argument("--batch-from", type=str, metavar="LABEL",
                    help="Do live requests before LABEL, and batch requests from LABEL onwards")
 parser.add_argument("--deal-with-batched", action="store_true",
                    help="Process a JSONL file containing completed batch results")
 args, _ = parser.parse_known_args()
 tasks = [] # List of tuples: (filepath_str, label_str)
 results = {}
 for path_str in args.paths:
    arg_path = Path(path_str)
 if arg_path.suffix == ".jpg":
    INPUT_DIR = str(arg_path.parents[1])
    FULL_LABEL = arg_path.parent.name
    tasks.append((str(arg_path), FULL_LABEL))
    results[FULL_LABEL] = []
 else:
    # Directory behaviour
    INPUT_DIR = str(arg_path)
    if not arg_path.exists():
-        sys.exit(f"Directory {INPUT_DIR} not found.")
+        print(f"Warning: {path_str} not found. Skipping.")
        continue
-    for sub in arg_path.iterdir():
+    if arg_path.is_file() and arg_path.suffix.lower() == ".jpg":
-        if sub.is_dir() and sub.name.startswith("Ex"):
+        # Handle individual file
-            label = sub.name
+        # Note: assumes structure InterroTest/Ex 2/Group_1.jpg to get parents[1]
        label = arg_path.parent.name
        tasks.append((str(arg_path), label))
        if label not in results:
            results[label] = []
-            for img in sub.glob("*.jpg"):
+
-                tasks.append((str(img), label))
+    elif arg_path.is_dir():
        # Handle directory (original behavior)
        for sub in arg_path.iterdir():
            if sub.is_dir() and sub.name.startswith("Ex"):
                label = sub.name
                if label not in results:
                    results[label] = []
                for img in sub.glob("*.jpg"):
                    tasks.append((str(img), label))
 my_prompt = """I'm giving you an image of several written answers to an exam.
@ -135,17 +144,15 @@ You are asked to score the question or exercice labeled `<<label>>`,
 do not score or give feedback to any other question."""
 def make_prompt(full_label):
    # l = full_label.split(" ")
    # ex_label = l[0] + " " + l[1]
    # text = (Path(INPUT_DIR) / "Text" / ex_label).read_text()
    # corr = (Path(INPUT_DIR) / "Sol" / ex_label).read_text()
    # persp = (Path(INPUT_DIR) / "Persp" / ex_label).read_text()
    def read_longest_prefix_file(subdir):
        dir_path = Path(INPUT_DIR) / subdir
-        matches = [f for f in dir_path.iterdir() if f.is_file() and full_label.startswith(f.name)]
+        matches = [f for f in dir_path.iterdir()
                   if f.is_file()
                   and full_label.startswith(f.name)
                   and f.suffix not in [".pdf", ".tex"]]
        if not matches:
            return ""
-        return max(matches, key=lambda f: len(f.name)).read_text()
+        return max(matches, key=lambda f: len(f.name)).read_text(encoding="utf-8", errors="replace")
    text = read_longest_prefix_file("Text")
    corr = read_longest_prefix_file("Sol")
@ -482,7 +489,7 @@ def handle_label_errors(pid, label, res, pdf_path):
    error_type = res.get("error")
    all_labels = read_all_labels(INPUT_DIR)
-    labels_txt = (Path(INPUT_DIR) / "labels").read_text()
+    labels_txt = (Path(INPUT_DIR) / "labels").read_text(encoding="utf-8", errors="replace")
    enonce = enonce_total(INPUT_DIR)
    if error_type == "wrong-label":
@ -499,7 +506,7 @@ Here is the full content of the exam :
 {enonce}
-Here is a list of all possible lables. You need to answer with one of these :
+Here is a list of all possible labels. You need to answer with one of these :
 {labels_txt}
 """
@ -780,62 +787,89 @@ if __name__ == "__main__":
            print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr)
-    if args.batch:
+    if args.batch or args.batch_from:
-        batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
+        from utils import read_all_labels
-        batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
+        all_labels = read_all_labels(INPUT_DIR)
-        count_flash = 0
+        batch_tasks = []
-        count_pro = 0
+        if args.batch_from:
            if args.batch_from not in all_labels:
                sys.exit(f"Error: Label '{args.batch_from}' not found. Available labels: {all_labels}")
-        with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
+            target_idx = all_labels.index(args.batch_from)
-             open(batch_pro_file, "w", encoding="utf-8") as f_pro:
+            live_tasks = []
            for task in tasks_to_process:
-                file_path, label = task[0], task[1]
+                lbl = task[1]
-                group_name = os.path.splitext(file_path)[0]
+                # Any label found sequentially equal or after `args.batch_from` gets batched
-                json_path = group_name + '.json'
+                if lbl in all_labels and all_labels.index(lbl) >= target_idx:
                    batch_tasks.append(task)
                else:
                    live_tasks.append(task)
-                with open(json_path, 'r') as jf:
+            tasks_to_process = live_tasks # Keep live tasks to be run right after
-                    group_data = json.load(jf)
+        else:
-                use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
+            batch_tasks = tasks_to_process
            tasks_to_process = [] # Run nothing live if just `--batch`
-                image_data = Path(file_path).read_bytes()
+        if batch_tasks:
-                b64_img = base64.b64encode(image_data).decode("utf-8")
+            batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
            batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
-                # Format payload matching Gemini Batch API file requirements
+            count_flash = 0
-                req = {
+            count_pro = 0
-                    "key": file_path,  # The ID returned in the output file
+
-                    "request": {
+            with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
-                        "contents": [{
+                 open(batch_pro_file, "w", encoding="utf-8") as f_pro:
-                            "role": "user",
+
-                            "parts": [
+                for task in batch_tasks:
-                                {"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
+                    file_path, label = task[0], task[1]
-                                {"text": make_prompt(label)}
+                    group_name = os.path.splitext(file_path)[0]
-                            ]
+                    json_path = group_name + '.json'
-                        }],
+
-                        "generation_config": {
+                    with open(json_path, 'r') as jf:
-                            "temperature": 1.0,
+                        group_data = json.load(jf)
-                            "topP": 0.95,
+                    use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
-                            "maxOutputTokens": 65535,
+
-                            "responseMimeType": "application/json",
+                    image_data = Path(file_path).read_bytes()
-                            "responseSchema": UNROLLED_SCHEMA
+                    b64_img = base64.b64encode(image_data).decode("utf-8")
-                            # TypeAdapter(List[EvaluationEntry]).json_schema()
+
                    # Format payload matching Gemini Batch API file requirements
                    req = {
                        "key": file_path,  # The ID returned in the output file
                        "request": {
                            "contents": [{
                                "role": "user",
                                "parts": [
                                    {"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
                                    {"text": make_prompt(label)}
                                ]
                            }],
                            "generation_config": {
                                "temperature": 1.0,
                                "topP": 0.95,
                                "maxOutputTokens": 65535,
                                "responseMimeType": "application/json",
                                "responseSchema": UNROLLED_SCHEMA
                            }
                        }
                    }
                }
-                if use_flash:
+                    if use_flash:
-                    f_flash.write(json.dumps(req) + "\n")
+                        f_flash.write(json.dumps(req) + "\n")
-                    count_flash += 1
+                        count_flash += 1
-                else:
+                    else:
-                    f_pro.write(json.dumps(req) + "\n")
+                        f_pro.write(json.dumps(req) + "\n")
-                    count_pro += 1
+                        count_pro += 1
-        print(f"Batch generation complete.")
+            print(f"Batch generation complete.")
-        print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
+            print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
-        print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
+            print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
-        print("Upload these files via the File API and create two separate batch jobs.")
+            print("Upload these files via the File API and create two separate batch jobs.")
-        sys.exit(0)
+
        # If there's no live tasks to do, and we aren't doing a batched ingestion, exit right away
        if not tasks_to_process and not args.deal_with_batched:
            sys.exit(0)
    batched_responses = {}
    if args.deal_with_batched:
@ -883,7 +917,7 @@ if __name__ == "__main__":
    print("Time elapsed : ", end_time - start_time)
    print("Requests to pro / flash : ", pro_count, flash_count)
    if errors_summary:
-        print("\n--- Summary of Exceptions ---", file=sys.stderr)
+        print("\n--- Summary of Exceptions (You can use several images on one instance) ---", file=sys.stderr)
        for (err, file) in errors_summary:
            print(err, file=sys.stderr)
            escaped_path = shlex.quote(str(file))
--- a/gemini_for_labels.py
+++ b/gemini_for_labels.py
@ -296,7 +296,7 @@ def process_copy_group(group_key, files):
                        continue  # Retry immediately
                    else:
                        name = "Unknown"
-
+                annota.name = name
                # Save result
                with open(output_json, "w", encoding="utf-8") as f:
                    json.dump(annota.model_dump(), f, indent=2)
--- a/liste_francais.txt
+++ b/liste_francais.txt
@ -12386,6 +12386,7 @@ maternelles
 maternité
 mathématicien
 mathématique
 mathématiquement
 mathématiques
 maths
 matière
--- a/post-correction.py
+++ b/post-correction.py
@ -7,6 +7,46 @@ import argparse
 if len(sys.argv) < 2:
    sys.exit("Usage: python script.py <InputDir>")
 def escape_latex_underscores(text):
    r"""
    Escape '_' outside LaTeX math environments.
    Supports:
      - $...$
      - $$...$$
      - \( ... \)
      - \[ ... \]
    """
    # Regex matching LaTeX math blocks
    math_pattern = re.compile(
        r'(\$\$.*?\$\$|'      # $$...$$
        r'\$.*?\$|'           # $...$
        r'\\\(.*?\\\)|'       # \( ... \)
        r'\\\[.*?\\\])',      # \[ ... \]
        re.DOTALL
    )
    parts = []
    last_end = 0
    for match in math_pattern.finditer(text):
        start, end = match.span()
        # Escape underscores outside math
        outside = text[last_end:start].replace('_', r'\_')
        parts.append(outside)
        # Keep math block unchanged
        parts.append(match.group(0))
        last_end = end
    # Remaining text after last math block
    outside = text[last_end:].replace('_', r'\_')
    parts.append(outside)
    return ''.join(parts)
 arg_path = Path(sys.argv[1])
 tasks = [] # List of tuples: (filepath_str, label_str)
 results = {}
@ -79,7 +119,8 @@ def clean_string(s: str) -> str:
    if '\x00' in s:
        s = fast_fix(s)
        s = s.replace('\x00', '')
-    return some_other_replacements(s)
+    s = some_other_replacements(s)
    return escape_latex_underscores(s)
 def clean_obj(obj):
--- a/splitting_int.py
+++ b/splitting_int.py
@ -8,6 +8,9 @@ import shutil
 from pathlib import Path
 from collections import defaultdict
 carreau = 1000 // 38
 def decode_json(pdf_file):
    file_path = Path(pdf_file)
    with open(file_path.with_suffix(".json"), "r") as f:
@ -26,8 +29,7 @@ def decode_json(pdf_file):
    for d in bb_list:
        (b, label) = d["box_2d"], d["label"]
        pn = page_number(b)
-        carreau = 1000 // 38
+        result.append((label, pn, b[0] - carreau, b[2]-carreau, b[1], b[3]))
        result.append((label, pn, b[0] - int(carreau), b[2]-int(carreau), b[1], b[3]))
    result.sort(key=lambda x: (x[1], x[2]))
    return (name, result)
@ -98,7 +100,7 @@ def split_an_interro(base_dir, input_pdf, coords_list):
        # RULE 2: Determine stopping label
        for next_item in coords_list[idx + 1:]:
-            n_clean, n_type, n_pn, n_y_start, _, _, _ = next_item
+            n_clean, n_type, n_pn, n_y_start, n_y_end, _, _ = next_item
            if c_type == "L":
                is_stop = (n_type in ("L", "N"))
@ -109,7 +111,9 @@ def split_an_interro(base_dir, input_pdf, coords_list):
            if is_stop:
                end_page = n_pn
-                end_y_target_raw = n_y_start
+                # end_y_target_raw = n_y_start
                # On avait retiré un carreau précédemment, on le rajoute…
                end_y_target_raw = min(n_y_start + int(1.25 * carreau), 1000)
                break
        # RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width)
--- a/submit_batches.py
+++ b/submit_batches.py
@ -72,8 +72,6 @@ def main():
    print("-" * 50)
    print("All batch jobs have been initiated.")
    print("Save the Batch Job Names above. You can monitor them with:")
    print("  client.batches.get(name='YOUR_BATCH_JOB_NAME')")
 if __name__ == "__main__":
    main()
--- a/update_ods.py
+++ b/update_ods.py
@ -1,3 +1,4 @@
 import argparse
 import os
 import sys
 import json
@ -12,12 +13,13 @@ ODS_PATH = "/home/sebastien/Rust/gestion_classe/Staging/current_eval.ods"
 TARGET_DIR_NAME = "A Rendre"
 def main():
-    if len(sys.argv) < 2:
+    parser = argparse.ArgumentParser(description="Update ODS with student scores.")
-        # Default to current directory if not provided, or raise error
+    parser.add_argument("work_dir", nargs="?", default=os.getcwd(), help="Directory to process")
-        work_dir = os.getcwd()
+    parser.add_argument("--sum", action="store_true", help="Write only the total sum per student")
-    else:
+    args = parser.parse_args()
        work_dir = os.path.abspath(sys.argv[1])
    work_dir = os.path.abspath(args.work_dir)
    all_labels = read_all_labels(Path(work_dir))
    a_rendre_path = os.path.join(work_dir, TARGET_DIR_NAME)
@ -101,53 +103,65 @@ def main():
        # Start filling from Row 2 (index 2), immediately below the name line
        start_row = 2
-        # for i, key in enumerate(scores_data.keys()):
+        if args.sum:
-        for i, key in enumerate(all_labels):
+            # Calculate total
-            row_idx = start_row + i
+            total = 0.0
-
+            for val in scores_data.values():
            # Ensure we don't go out of bounds
            if row_idx >= sheet.nrows():
                sheet.append_rows(1)
            if key in scores_data:
                val_str = str(scores_data[key])
            else:
                val_str = ""
            # Logic: if "" -> "NT"
            new_val = "NT" if val_str == "" else val_str
            cell = sheet[row_idx, col_idx]
            current_val = cell.value
            # Conflict Detection
            # Normalize current ODS value to string for comparison
            # ODS might store 2.0 as float 2.0. JSON has "2.0".
            is_different = False
            if current_val is not None and current_val != "":
                # specific check to handle float/string mismatch (2.0 vs "2.0")
                try:
-                    if float(str(current_val)) != float(str(new_val)):
+                    total += float(val)
-                        is_different = True
+                except (ValueError, TypeError):
-                except ValueError:
+                    continue
                    # If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
                    if str(current_val).strip() != str(new_val).strip():
                        is_different = True
-                if is_different:
+            cell = sheet[start_row, col_idx]
-                    print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
+            cell.set_value(total)
-                          f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
+            print(f"Set sum for {item}: {total}")
        else:
            for i, key in enumerate(all_labels):
                row_idx = start_row + i
-            # Set value
+                # Ensure we don't go out of bounds
-            # Try to set as float if it looks like a number, otherwise string
+                if row_idx >= sheet.nrows():
-            if new_val == "NT":
+                    sheet.append_rows(1)
-                cell.set_value(new_val)
+
-            else:
+                if key in scores_data:
-                try:
+                    val_str = str(scores_data[key])
-                    cell.set_value(float(new_val))
+                else:
-                except ValueError:
+                    val_str = ""
                # Logic: if "" -> "NT"
                new_val = "NT" if val_str == "" else val_str
                cell = sheet[row_idx, col_idx]
                current_val = cell.value
                # Conflict Detection
                # Normalize current ODS value to string for comparison
                # ODS might store 2.0 as float 2.0. JSON has "2.0".
                is_different = False
                if current_val is not None and current_val != "":
                    # specific check to handle float/string mismatch (2.0 vs "2.0")
                    try:
                        if float(str(current_val)) != float(str(new_val)):
                            is_different = True
                    except ValueError:
                        # If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
                        if str(current_val).strip() != str(new_val).strip():
                            is_different = True
                    if is_different:
                        print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
                              f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
                # Set value
                # Try to set as float if it looks like a number, otherwise string
                if new_val == "NT":
                    cell.set_value(new_val)
                else:
                    try:
                        cell.set_value(float(new_val))
                    except ValueError:
                        cell.set_value(new_val)
    print("Saving ODS file...")
    doc.save()
--- a/utils.py
+++ b/utils.py
@ -14,7 +14,7 @@ def enonce_total(base_dir):
    if not text_dir.is_dir():
        return ""
-    files = [f for f in text_dir.iterdir() if f.is_file()]
+    files = [f for f in text_dir.iterdir() if f.is_file() and f.suffix not in [".pdf", ".tex"]]
    files.sort(key=lambda f: natural_key(f.name))
    output = []