Miscs (Interro 28)
parent
0836d5809d
commit
7e7045293a
60
Readme.org
60
Readme.org
|
|
@ -1,10 +1,11 @@
|
||||||
#+title: Script
|
#+title: Script
|
||||||
#+author: Sébastien Miquel
|
#+author: Sébastien Miquel
|
||||||
#+date: 14-03-2026
|
#+date: 14-03-2026
|
||||||
# Time-stamp: <08-05-26 22:52>
|
# Time-stamp: <14-05-26 08:55>
|
||||||
#+OPTIONS:
|
#+OPTIONS:
|
||||||
|
|
||||||
* Quézaco
|
* Méta
|
||||||
|
** Quézaco
|
||||||
|
|
||||||
Ce dépôt contient un certain nombre de script Python que j'utilise
|
Ce dépôt contient un certain nombre de script Python que j'utilise
|
||||||
pour faire corriger des copies par Gemini.
|
pour faire corriger des copies par Gemini.
|
||||||
|
|
@ -20,7 +21,7 @@ pour faire corriger des copies par Gemini.
|
||||||
4. Ces annotations manuscrites sont lues et recompilées en une
|
4. Ces annotations manuscrites sont lues et recompilées en une
|
||||||
version de la copie pour l'élève.
|
version de la copie pour l'élève.
|
||||||
|
|
||||||
* Disclaimer
|
** Disclaimer
|
||||||
|
|
||||||
J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai
|
J'utilise régulièrement cet outil et j'en suis satisfait, mais j'ai
|
||||||
fait peu d'efforts pour le rendre universel et simple à l'emploi.
|
fait peu d'efforts pour le rendre universel et simple à l'emploi.
|
||||||
|
|
@ -37,9 +38,9 @@ examples du rendu final (dans le sous dossier =BGnot=).
|
||||||
Cette situation s'améliorera peut-être, mais faciliter l'utilisation
|
Cette situation s'améliorera peut-être, mais faciliter l'utilisation
|
||||||
de ce système n'est pas une priorité.
|
de ce système n'est pas une priorité.
|
||||||
|
|
||||||
* Requirements
|
** Requirements
|
||||||
|
|
||||||
** Python
|
*** Python
|
||||||
|
|
||||||
Libraries :
|
Libraries :
|
||||||
|
|
||||||
|
|
@ -47,13 +48,13 @@ Libraries :
|
||||||
pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google
|
pip install numpy pandas matplotlib pillow pydantic pypdf pdf2image reportlab img2pdf pymupdf ftfy ezodf google
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
|
||||||
** Poppler (for pdf2image)
|
*** Poppler (for pdf2image)
|
||||||
|
|
||||||
+ Linux : install poppler-utils
|
+ Linux : install poppler-utils
|
||||||
+ Windows : Download from: https://github.com/oschwartz10612/poppler-windows
|
+ Windows : Download from: https://github.com/oschwartz10612/poppler-windows
|
||||||
and add it to your PATH
|
and add it to your PATH
|
||||||
|
|
||||||
** Accès à Gemini
|
*** Accès à Gemini
|
||||||
|
|
||||||
Il faut créer une clef API pour Gemini (pas facile).
|
Il faut créer une clef API pour Gemini (pas facile).
|
||||||
|
|
||||||
|
|
@ -66,7 +67,7 @@ Puis ajouter =GEMINI_API_KEY= à l'environnement avec :
|
||||||
export GEMINI_API_KEY=…
|
export GEMINI_API_KEY=…
|
||||||
#+END_SRC
|
#+END_SRC
|
||||||
|
|
||||||
* Correction d'un paquet de copies
|
** Correction d'un paquet de copies
|
||||||
|
|
||||||
1. Créer un fichier =names= dans le dossier courant, avec les
|
1. Créer un fichier =names= dans le dossier courant, avec les
|
||||||
noms/prénoms des élèves, un par ligne
|
noms/prénoms des élèves, un par ligne
|
||||||
|
|
@ -83,7 +84,8 @@ export GEMINI_API_KEY=…
|
||||||
pour tel truc, etc)
|
pour tel truc, etc)
|
||||||
6. Suivre les étapes plus bas.
|
6. Suivre les étapes plus bas.
|
||||||
|
|
||||||
* Prétraitement
|
* Étapes et Script
|
||||||
|
** Prétraitement
|
||||||
|
|
||||||
1. =./rotate_all.sh Interro=
|
1. =./rotate_all.sh Interro=
|
||||||
(facultatif)
|
(facultatif)
|
||||||
|
|
@ -107,14 +109,14 @@ export GEMINI_API_KEY=…
|
||||||
|
|
||||||
Rerun on a single file with =python cutleft.py Interro/Copie01.pdf=
|
Rerun on a single file with =python cutleft.py Interro/Copie01.pdf=
|
||||||
|
|
||||||
* Génération d'information sur l'énoncé
|
** Génération d'information sur l'énoncé
|
||||||
|
|
||||||
1. =python enonce_info.py Interro= (gestion perso)
|
1. =python enonce_info.py Interro= (gestion perso)
|
||||||
OU
|
OU
|
||||||
2. =python gemini_for_enonce.py Interro=
|
2. =python gemini_for_enonce.py Interro=
|
||||||
+ Nécessite =enonce.tex/org= et `correction.tex/org`
|
+ Nécessite =enonce.tex/org= et `correction.tex/org`
|
||||||
|
|
||||||
* Labelisation et regroupement
|
** Labelisation et regroupement
|
||||||
|
|
||||||
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||||
|
|
||||||
|
|
@ -130,25 +132,27 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||||
+ Quand un label est manquant, il est possible de cliquer sur
|
+ Quand un label est manquant, il est possible de cliquer sur
|
||||||
l'image, ce qui copie les coordonnées dans le presse papier
|
l'image, ce qui copie les coordonnées dans le presse papier
|
||||||
(sous linux…), puis on peut l'ajouter à la main.
|
(sous linux…), puis on peut l'ajouter à la main.
|
||||||
+ Utilisation de `_`, `|…` et `…|`
|
+ Utilisation de `_`, `|…` et `…|` :
|
||||||
|
+ `|…` n'est pas arrêté verticalement par son type opposé.
|
||||||
|
+ `…|` est stoppé horizontalement par le `|…` le plus proche.
|
||||||
Pour modifier une seule copie :
|
Pour modifier une seule copie :
|
||||||
=python plotting.py Interro/Copie01.pdf=
|
=python plotting.py Interro/Copie01.pdf=
|
||||||
|
|
||||||
It also generates les =Copie01.json=, à partir des =Copie01_01.json=
|
It also generates les =Copie01.json=, à partir des =Copie01_01.json=
|
||||||
3. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
|
1. En cas de soucis, (par exemple les pages ne sont pas dans le bon ordre)
|
||||||
- Réordonner les pages du fichier pdf
|
- Réordonner les pages du fichier pdf
|
||||||
- Rerun =python cutleft.py Interro/Copie{id}=
|
- Rerun =python cutleft.py Interro/Copie{id}=
|
||||||
- Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
|
- Rerun =python gemini_dir_batching.py Interro/Copie{id}= ?? À
|
||||||
vérifier, pas sûr que ça marche.
|
vérifier, pas sûr que ça marche.
|
||||||
4. =python splitting_int.py Interro=
|
3. =python splitting_int.py Interro=
|
||||||
|
|
||||||
Découpe les copies suivant les exercices
|
Découpe les copies suivant les exercices
|
||||||
5. =python grouping.py Interro=
|
4. =python grouping.py Interro=
|
||||||
|
|
||||||
Regroupe les mêmes questions de différentes copies en groupes de
|
Regroupe les mêmes questions de différentes copies en groupes de
|
||||||
tailles raisonnables.
|
tailles raisonnables.
|
||||||
|
|
||||||
* Correction et annotation
|
** Correction et annotation
|
||||||
|
|
||||||
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||||
|
|
||||||
|
|
@ -170,16 +174,18 @@ Set proxy with ~export HTTPS_PROXY="http://10.0.0.1:3128"~
|
||||||
Pour diminuer le coût, il est possible de batch les requêtes, qui
|
Pour diminuer le coût, il est possible de batch les requêtes, qui
|
||||||
seront alors traitées sous au plus 24h.
|
seront alors traitées sous au plus 24h.
|
||||||
+ =python correction.py Interro --batch=
|
+ =python correction.py Interro --batch=
|
||||||
|
+ OU =python correction.py Interro --batch-from 'Ex 4'=
|
||||||
+ =python submit_batches.py Interro=
|
+ =python submit_batches.py Interro=
|
||||||
+ =python batch_status.py=
|
+ =python batch_status.py=
|
||||||
+ =python fetch_batched_results.py Interro=
|
+ =python fetch_batched_results.py Interro=
|
||||||
+ =python correction.py Interro --deal-with-batched=
|
+ =python correction.py Interro --deal-with-batched=
|
||||||
3. =python post-correction.py Interro=
|
3. =python post-correction.py Interro=
|
||||||
|
|
||||||
Essaye de corriger des erreurs d'encodage/d'accents dans
|
- Essaye de corriger des erreurs d'encodage/d'accents dans
|
||||||
=correction.json=.
|
=correction.json=.
|
||||||
|
- aussi échappe les `_` en dehors du mode math, pour LaTeX.
|
||||||
|
|
||||||
* Génération des copies annotées
|
** Génération des copies annotées
|
||||||
|
|
||||||
1. =python annotating.py Interro= (facultatif)
|
1. =python annotating.py Interro= (facultatif)
|
||||||
|
|
||||||
|
|
@ -208,7 +214,7 @@ OU
|
||||||
- Vider =Syncthing/Annotées= sur la tablette et localement.
|
- Vider =Syncthing/Annotées= sur la tablette et localement.
|
||||||
À automatiser, aussi c'est lent…
|
À automatiser, aussi c'est lent…
|
||||||
|
|
||||||
* Lecture de la correction manuscrite
|
** Lecture de la correction manuscrite
|
||||||
|
|
||||||
1. =python from_tablette.py Interro= (gestion perso)
|
1. =python from_tablette.py Interro= (gestion perso)
|
||||||
|
|
||||||
|
|
@ -243,6 +249,7 @@ OU
|
||||||
+ =gestion_classe ne= pour créer l'interro puis
|
+ =gestion_classe ne= pour créer l'interro puis
|
||||||
+ =gestion_classe we= (set barème here)
|
+ =gestion_classe we= (set barème here)
|
||||||
+ =python update_ods.py Interro=
|
+ =python update_ods.py Interro=
|
||||||
|
ou =python update_ods.py Interro --sum= (en l'absence de barème)
|
||||||
+ =gestion_classe re=
|
+ =gestion_classe re=
|
||||||
+ =gestion_classe wsent=
|
+ =gestion_classe wsent=
|
||||||
+ =python add_final_score.py Interro21=
|
+ =python add_final_score.py Interro21=
|
||||||
|
|
@ -252,10 +259,7 @@ OU
|
||||||
+ update the copies from =miqmacs.fr/admin=.
|
+ update the copies from =miqmacs.fr/admin=.
|
||||||
6. (gestion perso) Impression d'une copie. Via Evince » print to pdf.
|
6. (gestion perso) Impression d'une copie. Via Evince » print to pdf.
|
||||||
|
|
||||||
|
** Recorrection d'une seule copie (peu testé)
|
||||||
|
|
||||||
* Recorrection d'une seule copie (peu testé)
|
|
||||||
|
|
||||||
|
|
||||||
!! Attention, refaire ne marchera pas si tu fais une annotation non
|
!! Attention, refaire ne marchera pas si tu fais une annotation non
|
||||||
groupée into refaire !!
|
groupée into refaire !!
|
||||||
|
|
|
||||||
|
|
@ -160,11 +160,35 @@ def main():
|
||||||
|
|
||||||
used_prefixes.add(unique_prefix)
|
used_prefixes.add(unique_prefix)
|
||||||
|
|
||||||
|
existing_items = set()
|
||||||
|
max_existing_group = 0
|
||||||
|
|
||||||
|
|
||||||
|
if not args.overwrite and os.path.exists(bgnot_dir):
|
||||||
|
for d in os.listdir(bgnot_dir):
|
||||||
|
if d.startswith(f"{unique_prefix} G"):
|
||||||
|
try:
|
||||||
|
g_id = int(d.split(' G')[-1])
|
||||||
|
max_existing_group = max(max_existing_group, g_id)
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
bnote_path = os.path.join(bgnot_dir, d, "bnote.json")
|
||||||
|
if os.path.exists(bnote_path):
|
||||||
|
with open(bnote_path, "r") as bf:
|
||||||
|
bdata = json.load(bf)
|
||||||
|
for img in bdata.get("images", []):
|
||||||
|
existing_items.add((img["id"], img["label"]))
|
||||||
|
|
||||||
items_to_render = []
|
items_to_render = []
|
||||||
for sid, lbls in results.items():
|
for sid, lbls in results.items():
|
||||||
for lbl in labels:
|
for lbl in labels:
|
||||||
if lbl in lbls:
|
if lbl in lbls:
|
||||||
items_to_render.append((sid, lbl, lbls[lbl]))
|
# Only add if it hasn't been generated yet
|
||||||
|
if (sid, lbl) not in existing_items:
|
||||||
|
items_to_render.append((sid, lbl, lbls[lbl]))
|
||||||
|
if not items_to_render:
|
||||||
|
continue
|
||||||
|
|
||||||
# Sort structurally: by student id and label
|
# Sort structurally: by student id and label
|
||||||
items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1])))
|
items_to_render.sort(key=lambda x: (natural_key(x[0]), natural_key(x[1])))
|
||||||
|
|
@ -217,7 +241,7 @@ def main():
|
||||||
batches = batches2
|
batches = batches2
|
||||||
|
|
||||||
for i, batch in enumerate(batches, 1):
|
for i, batch in enumerate(batches, 1):
|
||||||
save_batch(batch, unique_prefix, i, root_dir, args.overwrite)
|
save_batch(batch, unique_prefix, max_existing_group + i, root_dir, args.overwrite)
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|
|
||||||
182
correction.py
182
correction.py
|
|
@ -5,14 +5,11 @@ from pathlib import Path
|
||||||
import argparse
|
import argparse
|
||||||
|
|
||||||
if len(sys.argv) < 2:
|
if len(sys.argv) < 2:
|
||||||
sys.exit("Usage: python script.py InterroTest/Ex 2/Group_1.jpg OR <InputDir>")
|
sys.exit("Usage: python script.py 'InterroTest/Ex 2/Group_1.jpg' OR <InputDir> OR 'file1' 'file2'")
|
||||||
|
|
||||||
arg_path = Path(sys.argv[1])
|
|
||||||
tasks = [] # List of tuples: (filepath_str, label_str)
|
|
||||||
results = {}
|
|
||||||
|
|
||||||
# Parse Arguments
|
# Parse Arguments
|
||||||
parser = argparse.ArgumentParser()
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("paths", nargs="+", help="List of images or directories")
|
||||||
parser.add_argument("--overwrite", action="store_true",
|
parser.add_argument("--overwrite", action="store_true",
|
||||||
help="Force redo requests even if output exists")
|
help="Force redo requests even if output exists")
|
||||||
parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer")
|
parser.add_argument("--limit", type=int, help="limit calls to gemini rpo integer")
|
||||||
|
|
@ -20,28 +17,40 @@ parser.add_argument("--refaire", action="store_true",
|
||||||
help="Redo specific copies/labels defined in refaire.json")
|
help="Redo specific copies/labels defined in refaire.json")
|
||||||
parser.add_argument("--batch", action="store_true",
|
parser.add_argument("--batch", action="store_true",
|
||||||
help="Generate a JSONL file of requests to send to the Gemini Batch API")
|
help="Generate a JSONL file of requests to send to the Gemini Batch API")
|
||||||
|
parser.add_argument("--batch-from", type=str, metavar="LABEL",
|
||||||
|
help="Do live requests before LABEL, and batch requests from LABEL onwards")
|
||||||
parser.add_argument("--deal-with-batched", action="store_true",
|
parser.add_argument("--deal-with-batched", action="store_true",
|
||||||
help="Process a JSONL file containing completed batch results")
|
help="Process a JSONL file containing completed batch results")
|
||||||
args, _ = parser.parse_known_args()
|
args, _ = parser.parse_known_args()
|
||||||
|
|
||||||
|
tasks = [] # List of tuples: (filepath_str, label_str)
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
|
||||||
|
for path_str in args.paths:
|
||||||
|
arg_path = Path(path_str)
|
||||||
|
|
||||||
if arg_path.suffix == ".jpg":
|
|
||||||
INPUT_DIR = str(arg_path.parents[1])
|
|
||||||
FULL_LABEL = arg_path.parent.name
|
|
||||||
tasks.append((str(arg_path), FULL_LABEL))
|
|
||||||
results[FULL_LABEL] = []
|
|
||||||
else:
|
|
||||||
# Directory behaviour
|
|
||||||
INPUT_DIR = str(arg_path)
|
|
||||||
if not arg_path.exists():
|
if not arg_path.exists():
|
||||||
sys.exit(f"Directory {INPUT_DIR} not found.")
|
print(f"Warning: {path_str} not found. Skipping.")
|
||||||
|
continue
|
||||||
|
|
||||||
for sub in arg_path.iterdir():
|
if arg_path.is_file() and arg_path.suffix.lower() == ".jpg":
|
||||||
if sub.is_dir() and sub.name.startswith("Ex"):
|
# Handle individual file
|
||||||
label = sub.name
|
# Note: assumes structure InterroTest/Ex 2/Group_1.jpg to get parents[1]
|
||||||
|
label = arg_path.parent.name
|
||||||
|
tasks.append((str(arg_path), label))
|
||||||
|
if label not in results:
|
||||||
results[label] = []
|
results[label] = []
|
||||||
for img in sub.glob("*.jpg"):
|
|
||||||
tasks.append((str(img), label))
|
elif arg_path.is_dir():
|
||||||
|
# Handle directory (original behavior)
|
||||||
|
for sub in arg_path.iterdir():
|
||||||
|
if sub.is_dir() and sub.name.startswith("Ex"):
|
||||||
|
label = sub.name
|
||||||
|
if label not in results:
|
||||||
|
results[label] = []
|
||||||
|
for img in sub.glob("*.jpg"):
|
||||||
|
tasks.append((str(img), label))
|
||||||
|
|
||||||
my_prompt = """I'm giving you an image of several written answers to an exam.
|
my_prompt = """I'm giving you an image of several written answers to an exam.
|
||||||
|
|
||||||
|
|
@ -135,17 +144,15 @@ You are asked to score the question or exercice labeled `<<label>>`,
|
||||||
do not score or give feedback to any other question."""
|
do not score or give feedback to any other question."""
|
||||||
|
|
||||||
def make_prompt(full_label):
|
def make_prompt(full_label):
|
||||||
# l = full_label.split(" ")
|
|
||||||
# ex_label = l[0] + " " + l[1]
|
|
||||||
# text = (Path(INPUT_DIR) / "Text" / ex_label).read_text()
|
|
||||||
# corr = (Path(INPUT_DIR) / "Sol" / ex_label).read_text()
|
|
||||||
# persp = (Path(INPUT_DIR) / "Persp" / ex_label).read_text()
|
|
||||||
def read_longest_prefix_file(subdir):
|
def read_longest_prefix_file(subdir):
|
||||||
dir_path = Path(INPUT_DIR) / subdir
|
dir_path = Path(INPUT_DIR) / subdir
|
||||||
matches = [f for f in dir_path.iterdir() if f.is_file() and full_label.startswith(f.name)]
|
matches = [f for f in dir_path.iterdir()
|
||||||
|
if f.is_file()
|
||||||
|
and full_label.startswith(f.name)
|
||||||
|
and f.suffix not in [".pdf", ".tex"]]
|
||||||
if not matches:
|
if not matches:
|
||||||
return ""
|
return ""
|
||||||
return max(matches, key=lambda f: len(f.name)).read_text()
|
return max(matches, key=lambda f: len(f.name)).read_text(encoding="utf-8", errors="replace")
|
||||||
|
|
||||||
text = read_longest_prefix_file("Text")
|
text = read_longest_prefix_file("Text")
|
||||||
corr = read_longest_prefix_file("Sol")
|
corr = read_longest_prefix_file("Sol")
|
||||||
|
|
@ -482,7 +489,7 @@ def handle_label_errors(pid, label, res, pdf_path):
|
||||||
error_type = res.get("error")
|
error_type = res.get("error")
|
||||||
|
|
||||||
all_labels = read_all_labels(INPUT_DIR)
|
all_labels = read_all_labels(INPUT_DIR)
|
||||||
labels_txt = (Path(INPUT_DIR) / "labels").read_text()
|
labels_txt = (Path(INPUT_DIR) / "labels").read_text(encoding="utf-8", errors="replace")
|
||||||
enonce = enonce_total(INPUT_DIR)
|
enonce = enonce_total(INPUT_DIR)
|
||||||
|
|
||||||
if error_type == "wrong-label":
|
if error_type == "wrong-label":
|
||||||
|
|
@ -499,7 +506,7 @@ Here is the full content of the exam :
|
||||||
|
|
||||||
{enonce}
|
{enonce}
|
||||||
|
|
||||||
Here is a list of all possible lables. You need to answer with one of these :
|
Here is a list of all possible labels. You need to answer with one of these :
|
||||||
|
|
||||||
{labels_txt}
|
{labels_txt}
|
||||||
"""
|
"""
|
||||||
|
|
@ -780,62 +787,89 @@ if __name__ == "__main__":
|
||||||
print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr)
|
print(f"Warning: --refaire flag used, but {refaire_path} not found.", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
if args.batch:
|
if args.batch or args.batch_from:
|
||||||
batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
|
from utils import read_all_labels
|
||||||
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
|
all_labels = read_all_labels(INPUT_DIR)
|
||||||
|
|
||||||
count_flash = 0
|
batch_tasks = []
|
||||||
count_pro = 0
|
if args.batch_from:
|
||||||
|
if args.batch_from not in all_labels:
|
||||||
|
sys.exit(f"Error: Label '{args.batch_from}' not found. Available labels: {all_labels}")
|
||||||
|
|
||||||
with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
|
target_idx = all_labels.index(args.batch_from)
|
||||||
open(batch_pro_file, "w", encoding="utf-8") as f_pro:
|
live_tasks = []
|
||||||
|
|
||||||
for task in tasks_to_process:
|
for task in tasks_to_process:
|
||||||
file_path, label = task[0], task[1]
|
lbl = task[1]
|
||||||
group_name = os.path.splitext(file_path)[0]
|
# Any label found sequentially equal or after `args.batch_from` gets batched
|
||||||
json_path = group_name + '.json'
|
if lbl in all_labels and all_labels.index(lbl) >= target_idx:
|
||||||
|
batch_tasks.append(task)
|
||||||
|
else:
|
||||||
|
live_tasks.append(task)
|
||||||
|
|
||||||
with open(json_path, 'r') as jf:
|
tasks_to_process = live_tasks # Keep live tasks to be run right after
|
||||||
group_data = json.load(jf)
|
else:
|
||||||
use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
|
batch_tasks = tasks_to_process
|
||||||
|
tasks_to_process = [] # Run nothing live if just `--batch`
|
||||||
|
|
||||||
image_data = Path(file_path).read_bytes()
|
if batch_tasks:
|
||||||
b64_img = base64.b64encode(image_data).decode("utf-8")
|
batch_flash_file = Path(INPUT_DIR) / "batch_requests_flash.jsonl"
|
||||||
|
batch_pro_file = Path(INPUT_DIR) / "batch_requests_pro.jsonl"
|
||||||
|
|
||||||
# Format payload matching Gemini Batch API file requirements
|
count_flash = 0
|
||||||
req = {
|
count_pro = 0
|
||||||
"key": file_path, # The ID returned in the output file
|
|
||||||
"request": {
|
with open(batch_flash_file, "w", encoding="utf-8") as f_flash, \
|
||||||
"contents": [{
|
open(batch_pro_file, "w", encoding="utf-8") as f_pro:
|
||||||
"role": "user",
|
|
||||||
"parts": [
|
for task in batch_tasks:
|
||||||
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
|
file_path, label = task[0], task[1]
|
||||||
{"text": make_prompt(label)}
|
group_name = os.path.splitext(file_path)[0]
|
||||||
]
|
json_path = group_name + '.json'
|
||||||
}],
|
|
||||||
"generation_config": {
|
with open(json_path, 'r') as jf:
|
||||||
"temperature": 1.0,
|
group_data = json.load(jf)
|
||||||
"topP": 0.95,
|
use_flash = len(group_data) >= 4 or group_data[-1][2] <= 500
|
||||||
"maxOutputTokens": 65535,
|
|
||||||
"responseMimeType": "application/json",
|
image_data = Path(file_path).read_bytes()
|
||||||
"responseSchema": UNROLLED_SCHEMA
|
b64_img = base64.b64encode(image_data).decode("utf-8")
|
||||||
# TypeAdapter(List[EvaluationEntry]).json_schema()
|
|
||||||
|
# Format payload matching Gemini Batch API file requirements
|
||||||
|
req = {
|
||||||
|
"key": file_path, # The ID returned in the output file
|
||||||
|
"request": {
|
||||||
|
"contents": [{
|
||||||
|
"role": "user",
|
||||||
|
"parts": [
|
||||||
|
{"inlineData": {"mimeType": "image/jpeg", "data": b64_img}},
|
||||||
|
{"text": make_prompt(label)}
|
||||||
|
]
|
||||||
|
}],
|
||||||
|
"generation_config": {
|
||||||
|
"temperature": 1.0,
|
||||||
|
"topP": 0.95,
|
||||||
|
"maxOutputTokens": 65535,
|
||||||
|
"responseMimeType": "application/json",
|
||||||
|
"responseSchema": UNROLLED_SCHEMA
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
if use_flash:
|
if use_flash:
|
||||||
f_flash.write(json.dumps(req) + "\n")
|
f_flash.write(json.dumps(req) + "\n")
|
||||||
count_flash += 1
|
count_flash += 1
|
||||||
else:
|
else:
|
||||||
f_pro.write(json.dumps(req) + "\n")
|
f_pro.write(json.dumps(req) + "\n")
|
||||||
count_pro += 1
|
count_pro += 1
|
||||||
|
|
||||||
print(f"Batch generation complete.")
|
print(f"Batch generation complete.")
|
||||||
print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
|
print(f" - {count_flash} requests saved to {batch_flash_file} (for {MODEL_ID_flash})")
|
||||||
print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
|
print(f" - {count_pro} requests saved to {batch_pro_file} (for {MODEL_ID_pro})")
|
||||||
print("Upload these files via the File API and create two separate batch jobs.")
|
print("Upload these files via the File API and create two separate batch jobs.")
|
||||||
sys.exit(0)
|
|
||||||
|
# If there's no live tasks to do, and we aren't doing a batched ingestion, exit right away
|
||||||
|
if not tasks_to_process and not args.deal_with_batched:
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
batched_responses = {}
|
batched_responses = {}
|
||||||
if args.deal_with_batched:
|
if args.deal_with_batched:
|
||||||
|
|
@ -883,7 +917,7 @@ if __name__ == "__main__":
|
||||||
print("Time elapsed : ", end_time - start_time)
|
print("Time elapsed : ", end_time - start_time)
|
||||||
print("Requests to pro / flash : ", pro_count, flash_count)
|
print("Requests to pro / flash : ", pro_count, flash_count)
|
||||||
if errors_summary:
|
if errors_summary:
|
||||||
print("\n--- Summary of Exceptions ---", file=sys.stderr)
|
print("\n--- Summary of Exceptions (You can use several images on one instance) ---", file=sys.stderr)
|
||||||
for (err, file) in errors_summary:
|
for (err, file) in errors_summary:
|
||||||
print(err, file=sys.stderr)
|
print(err, file=sys.stderr)
|
||||||
escaped_path = shlex.quote(str(file))
|
escaped_path = shlex.quote(str(file))
|
||||||
|
|
|
||||||
|
|
@ -296,7 +296,7 @@ def process_copy_group(group_key, files):
|
||||||
continue # Retry immediately
|
continue # Retry immediately
|
||||||
else:
|
else:
|
||||||
name = "Unknown"
|
name = "Unknown"
|
||||||
|
annota.name = name
|
||||||
# Save result
|
# Save result
|
||||||
with open(output_json, "w", encoding="utf-8") as f:
|
with open(output_json, "w", encoding="utf-8") as f:
|
||||||
json.dump(annota.model_dump(), f, indent=2)
|
json.dump(annota.model_dump(), f, indent=2)
|
||||||
|
|
|
||||||
|
|
@ -12386,6 +12386,7 @@ maternelles
|
||||||
maternité
|
maternité
|
||||||
mathématicien
|
mathématicien
|
||||||
mathématique
|
mathématique
|
||||||
|
mathématiquement
|
||||||
mathématiques
|
mathématiques
|
||||||
maths
|
maths
|
||||||
matière
|
matière
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,46 @@ import argparse
|
||||||
if len(sys.argv) < 2:
|
if len(sys.argv) < 2:
|
||||||
sys.exit("Usage: python script.py <InputDir>")
|
sys.exit("Usage: python script.py <InputDir>")
|
||||||
|
|
||||||
|
def escape_latex_underscores(text):
|
||||||
|
r"""
|
||||||
|
Escape '_' outside LaTeX math environments.
|
||||||
|
Supports:
|
||||||
|
- $...$
|
||||||
|
- $$...$$
|
||||||
|
- \( ... \)
|
||||||
|
- \[ ... \]
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Regex matching LaTeX math blocks
|
||||||
|
math_pattern = re.compile(
|
||||||
|
r'(\$\$.*?\$\$|' # $$...$$
|
||||||
|
r'\$.*?\$|' # $...$
|
||||||
|
r'\\\(.*?\\\)|' # \( ... \)
|
||||||
|
r'\\\[.*?\\\])', # \[ ... \]
|
||||||
|
re.DOTALL
|
||||||
|
)
|
||||||
|
|
||||||
|
parts = []
|
||||||
|
last_end = 0
|
||||||
|
|
||||||
|
for match in math_pattern.finditer(text):
|
||||||
|
start, end = match.span()
|
||||||
|
|
||||||
|
# Escape underscores outside math
|
||||||
|
outside = text[last_end:start].replace('_', r'\_')
|
||||||
|
parts.append(outside)
|
||||||
|
|
||||||
|
# Keep math block unchanged
|
||||||
|
parts.append(match.group(0))
|
||||||
|
|
||||||
|
last_end = end
|
||||||
|
|
||||||
|
# Remaining text after last math block
|
||||||
|
outside = text[last_end:].replace('_', r'\_')
|
||||||
|
parts.append(outside)
|
||||||
|
|
||||||
|
return ''.join(parts)
|
||||||
|
|
||||||
arg_path = Path(sys.argv[1])
|
arg_path = Path(sys.argv[1])
|
||||||
tasks = [] # List of tuples: (filepath_str, label_str)
|
tasks = [] # List of tuples: (filepath_str, label_str)
|
||||||
results = {}
|
results = {}
|
||||||
|
|
@ -79,7 +119,8 @@ def clean_string(s: str) -> str:
|
||||||
if '\x00' in s:
|
if '\x00' in s:
|
||||||
s = fast_fix(s)
|
s = fast_fix(s)
|
||||||
s = s.replace('\x00', '')
|
s = s.replace('\x00', '')
|
||||||
return some_other_replacements(s)
|
s = some_other_replacements(s)
|
||||||
|
return escape_latex_underscores(s)
|
||||||
|
|
||||||
|
|
||||||
def clean_obj(obj):
|
def clean_obj(obj):
|
||||||
|
|
|
||||||
|
|
@ -8,6 +8,9 @@ import shutil
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
|
|
||||||
|
carreau = 1000 // 38
|
||||||
|
|
||||||
|
|
||||||
def decode_json(pdf_file):
|
def decode_json(pdf_file):
|
||||||
file_path = Path(pdf_file)
|
file_path = Path(pdf_file)
|
||||||
with open(file_path.with_suffix(".json"), "r") as f:
|
with open(file_path.with_suffix(".json"), "r") as f:
|
||||||
|
|
@ -26,8 +29,7 @@ def decode_json(pdf_file):
|
||||||
for d in bb_list:
|
for d in bb_list:
|
||||||
(b, label) = d["box_2d"], d["label"]
|
(b, label) = d["box_2d"], d["label"]
|
||||||
pn = page_number(b)
|
pn = page_number(b)
|
||||||
carreau = 1000 // 38
|
result.append((label, pn, b[0] - carreau, b[2]-carreau, b[1], b[3]))
|
||||||
result.append((label, pn, b[0] - int(carreau), b[2]-int(carreau), b[1], b[3]))
|
|
||||||
result.sort(key=lambda x: (x[1], x[2]))
|
result.sort(key=lambda x: (x[1], x[2]))
|
||||||
return (name, result)
|
return (name, result)
|
||||||
|
|
||||||
|
|
@ -98,7 +100,7 @@ def split_an_interro(base_dir, input_pdf, coords_list):
|
||||||
|
|
||||||
# RULE 2: Determine stopping label
|
# RULE 2: Determine stopping label
|
||||||
for next_item in coords_list[idx + 1:]:
|
for next_item in coords_list[idx + 1:]:
|
||||||
n_clean, n_type, n_pn, n_y_start, _, _, _ = next_item
|
n_clean, n_type, n_pn, n_y_start, n_y_end, _, _ = next_item
|
||||||
|
|
||||||
if c_type == "L":
|
if c_type == "L":
|
||||||
is_stop = (n_type in ("L", "N"))
|
is_stop = (n_type in ("L", "N"))
|
||||||
|
|
@ -109,7 +111,9 @@ def split_an_interro(base_dir, input_pdf, coords_list):
|
||||||
|
|
||||||
if is_stop:
|
if is_stop:
|
||||||
end_page = n_pn
|
end_page = n_pn
|
||||||
end_y_target_raw = n_y_start
|
# end_y_target_raw = n_y_start
|
||||||
|
# On avait retiré un carreau précédemment, on le rajoute…
|
||||||
|
end_y_target_raw = min(n_y_start + int(1.25 * carreau), 1000)
|
||||||
break
|
break
|
||||||
|
|
||||||
# RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width)
|
# RULES 3 & 4: Calculate horizontal boundaries (0.0 to 1.0 fraction of local page width)
|
||||||
|
|
|
||||||
|
|
@ -72,8 +72,6 @@ def main():
|
||||||
|
|
||||||
print("-" * 50)
|
print("-" * 50)
|
||||||
print("All batch jobs have been initiated.")
|
print("All batch jobs have been initiated.")
|
||||||
print("Save the Batch Job Names above. You can monitor them with:")
|
|
||||||
print(" client.batches.get(name='YOUR_BATCH_JOB_NAME')")
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|
|
||||||
110
update_ods.py
110
update_ods.py
|
|
@ -1,3 +1,4 @@
|
||||||
|
import argparse
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
import json
|
import json
|
||||||
|
|
@ -12,12 +13,13 @@ ODS_PATH = "/home/sebastien/Rust/gestion_classe/Staging/current_eval.ods"
|
||||||
TARGET_DIR_NAME = "A Rendre"
|
TARGET_DIR_NAME = "A Rendre"
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
if len(sys.argv) < 2:
|
parser = argparse.ArgumentParser(description="Update ODS with student scores.")
|
||||||
# Default to current directory if not provided, or raise error
|
parser.add_argument("work_dir", nargs="?", default=os.getcwd(), help="Directory to process")
|
||||||
work_dir = os.getcwd()
|
parser.add_argument("--sum", action="store_true", help="Write only the total sum per student")
|
||||||
else:
|
args = parser.parse_args()
|
||||||
work_dir = os.path.abspath(sys.argv[1])
|
|
||||||
|
|
||||||
|
work_dir = os.path.abspath(args.work_dir)
|
||||||
|
|
||||||
all_labels = read_all_labels(Path(work_dir))
|
all_labels = read_all_labels(Path(work_dir))
|
||||||
|
|
||||||
a_rendre_path = os.path.join(work_dir, TARGET_DIR_NAME)
|
a_rendre_path = os.path.join(work_dir, TARGET_DIR_NAME)
|
||||||
|
|
@ -101,53 +103,65 @@ def main():
|
||||||
# Start filling from Row 2 (index 2), immediately below the name line
|
# Start filling from Row 2 (index 2), immediately below the name line
|
||||||
start_row = 2
|
start_row = 2
|
||||||
|
|
||||||
# for i, key in enumerate(scores_data.keys()):
|
if args.sum:
|
||||||
for i, key in enumerate(all_labels):
|
# Calculate total
|
||||||
row_idx = start_row + i
|
total = 0.0
|
||||||
|
for val in scores_data.values():
|
||||||
# Ensure we don't go out of bounds
|
|
||||||
if row_idx >= sheet.nrows():
|
|
||||||
sheet.append_rows(1)
|
|
||||||
|
|
||||||
if key in scores_data:
|
|
||||||
val_str = str(scores_data[key])
|
|
||||||
else:
|
|
||||||
val_str = ""
|
|
||||||
|
|
||||||
# Logic: if "" -> "NT"
|
|
||||||
new_val = "NT" if val_str == "" else val_str
|
|
||||||
|
|
||||||
cell = sheet[row_idx, col_idx]
|
|
||||||
current_val = cell.value
|
|
||||||
|
|
||||||
# Conflict Detection
|
|
||||||
# Normalize current ODS value to string for comparison
|
|
||||||
# ODS might store 2.0 as float 2.0. JSON has "2.0".
|
|
||||||
is_different = False
|
|
||||||
|
|
||||||
if current_val is not None and current_val != "":
|
|
||||||
# specific check to handle float/string mismatch (2.0 vs "2.0")
|
|
||||||
try:
|
try:
|
||||||
if float(str(current_val)) != float(str(new_val)):
|
total += float(val)
|
||||||
is_different = True
|
except (ValueError, TypeError):
|
||||||
except ValueError:
|
continue
|
||||||
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
|
|
||||||
if str(current_val).strip() != str(new_val).strip():
|
|
||||||
is_different = True
|
|
||||||
|
|
||||||
if is_different:
|
cell = sheet[start_row, col_idx]
|
||||||
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
|
cell.set_value(total)
|
||||||
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
|
print(f"Set sum for {item}: {total}")
|
||||||
|
else:
|
||||||
|
for i, key in enumerate(all_labels):
|
||||||
|
row_idx = start_row + i
|
||||||
|
|
||||||
# Set value
|
# Ensure we don't go out of bounds
|
||||||
# Try to set as float if it looks like a number, otherwise string
|
if row_idx >= sheet.nrows():
|
||||||
if new_val == "NT":
|
sheet.append_rows(1)
|
||||||
cell.set_value(new_val)
|
|
||||||
else:
|
if key in scores_data:
|
||||||
try:
|
val_str = str(scores_data[key])
|
||||||
cell.set_value(float(new_val))
|
else:
|
||||||
except ValueError:
|
val_str = ""
|
||||||
|
|
||||||
|
# Logic: if "" -> "NT"
|
||||||
|
new_val = "NT" if val_str == "" else val_str
|
||||||
|
|
||||||
|
cell = sheet[row_idx, col_idx]
|
||||||
|
current_val = cell.value
|
||||||
|
|
||||||
|
# Conflict Detection
|
||||||
|
# Normalize current ODS value to string for comparison
|
||||||
|
# ODS might store 2.0 as float 2.0. JSON has "2.0".
|
||||||
|
is_different = False
|
||||||
|
|
||||||
|
if current_val is not None and current_val != "":
|
||||||
|
# specific check to handle float/string mismatch (2.0 vs "2.0")
|
||||||
|
try:
|
||||||
|
if float(str(current_val)) != float(str(new_val)):
|
||||||
|
is_different = True
|
||||||
|
except ValueError:
|
||||||
|
# If conversion fails (e.g. comparing "NT" to "2.0"), compare strings
|
||||||
|
if str(current_val).strip() != str(new_val).strip():
|
||||||
|
is_different = True
|
||||||
|
|
||||||
|
if is_different:
|
||||||
|
print(f"DEBUG: Conflict for {item} at {key} (Row {row_idx}). "
|
||||||
|
f"Existing: '{current_val}' vs New: '{new_val}'. Overwriting.")
|
||||||
|
|
||||||
|
# Set value
|
||||||
|
# Try to set as float if it looks like a number, otherwise string
|
||||||
|
if new_val == "NT":
|
||||||
cell.set_value(new_val)
|
cell.set_value(new_val)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
cell.set_value(float(new_val))
|
||||||
|
except ValueError:
|
||||||
|
cell.set_value(new_val)
|
||||||
|
|
||||||
print("Saving ODS file...")
|
print("Saving ODS file...")
|
||||||
doc.save()
|
doc.save()
|
||||||
|
|
|
||||||
2
utils.py
2
utils.py
|
|
@ -14,7 +14,7 @@ def enonce_total(base_dir):
|
||||||
if not text_dir.is_dir():
|
if not text_dir.is_dir():
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
files = [f for f in text_dir.iterdir() if f.is_file()]
|
files = [f for f in text_dir.iterdir() if f.is_file() and f.suffix not in [".pdf", ".tex"]]
|
||||||
files.sort(key=lambda f: natural_key(f.name))
|
files.sort(key=lambda f: natural_key(f.name))
|
||||||
|
|
||||||
output = []
|
output = []
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue