![]() The first 8% took like 10 seconds and then it hangs here (for 5 minutes). def transforms2(examples):Įxamples = preprocess_input(_im_resized)ĭs = ds.map(transforms2, remove_columns=)ġ6%|█▋ | 1979/12085 Modifying the processing to non-batch seem’s to be faster but also get stuck for a while after processing a “batch”. rw- 1 meyer bio3d 3.0G Nov 5 21:28 tmpyp8oimeyĬould anyone help me out figuring how to optimize this task ? (Having a dataset of PIL object that need to be resized and scaled as numpy or tensor compatible with tensorflow/keras, to optimize in terms of runtime and disk space.) rw-r-r- 1 meyer bio3d 222K Nov 4 13:23 myo_quant-sdh-data-validation.arrow rw-r-r- 1 meyer bio3d 1.9M Nov 4 13:23 myo_quant-sdh-data-train.arrow rw-r-r- 1 meyer bio3d 534K Nov 4 13:23 myo_quant-sdh-data-test.arrow rw-r-r- 1 meyer bio3d 1.9K Nov 4 13:23 dataset_info.json ![]() rw-r-r- 1 meyer bio3d 8 Nov 4 13:23 LICENSE It took 3 minutes to run three batches and then it crashed due to “lack of space” but I don’t understand how this could happen when the dataset is only 1 gigabits and I have multiple dozen of gigabits of space avaliable.įor the size issue, it looks like my 50 gigabits data volume is full because this single dataset created a huggingface cache folder of 33 gigabits ! I don’t understand how this can happen. 0%| | 0/13 [00:00 2985 writer.write_batch(batch)Ģ986 if update_data and writer is not None: The print("Finshed processing !") print really fast (meaning that the processing goes fast) but then it takes multiple minutes to go to the next batch. The issue I have is that the ds.map() takes a very long time to run and then even crash. Tf_ds_test = ds.to_tf_dataset(columns=, label_cols=, batch_size=BATCH_SIZE, shuffle=True) Tf_ds_val = ds.to_tf_dataset(columns=, label_cols=, batch_size=BATCH_SIZE, shuffle=True) Tf_ds_train = ds.to_tf_dataset(columns=, label_cols=, batch_size=BATCH_SIZE, shuffle=True) _im_resized = tf.image.resize(_img, (256,256))Įxamples.append(preprocess_input(_im_resized))ĭs = ds.map(transforms, remove_columns=, batched=True) I need first to process them in two ways: (i) resizing them using tf.image.resize(image, (256,256)), (ii) rescaling them to using _v2.preprocess_input(x) import tensorflow as tfįrom import load_img, img_to_arrayįrom _v2 import preprocess_inputĭs = load_dataset("corentinm7/MyoQuant-SDH-Data") I’m trying to import them in a Jupyter Notebook to train a model with Keras/Tensorflow. I’ve uploaded my first dataset, consisting of 16.500 images corentinm7/MyoQuant-SDH-Data I’ve been discovering HuggingFace recently. How to optimize it in terms of runtime and disk space ? ![]() TL DR: How to process (resize+rescale) a huggingface dataset of 16.000 PIL-image as numpy array or tensorflow tensor and convert it to tensorflow-dataset.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |