AI

Colab ReviewRadar

hyunjun's developing ๐Ÿฃ 2025. 6. 3. 17:43

 

์•ˆ๋…•ํ•˜์„ธ์š”?
AI์— ๊ด€์‹ฌ์ด ๋งŽ์•„์„œ, ์ด๋ฒˆ์— ์ฒ˜์Œ์œผ๋กœ ์˜ํ™” ๋ฆฌ๋ทฐ ๊ฐ์ • ๋ถ„์„ ํ”„๋กœ์ ํŠธ์— ๋„์ „ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.
์‚ฌ์‹ค ์ธ๊ณต์ง€๋Šฅ์ด๋ผ๋Š” ๊ฒŒ ๋ง‰์—ฐํ•˜๊ฒŒ๋งŒ ๋А๊ปด์กŒ๋Š”๋ฐ,
๊ตฌ๊ธ€ ์ฝ”๋žฉ์—์„œ ์ง์ ‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ณ  ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋ณด๋‹ˆ๊นŒ
์ƒ๊ฐ๋ณด๋‹ค ์žฌ๋ฏธ์žˆ๊ณ  ์‹ ๊ธฐํ•œ ๊ฒฝํ—˜์ด์—ˆ์–ด์š”.
์ด ๊ธ€์—์„œ๋Š” TensorFlow์—์„œ ์ œ๊ณตํ•˜๋Š” IMDB ์˜ํ™” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ 
AI๊ฐ€ ์–ด๋–ป๊ฒŒ ์‚ฌ๋žŒ์˜ ๊ฐ์ •์„ ์ดํ•ดํ•˜๋Š”์ง€ ํ•˜๋‚˜์”ฉ ๋”ฐ๋ผ๊ฐ€๋ดค๋˜ ๊ณผ์ •์„ ์ •๋ฆฌํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
์ €์ฒ˜๋Ÿผ AI์— ์ž…๋ฌธํ•˜์‹œ๋Š” ๋ถ„๋“ค๊ป˜ ์กฐ๊ธˆ์ด๋‚˜๋งˆ ๋„์›€์ด ๋˜์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์Šต๋‹ˆ๋‹ค.

 


 

IMDB ๋ฐ์ดํ„ฐ? 

 

IMDB ๋ฐ์ดํ„ฐ์…‹์€ ์Šคํƒ ํฌ๋“œ ๋Œ€ํ•™์—์„œ ๊ณต๊ฐœํ•œ ์˜ํ™” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ๋กœ,
๊ฐ ๋ฆฌ๋ทฐ์— ๊ธ์ •(1) ๋˜๋Š” ๋ถ€์ •(0) ๋ ˆ์ด๋ธ”์ด ๋ถ™์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ๊ฐ์ • ๋ถ„์„ ์‹ค์Šต์šฉ์œผ๋กœ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐ์ดํ„ฐ์…‹์ž…๋‹ˆ๋‹ค.

 


 

๊ฐœ๋ฐœ ํ™˜๊ฒฝ ๋ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ 

 

 Google Colab
 Python 3.x
 TensorFlow, TensorFlow Hub, TensorFlow Datasets
 matplotlib, numpy

 

* TensorFlow Datasets์ด๋ž€ tfds์˜ Full Name์œผ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹์—์„œ ์ž์ฃผ ์“ฐ์ด๋Š” ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹๋“ค์„
 ์‰ฝ๊ฒŒ ๋ถˆ๋Ÿฌ์™€์„œ ๋ฐ”๋กœ ์“ธ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.


Code

 

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

 

๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๋Š” ๋ฐ ์ž์ฃผ ์“ฐ๋Š” numpy,
๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋งŒ๋“ค ๋•Œ ํ•„์ˆ˜์ธ tensorflow,
์‚ฌ์ „ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ์“ธ ๋•Œ ํ•„์š”ํ•œ tensorflow_hub,
์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‰ฝ๊ฒŒ ๋ถˆ๋Ÿฌ์˜ค๋Š” tensorflow_datasets(tfds),
๊ทธ๋ฆฌ๊ณ  ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆด ๋•Œ ์“ฐ๋Š” matplotlib๊นŒ์ง€ ๋ถˆ๋Ÿฌ์™”์–ด์š”.

 

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

 

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋“ค์˜ ๋ฒ„์ „์„ ํ™•์ธํ•ด์ฃผ๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. 

 

* ๋ฒ„์ „์ด ๋งž์ง€ ์•Š์œผ๋ฉด ์˜ค๋ฅ˜๊ฐ€ ๋นˆ๋ฒˆํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋ฃจ ์‚ฌ์ด์—๋„ ๋ฒ„์ „์ด  ๋ฐ”๋€Œ๋‹ˆ ์—ฌ๋Ÿฌ ๋ฒ„์ „์œผ๋กœ ํ…Œ์ŠคํŠธ ํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

 

train_data, test_data = tfds.load(
    name="imdb_reviews",
    split=["train", "test"],
    batch_size=-1,
    as_supervised=True
)

 

tfds.load๋ฅผ ํ†ตํ•ด IMDB ์˜ํ™” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
split ์˜ต์…˜์œผ๋กœ ํ•™์Šต(train)๊ณผ ํ…Œ์ŠคํŠธ(test) ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
batch_size=-1์€ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ ๋ฒˆ์— ๋ฉ”๋ชจ๋ฆฌ๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค(๋ฐ์ดํ„ฐ๊ฐ€ ํฌ์ง€ ์•Š์„ ๋•Œ ์œ ์šฉ).
as_supervised=True๋กœ (์ž…๋ ฅ, ๋ ˆ์ด๋ธ”) ์Œ์˜ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

 

train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)

 

TensorFlow Dataset ๊ฐ์ฒด๋ฅผ numpy ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. 

IMDB ๋ฐ์ดํ„ฐ์…‹์€ ๋ฆฌ๋ทฐ์™€ ๋ ˆ์ด๋ธ”๋กœ ๋‚˜๋ˆ„์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค. ๋•Œ๋ฌธ์— ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ๋ฅผ ์œ„์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๋‚˜๋ˆ„์–ด์ค๋‹ˆ๋‹ค.

 

model_url = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(model_url, input_shape=[], dtype=tf.string, trainable=True)

 

TensorFlow Hub์—์„œ ์ œ๊ณตํ•˜๋Š” ์‚ฌ์ „ํ•™์Šต๋œ ์˜์–ด ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ(`nnlm-en-dim50/2`)์„ KerasLayer๋กœ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
์ด ๋ ˆ์ด์–ด๋Š” ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ •๋œ ์ฐจ์›์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ, ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

*์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ž€ n์ฐจ์› ๊ณต๊ฐ„์„ ์ƒ๊ฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ˆซ์ž๋กœ ๋ฐ”๊พผ ๊ณต๊ฐ„์˜ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

 

๋ชจ๋ธ ์•„ํ‚คํ…์ณ๋ฅผ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค.

 

16๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ๊ฐ€์ง„ ์€๋‹‰์ธต๊ณผ ์ถœ๋ ฅ์ธต์„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

 

model.compile(
    optimizer='adam',
    loss=tf.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy']
)

 

๋ชจ๋ธ์„ ์ปดํŒŒ์ผํ•ฉ๋‹ˆ๋‹ค. 

 

Adam ์˜ตํ‹ฐ๋งˆ์ด์ €๋Š” ํ•™์Šต ํšจ์œจ์„ฑ๊ณผ ์•ˆ์ •์„ฑ์„ ๋†’์—ฌ์ฃผ๋Š” ๋Œ€ํ‘œ์ ์ธ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
BinaryCrossentropy ์†์‹ค ํ•จ์ˆ˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ํ‘œ์ค€์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
metrics=‘accuracy’๋กœ ํ•™์Šต ๋ฐ ํ‰๊ฐ€ ์‹œ ์ •ํ™•๋„๋ฅผ ํ•จ๊ป˜ ๋ชจ๋‹ˆํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค

 

x_val = train_examples[:10000]
partial_x_train = train_examples[10000:]
y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

 

ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ฅผ ๋ถ„๋ฆฌํ•ด์ค๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ํ•œ ๋ฒˆ๋งŒ ์จ ์ •ํ™•๋„๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ์ฒดํฌํ•˜๊ณ ์ž ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.

 

history = model.fit(
    partial_x_train,
    partial_y_train,
    epochs=40,
    batch_size=512,
    validation_data=(x_val, y_val),
    verbose=1
)

 

๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

์—ํญ์€ ํšŒ๋…์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. 40ํšŒ๋…์„ ์‹œํ‚จ๊ฒ๋‹ˆ๋‹ค.

 

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

 

๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค.

 

 

๊ทธ๋Ÿผ ์œ„ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์ถœ๋ ฅ์ด ๋ฉ๋‹ˆ๋‹ค.

์ด ํ‘œ๋ฅผ ๋ณด์‹œ๋ฉด 20 ์—ํญ ํ›„๋กœ๋Š” ๊ฒ€์ฆ ์ •ํ™•๋„๊ฐ€ ๋ฏธ์„ธํ•˜๊ฒŒ ์ค„์–ด๋“œ๋Š” ๊ฑธ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿผ ์—ํญ์„ ์ค„์ด์…”์„œ ์‚ฌ์šฉํ•˜๋ฉด๋ฉ๋‹ˆ๋‹ค. 

 

๋‘์„œ ์—†๋Š” ๊ธ€์ด์ง€๋งŒ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. 

 

ํ’€ ์ฝ”๋“œ๋Š” ์•„๋ž˜์— ์ฒจ๋ถ€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

import numpy as np

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import matplotlib.pyplot as plt

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

train_data, test_data = tfds.load(name="imdb_reviews", split=["train", "test"],
                                  batch_size=-1, as_supervised=True)

train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)

print("Training entries: {}, test entries: {}".format(len(train_examples), len(test_examples)))

model = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(model, input_shape=[], dtype=tf.string, trainable=True)
hub_layer(train_examples[:3])

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))

model.summary()

model.compile(optimizer='adam',
              loss=tf.losses.BinaryCrossentropy(from_logits=True),
              metrics=[tf.metrics.BinaryAccuracy(threshold=0.0, name='accuracy')])


x_val = train_examples[:10000]
partial_x_train = train_examples[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=40,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)

results = model.evaluate(test_examples, test_labels)

print(results)

model.save('imdb_model')

history_dict = history.history
history_dict.keys()

# history.history๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋กœ, ๊ฐ metric์˜ ๊ฐ’์ด ๋ฆฌ์ŠคํŠธ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
acc = history.history['accuracy']  # ๋˜๋Š” 'binary_accuracy' (metrics ์ด๋ฆ„์— ๋”ฐ๋ผ ๋‹ค๋ฆ„)
val_acc = history.history['val_accuracy']  # ๋˜๋Š” 'val_binary_accuracy'
epochs = range(1, len(acc) + 1)

plt.clf()   # clear figure

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()