Yekun's Note

Machine learning notes and writeup.

Fork me on GitHub

Notes of TensorFlow v1.x

Here is some useful notes/tricks of Tensorflow 1.x, a powerful deep learning framework developed by Google, Inc.

Op

placeholder

1
2
3
4
5
6
7
8
9
10
11
tf.placeholder_with_default(
input, shape, name=None
)
"""
args:
input: a tensor. the default value to produce when output is not fed
shape: a `tf.TensorShape` / list(int), possibly partial shape
name(optional): tensor name
returns:
a Tensor.
"""

tensor

  1. eval() [2]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    Tensor.eval(feed_dict=None, session=None)
    """
    args:
    feed_dict: feed_dict like `session.run()`
    session: specify the session to evaluate the tensor. If none, the default session will be used.

    returns:
    A numpy array of values
    """
  2. tf.group()

    1
    2
    3
    4
    5
    6
    tf.group(
    *inputs, name=None
    )
    Args:
    `*input`: zero or more tensors to group
    `name`(optional): op name
  3. get the batch size from placeholder

    1
    bsz = tf.shape(placeholder)[0]

Tensor manupulation

tf.where

1
2
3
4
5
6
7
8
9
10
tf.where(
condition, x=None, y=None, name=None
)
"""
Args:
condition: bool tensor
Returns:
If x== y == None, return the coordinates of true elements of condition.
Dim -> (# of true elements, condition.shape[-1])
"""

tf.gather

  • tf.gather slices the params with indix indices along axis.
  • output shape = params.shape[:axis] + indices.shape[batch_dims:] + params.shape[axis+1:]. The middle term indicates the axis to slice on.
1
2
3
4
5
6
7
8
tf.gather(
params, indices, axis=None, batch_dims=0, name=None
)
Args:
params: tensor to gather from
indices: tensor to index
axis: axis in params to gather indices from. Default the 1st non-batch dimension.
batch_dims: int. < rank(indices)

tf.gather_nd

  • Slice the params with the specified shape of indices.
  • Slice on the first N dims, where N=indices.shape[-1]. i.e., # of the slice op.
  • indices.shape[-1] <= params.rank
    • if equal, slice the element
    • if not equal, slice along the indices.shape[-1] axis.
  • out.shape = indices.shape[:-1] + params.shape[indices.shape[-1]:]. indices.shape[-1] indicate the dim after slicing.
    1
    2
    3
    tf.gather_nd(
    params, indices, batch_dims=0, name=None
    )

matrix mask

  1. tf.max_band_part
    Copy a tensor setting everything outside a central band in each innermost matrix to zero. That is, elements below the “num_lower” (if not -1) and above the “num_upper” are set to zeros.
1
in_band(m, n) = (num_lower < 0 || (m-n) <= num_lower)) && (num_upper < 0 || (n-m) <= num_upper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# deprecated: tf.matrix_band_part
tf.linalg.band_part(input, num_lower, num_upper, name=None)
"""
input: tensor
num_lower: int
num_upper: int
"""

# example

# if 'input' is [[ 0, 1, 2, 3]
#               [-1, 0, 1, 2]
#               [-2, -1, 0, 1]
#                [-3, -2, -1, 0]],

# tf.matrix_band_part(input, 1, -1) ==> [[ 0, 1, 0, 0]
#                                       [-1, 0, 1, 0]
#                                       [ 0, -1, 0, 1]
#                                      [ 0, 0, -1, 0]],

Gradient Clipping

tf.clip_by_norm

1
2
3
4
5
6
7
8
tf.clip_by_norm(t, clip_norm, axes=None, name=None)
"""
t: grad tensor to be clipped
clip_norm: A maximum clipping norm
axes: dimension for gradient clipping. Default None indicates all dimensions.
name: op name (optional).
"""
t * clip_norm / l2norm(t)
1
2
3
4
5
# example
opt = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)
grads_vals = self.opt.compute_gradients(self.loss)
grads_vals = [(tf.clip_by_norm(g, clip_norm), v) for g, v in grads_vals if g is not None]
train_op = opt.apply_gradients(grads_vals)

tf.clip_by_global_norm

1
2
3
4
5
6
7
8
9
tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)
"""
t_list: a tuple or list of tensors
clip_norm: A maximum clipping norm
use_norm (optional): specify the global norm if already computed.
name (optional): op name.
"""
t_list[i] * clip_norm / max(global_norm, clip_norm)
global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
1
2
3
4
5
6
# example
opt = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08)
params = tf.trainable_variables()
grads = tf.gradients(self.loss, params)
grads, grad_norm = tf.clip_by_global_norm(grads, clip_norm=5)
train_op = opt.apply_gradients(zip(grads, params))

tf.clip_by_averge_norm

1
2
3
4
5
6
7
tf.clip_by_average_norm(t, clip_norm, name=None)
"""
t: grad tensor to be clipped
clip_norm: A maximum clipping norm
name: op name
"""
t * clip_norm / l2norm_avg(t)

where $m$ is the number of tensor $t$.

tf.clip_by_value

1
2
3
4
5
6
7
8
9
tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)
"""
t: grad tensor
clip_value_min: clip min value
clip_value_max: clip max value
name: op name
"""
t[t > clip_value_max] = clip_value_max
t[t < clip_value_min] = clip_value_min

tf.distributions

Multinormial

Sample from probs and get the probabilities

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import tensorflow as tf
tf.enable_eager_execution()
probs = tf.constant([[0.5, 0.2, 0.1, 0.2], [0.6, 0.1, 0.1, 0.1]], dtype=tf.float32)

# method 1: (recommended)
multinom = tf.distributions.Multinomial(
total_count=tf.constant(1,dtype=tf.float32), # sample one for each record in the batch, that is [1, batch_size]
probs=probs)
sampled_actions = multinom.sample() # sample one action for data in the batch
predicted_actions = tf.argmax(sampled_actions, axis=-1)
action_probs = sampled_actions * predicted_probs
action_probs = tf.reduce_sum(action_probs, axis=-1)

# method 2: use tf.gather_nd
idx = tf.multinomial(probs, 1)
row_indices = tf.range(probs.get_shape()[0], dtype=tf.int64)
full_indices = tf.stack([row_indices, tf.squeeze(idx)], axis=1)
rs = tf.gather_nd(probs, full_indices)

Categorical

Intuited as generating sample of argmax{ OneHotCategorical(probs)} itself being identical to argmax{ Multinomial(probs, total_count=1)}.

Control flow

image source: [5]

tf.cond

  • Forward pass tf.cond(pred, fn1, fn2)
  • Gradient tf.cond(pred, grad(fn1), grad(fn2))

tf.while_loop

  • Forward pass tf.while_loop(cond_fn, body_fn, loop_vars) -> executes N times
  • Gradient
    1
    2
    3
    4
    5
    tf.while_loop(
    lambda i, g_vars: i<N,
    lambda i, g_vars: (i+1, grad(body_fn)(g_vars)), grad_ys
    )

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# used in the SeqGAN training.

import tensorflow as tf

def condition(t, output_ta):
return tf.less(t, 3)

def body(t, output_ta):
# write in the value at the $t$-th index of TensorArray
output_ta = output_ta.write(t, [2,3])
return t+1, output_ta

t = tf.constant(0)
# define TensorArray
output_ta = tf.TensorArray(dtype=tf.float32, size=1, dynamic_size=True)
# while_loop
result = tf.while_loop(condition, body, loop_vars=[t, output_ta])
last_t, last_out = result

final_out = last_out.stack() # return the value in the TensorArray



Eager execution mode

Eager execution mode support the dynamic graph and print out the values of tensors when creating graphs without using tf.Session() for version 1.x. [4]

1
2
# start the file with
tf.enable_eager_execution()

Model storage and load

Load trained model

  • Get undefined Placeholder op names
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    import tensorflow as tf

    checkpoint_path = "<ckpt-file>.ckpt"

    saver = tf.train.import_meta_graph('<meta-file>.meta')
    imported_graph = tf.get_default_graph()

    placeholders = [op for op in imported_graph.get_operations() if op.type == "Placeholder"]
    print(len(placeholders))
    for p in placeholders:
    print(p.name)

Tensorboard

Remote connection

1
2
3
4
5
6
7
# <local_port>: 16006
$ ssh -L 16006:127.0.0.1:6006 <account>@<server.address>

# at remote server
(remote) $ tensorboard --logdir="<./modeldir>"

# local browser visit https://127.0.0.1:16006

Configuration

GPU designation

  • Designate GPU in tf[7][8]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    import os
    import tensorflow as tf

    os.environ['CUDA_VISIBLE_DEVICES'] = "0,1,2,3"

    """
    allow_soft_placement
    ---------------------
    Whether soft placement is allowed. If allow_soft_placement is true,
    an op will be placed on CPU if
    1. there's no GPU implementation for the OP
    or
    2. no GPU devices are known or registered
    or
    3. need to co-locate with reftype input(s) which are from CPU.
    """
    config = tf.ConfigProto(
    allow_soft_placement=True, # allow soft device placement
    log_device_placement=True # Whether device placements should be logged.
    )

    # allow growth
    config.gpu_options.allow_growth = True
    # or assign fixed gpu memory
    # config.gpu_options.per_process_gpu_memory_fraction = 0.4
    sess = tf.session(config=config, ...)
  • Check GPU usage

    1
    $ nvidia-smi
  • Peridically watch the gpu usage:

    1
    2
    3
    4
    5
    # watch the gpu usage every 10 secs
    $ watch -n 10 nvidia-smi

    # --loop
    $ nvidia-smi -l

Log

Level Level for Humans Level Description
0 DEBUG [Default] Print all messages
1 INFO Filter out INFO messages
2 WARNING Filter out INFO & WARNING messages
3 ERROR Filter out all messages
1
2
3
# only output ERROR, omiting INFO and WARNING
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # default 0

References