StreamTable: Lazy-evaluating sequential rows

class carriage.StreamTable(iterable, *, pipeline=None)

StreamTable is similar to Stream but designed to work on Rows only.

classmethod count(start, step=1)

Create a inifinite consecutive StreamTable

>>> StreamTable.count(3, 5).take(3).show()
|   count |
|---------|
|       3 |
|       8 |
|      13 |
classmethod cycle(iterable)

Create a StreamTable cycling a iterable

>>> StreamTable.cycle([1,2]).take(5).show()
|   cycle |
|---------|
|       1 |
|       2 |
|       1 |
|       2 |
|       1 |
explode(field)

Expand each row into multiple rows for each element in the field

>>> stb = StreamTable([Row(name='a', nums=[1,3,4]), Row(name='b', nums=[2, 1])])
>>> stb.explode('nums').show()
| name   |   nums |
|--------+--------|
| a      |      1 |
| a      |      3 |
| a      |      4 |
| b      |      2 |
| b      |      1 |
classmethod from_dataframe(df, with_index=False)

Create from Pandas DataFrame

>>> import pandas as pd
>>> df = pd.DataFrame([(0, 1), (2, 3)], columns=['a', 'b'])
>>> StreamTable.from_dataframe(df).show()
|   a |   b |
|-----+-----|
|   0 |   1 |
|   2 |   3 |
Parameters:
  • df (pandas.DataFrame) – source DataFrame
  • with_index (bool) – include index value or not
Returns:

Return type:

StreamTable

classmethod from_tuples(tuples, fields=None)

Create from iterable of tuple

>>> StreamTable.from_tuples([(1, 2), (3, 4)], fields=('x', 'y')).show()
|   x |   y |
|-----+-----|
|   1 |   2 |
|   3 |   4 |
Parameters:
  • tuples (Iterable[tuple]) – data
  • fields (Tuple[str]) – field names
classmethod iterate(func, x)

Create a StreamTable recursively applying a function to last return value.

>>> def multiply2(x): return x * 2
>>> StreamTable.iterate(multiply2, 3).take(4).show()
|   iterate |
|-----------|
|         3 |
|         6 |
|        12 |
|        24 |
map_fields(**field_funcs)

Add or replace fields by applying each row to function

>>> from carriage import Row, X
>>> st = StreamTable([Row(x=3, y=4), Row(x=-1, y=2)])
>>> st.map_fields(z=X.x + X.y).to_list()
[Row(x=3, y=4, z=7), Row(x=-1, y=2, z=1)]
Parameters:**field_funcs (Map[field_name, Function]) – Each function will be evaluated with the current row as the only argument, and the return value will be the new value of the field.
Returns:
Return type:StreamTable
classmethod range(start, end=None, step=1)

Create a StreamTable from range

>>> StreamTable.range(1, 10, 3).show()
|   range |
|---------|
|       1 |
|       4 |
|       7 |
classmethod read_jsonl(path)

Create from a jsonlines file

>>> StreamTable.read_jsonl('person.jsonl') 
|   name |   age |
|--------+-------|
|   john |    18 |
|   jane |    26 |
Parameters:path (str or path or file object) – path to the input file
classmethod repeat(elems, times=None)

Create a StreamTable repeating elems

>>> StreamTable.repeat(1, 3).show()
|   repeat |
|----------|
|        1 |
|        1 |
|        1 |
classmethod repeatedly(func, times=None)

Create a StreamTable repeatedly calling a zero parameter function

>>> def counter():
...     counter.num += 1
...     return counter.num
>>> counter.num = -1
>>> StreamTable.repeatedly(counter, 5).show()
|   repeatedly |
|--------------|
|            0 |
|            1 |
|            2 |
|            3 |
|            4 |
select(*fields, **field_funcs)

Keep only specified fields, and add/replace fields.

>>> from carriage import Row, X
>>> st = StreamTable([Row(x=3, y=4), Row(x=-1, y=2)])
>>> st.select('x', z=X.x + X.y, pi=3.14).to_list()
[Row(x=3, z=7, pi=3.14), Row(x=-1, z=1, pi=3.14)]
Parameters:
  • *fields (List[str]) – fields to keep
  • **field_funcs (Map[str, Function or scalar]) – If value is a function, this function will be evaluated with the current row as the only argument. If value is not callable, use the value directly.
Returns:

Return type:

StreamTable

show(n=10)

print rows

Parameters:n (int) – number of rows to show
tabulate(n=10, tablefmt='orgtbl')

return tabulate formatted string

Parameters:
  • n (int) – number of rows to show
  • tablefmt (str) – output table format. all possible format strings are in StreamTable.tabulate.tablefmts`
to_dataframe()

Convert to Pandas DataFrame

Returns:
Return type:pandas.DataFrame
to_stream()

Convert to Stream

Returns:
Return type:Stream
where(*conds, **kwconds)

Create a new Stream contains only Rows pass all conditions.

>>> from carriage import Row, X
>>> st = StreamTable([Row(x=3, y=4), Row(x=3, y=5), Row(x=4, y=5)])
>>> st.where(x=3).to_list()
[Row(x=3, y=4), Row(x=3, y=5)]
>>> st.where(X.y > 4).to_list()
[Row(x=3, y=5), Row(x=4, y=5)]
Returns:
Return type:StreamTable
write_jsonl(path)

Write into file in the format of jsonlines

>>> stb.write_jsonl('person.jsonl') 
Parameters:path (str or path or file object) – path to the input file