StreamTable
: Lazy-evaluating sequential rows¶
-
class
carriage.
StreamTable
(iterable, *, pipeline=None)¶ StreamTable is similar to Stream but designed to work on Rows only.
-
classmethod
count
(start, step=1)¶ Create a inifinite consecutive StreamTable
>>> StreamTable.count(3, 5).take(3).show() | count | |---------| | 3 | | 8 | | 13 |
-
classmethod
cycle
(iterable)¶ Create a StreamTable cycling a iterable
>>> StreamTable.cycle([1,2]).take(5).show() | cycle | |---------| | 1 | | 2 | | 1 | | 2 | | 1 |
-
explode
(field)¶ Expand each row into multiple rows for each element in the field
>>> stb = StreamTable([Row(name='a', nums=[1,3,4]), Row(name='b', nums=[2, 1])]) >>> stb.explode('nums').show() | name | nums | |--------+--------| | a | 1 | | a | 3 | | a | 4 | | b | 2 | | b | 1 |
-
classmethod
from_dataframe
(df, with_index=False)¶ Create from Pandas DataFrame
>>> import pandas as pd >>> df = pd.DataFrame([(0, 1), (2, 3)], columns=['a', 'b']) >>> StreamTable.from_dataframe(df).show() | a | b | |-----+-----| | 0 | 1 | | 2 | 3 |
Parameters: - df (pandas.DataFrame) – source DataFrame
- with_index (bool) – include index value or not
Returns: Return type:
-
classmethod
from_tuples
(tuples, fields=None)¶ Create from iterable of tuple
>>> StreamTable.from_tuples([(1, 2), (3, 4)], fields=('x', 'y')).show() | x | y | |-----+-----| | 1 | 2 | | 3 | 4 |
Parameters: - tuples (Iterable[tuple]) – data
- fields (Tuple[str]) – field names
-
classmethod
iterate
(func, x)¶ Create a StreamTable recursively applying a function to last return value.
>>> def multiply2(x): return x * 2 >>> StreamTable.iterate(multiply2, 3).take(4).show() | iterate | |-----------| | 3 | | 6 | | 12 | | 24 |
-
map_fields
(**field_funcs)¶ Add or replace fields by applying each row to function
>>> from carriage import Row, X >>> st = StreamTable([Row(x=3, y=4), Row(x=-1, y=2)]) >>> st.map_fields(z=X.x + X.y).to_list() [Row(x=3, y=4, z=7), Row(x=-1, y=2, z=1)]
Parameters: **field_funcs (Map[field_name, Function]) – Each function will be evaluated with the current row as the only argument, and the return value will be the new value of the field. Returns: Return type: StreamTable
-
classmethod
range
(start, end=None, step=1)¶ Create a StreamTable from range
>>> StreamTable.range(1, 10, 3).show() | range | |---------| | 1 | | 4 | | 7 |
-
classmethod
read_jsonl
(path)¶ Create from a jsonlines file
>>> StreamTable.read_jsonl('person.jsonl') | name | age | |--------+-------| | john | 18 | | jane | 26 |
Parameters: path (str or path or file object) – path to the input file
-
classmethod
repeat
(elems, times=None)¶ Create a StreamTable repeating elems
>>> StreamTable.repeat(1, 3).show() | repeat | |----------| | 1 | | 1 | | 1 |
-
classmethod
repeatedly
(func, times=None)¶ Create a StreamTable repeatedly calling a zero parameter function
>>> def counter(): ... counter.num += 1 ... return counter.num >>> counter.num = -1 >>> StreamTable.repeatedly(counter, 5).show() | repeatedly | |--------------| | 0 | | 1 | | 2 | | 3 | | 4 |
-
select
(*fields, **field_funcs)¶ Keep only specified fields, and add/replace fields.
>>> from carriage import Row, X >>> st = StreamTable([Row(x=3, y=4), Row(x=-1, y=2)]) >>> st.select('x', z=X.x + X.y, pi=3.14).to_list() [Row(x=3, z=7, pi=3.14), Row(x=-1, z=1, pi=3.14)]
Parameters: - *fields (List[str]) – fields to keep
- **field_funcs (Map[str, Function or scalar]) – If value is a function, this function will be evaluated with the current row as the only argument. If value is not callable, use the value directly.
Returns: Return type:
-
show
(n=10)¶ print rows
Parameters: n (int) – number of rows to show
-
tabulate
(n=10, tablefmt='orgtbl')¶ return tabulate formatted string
Parameters: - n (int) – number of rows to show
- tablefmt (str) – output table format. all possible format strings are in StreamTable.tabulate.tablefmts`
-
to_dataframe
()¶ Convert to Pandas DataFrame
Returns: Return type: pandas.DataFrame
-
where
(*conds, **kwconds)¶ Create a new Stream contains only Rows pass all conditions.
>>> from carriage import Row, X >>> st = StreamTable([Row(x=3, y=4), Row(x=3, y=5), Row(x=4, y=5)]) >>> st.where(x=3).to_list() [Row(x=3, y=4), Row(x=3, y=5)] >>> st.where(X.y > 4).to_list() [Row(x=3, y=5), Row(x=4, y=5)]
Returns: Return type: StreamTable
-
write_jsonl
(path)¶ Write into file in the format of jsonlines
>>> stb.write_jsonl('person.jsonl')
Parameters: path (str or path or file object) – path to the input file
-
classmethod